Posts Tagged: "writing to google spreadsheet using python"

Is It A WordPress Website Checker Script In Python

- - Applications, Python, Tutorials

By the end of this read, you will be able to code a program that will check for a number of domain names stored in a text file to verify whether or not a website is powered by wordpress. In this program we will write the result for each website in a google spreadsheet for later use but you can apply certain code if a website is wordpress powered at the mean time.

Python script to check if a bunch website is wordpress powered

Below is the program written in python programming language which reads a text file storing the names of the domain one per each line and checks if it a wordpress website and writes the status in a google spreadsheet.

 

from bs4 import BeautifulSoup
import mechanize
import gdata.spreadsheet.service
import datetime
rowdict = {}
rowdict['date'] = str(datetime.date.today())
spread_sheet_id = '1mvebf95F5c_QUCg4Oep_oRkDRe40QRDzVfAzt29Y_QA'
worksheet_id = 'od6'
client = gdata.spreadsheet.service.SpreadsheetsService()
client.debug = True
client.email = "email@domain.com"
client.password = 'password'
client.source = 'iswordpress'
client.ProgrammaticLogin()
br = mechanize.Browser()
br.set_handle_robots(False)
br.addheaders = [("User-agent","Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.13) Gecko/20101206 Ubuntu/10.10 (maverick) Firefox/3.6.13")]
base_url = br.open("http://www.isitwp.com/")
with open('websitesforwpcheck.txt') as f:
    for line in f:
        rowdict['website'] = str(line)
        br.select_form(nr=0)
        br["q"] = str(line)
        isitwp_response = br.submit()
        isitwp_response = isitwp_response.read()
        if "Good news everyone" in a:
            rowdict['iswordpresswebsite'] = "yes"
        else:
            rowdict['iswordpresswebsite'] = "no"
        client.InsertRow(rowdict,spread_sheet_id, worksheet_id)

Isitwordpresswebsite code explanation

Before beginning , we need to create a spreadsheet and make three rows with names website, date, isitwordpresswebsite

1. Line 1 to 4

These are the import statements for the libraries we will use in our program. We use BeautifulSoup to convert a response from a request as a soup object. Mechanize is used in order to skip the length of code for sessions and cookies. Gdata is used to connect to our google account in order to access the spreadsheet we want to work with. Datetime is used to get the current date at the time of script run.

2. Line 5

In line 5, we create an empty dictionary where we later create a key:value pairs for date, website name and status of a website(is it wordpress or not). Also google spreadsheet accepts a dictionary item/json which is then written to the spreadsheet. In line 6 we store the current date at the time of script run to a key “date” which is later in our program pushed to the google spreadsheet.

3. Line 7 to 14

Now, before proceeding forward, we need to create a google spreadsheet. Now another step is to take a look at the url of the spreadsheet we created. We need the spreadsheet id to access the spreadsheet via our program. Below is a screenshot of how to get the spreadsheet id. The worksheet id is ‘od6‘ by default. Line 9 to 14 gets us logged in to our google account and accesses the spreadsheet we want to work with.

4. Line 15 to 18

In line 15, we use the mechanize module’s method to initiate a browser. Line 16 states to ignore the robots.txt file. Line 17 adds a user agent to the browser. Line 18 lets us open the website “isitwp.com” which we will be using to check if a website is wordpress powered or not.

5. Line 19

Line 19 allows us to open the text file where we have the names of the domain(one per each line).

6. Line 20 to 21

In line 20, we iterate through the lines of the file where we have stored domain names. At each iteration, the domain name at the current line is stored to the variable line. In line 21, we create a key:value pair where we store the domain name at each iteration to the key “website.

7. Line 22 to 30

Line 22 is a statement to select the form of index 0 i.e the first form present in the website isitwp.com. Now value of the name attribute of the seachbox is “q” which is where we store the name of the domain before submitting the form. Line 24 submits the form after passing the value of domain name in the br[‘q’] and stores the response in a variable isitwp_response. Line 25 is the complete webpage response stored in variable isitwp_response. In line 26 we check if “Good news everyone” substring is present in the response which means the website is powered by wordpress else it is not. Line 27 then makes a key:value pair where the key “iswordpresswebsite” is given value “yes” if the condition on line 26 passed and “no” if the condition failed. Now remember the key of the dictionary rowdict must be the name of the row in our spreadsheet.

This way we can test if a website is wordpress powered for a number of website names stored in a text file(one at each line). Thanks for reading :) . If you have any questions regarding the article or the code, comment below so we can discuss on that part.