Grab alexa rank and write to google spreadsheet using python

- - Tutorials

How to grab alexa rank using python?

By the end of this read you will be able to extract alexa rank and write it to a google spreadsheet programmatically. This can be helpful for seo experts as they have to constantly analyze the alexa ranking and submit it to his/her boss in a regular basis or for personal reference. Some times you have a huge number of websites to take care of, especially when your main expertise is search engine optimization. Going to alexa’s website and writing ranks for each website is a cumbersome task to deal with. The code provided below is in python programming language. Executing the code with valid gmail id, password, spreadsheet id, worksheet id associated with the spreadsheet you want to work with will append website link and current alexa rank with date to the spreadsheet. Don’t worry about spreadsheet id and worksheet id, I will let you know on how to get these items.

Needs for extracting alexa rank and writing to a spreadsheet using python

 

1. email address and password associated with google docs

Your gmail address and password.

2. Spreadsheet id and worksheet id
writing alexa rank to google spreadsheet using python

write to google spreadsheet using python

Go to docs.google.com when logged in and create a new spreadsheet. Fill the first three columns of the first row as website, rank and date respectively. All the letter should be in lower case and no whitespaces. Now when you have created a new spreadsheet, take a look to the url. The url looks something like this one

https://docs.google.com/spreadsheets/d/1VbNph0TfFetKLU8hphrEyuNXlJ-7m628p8Sbu82o8lU/edit#gid=0

The spreadsheet id(mentioned earlier) is present in the url.

1VbNph0TfFetKLU8hphrEyuNXlJ-7m628p8Sbu82o8lU” in the above url is the spreadsheet id we need. By default the worksheet id is ‘od6‘.

3. List of websites in a text file

Since we’re writing a program that can extract alexa ranks for hundreds of websites and append it to google spreadsheet, therefore taking url from console input is never a good solution. We have to write the url of websites we want to take care of in a text file. Each website in a single line in the format www.domain.com. Make sure there is a valid website, one in each line because we will read the url from python line by line.

Grab alexa rank and write to a google spreadsheet

In the code below make necessary changes before you run it. Replace username with your email, password with your true password and the filename with the name of the file that contains urls of website you want to take care of.

from urllib2 import urlopen
import gdata.spreadsheet.service
import datetime
rowdict = {}
rowdict['date'] = str(datetime.date.today())
spread_sheet_id = '1VbNvh0TfFetKWU0hphrEyudXlJ-7m628p8Sbu82o8lU'
worksheet_id = 'od6'
client = gdata.spreadsheet.service.SpreadsheetsService()
client.debug=True
client.email='email@gmail.com'
client.password='Password'
client.source='alexarank'
client.ProgrammaticLogin()
with open('websitesforalexa') as f:
for line in f:
  soup = BeautifulSoup(urlopen('http://data.alexa.com/data?cli=10&dat=snbamz&url='+line).read())
  rowdict['website']=soup.popularity['url']
  rowdict['rank']=soup.popularity['text']
  client.InsertRow(rowdict,spread_sheet_id,worksheet_id)

 

Modules required

For this program to run correctly we need modules namely:

BeautifulSoup

gdata

If you are on a linux machine the following command should install the necessary modules

sudo apt-get install python-bs4

sudo apt-get install python-gdata

If you are in some other operating system or if the above command didn’t work then you can get those modules in the following links.

https://code.google.com/p/gdata-python-client/downloads/list

https://pypi.python.org/pypi/beautifulsoup4/4.3.2

Other modules we will use are parent modules hence we need not download/install it.

Getting the alexa rank using BeautifulSoup

BeautifulSoup lets us extract xml elements hence it is the perfect module for this program. Below is the screenshot of the xml we will parse to get the alexa rank.

extracting alexa rank of a website

Parsing an xml to get alexa rank

As seen in the xml above, the rank resides as the value of attribute ‘TEXT’ whose parent tag is <POPULARITY>. All we need to do is make a soup element of the url containing the above xml and access the alexa rank by tag and attribute.

soup.popularity[‘text’]

Writing alexa rank to google spreadsheet using python

grab alexa rank and write to google spreadsheet

writing alexa rank to google spreadsheet

In order to write the extracted rank to google spreadsheet programmatically we are using the gdata module. In order to write to a spreadsheet we need the spreadsheet id, worksheet id and a dictionary containing values we want to write to the spreadsheet. The dictionary contains key as the column header and value as the string that is to be written to the spreadsheet(url, rank, date for our program).


Bhishan Bhandari [22] Brewing contents directly from the Himalayas of Nepal. I am a hobbyist programmer and enjoy writing scripts for automation. If you'd like a process to be automated through programming, I also sell my services at Fiverr . Lately, I like to refresh my Quora feeds. Shoot me messages at bbhishan@gmail.com  

There Are 1 Comment On This Article.

Leave a Reply

Your email address will not be published. Required fields are marked *