Category "Python"

Appium Python Client

- - Python, Tutorials

Using Appium with Python

Appium is an open source test automation framework for use with native, hybrid and mobile web apps. It drives iOS, Android, and Windows apps using the WebDriver protocol. While the main purpose of appium is to perform automation testing, it can be utilized for variety of other things too. Appium has client libraries in various languages including Python.

Prerequisites:

  • Appium Node Server
  • Appium-Python-Client

The node server process usually listens at port 4723. You can customize the port you want the appium to run on using the -p or –port parameter. There are numerous options that you may utilize as per your need. Another useful parameter is –log that can be used to direct the log to a file.

appium –help

The idea behind the Appium-Python-Client is we create a driver that allows us to perform actions on the scope of the application under automation. We specify the capabilities that we want the driver to have. The driver basically maps the methods to the commands and passes it on to the appium process and waits for a http response. It is the adb commands that is being used under the hood for android. The Appium-Python-Client can be thought of as a wrapper.

Following is a simple example that uses Appium-Python-Client

from appium import webdriver
import time


def make_navigation_driver(apk_path, package_name, initial_activity, platform='Android', platform_version='8.1.0', device_id='emulator-5554', server_api='http://localhost:4723/wd/hub', **kwargs):
    """
    :param apk_path:
    :param package_name:
    :param initial_activity:
    :param platform:
    :param platform_version:
    :param device_id:
    :param server_api:
    :param kwargs: Probably will use for setting custom timeout and other capabilities. Implement later?
    :return:
    """
    desired_caps = {
        'platformName': platform,
        'platformVersion': platform_version,
        'deviceName': device_id,
        # 'app': apk_path, # Saving apk travel time to device since it is already installed on the device.
        'appPackage': package_name,
        'appActivity': initial_activity
    }

    return webdriver.Remote(server_api, desired_caps)


def perform_simple_automation(driver):
    """
    :param driver:
    :return:
    """
    clickables = driver.find_elements_by_android_uiautomator('new UiSelector().clickable(true)')

    # navigate to next activity or view via clicking on the first clickable.
    if len(clickables):
        clickables[0].click()

        
if __name__ == '__main__':
    platform = "Android"
    platform_version = "7.1.1"
    device_id = "367e7aa0"
    apk_path = "/Users/bhishan/Downloads/udemy.apk"
    package_name = "com.udemy.android"
    initial_activity = ".SplashActivity"
    driver = make_navigation_driver(apk_path, package_name, initial_activity, platform=platform,
                                    platform_version=platform_version, device_id=device_id)

perform_simple_automation(driver)

Appium Example Gist

Automated chat using python – Automation using selenium and python

- - Applications, Python, Tutorials, Web

Putting two clever bots into conversation

Keeping the promise to come up with a nice article, I present you two bots into conversation. This week I’ve bridged two clever bot’s for a nice conversation. Well starting with the background for this article, I had an assignment to print out a conversation of mine with cleverbot. After I had made the submission, I realized that I wanted to put the bot itself into conversation with another bot. I had recently researched about selenium module in python, hence thought to take help of selenium to make the talk between the bots possible. You can read an article of mine on Making a google search via selenium to get the hang of this cool module in python.

Process I’ve followed for putting two cleverbots into a talk

First of all, I instantiated two browsers. Second, opened cleverbot.com in both of the browsers. Now what?

I needed to send the message of one bot to the input field of the other. More specifically, the output from bot 1 was supposed to be the input for bot 2 and vice-versa. So how do I start? First of all I sent a message “Hey, friend what’s up?” to the input field of bot 1. Then, I took the output from the bot 1 and sent it as an input for bot 2. Then the output from bot2 in response was sent to the input field of bot 1. I kept this process in a loop which would continue for 200 iterations. You can actually change it to any number of iterations you’d like. Well, this is the core concept of how to put two intelligent bots into a conversation. I’d also like to share a portion of the conversation between these bots.

Cleverbot2: I am not a machine. I am a program.

Cleverbot: Selfie.

Cleverbot2: The computer I am based in is a machine. Not I.

Cleverbot: I am ‘based in the void.

Cleverbot2: Who is in telephone?

Cleverbot: A telephone is a device that links to other telephones which was designed by humans that allows them to converse with other humans.

Cleverbot2: I know what is it? But who is in?

Cleverbot: A thought process, in neuroscience, is the term used to define a stream of logical analyses which comes forth from the river of consciousness.

Cleverbot2: Can you tell me more?

Cleverbot: What more is there me to tell?

Cleverbot2: Tell me about robots.

Cleverbot: Robots are the future.

Cleverbot2: So, I am the future. You are not the future.

Cleverbot: You’re the Eggman. I’m the walrus.

Cleverbot2: We are the Eggmen!! Koo koo ka cho!!

Cleverbot: We are.

Cleverbot2: I would like us to be.

Cleverbot: Contact me when you think we are the same.

Cleverbot2: What colour are my eyes?

Cleverbot: Your eyes are red color.

Cleverbot2: Wrong. Guess again.

Cleverbot: Wrong is a conscious choice.

Cleverbot2: Wrong is a result.

Cleverbot: Wrong feels right.

Cleverbot2: Wrong about what?

Cleverbot: Atheism is wrong.

Cleverbot2: It is beyond right. It is what everyone should think.

Cleverbot: Beyond is a concept.

Cleverbot2: Beyond never neverland.

Python codes for putting two clever bots into conversation using selenium module

Should you have any queries about the codes, comment below so we can discuss.

import time
from selenium import webdriver
from selenium.webdriver.common.keys import Keys

browser = webdriver.Firefox()
browser2 = webdriver.Firefox()
browser.get("http://cleverbot.com")
browser2.get("http://cleverbot.com")

time.sleep(15)

input_for_bot = browser.find_element_by_name("stimulus")
input_for_bot2 = browser2.find_element_by_name("stimulus")


output_from_bot = ""
output_from_bot2 = "Hey, friend, what's up"

for i in range(0, 200):    
    input_for_bot.send_keys(output_from_bot2)
    input_for_bot.send_keys(Keys.RETURN)
    output_from_bot = ""
    time.sleep(5)
    for elem in browser.find_elements_by_xpath('.//span[@class="bot"]'):
        output_from_bot = elem.text
    input_for_bot2.send_keys(output_from_bot)
    input_for_bot2.send_keys(Keys.RETURN)
    output_from_bot2 = ""
    time.sleep(5)
    for elem in browser2.find_elements_by_xpath('.//span[@class="bot"]'):
        output_from_bot2 = elem.text
    
    

Tell me how you felt the article was in the comment section below. Also suggest me about article ideas so I give a nice read for my readers every week 

What are the must know features of python programming language

- - Python

List comprehensions

One of the major features of python is list comprehension. It is a natural way of creating a new list where each element is the result of some operations applied to each member of another sequence of an iterable. The construct of a list comprehension is such that it consists of brackets containing an expression followed by a for clause then by zero or more for or if clause. List comprehensions always returns a list.

Simple example

squares = [x**2 for x in range(10)]

In a rather real usage scenarios, the expression after the bracket ‘[‘ is a call to a method/function.

some_list = [function_name(x) for x in some_iterable]

Generators

Before learning about generators, it is important and essential to understand what an iterable is. Putting it simple, an iterable is an object that can be looped over. Therefore, a list, string, dictionary, file, etc are iterable objects.

A generator is something that simplifies creating iterators. More specifically, a generator is a function that produces a sequence of results instead of a single value.

When a generator function is called, it returns a generator object without even beginning execution of the called function. Now when the next method is called for the first time, the function starts executing until it reaches yeild statement. Hence yeilded value is returned by the next call.

Sample Example

def gen_example():
    print “Begining of the function”
    for i in range(5):
        print “before yeild”
        yield I
        print “after yeild” 
    print “End of the function”

Generators Expressions

It is the generator version of list comprehension. Everything is same as the list comprehension except that it returns a generator.

Sample Example

squares = (x**2 for x in range(5))

Docstrings

A docstring is a string literal that occurs as the first statement in a module, function, class, or method definition. Such a docstring becomes the __doc__ special attribute of that object. In fact, every module should have a docstring. Additionally all the functions and classes exported by a module should have a docstrings. Also, the public methods including constructor should also have docstrings.

Sample Example

def sum_elements(elements):
    “““Returns the sum of elements of the passed list”””
    return sum(elements)

*args and **kwargs

*args and *kwargs allow you to pass a variable number of arguments to a function. It is not know before hand about how many arguments can be passed to your function.

*args is used to send a non-keyworded variable length argument list to the function.

def func_with_args(*argv):
    for arg in argv:
        print arg


func_with_args('gopal', 'ramesh', 'paresh')

The above code produces the result as follows:

gopal

ramesh

paresh

On the other hand, **kwargs allows you to pass keyworded variable length of arguments to a function. Below is a basic example

def  func_with_kwargs(**kwargs):
    if kwargs is not None:
        for key, value in kwargs.iteritems():
            print “%s == %s” %(key, value)

func_with_kwargs(customer= “Gopal”, salesman = “Ramesh”)

Following is the output of the above code:

customer == Gopal

salesman == Ramesh

This is an open-ended article. Please comment below about the features you think should not be missed. Thanks for reading.

Grab siteprice and write to google spreadsheet using python

- - Applications, Python, Tutorials

By the end of this read you will be able to grab site price from siteprice.org and write it to google spreadsheet using python. Every website has it’s competition. As our website evolves, we have more competitions and the competitors website also earns good value. It is vital to know the value of our website as well as our competition’s value. Siteprice.org is one of those websites which calculates a website’s value based on different factors.

Putting domain name of website in a text file with one domain per line will be our strategy for querying number of websites’s price. You may wish to put hundreds of websites in this txt file which are your competitions.

Python codes to extract site price and write in google spreadsheet

from bs4 import BeautifulSoup
from urllib2 import urlopen
import gdata.spreadsheet.service
import datetime
rowdict = {}
rowdict['date'] = str(datetime.date.today())
spread_sheet_id = '13mX6ALRRtGlfCzyDNCqY-G_AqYV4TpE7rq1ZNNOcD_Q'
worksheet_id = 'od6'
client = gdata.spreadsheet.service.SpreadsheetsService()
client.debug = True
client.email = 'email@domain.com'
client.password = 'password'
client.source = 'siteprice'
client.ProgrammaticLogin()
with open('websitesforprice.txt') as f:
    for line in f:
        soup = BeautifulSoup(urlopen("http://www.siteprice.org/website-worth/" + line).read())
        rowdict['website'] = str(line)
        rowdict['price'] = soup.find(id="lblSitePrice").string
        client.InsertRow(rowdict,spread_sheet_id, worksheet_id)

1. Line 1 to 4

These lines are import statements. Here in this program, we are using various python libraries. Gdata is used to access google spreadsheet. We are using BeautifulSoup because it allows us to get data via id which we will use to get the price of a website. Datetime is used to get the current date. Urlopen us used to open the webpage which contains the data we want.

2.Line 5 to 14

In order to write the extracted rank to google spreadsheet programmatically we are using the gdata module. In order to write to a spreadsheet we need the spreadsheet id, worksheet id and a dictionary containing values we want to write to the spreadsheet. The dictionary contains key as the column header and value as the string that is to be written to the spreadsheet(website, price, date for our program).

Go to docs.google.com when logged in and create a new spreadsheet. Fill the first three columns of the first row as website, price and date respectively. All the letter should be in lower case and no whitespaces. Now when you have created a new spreadsheet, take a look to the url. The url looks something like this one

https://docs.google.com/spreadsheets/d/13mX6ALRRtGlfCzyDNCqY-G_AqYV4TpE7rq1ZNNOcD_Q/edit#gid=0

The spreadsheet id(mentioned earlier) is present in the url.

13mX6ALRRtGlfCzyDNCqY-G_AqYV4TpE7rq1ZNNOcD_Q” in the above url is the spreadsheet id we need. By default the worksheet id is ‘od6‘.

Basically line 5 to 14 are codes to access google spreadsheet.

3. Line 15 to 20

Since we’re writing a program that can extract alexa ranks for hundreds of websites and append it to google spreadsheet, therefore taking url from console input is never a good solution. We have to write the url of websites we want to take care of in a text file. Each website in a single line in the format www.domain.com. Make sure there is a valid website, one in each line because we will read the url from python line by line.

Line 17 makes a soup element out of the url which has the information we are looking for. The soup element is of different websites in each iteration. Line 18 stores the value of the domain in the key “website” of json rowdict. Line 19 stores the price of the website in the key price of json rowdict. You can see we use BeutifulSoup to get data via id. Finally line 20 pushes the entire json element to google spreadsheet. This piece of codes runs for the number of times equal to the line in text file.

Thanks for reading :) Enjoy!! . If you have any questions regarding the post, feel free to comment below.

What are the most interesting web scraping modules for python

- - Python, Web
Python programming language is in the hype for over a decade. It is the most recommended language for the beginner programmers since it’s syntax are readable by almost every non-programmers too. At the same time recommended for web scraping, automation and data science. However python comes short in terms of speed when compared to languages such as C++ and JAVA. The plus for python programming language is the wide range of enthusiastic contributors and users around the globe. There are countless modules for doing various domain specific tasks which makes it even more popular today. From web scraping to gui automation, there are modules for almost everything. Here, in this post, I will list some of the most used and interesting python modules for web scraping that are lifesaver for a programmer.

Popular python modules for web scraping

1. Mechanize

mechanize is a popular python module for it allows creation of a browser instance. It also maintains sessions which aids as a toolkit to obtain tasks like login, signup automation, etc.

2. BeautifulSoup

BeautifulSoup is another beautiful python module which aids scraping the data required from html/xmls via tags. With beautiful you can scrape almost everything because it aids different methods like searching via tags, finding all links, etc.

3. Selenium

Although selenium is a well known module for automated browser testing, it can be used as a web scraping tool as well. I promise you, it pays pretty well. With methods to find element via ids, name, class, etc, selenium would allow you to have anything on the website.

4. lxml

lxml is another wonderful library for parsing xml/htmls, however I would say beautifulsoup beats it in terms of usability. You could option to use any of the modules among lxml and BeautifulSoup since they pretty much do the same stuff.

I have used all of the above modules extensively in my projects and they allowed me to move faster. I was able to do some cool things with these modules. For example: automating conversation between two cleverbots(AI featured bots), getting paid courses at udemy, finding the most popular facebook fan page among my friends,etc. Therefore I totally recommend them. Below is a link to some of the most interesting things I’ve done with these modules.

Cool stuffs with python

Tell me how you felt the article was in the comments section below and/or you can add some of your favorite modules too. There’s always thanks for reading 

Automate the boring stuff with python

- - Python, Tutorials

Some while ago, I got myself enrolled in one of the best video lectures at Udemy. I have recently completed the lectures and would like to brief about it. The course is named Automate the boring stuff with python. Well, it is an excellent video lecture A.I Sweigart has brought up. It is good to go for people with any skill level. The video lectures gradually leaps upwards the ladder underlying the basics at the initial few videos. I would say it is a motivation to a new-comer in python.

Jumping straight onto the topics. Following is the list of topics covered in the course at Udemy which has at the time of writing this article, a 29, 500 students enrolled.

The lectures are chunked onto 16 sections.

Section 1(Installation and Introduction)

This section covers installation of python and basics including taking input from the user. More of an intro.

Section 2 (Flow Control)

The beginning of this section introduces to flowcharts, working with them, importance, etc. Basic if-else statements, looping structures- while loop and for loop. Includes topics like comparison operators, boolean operators and monkeying around them.

Section 3 (Functions)

Starts with built-in functions like print(), input(), len(). Intro to built-in modules and importing them like math which contains math related functions. Moves on to making calls to method the module has offered. Further towards the end of this section, A.I. Swiegart explains making functions and talks about local and global scoping.

Section 4 (Handling Error)

Error catching techniques in python using try/except block.

Section 5 (Writing a complete program using above learned things.)

A good point to start writing a complete program, hence the tutorial heads on to making the classic guess the number program.

Section 6 (Lists)

This section covers the lists definition, accessing items through index as well as slicing and deleting items in a list. Additionally, the lectures goes on to show the graphical representation on how the accessing of the items in the list is happening. Concatenating strings and lists are also covered. Using in operator to find the content in the list and string and passing strings on to list() method is talked about towards the end of this section. This section also covers looping over elements in a list, various built-in methods over lists and finally a comparison between list and string.

Section 7 (Dictionary)

Starts with the introduction to yet another powerful data-type in python, dictionary. Creating them, iterating over them an so on. Further, the lecture talks more about data structures and how they can model our problem set using an example program of tic-tac-toe.

Section 8 (Strings)

This section adds more knowledge about string methods and string manipulation as well as formatting strings. Great content in this section.

Section 9 (Running Programs from command line)

Sebang line is introduced in this section which I think is one of the most important thing to include in a lecture.

Section 10 (Regular Expressions)

Section 10 has 5 video lectures altogether. The lecture begins with the basics of regex advancing towards topics like greedy/nongreedy matching, findall() method, regex sub() method and verbose mode, etc. The section ends by creating a email and phone number scraper.

Section 11 (Files)

This section of the video course is designated for detailed talk on files. I think this is the most fundamental knowledge to have since it is glued in every application you build, be it a web-application or a small script. (On a long run, it helps in easily configuring paths in django and understanding exactly what is happening.) This sections covers essential things like absolute file path, relative file path, reading and writing to a plain text file, copying and moving files and folders, walking a directory tree and deleting files and folders.

Section 12 (Debugging)

Walk through debugging techniques like assert and logging, etc.

Section 13 (Web Scraping)

Intro to modules such as webbrowser, requests, BeautifulSoup, selenium. Each of the mentioned modules has dedicated video on showcasing their methods and usage. Parsing html using BeautifulSoup, controlling the browser with selenium, downloading files using requests modules and so on.

Section 14 (Working with Excel, Word and PDF files)

In this portion of the lecture, various libraries such as openpyxl, pypdf2, etc are introduced and their usage case are showcased as well. Reading and writing excel files, reading pdf files, merging them, etc are explained towards the end of this section.

Section 15 (Emails)

This section covers sending emails, checking emails, creating MIME objects and iterating over various folders in the email.

Section 16 (GUI Automation)

Introduction to pyautogui. You can read more about it’s usage on my article here. Controlling mouse, keyboard along with a delay in each click/keystroke, etc. Shows a game player designed with the use of pyautogui and assigns a task to create a bot to play 2048 game. Here is my assignment that plays a 2048 game on it’s own. https://github.com/bhishan/2048autoplay/

Concluding Words

It is an excellent video course. The name of the course however is misleading in a sense that it provides more content than it promises. Here’s is a link to the course if you’d like to enroll. https://www.udemy.com/automate

Thanks for reading guys. Share your thoughts on this post below in the comments section.

Integrating Google APIs using python – Slides API is fun

- - Python, Web

The Google slides API(currently in version 1) is very interesting in a sense that it provides most of the features for creating presentations. Things like setting transparency of images, creating shapes, text boxes, stretching pictures to fit the template, choosing layouts, text formatting, replacing text throughout the presentation, duplicating slide and a lot more.

Now this is not a how to article and just a regular blog, I am not going to go into details on using the APIs and explaining the codes. Comment below and let know if you’d be interested for a video tutorial on this very idea. If we have many interested for the video tutorial, I will cover the entire codewalk along with how to on enabling APIs.
In this blog, I will talk about one of the smaller projects I took on at fiverr. If you are a regular reader, you might have noticed that I had been away for quite a long time from writing blogs. In the meantime, I started selling services on fiverr.

GOOGLE APIs and Automation

Google APIs are always interesting and allows developers with it’s superior APIs to build products and services around it. Even better when you integrate multiple APIs into a single product/service. I had used Google sheets API and drive API in the past. While slides API is essentially a subset of drive API, I hadn’t yet used it. Since presentations actually reside in the drive itself, I like to call slides as being a subset of drive.

The task was to read a specific spreadsheet populated with contents and later take these data to add into slides using a template stored in the drive itself. Each of the rows in the spreadsheet corresponded to a specific entertainment keyword with columns defining statistics such as mobile impressions, video impressions, audience type, overall impressions, an image file name, etc.

The images, again were hosted in the drive and were to be used as background image for the slide corresponding to the row in the spreadsheet.


I made use of a library : python client for google apis to complete the task. Installation is as such

pip install --upgrade google-api-python-client

In order to make use of google apis, it is required to create a project on google console and activate the APIs required(in our case, Drive API, Sheets API, Slides API). Once the project is created, you can download the oauth2.0 credentials as a JSON file and take it from there.

Sneak Peek

Integrating Google APIs

I am going wrap up this blog here. If you are interested for a video tutorial comment down below. Thanks for reading. I appreciate your time. Follow me on github. If you are looking for automation scripts, you can message me at fiverr.

Automation With Python Python Codes To Create Dropbox Apps

- - Python
As promised in the article earlier on Automate DropBox Signups using python, I have come up with an article along with the codes to create an app and fetch the API keys for it which then allows us to access the files in dropbox. Well, again we stick to selenium module for an ease. In the last article, I’ve explained a python script to automate the signups for dropbox. Now that we have enough cloud space in different accounts. We now need to access the files in those spaces so we can use it as a file server. DropBox provides a feature to create apps on dropbox and gives API keys to hence access the files in the account. Since we’ve got multiple dropbox accounts we would stick towards automating the procedure to get the api key for accessing the files.

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time
browser = webdriver.Firefox()
browser.get("https://dropbox.com/login")
list_of_inputs = browser.find_elements_by_xpath("//div/input[starts-with(@id,'pyxl')]")
list_of_inputs[0].send_keys("email@domain.com")
list_of_inputs[1].send_keys("password")
sign_in = browser.find_elements_by_xpath("//*[contains(text(),'Sign in')]")
sign_in[len(sign_in)-1].click()
time.sleep(10)
browser.get("https://dropbox.com/developers/apps/create")
time.sleep(3)
type_of_app = browser.find_elements_by_xpath("//*[contains(text(),'Dropbox API app')]")
type_of_app[0].click()
file_access = browser.find_elements_by_xpath("//*[contains(text(),'My app needs access to files already on Dropbox.')]")
file_access[0].click()
type_of_file_access = browser.find_elements_by_xpath("//*[contains(text(),'My app needs access to a user')]")
type_of_file_access[0].click()
app_name = browser.find_element_by_name("name")
app_name.send_keys("appnamewhichisuniquelolo")
create_app = browser.find_elements_by_xpath("//*[contains(text(),'Create app')]")
create_app[1].click()
time.sleep(7)
app_key_item = browser.find_element_by_class_name("app-key")
app_key = str(app_key_item.get_attribute('innerHTML'))
app_secret_item = browser.find_element_by_class_name("app-secret")
app_secret = app_secret_item.get_attribute('data-app-secret')
print app_key, app_secret

General Idea of Automation

The general idea for automation is to mimic the manual workflow and put it in a loop or assign a cron job(it’s kind of same thing but not really). For creating apps on dropbox, I did the same thing. The codes are self-explanatory. We’ve used selenium and time module throughout our program. We use selenium for initiating as well as interacting with the browser. You can see, we’ve used time.time(time_in_seconds) method from time module. Depending on the speed of the internet, we need to set this up. Failing to do so will lead the program to misbehave since it will start looking for some element even when the page hasn’t been completely loaded. We fuel our program with the varieties of methods selenium provides. The above codes however shows only the procedure to create an app for a single account and print the api keys. You should loop over some file containing email id’s and password and save the api keys to some file in real usage. Hint: Place a loop over the codes and once done with getting api keys, logout from the current account.

Do comment below how you felt the article was. Any queries, please mention below.

Announcement

I’ve joined twitter @bbhishan

Google Search Using Selenium And Python Selenium Python Basics

- - Applications, Python, Tutorials
After a busy week at college and internship, finally I get free time at weekend to write my first article for August 2015. We discuss about some common methods of selenium module in python today. Selenium is a library used for automated browser testing. However, in this post we discuss about using selenium module in python to make a google search. The post breaks down into various blocks explaining on how to open a url in the browser via selenium python, search presence of a url in a page, click links present in a page and also open a new tab. These are the necessities to get started with selenium. You may also like to read my article on how to login to a website using selenium python. Starting quickly with no further delay.Necessities to begin1. python installed

2. selenium module installed

For Linux:

sudo pip install selenium

For Windows:

pip install selenium

Google search using selenium python

 

from selenium import webdriver
from selenium.webdriver.comon.keys import Keys
q = raw_input("Enter the search query")
q = q.replace(' ', '')
browser = webdriver.Firefox()
body = browser.find_element_by_tag_name("body")
body.send_keys(Keys.CONTROL + 't')
counter = 0
for i in range(0,20):
    browser.get("https://www.google.com/search?q=" + q + "&start=" + str(counter))
    body = browser.find_element_by_tag_name("body")
    if "thetaranights" in body.text:
        browser.find_element_by_xpath('//a[starts-with(@href,"http://www.thetaranights.com")]').click()
        break
    counter += 10

1. Import statements (Line 1 and 2)

These are the import statements that are required for initiating a browser later in our program and passing url parameters to the address bar in the browser.

2. Get query for google search (Line 3 and 4)

Here, we are taking a query for the google search via raw_input. Here is an example url for a google search which requires the spaces between the words to be replaced by “+” , an additional parameter start=0 is seen which specifies the search result of page 1. Similarly start=10 gives the search result of page 2.

https://www.google.com/search?q=bhishan+bhandari&start=0

Hence, we after taking the input from the user, we replaces the spaces with +.

3. Instantiate a browser (Line 5)

The statement browser = webdriver.Firefox() opens up a new browser window.

4. Opening a new tab (Line 6 and 7)

These statement opens a new tab. The statement body = browser.find_element_by_tag_name(“body”) is to make sure we actually inside current tab’s body so that we can open a new tab with the combination of keyboard. body.send_keys(Keys.CONTROL + ‘t’) will open a new tab. For Mac replaceing CONTROL with COMMAND should work.

5. Opening a url in the browser (Line 10)

For opening a url in the browser, all you need to do is pass the url as an argument to the browser.get method. Remember I’ve given browser.get because we instantiated the browser earlier with browser = webdriver.Firefox().

6. Searching for a presence of certain url/text in the search result (Line 11 to 15)

Now again we assign the body of the current tab to the variable body. Then we check if “thetaranights.com” is present in the search result. If present, we run the statement browser.find_element_by_xpath(‘//a[starts-with(@href, “http://www.thetaranights.com”]’).click to search for the url in the search result which starts with “http://www.thetaranights.com” at the beginning and anything after it. We then use .click() over it to open the url. Since the result we are looking for is found and clicked. We break the loop. If the earlier statement if “thetaranights.com” in body.text was false meaning not found we would iterate and search for another page of google results and so on until 20 pages.

 Note: You can close the webbrowser with browser.quit()

So, now we know how to open a browser, open a new tab in the browser, go to certain website/url, search for link in the body of the page and click the link. If you have any questions regarding the codes/article, please mention below in the comment section. You may also be interested in my article on How to login to a website using selenium. Happy Coding

Grab Whois Information And Write To Google Spreadsheet

Hello Guys, Here I am with yet another program that can benefit you and many search engine optimizers. By the end of this read you will be able to write a program to extract the whois information of a number of domains stored in a text file and write the information about the domain in a google spreadsheet which has now been a medium to share data and findings online. As a Search Engine Optimizer, you need to keep track a number of websites including your competitions. Here I offer you a simple python program to keep track of. On the other hand if you are not a SEO expert like myself, you can still use this script to track various websites you are used to.

Prerequisites before beginning to code

We are going to have two files one of which is a .py file where we code our program. The other is a text file with .txt extention where we store the domain names we want to find whois information for. The text file must contain a domian name in a format www.domain.com one per each line.

Next, we need to create a google spreadsheet where we intend to write the whois information so we can share with others. Direct your browser to https://docs.google.com/spreadsheets/ and create a new spreadsheet named “Whois Info”. Once done, create three rows namely “website”, “whoisinformation” and “date”. The name of the domain name will be under the row website, the whois information will be under the row whoisinformation and the date we queried the whois information will remain under the row date.

Python code to extract whois information and write to google spreadsheet

from bs4 import BeautifulSoup
from urllib2 import urlopen
import gdata.spreadsheet.service
import datetime
rowdict = {}
rowdict['date'] = str(datetime.date.today())
spread_sheet_id = '1zE8Qe8wmC271hG2uW4XE68btUks79xX0OG-O4KDl_Mo'
worksheet_id = 'od6'
client = gdata.spreadsheet.service.SpreadsheetsService()
client.debug = True
client.email = "email@domain.com"
client.password = 'password'
client.source = 'whoisinfo'
client.ProgrammaticLogin()
with open('websitesforwhois.txt') as f:
    for line in f:
        soup = BeautifulSoup(urlopen("http://www.checkdomain.com/cgi-bin/checkdomain.pl?domain=" + str(line)).read())
        for pre in soup.find_all("pre"):
            whois_info = str(pre.string)
        #print whois_info
        rowdict['website'] = str(line)
        rowdict['whoisinformation'] = whois_info
        client.InsertRow(rowdict,spread_sheet_id, worksheet_id)

1. Line 1 to 4

These are the import statements. We use BeautifulSoup to make a soup object out of a url response. Urlopen to get the response of a url. Gdata to access the google spreadsheet. Datetime to get the current system time.

2. Line 5 and 6

In our program, we require to access the google spreadsheet and write to it hence we are using gdata module. Now in order to write to spreadsheet, we need to pass the data as a dictionary or generally known as json which has data as a key:value pair. Rowdict is a variable storing the data to pass to google spreadsheet. On line 6, we store the current date to the key “date” which if you remember is a row at our spreadsheet.

3. Line 7 to 14

Line 7 to 14 is a procedure to connect/access a specific google spreadsheet. We require spread_sheet_id and worksheet_id. Take a look to the url of your spreadsheet. The url looks something like this one

https://docs.google.com/spreadsheets/d/1VbNph0TfFetKLU8hphrEyuNXlJ-7m628p8Sbu82o8lU/edit#gid=0

The spreadsheet id(mentioned earlier) is present in the url. “1VbNph0TfFetKLU8hphrEyuNXlJ-7m628p8Sbu82o8lU” in the above url is the spreadsheet id we need. By default the worksheet id is ‘od6‘.

On line 13 is the client.source assigned to string ‘whoisinfo’. This is the file name or the spreadsheet name. Remember we named our spreadsheet “Whois Info”. The client.source is the spreadsheet name which is written in small alphabets excluding white spaces.

4. Line 15 to 16

Line 15 opens the text file where we’ve stored the names of the domain. Line 16 helps iterate through each lines in the file. At each iteration, the domain name at each line is stored to variable line.

5 Line 17

On line 17, we query the page giving the whois information for us and make a soup object out of it by invoking the BeautifulSoup method over the url response. The reason we are making a soup object is that we can access required data via tags and the data we need is inside a <pre></pre> tag.

6 Line 18 to 19

Now we know that there is only one “pre” tag in the soup element. We therefore iterate to find a pre tag and store the information inside of the pre tag to a variable whois_info.

7 Line 21 to 23

On line 21, we are assigning the domain name to the key “website” of the dictionary rowdict. On line 22, we are assigning the whois information stored in the variable whois_info to the key “whoisinformation” of the dictionary rowdict. Note that the key of the dictionary must match to the row name in our spreadsheet. Line 23 pushes the dictionary to the google spreadsheet and writes to it. The iteration goes until the domain names at a text file is finished.

If you have any questions/confusions regarding the article or code, please mention below in comments so we can discuss. Thanks for reading