Google Search Using Selenium And Python Selenium Python Basics

- - Applications, Python, Tutorials
After a busy week at college and internship, finally I get free time at weekend to write my first article for August 2015. We discuss about some common methods of selenium module in python today. Selenium is a library used for automated browser testing. However, in this post we discuss about using selenium module in python to make a google search. The post breaks down into various blocks explaining on how to open a url in the browser via selenium python, search presence of a url in a page, click links present in a page and also open a new tab. These are the necessities to get started with selenium. You may also like to read my article on how to login to a website using selenium python. Starting quickly with no further delay.Necessities to begin1. python installed

2. selenium module installed

For Linux:

sudo pip install selenium

For Windows:

pip install selenium

Google search using selenium python

 

from selenium import webdriver
from selenium.webdriver.comon.keys import Keys
q = raw_input("Enter the search query")
q = q.replace(' ', '')
browser = webdriver.Firefox()
body = browser.find_element_by_tag_name("body")
body.send_keys(Keys.CONTROL + 't')
counter = 0
for i in range(0,20):
    browser.get("https://www.google.com/search?q=" + q + "&start=" + str(counter))
    body = browser.find_element_by_tag_name("body")
    if "thetaranights" in body.text:
        browser.find_element_by_xpath('//a[starts-with(@href,"http://www.thetaranights.com")]').click()
        break
    counter += 10

1. Import statements (Line 1 and 2)

These are the import statements that are required for initiating a browser later in our program and passing url parameters to the address bar in the browser.

2. Get query for google search (Line 3 and 4)

Here, we are taking a query for the google search via raw_input. Here is an example url for a google search which requires the spaces between the words to be replaced by “+” , an additional parameter start=0 is seen which specifies the search result of page 1. Similarly start=10 gives the search result of page 2.

https://www.google.com/search?q=bhishan+bhandari&start=0

Hence, we after taking the input from the user, we replaces the spaces with +.

3. Instantiate a browser (Line 5)

The statement browser = webdriver.Firefox() opens up a new browser window.

4. Opening a new tab (Line 6 and 7)

These statement opens a new tab. The statement body = browser.find_element_by_tag_name(“body”) is to make sure we actually inside current tab’s body so that we can open a new tab with the combination of keyboard. body.send_keys(Keys.CONTROL + ‘t’) will open a new tab. For Mac replaceing CONTROL with COMMAND should work.

5. Opening a url in the browser (Line 10)

For opening a url in the browser, all you need to do is pass the url as an argument to the browser.get method. Remember I’ve given browser.get because we instantiated the browser earlier with browser = webdriver.Firefox().

6. Searching for a presence of certain url/text in the search result (Line 11 to 15)

Now again we assign the body of the current tab to the variable body. Then we check if “thetaranights.com” is present in the search result. If present, we run the statement browser.find_element_by_xpath(‘//a[starts-with(@href, “http://www.thetaranights.com”]’).click to search for the url in the search result which starts with “http://www.thetaranights.com” at the beginning and anything after it. We then use .click() over it to open the url. Since the result we are looking for is found and clicked. We break the loop. If the earlier statement if “thetaranights.com” in body.text was false meaning not found we would iterate and search for another page of google results and so on until 20 pages.

 Note: You can close the webbrowser with browser.quit()

So, now we know how to open a browser, open a new tab in the browser, go to certain website/url, search for link in the body of the page and click the link. If you have any questions regarding the codes/article, please mention below in the comment section. You may also be interested in my article on How to login to a website using selenium. Happy Coding

How To Split And Merge Pdf Documents

- - Uncategorized

Not the type of posts I usually produce. A promotional review of a tool.

Everyone knows that PDF files are hard to work with. Apart from figuring out how to convert PDF documents, oftentimes we’re also trying to put together the best PDF document possible from other content.

But when those content sources are already in the PDF format, it can seem like an uphill battle just to get the content separated. More often than not, we need to figure out how to manipulate PDF documents at the page level.

Sometimes we may need to rework a PDF document by adding or removing a few pages. Manipulating PDF documents like this can seem intimidating at first.

If you have legal PDF documents your concern may be preserving the integrity of the PDF pages, or if you’re working with reports, you may be worried about deleting the original PDF pages for good.

Normally, you’d have to convert the entire PDF file into a Microsoft Word document, delete or insert the pages accordingly, and then convert it back to PDF. But there’s an easier way to do it.

With a tool like Able2Extract 10 from Investintech.com, you can merge and split your PDFs as easily as you can select a page. This latest version comes with features for converting, creating and editing PDF documents.

Under the latter category, Able2Extract 10 has added the ability to merge and split PDF files. It does this by letting you extract or insert PDF pages to your currently opened PDF document.

For instance, if you have blank pages or full page images in a PDF you’d like to remove or collect into one file, you can extract them into a completely separate file. Or, if you’d like to add some supplementary information to compliment your existing PDF content, you can add them page by page into an existing PDF document easily.

Here’s a look at how this can be done with Able2Extract 10’s latest PDF splitting and merging feature.

To Merge PDFs:

1. Open the PDF you wish to add pages to in Able2Extract 10.

2. Click on Edit from the toolbar

 

3. From the side editing panel, select Insert From PDF

4. From the dialog that appears, select your PDF file from which you want to insert pages from. Click on Open.

How To Download Udemy Videos Script For Downloading Udemy Videos

- - Web
This short post will walk through simple steps to download udemy videos which are not download-able from the website. Most of the paid udemy courses as well as some free courses are unavailable to download at udemy.com . I personally have around 200 courses in my account. Now most of these courses were not available for download. Fortunately I found a python script on the internet which solved my problem easily.

I introduce you all to udemydl which is a python script.

Using udemydl to download courses from Udemy account

1. I assume you have python installed in your device. If you are running on a Linux environment then python is pre-installed.

2. If you don’t have pip installer for installing python packages, have it installed via following command.

sudo apt-get install pip

3. Now that you have pip installed type the following command the python script that enables the download of udemy courses.

sudo pip install udemy-dl

4. Below is the command to download the udemy course. You will need the url of the course.

udemy-dl https://www.udemy.com/COURSE_NAME

5. Next you will be asked for username and password of your udemy account.

Well that’s all you need to download a course from your udemy account.

Alternatively you can use the following command to download the course by passing username and password at the same time as parameters. Below is the command.

Udemy-dl -user@domain.com -p password https://www.udemy.com/COURSE_NAME

Thanks
for reading

Website Mobile Friendly Tester Automation Script Python Codes For Mobile Friendly Test

- - Uncategorized

Hey Guys, I am back again with another script that may pronounce useful to website owners, search engine optimization experts as well as normal people like me. Through the codes we write and discuss in this article, you will be able to check if a website is mobile friendly or not. Well, here I offer a bonus. Through the codes you will be able to issue a number of websites for a mobile friendly test instantly at a time. Why is it necessary? Here’s the answer. As of the latest update in google’s search algorithm, the search engine lord now considers mobile friendliness as a major ranking factor for a website.

Python script to automate mobile friendly test

Before we begin

Before we begin our coding, let me make few things clear. We will be writing 2 files although one will be a simple text file and another will be a python file. In this text file we will write the names of the domain we want to issue for a mobile friendly test, one in each line in the format domain.com i.e without www

from json import loads
import mechanize
br = mechanize.Browser()
br.set_handle_robots(False)
br.addheaders = [("User-agent","Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.13) Gecko/20101206 Ubuntu/10.10 (maverick) Firefox/3.6.13")]    

with open('websitesformobilefriendlytest.txt') as f:
    for line in f:
        google_results = br.open("https://www.googleapis.com/pagespeedonline/v3beta1/mobileReady?url=http://" + str(line)).read()
        json_obj = loads(google_results)
        if json_obj["ruleGroups"]["USABILITY"]["pass"] == True:
            print "Congrats " + str(line)  + " is mobile friendly"
        else:
            print str(line) + " is not mobile friendly"

1. Line 1 and 2

These are the import statements as we will be using mechanize module to query the mobile friendly test via a browser instantiated by the module and the response is a JSON hence we import loads from json.

2. Line 3 to 5

On line 3 we use the Browser() method of mechanize to instantiate a browser. Line 4 is a statement that tells to ignore the robots.txt file. On line 5, we specify a user agent.

3. Line 7 to 14

Line 7 opens the text file where we previously stored the names of the domain. We now can reference the content of the file via variable f.

Line 8 is the start of the for loop which stores the name of the domain in the variable line on each iteration.

On line 9, we query a domain name/ website for a mobile friendly test. The specified url will return a response of the test result which we store in a variable google_results

On line 10, we read the response and load it as a json object to a variable json_obj.

Now on line 11, we have a conditional statement to check if the website passed the mobile friendly test. The test result is a boolean value which is a value for the key “pass” which is again a value for the key “USABILITY” which in turn is a value for the key “ruleGroups” in the json_obj. Below is the example of how it may look.

{“ruleGroups” : {“USABILITY” : {“pass” : Ture/False}}}

If the website passed the mobile friendly test, the value will be True else False. Based on the result, we then print whether a website is mobile compatible or not.

Mobile friendly tester which writes result to google spreadsheet

Well, here is the bonus code. Let me know if you have any questions regarding the codes in the comment section below. Also, here’s a similar program (Is it a wordpress website checker script)with explanation on the codes which can help you understand and implement these codes. Thanks for reading :)

from json import loads
import mechanize
import gdata.spreadsheet.service
import datetime
rowdict = {}
rowdict['date'] = str(datetime.date.today())
spread_sheet_id = '13mX6ALRRtGlfCzyDNCqY-G_AqYV4TpE7rq1ZNNOcD_Q'
worksheet_id = 'od6'
client = gdata.spreadsheet.service.SpreadsheetsService()
client.debug = True
client.email = 'email@domain.com'
client.password = 'password'
client.source = 'mobilefriendlytest'
client.ProgrammaticLogin()

br = mechanize.Browser()
br.set_handle_robots(False)
br.addheaders = [("User-agent","Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.13) Gecko/20101206 Ubuntu/10.10 (maverick) Firefox/3.6.13")]

with open('websitesformobilefriendlytest.txt') as f:
for line in f:
    google_results = br.open("https://www.googleapis.com/pagespeedonline/v3beta1/mobileReady?url=http://" + str(line)).read()
    json_obj = loads(google_results)
    rowdict['website'] = str(line)
    if json_obj["ruleGroups"]["USABILITY"]["pass"] == True:
        #print "Congrats " + str(line) + " is mobile friendly"
        rowdict['ismobilefriendly'] = "yes"
    else:
        #print str(line) + " is not mobile friendly"
        rowdict['ismobilefriendly'] = "no"
    client.InsertRow(rowdict,spread_sheet_id, worksheet_id)

Hadoop Starter Kit What Is Big Data

- - Web

I just watched a 18 minutes video on introduction to Big Data & Hadoop on Udemy. Here’s a link https://www.udemy.com/hadoopstarterkit/learn/ to the course I’ve enrolled in if you’d like too. I would like to brief what I learned.

What is Big Data?

There are mainly three factors that very well helps define a big data. Volume, Velocity and Variety.

Let me take an example of an imaginary startup company who has around 1 TB of data at the initial phase. How do we define the data? Does it qualify for a big data? Well if I say the amount of data is going to be stable throughout the lifetime of the company, is it a big data? Certainly not. For a data set to be called big data, it should have a good growth rate thereby increasing the volume of the data and should be of different variety (text, picture, pdf, etc).

Here are some of the examples of big data.

Companies like Amazon monitors not only your purchase history and wishlist but also each clicks, recording all the pattern and processing this big amount of data thereby giving us a better recommendation system.

Here’s what NASA has to say about big data.

In the time it took you to read this sentence, NASA gathered approximately 1.73 gigabytes of data from our nearly 100 currently active missions! We do this every hour, every day, every year – and the collection rate is growing exponentially. – See more at: http://open.nasa.gov/blog/2012/10/04/what-is-nasa-doing-with-big-data-today/

Have a look at this

https://gigaom.com/2012/08/22/facebook-is-collecting-your-data-500-terabytes-a-day/

Big Data Challenges

Storage – Storage of data should be as efficient as possible both in terms of hardware and processing and retriving the data.

Computation Efficiency – It should be suitable for computation

Data Loss – Data may be lost due to hardware failure and other reasons. Hence data recovery strategies must be good.

Time – Big data is basically for analysis and processing, hence the amount of time for processing the data set should be minimal.

Cost – It should provide huge space and should also be cost effective.

Traditional Solutions

RDBMS

The main issue is scalability. Once the data increases, the amount of time for data processing goes higher with unmanagable number of tables forcing us to denormalize. Necessities may arise to change the query for efficiency. Also RDBMS is for structured data set only. Once the data is present in various formats, RDBMS cannot be used.

GRID Computing

Grid computing creates nodes hence is good for compute intensive. However, it does not perform well for big set of data. It requires programming in lower level like C.

A good solution, HADOOP

Supports huge volume

Storage Efficiency both in terms of hardware and processing/retrival

Good Data Recovery

Horizontal Scaling – Processing time is minimal

Cost Effective

Easy to Programmers and Non Programmers.

Is Hadoop replacing RDBMS?

So is Hadoop going to replace RDBMS? No. Hadoop is one thing and RDBMS is another better for specific purposes.

Hadoop

Storage : Perabytes

Horizontal Scaling

Cost Effective

Made of commodity computers. These are cost effective but enterprise level hardware.

Batch Processing System

Dynamic Schema (Different formats of files)

RDBMS

Storage: Gigabytes

Scaling limitted

Cost may increase violently with volume

Static Schema

Mailchimp Subscription Integrating WordPress And Mailchimp

- - Web

Integrating Mailchimp and WordPress for Better Results

Mailchimp is an email marketing company which is free for upto 2000 subscribers.

Mailchimp devotes three plans:

Entrepreneur : Send 12,000 emails to 2,000 subscribers for free. No credit card required while sign up. Mailchimp promises this package to be free forever.

Growing Business : 50,000 subscribers , unlimited number of emails.

High volume sender : More than 50,000 subscribers

Read more about the pricing at mailchimp.com/pricing

Though we will stick to the Entrepreneur plan for this post.

Why do you need Mailchimp ?

When you have a satisfactory traffic to your website/blog, and want the flow to remain and increase, it’s best to shuttle your blog with a subscribe button. But you need to make sure you provide quality content to your subscribers.

The need for mailchimp arises when you have pretty good quantity of followers and need a medium to disseminate your news/blog post to the audiences.

Sign Up for MailChimp

1. Visit Mailchimp. When on homepage, have a look at the upper right of the page and click on Sign Up Free.

mailchimpsignup

2. Enter your details when prompted. It is recommended to use your website’s official mailing id but alternatively you are free to use any valid email. Enter other details like username and password and finally stress the Create My Account button.

mailchimpenterdetails

3. Before starting off with MailChimp, you should click the conformation link in your email. On the landing page, you need to verify your humanity (Captcha). As soon as you are done with verification, you will now have to enter informations for your account.

mailchimpsignupsuccess

These include your name, company/website information, address, etc. Make sure you fill up each fields before submitting the form.

mailchimpactivate

Once done you will be prompted to a page simillar to this.

mailchimpletsgetstarted

4. Moving on, click on Creat A List as shown in the figure. Since we have no lists, we are going to create one. Click on Create List button on the right top of the screen.

mailchimpcreatelist

You will now to taken to a form. The list name must be sensible as it can be seen by your subscribers. “Your Website Name Newsletter” may be an appropriate name for the list.

Put the email address associated with your blog/website and an appropriate name (website name or your own name).

Remind people how they got on your list section is a way of telling people how they are incorporated with your website or a gentle reminder about their role in your website. For Example : “You received this email because you have subscribed to Your Website Name Newsletter”.

Enter other essential details and submit the form.

mailchimplistdetails

5. Now that you have successfully created a mailing list, it’s time to get people subscribe to your newsletter. Let’s get to our wordpress powered website/blog to get things done.

Under plugins, go to add new and search for “mailchimp widget”.

mailchimpwidget

Install and activate the plugin. As soon as you do so, you get a message on the top of the dashboard similar to one below.

mailchimpwidgetmsg

Click the link and you will be prompted for an API key.

mailchimpapikey

6. Login to your mailchimp account and click account as shown below.

mailchimpaccounttogetapi

Next up, click on extras and then on API keys. Scroll down until you see Create A Key option. Click on it to get the API key.

mailchimpcreatealist mailchimpcreateapikey

Now copy the API key and paste it in the field asking API key in your wordpress blog and save the changes.

mailchimpapikey

7. Now go to Widgets which is under Appearances and drag the MailChimp List Signup widget to the sidebar your theme allows. I am using Noteworthy theme at the moment, therefore I have right sidebar where I am going to place the MailChimp List Signup widget.

mailchimpwidgetediting

Since we have only one mailing list so the widget selects the available mailing list by default. Change the title to something appropriate and changing button text to “Subcribe” will be better. Click on Collect First Name. You may edit the success and faliure message as well.

All set and done. Just wait for the subscribers and grow your business. You can view the subscribers in your mailchimp account under the mailing list you created. To throw an email to everyone in the mailing list, create a campaign and that’s all it takes.

Object Oriented Programming With C Constructors Getter Setter

- - C++, Tutorials
Object Oriented Programming C++

Here through this article, we will discuss about the basics of Object Oriented Programming. Our codes will be based on C++ programming language while the concept is the same for other OOP languages too. We will write 3 files amongst which one is the header file, the second one is the implementation of the header template. Finally we will have one main program. By the end of this read, you will be able to write codes in Object Oriented Programming languages. We will cover constructor, destructor, setter and getters.

Class definition file Computer.h

#include
using namespace std;

class Computer{
    private:
        string deviceType;
        string nameofBrand;
    public:
        Computer(string brandName="lenovo",string typeofDevice="laptop");
        ~Computer();
        void setBrandName(string brandName);
        void setDeviceType(string typeofDevice);
        string getBrandName();
        string getDeviceType();
        void displayDeviceInfo();
};

The above program shows the structure of our class Computer. The file Computer.h is our class template file.

1. Line 1 and 2 are the include statements of our input/output header file i.e iostream

2. Line 4: Our class for this example is Computer in which the starting alphabet is capital which is the convention of OOP.

3. Line 5 to 7: In C++ we place the private variables after the keyword private followed by a colon. For our example we have two private variables deviceType and nameofBrand. The private variables cannot be accessed by the object.variableName while it is possible to access it via member functions i.e the methods that are public. Basically on a general sense, private variables can be accessed only within the class.

4. Line 8 to 15 are the member functions of class Computer. Here the functions/methods are placed after the keyword public: . This means the object of class Computer can access these member functions directly via object.memberFunction().

5. Line 9 and 10 are different than the other member functions. Line 9 is the definition of the constructor for our class Computer. The constructor contains the name same to the class name. This is the convention for all the OOP. A constructor has no return type as it is basically used for the initialization of the private variables. The code inside the constructor runs at the time of object creation. In our header file, we have two parameters in the constructor and each of the parameter is initialized by the default value. Line 10 is the definition of destructor. In C++ destructor has same name as the class name except it contains “~” sign before the name. Destructor are basically used to destroy other classes initialized in the current class.

6. Line 11 to 14 are the setter and getter methods for the private variables deviceType and nameofBrand. The setter methods have no return type and takes values through parameters which are to be set to the private variables. The getter methods are used to access the private variables and takes no parameter as it’s function is to return the value and not accept any parameters. Therefore getters have return type which is based on the type of private variables.

7. Line 15 is the member function like all others which has return type void and takes no parameter/argument.

The following file is Computer.cpp file which contains the implementation of the class definition Computer.h

Class implementation file Computer.cpp

#include
#include "Computer.h"
using namespace std;

Computer::Computer(string brandName,string typeofDevice){
    setBrandName(brandName);
    setDeviceType(typeofDevice);
}

Computer::~Computer(){
    cout<<"Object Destroyed!!"<<endl;
}

void Computer::setBrandName(string brandName){
    nameofBrand = brandName;
}

void Computer::setDeviceType(string typeofDevice){
    deviceType = typeofDevice;
}

string Computer::getBrandName(){
    return nameofBrand;
}

string Computer::getDeviceType(){
    return deviceType;
}

void Computer::displayDeviceInfo(){
    cout<< "It is a   " << getDeviceType() << "and belongs to  "<< getBrandName()<<endl;
}

1. Line 1 to 3 contains the include statements. We have to include the header file Computer.h in our implementation file. The standard header files are included via statement #include<header> while the header files created by the user are included via statement #include “Header.h”

2. Line 5 to 8 is the implementation of the constructor of the class Computer. It takes two arguments namely brandName and typeofDevice. Inside the function setBrandName and setDeviceType methods are called with the parameters brandName and typeofDevice respectively. Whenever an object of class Computer is created, the codes inside the constructor is run immediately.

3. Line 10 to 12 is the implementation of the Destructor of the class Computer. The destructor is basically used to terminate/kill the objects of the other classes initialized in the current class. In our example, we have done nothing but printed that the object has been destroyed.

4. Line 14 to 16 is the implementation of the method setBrandName. It is a setter method. Conventionally setter method begins with “set” followed by the variable name. Our setBrandName takes one argument and is of return type void. Inside the method, nameofBrand is set to the value passed in as an argument. nameofBrand is our private variable hence a public method is used to access and alter it’s value i.e setBrandName.

5. Line 18 to 20 is the implementation of the setDeviceType. Similar to the setBrandName method, it is also a setter method. This method is used to set the value of the private variable typeofDevice. This method also takes one argument and is of return type void.

6. Line 22 to 24 is the implementation of the method getBrandName. Unlike setBrandName, getBrandName is a getter method that is used to return the value of a private variable which in this case is nameofBrand. The return type of a getter method is same as the type of variable it returns. In our example, getBrandName is of string return type which takes no parameter/argument.

7. Line 26 to 28 is also a getter method that is used to return the value of the variable deviceType. It is of string return type because it is used to access the value of the variable deviceType which is of type string.

8. Finally we have our last method in the class computer which in this case, we are using to print out the information of the device based on the entries entered at the type of object creation. Method displayDeviceInfo is a void return type method that takes no parameter. Here we are using the standard of method of accessing the private variables i.e using getter methods. The method when invoked on an object prints the deviceType and nameofBrand.

Let us take a look at our main program where we create objects of class Computer and invoke various methods of the class. Below is the main program.

Main program testprogram.cpp

#include
#include "Computer.h"
using namespace std;

int main(){
    string deviceBrand;
    string typeofDevice;

    Computer computers[5];

    for(int i = 0; i < 5; i++){


        cout<< "Enter the brand of your computer for position "<< i+1<<endl;
        getline(cin,deviceBrand);

        cout<< "Enter the type of computer for position "<< i+1<<endl;
        getline(cin,typeofDevice);

        Computer objectHolder(deviceBrand, typeofDevice);

        computers[i] = objectHolder;
    }

    for(int i = 0; i < 5; i++){

        Computer objectHolder = computers[i];
        objectHolder.displayDeviceInfo();
        //computers[i].displayDeviceInfo();
    }
}

1. Line 1 to 3 are the statements to include the iostream and our Computer class that we coded earlier. As discussed earlier, we include the non-standard class (Computer.h in this case) in the format #include “Header.h”. One thing to note is that we include the class definition file and not the implementation file.

2. Line 6 to 7, we declare two variables of type string.

3. Line 9 begins the OOP portion. Here we are declaring an array of type Computer of size 5. This means each index of the array computers can hold an object of Computer class.

4. Line 11 to 23 is a for loop where we iterate for the number of times equal to the size of our array I.e five. We then take input from the user for the variables deviceBrand and typeofDevice declared earlier. Next, we create an object named objectHolder of class Computer. You will notice we have passed in two arguments at the time of creation of the object. Now this invokes the constructor of Computer class. Everything that’s inside of the constructor gets run at this instance. Finally, we are assigning the objectHolder to the array’s current index. Summing up we will have five objects assigned to the array at the end of our loop.

5. Line 25 to 30 is another loop. Here we invoke the displayDeviceInfo method of the class Computer on each object stored in the array computers. On invoking the method, we get the information of the device we’ve entered at the time of creation of the object.

Following is the output of our program. You will see Object destroyed being printed several time. This is because we have a destructor method in our computer class.

Grab Whois Information And Write To Google Spreadsheet

Hello Guys, Here I am with yet another program that can benefit you and many search engine optimizers. By the end of this read you will be able to write a program to extract the whois information of a number of domains stored in a text file and write the information about the domain in a google spreadsheet which has now been a medium to share data and findings online. As a Search Engine Optimizer, you need to keep track a number of websites including your competitions. Here I offer you a simple python program to keep track of. On the other hand if you are not a SEO expert like myself, you can still use this script to track various websites you are used to.

Prerequisites before beginning to code

We are going to have two files one of which is a .py file where we code our program. The other is a text file with .txt extention where we store the domain names we want to find whois information for. The text file must contain a domian name in a format www.domain.com one per each line.

Next, we need to create a google spreadsheet where we intend to write the whois information so we can share with others. Direct your browser to https://docs.google.com/spreadsheets/ and create a new spreadsheet named “Whois Info”. Once done, create three rows namely “website”, “whoisinformation” and “date”. The name of the domain name will be under the row website, the whois information will be under the row whoisinformation and the date we queried the whois information will remain under the row date.

Python code to extract whois information and write to google spreadsheet

from bs4 import BeautifulSoup
from urllib2 import urlopen
import gdata.spreadsheet.service
import datetime
rowdict = {}
rowdict['date'] = str(datetime.date.today())
spread_sheet_id = '1zE8Qe8wmC271hG2uW4XE68btUks79xX0OG-O4KDl_Mo'
worksheet_id = 'od6'
client = gdata.spreadsheet.service.SpreadsheetsService()
client.debug = True
client.email = "email@domain.com"
client.password = 'password'
client.source = 'whoisinfo'
client.ProgrammaticLogin()
with open('websitesforwhois.txt') as f:
    for line in f:
        soup = BeautifulSoup(urlopen("http://www.checkdomain.com/cgi-bin/checkdomain.pl?domain=" + str(line)).read())
        for pre in soup.find_all("pre"):
            whois_info = str(pre.string)
        #print whois_info
        rowdict['website'] = str(line)
        rowdict['whoisinformation'] = whois_info
        client.InsertRow(rowdict,spread_sheet_id, worksheet_id)

1. Line 1 to 4

These are the import statements. We use BeautifulSoup to make a soup object out of a url response. Urlopen to get the response of a url. Gdata to access the google spreadsheet. Datetime to get the current system time.

2. Line 5 and 6

In our program, we require to access the google spreadsheet and write to it hence we are using gdata module. Now in order to write to spreadsheet, we need to pass the data as a dictionary or generally known as json which has data as a key:value pair. Rowdict is a variable storing the data to pass to google spreadsheet. On line 6, we store the current date to the key “date” which if you remember is a row at our spreadsheet.

3. Line 7 to 14

Line 7 to 14 is a procedure to connect/access a specific google spreadsheet. We require spread_sheet_id and worksheet_id. Take a look to the url of your spreadsheet. The url looks something like this one

https://docs.google.com/spreadsheets/d/1VbNph0TfFetKLU8hphrEyuNXlJ-7m628p8Sbu82o8lU/edit#gid=0

The spreadsheet id(mentioned earlier) is present in the url. “1VbNph0TfFetKLU8hphrEyuNXlJ-7m628p8Sbu82o8lU” in the above url is the spreadsheet id we need. By default the worksheet id is ‘od6‘.

On line 13 is the client.source assigned to string ‘whoisinfo’. This is the file name or the spreadsheet name. Remember we named our spreadsheet “Whois Info”. The client.source is the spreadsheet name which is written in small alphabets excluding white spaces.

4. Line 15 to 16

Line 15 opens the text file where we’ve stored the names of the domain. Line 16 helps iterate through each lines in the file. At each iteration, the domain name at each line is stored to variable line.

5 Line 17

On line 17, we query the page giving the whois information for us and make a soup object out of it by invoking the BeautifulSoup method over the url response. The reason we are making a soup object is that we can access required data via tags and the data we need is inside a <pre></pre> tag.

6 Line 18 to 19

Now we know that there is only one “pre” tag in the soup element. We therefore iterate to find a pre tag and store the information inside of the pre tag to a variable whois_info.

7 Line 21 to 23

On line 21, we are assigning the domain name to the key “website” of the dictionary rowdict. On line 22, we are assigning the whois information stored in the variable whois_info to the key “whoisinformation” of the dictionary rowdict. Note that the key of the dictionary must match to the row name in our spreadsheet. Line 23 pushes the dictionary to the google spreadsheet and writes to it. The iteration goes until the domain names at a text file is finished.

If you have any questions/confusions regarding the article or code, please mention below in comments so we can discuss. Thanks for reading

Gui Automation With Python

- - Applications, Python, Tutorials, Web
Hello Readers. It has been a bit longer delay in publishing my article. However today I will present to my awesome readers, an introduction to a GUI automation module in python (I.e pyautogui). Pyautogui is a GUI automation module for python2 and python3 which provides methods for controlling mouse and keystrokes. This decent module can be used to create bots to automate the repetitive tasks while you can enjoy your coffee. Pyautogui can do anything a human user sitting at the computer can do, except spill coffee on the keyboard” says the geek responsible for this cool module.

Follow the link below to have pyautogui installed on your machine.

https://pyautogui.readthedocs.org/en/latest/install.html

With no further iteration about the introduction, I would like to present few basics about the module.

1. Locating coordinates of the mouse cursor.

>>> import pyautogui

>>> pyautogui.position()

(850, 504)

>>>

It returns the current x and y coordinate of the mouse cursor position. In a computer screen the left top point is the origin or (0,0)

2. Moving the mouse cursor

>>> pyautogui.moveTo(10,10)

>>> pyautogui.moveTo(10,10,duration=1)

The moveTo function takes x-coordinate and y-coordinate as parameters while duration can be passed as the third parameter which is optional used to specify the amount of time in seconds to reach to the specified coordinate. The second one is humanly approach while the first is an instant movement of cursor.

3. Clicking

>>> pyautogui.click(80,80)

>>> pyautogui.doubleClick(80,80)

>>> pyautogui.rightClick(80,80)

Clicking on a certain coordinate on the screen is possible via click method while it also provides doubleClick, rightClick methods taking parameter x-coordinate and y-coordinate in all cases.

4. Keystrokes

For typing, we will first need to locate an appropriate type area. Therefore, you might want to use this method after click on some coordinate which is writable. You can use two or more statements to run simultaneously one after another by separating each statement by semicolon. For instance, I’ve specified the coordinates of the url bar on my browser and then typed my name on it via following commands/statements

>>> pyautogui.click(50,80);pyautogui.typewrite(“Bhishan”)

>>> pyautogui.click(50,80);pyautogui.typewrite(“Bhishan”, interval=0.2)

We can pass an optional parameter interval in seconds to specify the time in seconds between each letter or keystroke.

5. Hot Key

The hotkey method can be used in cases we need to press two or more keys at the same time. A handy example is Ctrl + S to save a file or Ctrl + Shift + q to quit

>>> pyautogui.hotkey(‘Ctrl’,’Shift’,’q’)

You can see all the possible mapping keys of the keystrokes via this method

>>> pyautogui.KEYBOARD_KEYS

Well that’s enough to get you started and good at GUI automation via pyautogui. Below is a bot I have made using the module to automate a boring task for myself. Iterating the story behind the need for the bot. I am a fourth semester CS undergrad student(I mean lazy student). I never take notes in any of the classes I attend. At the time of exams, I rely on the photos of my friend’s notes which they send me. As always I got the photos but this time all the pictures were at landscape mode by some chance(near about 100-110 images). It would be kind of distracting to rotate each image to read it. So I wrote some 7-8 lines of code to make a bot that would open each image file rotate it and save it while I have my dinner. I’ve used time module along with pyautogui to keep some time gap between the statements.

import pyautogui
import time
pyautogui.click(450,450);pyautogui.typewrite('graphicsnotes');pyautogui.press('enter')
time.sleep(2)
for i in range(107):
  pyautogui.press('right');pyautogui.press('enter')
  pyautogui.hotkey('ctrl','r');pyautogui.hotkey('ctrl','s')
  time.sleep(2)
  pyautogui.press('esc')
  time.sleep(2)
  time.sleep(2)


The concept is to click anywhere on the desktop screen. I choose some random coordinate (450,450). Then typing folder name to locate the folder followed by enter to open it. Then iteratively clicking right to select the image file, opening the image file by pressing enter, followed by hotkey ‘Ctrl’ + ‘r’ to rotate the image clockwise then ‘Ctrl’ + ‘s’ to save it. Finally pressing esc to close the file and repeating the process to go to the next image file. I had total of 107 images so I’ve iterated 107 times in my program to reach up to all the image files. Tell me how you felt the article was in the comments section below so I can come up with a cool set of articles for the next week. Till then, happy automation with pyautogui 🙂

Here read the docs https://pyautogui.readthedocs.org

Python Selenium Time Tkinter Pyvirtualdisplay

- - Applications, Python, Technology, Web
Making use of libraries available and building something useful has always been my favorite. However, I haven’t yet been involved in making/contributing to one. I don’t even know the procedure to be involved in building libraries. My latest program also makes use of various libraries such as selenium, time, tkinter. In a nutshell, the program fetches jokes from www.laughfactory.com and displays a message box with a joke in a timely interval. I usually run the program while I am at work(internship). I’ve set 10 minutes interval between the jokes to be shown. By the way, I used time module and it’s time.sleep(time_in_seconds) method for dealing with intervals between each message box to appear. The program isn’t full fledged and you can make additions to it on your own for customization according to your need. I made use of selenium module along with pyvirtualdisplay to fetch the jokes and mute the display of the browser, since it would be distracting. The program only fetches 20 latest jokes from laughfactory.com and periodically displays one at a time in a messagebox. A little humor is always good at work. Anyway that’s my personal thought you may disagree to. I used tkinter module for displaying the message box. That’s all about the program. Have a look at the codes. Thank god, the import statements < program statements.

Python codes to get timely jokes

import tkinter
import tkinter.messagebox as mbox
from time import sleep
from selenium import webdriver
from pyvirtualdisplay import Display

display = Display(visible=0, size=(800, 600))
display.start()
browser = webdriver.Firefox()
browser.get("http://www.laughfactory.com/jokes/latest-jokes")

sleep(40)
#print("finding jokes")
jokes = [str(joke.text) for joke in browser.find_elements_by_xpath("//div/p[starts-with(@id,'joke_')]")]

#print("found")
browser.quit()
display.stop()

window = tkinter.Tk()
window.wm_withdraw()

for joke in jokes:
    sleep(600)
    mbox.showinfo('Bored? Enjoy the JOk3!', joke)

Thank you for spending your time here. I have always tried to make my articles short and maybe now succeeded. You can utilize the saved time for another read in this blog, may be :p .Tell me how you felt the article was in the comments section below. This keeps me motivated to publish good content.