Into the web. Web mining, determined to conquer the interests of the programmers remains untouched and rigid as a hot topic within the planet. When mining the web comes into action, most of the programmers are seen to choose python programming language as it provides numerous modules for web mining. In this post we will learn how to login to any website using the mechanize module which is an amazing module. An additional bonus we get from mechanize is that we need not work with the cookies. It does it on it’s own.
The story behind this post follows. I had to login to my college site every seven days. If I didn’t do so I am unable to login to the system later and had to apologize to my principal to get the access back. So now I have automated this script that gets me logged in to my college’s site.
In order to install the module, in your terminal type in the following command
sudo pip install mechanize
If other than linux machine, go to this link and get it installed.
Login to a website using python
Things you will need before starting off.
1. The login page url
2. The next thing you will need is the form you want to work with. By this I mean this is the login form. All you need to do is go to the username box ->> right click on it->> go to the inspect elements option. Now scroll up until you find the first form tag. In most cases you will find the form name attribute but some of the websites do not have this. If there exists then the value given to the name attribute under the form tag is the thing you need to access the form. Another way to access forms is by their index. The first form is indexed 0. Now in case the form name is not available, you will need to find how many forms are present in the login url(basically most of the websites have only one form because all you want the login page to do is login if authenticated). In this case the form index is 0. If at all their happens to be more than one form then you gotta find its index which I assume you can because there is nothing to do except finding the number the form you want to work with is adjusted at. The index for that form will be one less the position it is present in.
3. Now you need to know the variable name that is assigned to take the value you enter to the email/username and password section. To get these values inspect element when you are inside the fields email/username and password. Below is a snapshot to give you insights of the variables you want to take care of.
4. Your login credentials for that particular website.
Thats all what it takes for us to login to a website using mechanize module in python. Once logged in you can have access to any authorized url(s) under that domain. Currently I am interested in extracting my assignments and getting it uploaded to a google spreadsheet therefore I can use google’s service and get emails when I have a new assignment. Although this post only walked through login procedure. There are numerous possibilities you can derive from the use of mechanize module. Lets code for better life, happy coding.
import mechanize #pip install mechanize br = mechanize.Browser() br.set_handle_robots(False) br.addheaders = [("User-agent","Mozilla/5.0 (X11; U; Linux i686; en-US; rv:126.96.36.199) Gecko/20101206 Ubuntu/10.10 (maverick) Firefox/3.6.13")] sign_in = br.open("http://school.dwit.edu.np/login/index.php") #the login url br.select_form(nr = 0) #accessing form by their index. Since we have only one form in this example, nr =0. #br.select_form(name = "form name") Alternatively you may use this instead of the above line if your form has name attribute available. br["username"] = "email/username" #the key "username" is the variable that takes the username/email value br["password"] = "password" #the key "password" is the variable that takes the password value logged_in = br.submit() #submitting the login credentials logincheck = logged_in.read() #reading the page body that is redirected after successful login print logincheck #printing the body of the redirected url after login #req = br.open("http://school.dwit.edu.np/mod/assign/").read() #accessing other url(s) after login is done this way