Posts Tagged: "python"

Grab siteprice and write to google spreadsheet using python

- - Applications, Python, Tutorials

By the end of this read you will be able to grab site price from siteprice.org and write it to google spreadsheet using python. Every website has it’s competition. As our website evolves, we have more competitions and the competitors website also earns good value. It is vital to know the value of our website as well as our competition’s value. Siteprice.org is one of those websites which calculates a website’s value based on different factors.

Putting domain name of website in a text file with one domain per line will be our strategy for querying number of websites’s price. You may wish to put hundreds of websites in this txt file which are your competitions.

Python codes to extract site price and write in google spreadsheet

from bs4 import BeautifulSoup
from urllib2 import urlopen
import gdata.spreadsheet.service
import datetime
rowdict = {}
rowdict['date'] = str(datetime.date.today())
spread_sheet_id = '13mX6ALRRtGlfCzyDNCqY-G_AqYV4TpE7rq1ZNNOcD_Q'
worksheet_id = 'od6'
client = gdata.spreadsheet.service.SpreadsheetsService()
client.debug = True
client.email = 'email@domain.com'
client.password = 'password'
client.source = 'siteprice'
client.ProgrammaticLogin()
with open('websitesforprice.txt') as f:
    for line in f:
        soup = BeautifulSoup(urlopen("http://www.siteprice.org/website-worth/" + line).read())
        rowdict['website'] = str(line)
        rowdict['price'] = soup.find(id="lblSitePrice").string
        client.InsertRow(rowdict,spread_sheet_id, worksheet_id)

1. Line 1 to 4

These lines are import statements. Here in this program, we are using various python libraries. Gdata is used to access google spreadsheet. We are using BeautifulSoup because it allows us to get data via id which we will use to get the price of a website. Datetime is used to get the current date. Urlopen us used to open the webpage which contains the data we want.

2.Line 5 to 14

In order to write the extracted rank to google spreadsheet programmatically we are using the gdata module. In order to write to a spreadsheet we need the spreadsheet id, worksheet id and a dictionary containing values we want to write to the spreadsheet. The dictionary contains key as the column header and value as the string that is to be written to the spreadsheet(website, price, date for our program).

Go to docs.google.com when logged in and create a new spreadsheet. Fill the first three columns of the first row as website, price and date respectively. All the letter should be in lower case and no whitespaces. Now when you have created a new spreadsheet, take a look to the url. The url looks something like this one

https://docs.google.com/spreadsheets/d/13mX6ALRRtGlfCzyDNCqY-G_AqYV4TpE7rq1ZNNOcD_Q/edit#gid=0

The spreadsheet id(mentioned earlier) is present in the url.

13mX6ALRRtGlfCzyDNCqY-G_AqYV4TpE7rq1ZNNOcD_Q” in the above url is the spreadsheet id we need. By default the worksheet id is ‘od6‘.

Basically line 5 to 14 are codes to access google spreadsheet.

3. Line 15 to 20

Since we’re writing a program that can extract alexa ranks for hundreds of websites and append it to google spreadsheet, therefore taking url from console input is never a good solution. We have to write the url of websites we want to take care of in a text file. Each website in a single line in the format www.domain.com. Make sure there is a valid website, one in each line because we will read the url from python line by line.

Line 17 makes a soup element out of the url which has the information we are looking for. The soup element is of different websites in each iteration. Line 18 stores the value of the domain in the key “website” of json rowdict. Line 19 stores the price of the website in the key price of json rowdict. You can see we use BeutifulSoup to get data via id. Finally line 20 pushes the entire json element to google spreadsheet. This piece of codes runs for the number of times equal to the line in text file.

Thanks for reading :) Enjoy!! . If you have any questions regarding the post, feel free to comment below.

Enroll In 100 Off Courses At Udemy Automatically Python Codes To Get Paid Courses

- - Applications, Python, Tutorials
Udemy is a teaching and learning platform with loads of courses in various categories. Now very often different coupon codes are available for purchasing courses in minimal amount or with a 100% discount. Various websites serve these coupon codes. One of those websites which I rely on is GrowthCoupon.com

Now, I am not writing a review of 100% off coupon providers. Through this post I will explain my code which I am using to extract the 100% off coupon codes from growthcoupon.com and then get those courses automatically. I have automated my code so that I do not need to worry about new coupon codes available and can save my time. The below code enrolls you in 10 latest 100% off courses available at growthcoupon.com when run a single time. You may wish to automate the script every hour or so.

Get 100%off Udemy courses automatically using python

from json import loads
from bs4 import BeautifulSoup
import mechanize
api_key = "8def4868-509c-4f34-8667-f28684483810%3AS7obmNY1SsOfHLhP%2Fft6Z%2Fwc46x8B2W3BaHpa5aK2vJwy8VSTHvaPVuUpSLimHkn%2BLqSjT6NERzxqdvQ%2BpQfYA%3D%3D"
growth_coupon_url = "https://api.import.io/store/data/a5ef05a9-784e-410c-9f84-51e1e8ff413c/_query?input/webpage/url=http%3A%2F%2Fgrowthcoupon.com%2Fcoupon-category%2F100-discount%2F&_user=8def4868-509c-4f34-8667-f28684483810&_apikey=" + api_key
br = mechanize.Browser()
br.set_handle_robots(False)
br.addheaders = [("User-agent","Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.13) Gecko/20101206 Ubuntu/10.10 (maverick) Firefox/3.6.13")]
sign_in = br.open("https://www.udemy.com/join/login-popup/")
br.select_form(nr=3)
br["email"] = "email@domain.com"
br["password"] = "password"
logged_in = br.submit()

growth_coupon = br.open(growth_coupon_url)
json_obj = loads(growth_coupon.read())

for course_link in json_obj["results"]:
    try:
        course_page = br.open(str(course_link["couponcode_link"]))
        soup = BeautifulSoup(course_page)
        for link in soup.find_all("a"):
            req_link = link.get('href')
            if 'https://www.udemy.com/payment/checkout' in str(req_link):
                print req_link
                br.open(str(req_link))
                print "success"
                break
    except (mechanize.HTTPError,mechanize.URLError) as e:
        print e.code

The above program is a pure python code that extracts 10 latest 100% off coupon codes from GrowthCoupon.com and then enrolls you in those courses automatically.

1. Line 1 to 3

The first three lines are the import statements. In our program, we are using three python libraries. Amongst them, mechanize is used to login to the udemy account. BeautifulSoup is used to get the data on the basis of tags. Here in our program we use BeautifulSoup to get the links in certain page. Json’s loads is used to load the json response.

2. Line 4 and 5

We are using import.io API in order to extract data from growthcoupon. I got to know about this very cool resource in my Programming Synthesis class at my college. Here’s How to get and use import.io API. We store the API in a variable api_key. Then concatenate to the growth_coupon_url which is the standard post request url to get data in json format from growthcoupon.

3. Line 6 to 13

From line 6 to 13 is the procedure to login to a website (udemy in our case). Line 6 initializes a browser. Line 7 says ignore the robots.txt file. Line 8 adds a user agent to the browser. Line 9 opens the login url in the browser we initiated earlier.

The next thing you will need is the form you want to work with. By this I mean this is the login form. All you need to do is go to the username box ->> right click on it->> go to the inspect elements option. Now scroll up until you find the first form tag. In most cases you will find the form name attribute but some of the websites do not have this. If there exists then the value given to the name attribute under the form tag is the thing you need to access the form. Another way to access forms is by their index. The first form is indexed 0. Now in case the form name is not available, you will need to find how many forms are present in the login url(basically most of the websites have only one form because all you want the login page to do is login if authenticated). In this case the form index is 3.

Now you need to know the variable name that is assigned to take the value you enter to the email/username and password section. To get these values inspect element when you are inside the fields email/username and password. Below is a snapshot to give you insights of the variables you want to take care of.

udemyloginform

4. Line 15 and 16

Here on line 15, we are opening the url that gives us the data from growthcoupon in json format. Line 16 loads the url as a json object.

5. Line 18

Our json object is stored in json_obj variable. But the data we need is stored inside an array which is the value for the key “results” Hence we are iterating through this array.

6. Line 20

Now we open the couponcode link which is the value of the key “couponcode_link”. This url is present in each index of the array. On each loop the particular index’s url’s response is stored in variable course_page.

7. Line 21

We then convert the page response to a soup element by invoking the BeautifulSoup method over course_page.

8. Line 22 to 27

Now we want to iterate through each links found in the soup element. The url for enrolling in the udemy course starts with the string “https://www.udemy.com/payment/checkout”. Hence we check if the string is a substring of the link at each iteration. If the condition satisfies, we open that link to enroll ourselves in that course. Well that’s the end of the code that works.

Thanks for reading :) Enjoy ! If you have any questions regarding the codes/post, comment below.