Category "Python"

Python and Excel

- - Python, Tutorials

I intend to host a set of examples on using python to interact and work with excel files. This article in particular will use openpyxl module in python throughout the examples.

Installing openpyxl

I am using python3 throughout the examples, however it should work similarly with python2. As a good development practice, we should segregate dependencies between various projects and therefore is advised to use virtual environment or some sort.

python3 -m venv excelenv
source excelenv/bin/activate
pip install openpyxl

Open an existing document in python
>>> import openpyxl
>>> wb = openpyxl.load_workbook('example.xlsx')
>>> wb
<openpyxl.workbook.workbook.Workbook object at 0x7f010dce4240>
>>>
Selecting sheets from the workbook/document
>>> wb.get_sheet_names()
['Sheet1']
>>>
>>> sheet = wb.get_sheet_by_name("Sheet1")
>>> sheet
<Worksheet "Sheet1">
>>>
>>> sheet.max_column
3
>>> sheet.max_row
3
>>> sheet.min_row
1
>>> sheet.min_column
2
>>>
>>> sheet['A1'].value
'thetaranights.com'
>>> sheet['B1'].value
102345
>>> sheet['A2'].value
'thetaranights.com'
>>> sheet['B2'].value
123443
>>>
When the cell being accessed doesn’t have a value
>>> a = sheet['A4'].value
>>> a
>>> type(a)
<class 'NoneType'>
>>>

OpenPyXL will automatically interpret the types of the values in the cells of the sheet and return them as an object of that type. string , int, dates, etc.

>>> type(sheet['A3'].value)
<class 'str'>
>>> type(sheet['B3'].value)
<class 'int'>
Accessing cell values using row, column directive

Although we can access the values of the cell using alphabetic letters directive, we can also access them using row number and column numbers. An excel row and column starts at 1, not 0.

>>> sheet.cell(row=0, column=0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/bhishan-1504/excelinpython/excelenv/lib/python3.6/site-packages/openpyxl/worksheet/worksheet.py", line 296, in cell
    raise ValueError("Row or column values must be at least 1")
ValueError: Row or column values must be at least 1
>>>

>>> sheet.cell(row=1, column=1)
<Cell 'Sheet1'.A1>
>>> sheet.cell(row=1, column=1).value
'thetaranights.com'
>>>
Reading values from excel sheet
>>> wb = openpyxl.load_workbook('example.xlsx')
>>> sheet = wb.get_sheet_by_name("Sheet1")
>>> for each_row in range(sheet.min_row, sheet.max_row + 1):
...     for each_col in range(sheet.min_column, sheet.max_column + 1):
...         print(sheet.cell(row=each_row, column=each_col).value)
...
thetaranights.com
102345
thetaranights.com
123443
thetaranights.com
102234
>>>
Creating excel document
>>> import openpyxl
>>> wb = openpyxl.Workbook()
>>> wb.get_sheet_names()
['Sheet']
>>> sheet = wb.get_sheet_by_name('Sheet')
>>> sheet
<Worksheet "Sheet">
>>> sheet.title = 'Custom Created worksheet'
>>> wb.get_sheet_names()
['Custom Created worksheet']
>>> sheet
<Worksheet "Custom Created worksheet">
>>> sheet['A1'] = 'thetaranights.com'
>>> sheet['A1'].value
'thetaranights.com'
>>>
>>> sheet.append(['thetaranights.com', '102948']) # adding rows to sheet
>>> wb.save('createdbyopenpyxl.xlsx')
>>>

Going serverless with Chalice and AWS lambda

- - Python, Tutorials

The intentions of this post is to host a simple example of chalice from AWS that allows serverless API creation with the use of AWS lambda. You also get auto-generation of IAM policy making it faster to deploy web applications. Chalice expects to pick the AWS credentials from ~/.aws/config

Prerequisites for chalice[AWS Credentials]

If you’ve used AWS API or boto/boto3 for python, you’ve probably already added the credentials in place. Otherwise you could do it as below:

mkdir ~/.aws
touch ~/.aws/config

Contents of the config file

[default]
aws_access_key_id = YOUR_ACCESS_KEY_HERE
aws_secret_access_key = YOUR_SECRET_ACCESS_KEY
region = YOUR_REGION

Installing Chalice

As a good development practice, we should segregate dependencies between various projects and therefore is advised to use virtual environment or some sort.

$ python3 -m venv serverlessenv
$ source serverlessenv/bin/activate

(serverlessenv)$ pip install chalice

Creating simple API with chalice

Chalice comes with command line tool. See what commands are available

(serverlessenv)$ chalice

Usage: chalice [OPTIONS] COMMAND [ARGS]...

Options:
--version Show the version and exit.
--project-dir TEXT The project directory. Defaults to CWD
--debug / --no-debug Print debug logs to stderr.
--help Show this message and exit.

Commands:
delete
deploy
gen-policy
generate-pipeline Generate a cloudformation template for a...
generate-sdk
invoke Invoke the deployed lambda function NAME.
local
logs
new-project
package
url

(serverlessenv)$ chalice new-project

___ _ _ _ _ ___ ___ ___
/ __|| || | /_\ | | |_ _|/ __|| __|
| (__ | __ | / _ \ | |__ | || (__ | _|
\___||_||_|/_/ \_\|____||___|\___||___|

The python serverless microframework for AWS allows
you to quickly create and deploy applications using
Amazon API Gateway and AWS Lambda.

Please enter the project name: basichelloworld

(serverlessenv)$ cd basichelloworld/

(serverlessenv)~/basichelloworld$ ls -a

app.py .chalice .gitignore requirements.txt

Chalice is very simple, somewhat similar to Flask if you come from there.

Auto-generated app.py
from chalice import Chalice

app = Chalice(app_name='basichelloworld')


@app.route('/')
def index():
    return {'hello': 'world'}


# The view function above will return {"hello": "world"}
# whenever you make an HTTP GET request to '/'.
#
# Here are a few more examples:
#
# @app.route('/hello/{name}')
# def hello_name(name):
#    # '/hello/james' -> {"hello": "james"}
#    return {'hello': name}
#
# @app.route('/users', methods=['POST'])
# def create_user():
#     # This is the JSON body the user sent in their POST request.
#     user_as_json = app.current_request.json_body
#     # We'll echo the json body back to the user in a 'user' key.
#     return {'user': user_as_json}
#
# See the README documentation for more examples.
#
Running the API locally

(serverlessenv)~/basichelloworld$ chalice local
Serving on 127.0.0.1:8000

(serverlessenv)~/basichelloworld$ curl -X GET http://127.0.0.1:8000
{"hello": "world"}

Going serverless with chalice through AWS lambda

(serverlessenv)~/basichelloworld$ chalice deploy
Creating deployment package.
Updating policy for IAM role: basichelloworld
Creating lambda function: basichelloworld
Creating Rest API
Resources deployed:
- Lambda ARN: arn:aws:lambda:us-east-1:9582857991:function:basichelloworld
- Rest API URL: https://fxcdyzuitc.execute-api.us-east-1.amazonaws.com/api/

(serverlessenv)~/basichelloworld$ curl -X GET https://fxcdyzuitc.execute-api.us-east-1.amazonaws.com/api/
{"hello": "world"}

Python, Boto and AWS EC2

- - Python, Tutorials

Most if not all software companies have adopted to cloud infrastructure and services. AWS in particular is very popular amongst all. The intentions of this post is to host a few examples on using boto to make use of one of the services available on AWS i.e EC2. It is more likely than not to have need of a mechanism to programatically fire up a few instances, shut them down, filter instances and send remote commands to it to say the least.

Filter instances based on tag names from the AWS inventory

EC2 instances on AWS can have as many tag names key: value as required for purposes like identifying an instance or a set of instances. Also when the instance you are working on quite frequently needs to shut down and boot over again and you haven’t implemented elastic IP, you are bound to changes in the public IP address. Although you could argue to use private IP to filter an instance, it isn’t very effective when you have a lot of instances(>100).

Boto2
import boto.ec2

conn = boto.ec2.connect_to_region('us-east-1', aws_access_key_id='aws_access_id', aws_secret_access_key='aws_secret')
reservations = conn.get_all_instances(filters={'tagName' : 'value'})
public_ips = [each_instance.ip_address for r in reservations for each_instance in r.instances]
# each_instance.private_ip_address  to get the private ip address of the instance
Boto3
import boto3
session = boto3.session.Session(aws_access_key_id=aws_access_id,
                                aws_secret_access_key=aws_secret,
                                region_name='us-east-1')
 
ec2 = session.resource('ec2')
instances = ec2.instances.filter(
    Filters=[{'Name':'tag:purpose', 'Values':['intelligence']}
])
public_ips = [each_instance.public_ip_address for each_instance in instances]
# each_instance.private_ip_address to get the private ip address of the instance
Boot/Shutdown an instance/instances from the AWS inventory

Using boto, you can boot/shutdown/terminate instances.

Boto2
def start_stop_terminate_instance(instance_ids, conn, action='start'):
    if action == 'start':
        conn.start_instances(instance_ids=instance_ids)
    elif action == 'stop':
        conn.stop_instances(instance_ids=instance_ids)
    elif action == 'terminate':
        conn.terminate_instances(instance_ids=ids)
Boto3
def start_stop_terminate_instance(instance_ids, conn, action='start'):
    if action == 'start':
        conn.instances.filter(InstanceIds=instance_ids).start()
    elif action == 'stop':
        conn.instances.filter(InstanceIds=instance_ids).stop()
    elif action == 'terminate':
        conn.instances.filter(InstanceIds=instance_ids).terminate()
Create Instances based on various metrics

Boto makes use of the AWS APIs that also allows creating instances. An EC2 instance can have various properties. The most common is the type of the instance. Types are generally a grouping of instances based on metrics such as power, performance, bandwidth. Commonly used types for general purpose are t2, m4, m3. C5, c4, c3 are compute optimized instances. For a process/application more leaned towards in-memory activities, you’d use x1, r4, r3. There are other types too but the above mentioned are quite common in use. The other properties of an instance are instance id, the memory size (micro, nano, small, large, xlarge, 2xlarge, 4xlarge, 8xlarge, 10xlarge.), the key pair to make a secured connection to the instance, tag names, display names, security groups, attached storage id, etc. Using boto we can create an instance or multiple instances based on the above mentioned parameters.

Boto2
import boto.ec2
conn = boto.ec2.connect_to_region('us-east-1', aws_access_key_id='aws_access_id', aws_secret_access_key='aws_secret')
conn.run_instances(
    'ami-ag139jf',
    min_count=10, 
    max_count=100,
    key_name='myKey',
    instance_type='t2.small',
    security_groups=['sg-4512']
)
Boto3
import boto3
session = boto3.session.Session(aws_access_key_id='aws_access_id',
                                aws_secret_access_key='aws_secret',
                                region_name='us-east-1')
 
ec2 = session.resource('ec2')
ec2.create_instances(
    ImageId='ami-ag139jf', 
    MinCount=10, 
    MaxCount=100, 
    InstanceType='t2.small',
    KeyName='myKey',
    SecurityGroups=['sg-4512']
)
Send remote commands to an EC2 instance

Paramiko can be used for connecting to a remote instance and sending commands to be executed and get the standard output/error to act accordingly.

import paramiko

key = paramiko.RSAKey.from_private_key_file(path_to_pem_file)
client = paramiko.SSHClient()
client.set_missing_host_key_policy(paramiko.AutoAddPolicy())

# Connect to the instance
try:
    # using username, public ip address and the pem file, create connection to the instance
    client.connect(hostname=instance_ip, username="ubuntu", pkey=key)

    # Execute command remotely.
    stdin, stdout, stderr = client.exec_command(“ls -l”)
    print stdout.read()
    client.close()

except Exception, e:
    print e

Google APIs and Python – Part II

- - Python, Tutorials

Google services are cool and you can build products and services around it. We will see through examples how you can use various google services such as spreadsheet, slides and drive through Python. I hope people can take ideas from the following example to do amazing stuffs with Google services. There is a part one to this article where I walked through procedure to enable Google APIs, installation of required packages in Python, authentication and demonstrated individual examples of Sheets, Drive and Slides API. https://www.thetaranights.com/brief-introduction-to-google-apissheets-slides-drive/ . In this article however, we will integrate Sheets, Drive and Slides API altogether.

The Idea

We will use data from a sheet which contains some statistics about a few applications/websites. The end goal is to create a presentation slide, add a background image to it, add content from the sheet to the slide and also some other cool stuffs. All the resources used in the following examples are public so you can follow along.

I will be using following resources throughout the example

Create Presentation Slides from Sheets data and Drive images using Python
from googleapiclient import discovery
from httplib2 import Http
from oauth2client import file, client, tools

TEMPLATE_FILE = "TEM_F"

SCOPES = ('https://www.googleapis.com/auth/spreadsheets','https://www.googleapis.com/auth/drive')

CLIENT_SECRET = 'client_secret_760822340075-i0ark1h51pnbhii5dgafug3k4g1nodb8.apps.googleusercontent.com.json' # download from google console after activating apis

store = file.Storage('storage.json') # doesn't matter if not present, you will be prompted to accept access to google resources on your account and a token will be generated that is stored inside storage.json with requested previliges.

credz = store.get()

if not credz or credz.invalid:
    flow = client.flow_from_clientsecrets(CLIENT_SECRET, SCOPES)
    credz = tools.run_flow(flow, store)

HTTP = credz.authorize(Http())

SHEETS = discovery.build('sheets', 'v4', http=HTTP)

SLIDES = discovery.build('slides', 'v1', http=HTTP)

DRIVE = discovery.build('drive', 'v3', http=HTTP)


presentation_template_file_id = "1wLimfuGw1pqZZJvkc15lOfD7LSEAWCjuIlhbrOaiulE" # the template has been made public.
# name of the presentation file
DATA = {'name':'MobileApplicationsReport'}

PRESENTATION_ID = DRIVE.files().copy(body=DATA, fileId="1wLimfuGw1pqZZJvkc15lOfD7LSEAWCjuIlhbrOaiulE").execute()['id']
print(PRESENTATION_ID)

sheet_ID = '1xpjQkF692lNnTsfOckVll2OPTa659ZCuK3JezDSkris' # the sheet where we fetch data from to populate to the slides.

application_statistics = SHEETS.spreadsheets().values().get(range='Sheet1', spreadsheetId=sheet_ID).execute().get('values') # all the data from the sheet as lists including headers.

print(application_statistics)

presentation_details = SLIDES.presentations().get(presentationId=PRESENTATION_ID).execute()

slides_data = presentation_details.get('slides', [])[0]

page_id = slides_data['objectId'] # page id of the first slide of the presentation.

for each_data in application_statistics[1:]: # skip the headers.
    # duplicate slide for the next cycle before replacing content on a slide since we are using method of replacing text from the slide to populate data.
    reqs = [{"duplicateObject": {"objectId": page_id}}]
    copy_slide_rsp = SLIDES.presentations().batchUpdate(body={'requests':reqs}, presentationId=PRESENTATION_ID).execute()
    
    IMG_ID = each_data[10] # the id of the image present on google drive which we intend to have as a background image to this particular slide.
    img_url = '%s&access_token=%s' % (DRIVE.files().get_media(fileId=IMG_ID).uri, credz.access_token)
    print("Image url", img_url)

    # prepare a bulk requests that basically replaces the text from the template with the actual data from the sheets.
    bulk_requests = [
        {'updatePageProperties':{'objectId':page_id, 'pageProperties':{'pageBackgroundFill':{'stretchedPictureFill':{'contentUrl':img_url}}}, 'fields':'pageBackgroundFill'}},
        {'replaceAllText':{'containsText':{'text':'{{SHOWCASE  NAME}}', 'matchCase':True}, 'replaceText':each_data[1], "pageObjectIds": [page_id]}},
        {'replaceAllText':{'containsText':{'text':'{{DESCRIPTION}}', 'matchCase':True}, 'replaceText':each_data[2], "pageObjectIds": [page_id]}},
        {'replaceAllText':{'containsText':{'text':'{{COMPOSITION}}', 'matchCase':True}, 'replaceText':each_data[3], "pageObjectIds": [page_id]}},
        {'replaceAllText':{'containsText':{'text':'{{IMPRESSIONS}}', 'matchCase':True}, 'replaceText':each_data[8], "pageObjectIds": [page_id]}},
        {'replaceAllText':{'containsText':{'text':'{{VIDEO VIEWS}}', 'matchCase':True}, 'replaceText':each_data[7], "pageObjectIds": [page_id]}},
        {'replaceAllText':{'containsText':{'text':'{{USERS}}', 'matchCase':True}, 'replaceText':each_data[6], "pageObjectIds": [page_id]}},
        {'replaceAllText':{'containsText':{'text':'{{MOBILE}}', 'matchCase':True}, 'replaceText':each_data[9], "pageObjectIds": [page_id]}}
    ]
    bulk_update_response = SLIDES.presentations().batchUpdate(body={'requests':bulk_requests}, presentationId=PRESENTATION_ID, fields='').execute().get('replies')

    page_id = copy_slide_rsp['replies'][0]['duplicateObject']['objectId'] # update the page id as the one that was duplicated so we now can work on this slide.

delete_final_page = SLIDES.presentations().batchUpdate(body={'requests':[{"deleteObject": {"objectId": page_id}}]}, presentationId=PRESENTATION_ID, fields='').execute().get('replies')
Output Presentation Created:

After successful running of the above program, following presentation was generated.
https://docs.google.com/presentation/d/1h9YqUnCWu5pxXmW3rs_9rKmMsVeJM9I8nIBGbr25pME/edit?usp=sharing

Brief Introduction to Google APIs(Sheets, Slides, Drive)

- - Python, Tutorials

The intentions of this post is to familiarize usage of Google APIs with Python. Google services are cool and you can build products and services around it. We will see through examples how you can use various google services such as spreadsheet, slides and drive through Python. I hope people can take ideas from the following example to do amazing stuffs with Google services. In order to work with google services via their APIs, first we need to create a project on google console with specific APIs enabled. For the scope of this article, we will need the SHEETS API, SLIDES API and DRIVE API enabled.

I will be using following resources throughout the examples

Installation of libraries and setup

pip install --upgrade google-api-python-client oauth2client

Creating a project on Google Console and enabling APIs

1. Open google console https://console.cloud.google.com/apis/dashboard
2. Create a new project

create_a_project_google_console

Create a new project on google console

3. Name the project

new_project_google_console

Name new project

4. Enable Sheets, Slides and Drive APIs

google_console_enable_apis

Enable Google APIs



google_console_enable_drive_api

Enable Drive API as well as slides and sheets APIs

5. Create Credentials and Download it.

google_console_create_credentials

Create credentials for the project and download it

Authentication

We need the credentials that was downloaded from google console for authentication. Google creates an access token to access and work on the google resources. The token does expire and in case it does, we will be prompted on a browser to provide access to the application for the specified resources on our google account.

>>> from googleapiclient import discovery
>>> from httplib2 import Http
>>> from oauth2client import file, client, tools
>>> SCOPES = ('https://www.googleapis.com/auth/spreadsheets','https://www.googleapis.com/auth/drive')
>>> CLIENT_SECRET = 'client_secret_760822340075-i0ark1h51pnbhii5dgafug3k4g1nodb8.apps.googleusercontent.com.json'
>>> store = file.Storage('token.json')
>>> creds = store.get()
/home/bhishan-1504/googleapis/googleapienv/lib/python3.6/site-packages/oauth2client/_helpers.py:255: UserWarning: Cannot access token.json: No such file or directory
  warnings.warn(_MISSING_FILE_MESSAGE.format(filename))
>>> 

>>> if not credz or credz.invalid:
...     flow = client.flow_from_clientsecrets(CLIENT_SECRET, SCOPES)
...     credz = tools.run_flow(flow, store)
...

Your browser has been opened to visit:

    https://accounts.google.com/o/oauth2/auth?client_id=760822340075-i0ark1h51pnbhii5dgafug3k4g1nodb8.apps.googleusercontent.com&redirect_uri=http%3A%2F%2Flocalhost%3A8080%2F&scope=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fspreadsheets+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive&access_type=offline&response_type=code

If your browser is on a different machine then exit and re-run this
application with the command-line parameter

  --noauth_local_webserver

Created new window in existing browser session.
Authentication successful.
>>>

Note: For all of the examples below, we need authentication. I am skipping them in all of the examples underneath to remove redundancy.

Reading a google spreadsheet using Python
>>> HTTP = credz.authorize(Http())
>>> SHEETS = discovery.build('sheets', 'v4', http=HTTP)
>>> sheet_ID = '1xpjQkF692lNnTsfOckVll2OPTa659ZCuK3JezDSkris'
>>> spreadsheet_read = SHEETS.spreadsheets().values().get(range='Sheet1', spreadsheetId=sheet_ID).execute()
>>> spreadsheet_read
{'range': 'Sheet1!A1:Z1001', 'majorDimension': 'ROWS', 'values': [['Category', 'Showcase Name', 'Description', 'Audience Composition', 'ID', 'Audience Name', 'Users', 'Video Views', 'Impressions', 'Mobile Impressions', 'Image'], ['Sports', 'Sports Fans', 'People who likes sports', 'online data', '12', 'Sports Fans', '1000', '1000', '1000', '1000', 'sports-fans.jpg']]}
>>> spreadsheet_values = spreadsheet_read['values']
>>> spreadsheet_values
[['Category', 'Showcase Name', 'Description', 'Audience Composition', 'ID', 'Audience Name', 'Users', 'Video Views', 'Impressions', 'Mobile Impressions', 'Image'], ['Sports', 'Sports Fans', 'People who likes sports', 'online data', '12', 'Sports Fans', '1000', '1000', '1000', '1000', 'sports-fans.jpg']]
>>>
Search and Download a file from drive using Python
>>> import io
>>> DRIVE = discovery.build('drive', 'v3', http=HTTP)
>>> resp = DRIVE.files().list(q="name='{filepath}'".format(filepath="P1150264.JPG")).execute()
>>> resp
{'kind': 'drive#fileList', 'incompleteSearch': False, 'files': [{'kind': 'drive#file', 'id': '0B54qUrMD2GDIa0FMdkxLMmpoZVU', 'name': 'P1150264.JPG', 'mimeType': 'image/jpeg'}]}
>>> file_id = resp['files'][0]['id']
>>> file_id
'0B54qUrMD2GDIa0FMdkxLMmpoZVU'
>>> file_request = DRIVE.files().get_media(fileId=file_id)
>>> fh = io.BytesIO()
>>> downloader = MediaIoBaseDownload(fh, file_request)
>>> done = False
>>> while done is False:
...     status, done = downloader.next_chunk()
...     print("Download {status}".format(status=status.progress() * 100))
...
Download 100.0
>>>

Google Slides API

Create a blank presentation using Python
>>> SLIDES = discovery.build('slides', 'v1', http=HTTP)
>>> body = {'title': 'AutomatedPresentation'}
>>> presentation_request = SLIDES.presentations().create(body=body).execute()
>>> presentation_request['presentationId']
'1ROJOeVFaA4PbC2voR5EFddohxQlZvUkrdi1dsJUks9c'
>>>

Follow the link to see the presentation the above code snippet creates.
https://docs.google.com/presentation/d/1ROJOeVFaA4PbC2voR5EFddohxQlZvUkrdi1dsJUks9c/edit?usp=sharing

Creating presentation using existing template from drive
>>>TEMPLATE_FILE = 'TEM_F'
>>> SLIDES = discovery.build('slides', 'v1', http=HTTP)
>>> DRIVE = discovery.build('drive', 'v3', http=HTTP)
>>> rsp = DRIVE.files().list(q="name='%s'"% TEMPLATE_FILE).execute()['files'][0]
>>> rsp
{'kind': 'drive#file', 'id': '1wLimfuGw1pqZZJvkc15lOfD7LSEAWCjuIlhbrOaiulE', 'name': TEMPLATE_FILE, 'mimeType': 'application/vnd.google-apps.presentation'}
>>> DATA = {'name': 'PresentationUsingTemplate'}
>>> create_presentation_request = DRIVE.files().copy(body=DATA, fileId=rsp['id']).execute()
>>> presentation_id = create_presentation_request['id']
>>> presentation_id
'10iDjayeyVkVSp5F6eQIqzpISAqjFlbqG4_jdYDAFJG4'
>>>

Follow the link to see the presentation the above code snippet creates. https://docs.google.com/presentation/d/10iDjayeyVkVSp5F6eQIqzpISAqjFlbqG4_jdYDAFJG4/edit?usp=sharing

Adding background image to a slide
>>> SLIDES = discovery.build('slides', 'v1', http=HTTP)
>>> DRIVE = discovery.build('drive', 'v3', http=HTTP)
>>> rsp = DRIVE.files().list(q="name='%s'"% TEMPLATE_FILE).execute()['files'][0]
>>> rsp
{'kind': 'drive#file', 'id': '1wLimfuGw1pqZZJvkc15lOfD7LSEAWCjuIlhbrOaiulE', 'name': 'TEM_F', 'mimeType': 'application/vnd.google-apps.presentation'}
>>> DATA = {'name': 'PresentationUsingTemplatePlusBackgroundImage'}
>>> create_presentation_request = DRIVE.files().copy(body=DATA, fileId=rsp['id']).execute()
>>> presentation_id = create_presentation_request['id']
>>> presentation_id
'1cxpaH19h582Q4Ot3b5GL9U6ETl9myqE3JlX4_Fa35e8'
>>> IMG_FILE = "sports-fans.jpg"
>>> img_file_request = DRIVE.files().list(q="name='%s'" % IMG_FILE).execute()['files'][0]
>>> img_url = '%s&access_token=%s' % (DRIVE.files().get_media(fileId=img_file_request['id']).uri, credz.access_token)
>>> img_url
'https://www.googleapis.com/drive/v3/files/0B54qUrMD2GDIa2syZWF3OE5xSUk?alt=media&access_token=ya29.Glz3BcRtfadsfGzKwUQ-6llroeaMfdasfdaffadsjfdXiOewDdHqhdgBef2euMm9OMxGXyXF-axwZ0gFBwH2-T6qS29qmpc-H3ELcyh7CDZCbfzn7DTNJkugoA'

>>> presentation_details = SLIDES.presentations().get(presentationId=presentation_id).execute()
>>> first_slide_data = presentation_details.get('slides', [])[0]
>>> first_slide_id = slides_data['objectId']
>>> first_slide_id
'p3'
>>> bulk_reqs = [{'updatePageProperties':{'objectId':first_slide_id, 'pageProperties':{'pageBackgroundFill':{'stretchedPictureFill':{'contentUrl':img_url}}}, 'fields':'pageBackgroundFill'}}]
>>> bulk_update_req = SLIDES.presentations().batchUpdate(body={'requests':bulk_reqs}, presentationId=presentation_id).execute()

Follow the link to see the presentation the above code snippet creates.
https://docs.google.com/presentation/d/1cxpaH19h582Q4Ot3b5GL9U6ETl9myqE3JlX4_Fa35e8/edit?usp=sharing

On a follow up post to this one, we will focus on integrating slides, sheets and drive API altogether. We shall use spreadsheet data and populate it onto presentation slides.
To be continued…

Update

Published the second part to this article. https://www.thetaranights.com/google-apis-and-python-part-ii/

Python filter() built-in

- - Python, Tutorials

Filter makes an iterator that takes a function and uses the arguments from the following iterable passed to the filter built-in. It returns a filtered iterator which contains only those values for which the function(passed as the first argument to the filter) evaluated truth value. What makes this possible is the equal status of every object in Python. One of the main goals of Python was to have an equal status for all the objects. Remember how even a function is an object in Python and hence it can be assigned to a variable, passed as an argument to an another function, etc.


filter(function or None, iterable)

The first argument is a function that you want each of the elements of the following iterables to be passed as an argument and be evaluated.

Other than the function object, the filter built-in should have one iterable as an argument such that the arguments for the function is taken from the iterable.

Filter takes two arguments
>>> def isdivisibleby2(x):
...     if x % 2 == 0:
...         return True
...     return False
...
>>> filter([1,2,3,4])
Traceback (most recent call last):
  File "", line 1, in 
TypeError: filter expected 2 arguments, got 1
>>> filter(isdivisibleby2)
Traceback (most recent call last):
  File "", line 1, in 
TypeError: filter expected 2 arguments, got 1
>>> filter(isdivisibleby2, [1,2,3,4], [5,6,7,8])
Traceback (most recent call last):
  File "", line 1, in 
TypeError: filter expected 2 arguments, got 3
>>>
Filter Example
>>> def isdivisibleby2(x):
...     if x % 2 == 0:
...         return True
...     return False
...
>>> filtered_list = filter(isdivisibleby2, [1, 2, 3, 4])
>>> filtered_list
<filter object at 0x7f04cb644da0>
>>> list(filtered_list)
[2, 4]
>>>
Filter evaluates Truthy and Falsy

Filter built-in returns a filtered iterator which contains only those values for which the function(passed as the first argument to the filter) evaluated truth value(truthy). An empty sequence such as an empty list [], empty dictionaries, 0 for numeric, None are considered false values or falsy. Almost anything excluding the earlier mentioned are considered truthy. You should read this post on Truthy and Falsy concepts in Python. https://www.thetaranights.com/idiomatic-python-use-of-falsy-and-truthy-concepts/

>>> def arbitrary_function(x):
...     return x
...
>>> filtered_list = filter(arbitrary_function, [1, 2, 3, 4])
>>> filtered_list
<filter object at 0x7f04cb5e9550>
>>> list(filtered_list)
[1, 2, 3, 4]
>>>
>>> def arbitrary_function(x):
...     return 0 # any of False, None, [], {}
...
>>> filtered_list = filter(arbitrary_function, [1, 2, 3, 4])
>>> filtered_list
<filter object at 0x7f04cb5e92b0>
>>> list(filtered_list)
[]
>>>

Python map() built-in

- - Python, Tutorials

Map makes an iterator that takes a function and uses the arguments from the following iterables passed to the map built-in. What makes this possible is the equal status of every object in Python. One of the main goals of Python was to have an equal status for all the objects. Remember how even a function is an object in Python and hence it can be assigned to a variable, passed as an argument to a function, etc.


map(func, *iterables)

The first argument is a function that you want each of the elements of the following iterables to be passed as an argument and be evaluated.

Other than the function object, a map built-in should have at least one iterable and could have iterables as an argument such that the arguments for the function is taken from each of the iterables.

Map takes at least two arguments
>>> def square(x):
...     return x**2
...
>>> map(square)
Traceback (most recent call last):
  File "", line 1, in 
TypeError: map() must have at least two arguments.
>>>
Map Example
>>> def square(x):
...     return x**2
...
>>> squared = map(square, [1,2,3,4,5])
>>> squared
<map object at 0x7f1948bbbef0>
>>> list(squared)
[1, 4, 9, 16, 25]
>>>
Map could take multiple iterables
>>> def add_and_square(x, y):
...     return (x+y)**2
...
>>> added_and_squared = map(add_and_square, [1,2,3,4], [5,6,7,8])
>>> added_and_squared
<map object at 0x7f1948b79518>
>>> list(added_and_squared)
[36, 64, 100, 144]
>>>
When you pass iterables of varying length
>>> def add_and_square(x, y):
...     return (x+y)**2
...
>>> added_and_squared = map(add_and_square, [1,2,3,4], [5,6,7,8, 9])
>>> added_and_squared
<map object at 0x7f1948b795f8>
>>> list(added_and_squared)
[36, 64, 100, 144]
>>>

When you pass iterables of varying length to map built-in, it falls back to the minimum length.

Examples of Browser Automations using Selenium in Python

- - Python, Tutorials

Browser Automation is one of the coolest things to do especially when there is a major purpose to it. Through this post, I intend to host a set of examples on browser automation using selenium in Python so people can take ideas from the code snippets below to perform browser automation as per their need. Selenium allows just about any kinds of interactions with the browser elements and hence is a go for tasks requiring user interaction and javascript support.

Installation:


pip install selenium
Download chromedriver from http://chromedriver.chromium.org/downloads
Download phantomjs from http://phantomjs.org/download.html

Login to a website using selenium
>>> from selenium import webdriver
>>> from selenium.webdriver.common.keys import Keys
>>> executable_path = "/home/bhishan-1504/Downloads/chromedriver_linux64/chromedriver"
>>> browser = webdriver.Chrome(executable_path=executable_path)
>>> browser.get("https://github.com/login")
>>> username_field = browser.find_element_by_name("login")
>>> password_field = browser.find_element_by_name("password")
>>> username_field.send_keys("bhishan")
>>> password_field.send_keys("password")
>>> password_field.send_keys(Keys.RETURN)
>>>
Switching proxy with selenium

As much as selenium is used for web scraping, it is very effective for web interactions too. Suppose a scenario where you have to cast a vote for a competition, one vote per IP address. Following example demonstrates how you would use selenium to perform a repetitive task(casting a vote in this case) from various IP addresses.

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
url = "somedummysite.com/voting/bhishan.php" # url not made public



def cast_vote(proxy):
    service_args = [
    '--proxy=' + proxy,
    '--proxy-type=http',
    ]
    print(service_args)
    browser = webdriver.PhantomJS(service_args=service_args)
    
    browser.get(each_url)
    try:
        cast_vote_element = WebDriverWait(browser, 10).until(
            EC.presence_of_element_located((By.CLASS_NAME, 'vote'))
        )
    except selenium.common.exceptions.TimeoutException:
        print("Cast vote button not available. Seems like you have used this IP already!")
        return
    cast_vote_element.click()
    browser.quit()

def main():
    with open(proxies.txt', 'rb') as f:
        for each_ip in f:
            cast_vote(each_ip.strip())



if __name__ == '__main__':
    main()
Execute JavaScript using selenium

There could be cases where you’d want to execute javascript on the browser instance. The below example is a depiction of one such scenario. Remember when in your News Feed on facebook, a post has hundreds of thousands of comments and you have to monotonously click to expand the comment threads. The example below does it through selenium but has an even bigger purpose. The following code snippet loops over a few thousand facebook urls(relating to a post) and expands the comment threads and prints the page as a pdf file. This was a part of a larger program that had something to do with the pdf files. However, it isn’t relevant to this post. Here is a link to the JavaScript code which is used in the program below that expands the comments on facebook posts. I don’t even remember where I found it though.

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait

import json
import time


# get the js to be executed.

with open('js_code.txt', 'r') as f:
    js_code = f.read()

executable_path = '/home/bhishan-1504/Downloads/chromedriver_linux64/chromedriver'


appState = {
    "recentDestinations": [
        {
            "id": "Save as PDF",
            "origin": "local"
        }
    ],
    "selectedDestinationId": "Save as PDF",
    "version": 2
}


profile = {"printing.print_preview_sticky_settings.appState": json.dumps(appState), 'savefile.default_directory': "/home/bhishan-1504/secret_project/"}

profile["download.prompt_for_download"] = False
profile["profile.default_content_setting_values.notifications"] = 2
chrome_options = webdriver.ChromeOptions()

chrome_options.add_experimental_option('prefs', profile)
chrome_options.add_argument("--start-maximized")
chrome_options.add_argument('--kiosk-printing')

# chrome_options.add_argument("download.default_directory=/home/bhishan-1504/secret_project/")
browser = webdriver.Chrome(executable_path=executable_path, chrome_options=chrome_options)

def save_pdf(count):
    browser.execute_script("document.title=" + str(count) + ";")
    browser.execute_script('window.print();')
    time.sleep(1)


def visit_page(url, count):
    browser.get(url)
    try:
        home_btn = WebDriverWait(browser, 10).until(
            EC.presence_of_element_located((By.LINK_TEXT, "Home"))
        )
    except selenium.common.exceptions.TimeoutException:
        print("Didn’t work out!")
        return

    browser.execute_script(js_code)
    time.sleep(7)
    save_pdf(count)




if __name__ == '__main__':
    count = 1
    # loop through the text file and pass to visit page function.
    with open('urls.txt', 'r') as f:
        for each_url in f.readlines():
            visit_page(each_url, count)
            count += 1

I recently published an article on Web Scraping using BeautifulSoup. You should read it.

Web Scraping – BeautifulSoup Python

- - Python, Tutorials

Data collection from public sources is often beneficial to a business or an individual. As such the term “web scraping” isn’t something new. These data are often wrangled within html tags and attributes. Python is often used for data collection from these sources. The intentions of this post is to host example code snippets so people can take ideas from it to build scrapers as per their needs using BeautifulSoup and urllib module in Python. I will be using github’s trending page https://github.com/trending throughout this post for the examples, especially because it best suits for applying various BeautifulSoup methods.

Installation:

pip install BeautifulSoup4

Get html of a page:
>>> import urllib
>>> resp = urllib.request.urlopen("https://github.com/trending")
>>> resp.getcode()
200
>>> resp.read() # the html
Using BeautifulSoup to get title from a page
>>> import urllib
>>> import bs4
>>> github_trending = urllib.request.urlopen("https://github.com/trending")
>>> trending_soup = bs4.BeautifulSoup(github_trending.read(), "lxml")
>>> trending_soup.title
<title>Trending  repositories on GitHub today · GitHub</title>
>>> trending_soup.title.string
'Trending  repositories on GitHub today · GitHub'
>>>
Find single element by tag name, find multiple elements by tag name
>>> ordered_list = trending_soup.find('ol') #single element
>>>
>>> type(ordered_list)
<class 'bs4.element.Tag'>
>>>
>>> all_li = ordered_list.find_all('li') # multiple elements
>>>
>>> type(all_li)
<class 'bs4.element.ResultSet'>
>>>
>>> trending_repositories = [each_list.find('h3').text for each_list in all_li]
>>> for each_repository in trending_repositories:
...     print(each_repository.strip())
...
klauscfhq / taskbook
robinhood / faust
Avik-Jain / 100-Days-Of-ML-Code
jxnblk / mdx-deck
faressoft / terminalizer
trekhleb / javascript-algorithms
apexcharts / apexcharts.js
grain-lang / grain
thedaviddias / Front-End-Performance-Checklist
istio / istio
CyC2018 / Interview-Notebook
fivethirtyeight / russian-troll-tweets
boyerjohn / rapidstring
donnemartin / system-design-primer
awslabs / aws-cdk
QUANTAXIS / QUANTAXIS
crossoverJie / Java-Interview
GoogleChromeLabs / ndb
dylanbeattie / rockstar
vuejs / vue
sbussard / canvas-sketch
Microsoft / vscode
flutter / flutter
tensorflow / tensorflow
Snailclimb / Java-Guide
>>>
Getting Attributes of an element
>>> for each_list in all_li:
...     anchor_element = each_list.find('a')
...     print("https://github.com" + anchor_element['href'])
...
https://github.com/klauscfhq/taskbook
https://github.com/robinhood/faust
https://github.com/Avik-Jain/100-Days-Of-ML-Code
https://github.com/jxnblk/mdx-deck
https://github.com/faressoft/terminalizer
https://github.com/trekhleb/javascript-algorithms
https://github.com/apexcharts/apexcharts.js
https://github.com/grain-lang/grain
https://github.com/thedaviddias/Front-End-Performance-Checklist
https://github.com/istio/istio
https://github.com/CyC2018/Interview-Notebook
https://github.com/fivethirtyeight/russian-troll-tweets
https://github.com/boyerjohn/rapidstring
https://github.com/donnemartin/system-design-primer
https://github.com/awslabs/aws-cdk
https://github.com/QUANTAXIS/QUANTAXIS
https://github.com/crossoverJie/Java-Interview
https://github.com/GoogleChromeLabs/ndb
https://github.com/dylanbeattie/rockstar
https://github.com/vuejs/vue
https://github.com/sbussard/canvas-sketch
https://github.com/Microsoft/vscode
https://github.com/flutter/flutter
https://github.com/tensorflow/tensorflow
https://github.com/Snailclimb/Java-Guide
>>>
Using class name or other attributes to get element
>>> for each_list in all_li:
...     total_stars_today = each_list.find(attrs={'class':'float-sm-right'}).text
...     print(total_stars_today.strip())
...
1,063 stars today
846 stars today
596 stars today
484 stars today
459 stars today
429 stars today
443 stars today
366 stars today
330 stars today
282 stars today
182 stars today
190 stars today
200 stars today
190 stars today
166 stars today
164 stars today
144 stars today
158 stars today
157 stars today
144 stars today
144 stars today
142 stars today
132 stars today
101 stars today
108 stars today
>>>
Navigate childrens from an element
>>> for each_children in ordered_list.children:
...     print(each_children.find('h3').text.strip())
...
klauscfhq / taskbook
robinhood / faust
Avik-Jain / 100-Days-Of-ML-Code
jxnblk / mdx-deck
faressoft / terminalizer
trekhleb / javascript-algorithms
apexcharts / apexcharts.js
grain-lang / grain
thedaviddias / Front-End-Performance-Checklist
istio / istio
CyC2018 / Interview-Notebook
fivethirtyeight / russian-troll-tweets
boyerjohn / rapidstring
donnemartin / system-design-primer
awslabs / aws-cdk
QUANTAXIS / QUANTAXIS
crossoverJie / Java-Interview
GoogleChromeLabs / ndb
dylanbeattie / rockstar
vuejs / vue
sbussard / canvas-sketch
Microsoft / vscode
flutter / flutter
tensorflow / tensorflow
Snailclimb / Java-Guide
>>>

The .children will only return the immediate childrens of the parent element. If you’d like to get all of the elements under certain element, you should use .descendent

Navigate descendents from an element
>>> for each_children in ordered_list.descendent:
...     # perform operations
Navigating previous and next siblings of elements
>>> all_li = ordered_list.find_all('li')
>>> fifth_li = all_li[4]
>>> # each li element is separated by '\n' and hence to navigate to the fourth li, we should navigate previous sibling twice
...
>>>
>>> fourth_li = fifth_li.previous_sibling.previous_sibling
>>> fourth_li.find('h3').text.strip()
'jxnblk / mdx-deck'
>>>
>>> # similarly for navigating to the sixth li from fifth li, we would use next_sibling
...
>>> sixth_li = fifth_li.next_sibling.next_sibling
>>> sixth_li.find('h3').text.strip()
'trekhleb / javascript-algorithms'
>>>
Navigate to parent of an element
>>> all_li = ordered_list.find_all('li')
>>> first_li = all_li[0]
>>> li_parent = first_li.parent
>>> # the li_parent is the ordered list <ol>
...
>>>
Putting it all together(Github Trending Scraper)
>>> import urllib
>>> import bs4
>>>
>>> github_trending = urllib.request.urlopen("https://github.com/trending")
>>> trending_soup = bs4.BeautifulSoup(github_trending.read(), "lxml")
>>> ordered_list = trending_soup.find('ol')
>>> for each_list in ordered_list.find_all('li'):
...     repository_name = each_list.find('h3').text.strip()
...     repository_url = "https://github.com" + each_list.find('a')['href']
...     total_stars_today = each_list.find(attrs={'class':'float-sm-right'}).text
…        print(repository_name, repository_url, total_stars_today)

klauscfhq / taskbook                             https://github.com/klauscfhq/taskbook                             1,404 stars today
robinhood / faust                                https://github.com/robinhood/faust                                960 stars today
Avik-Jain / 100-Days-Of-ML-Code 	         https://github.com/Avik-Jain/100-Days-Of-ML-Code                  566 stars today
trekhleb / javascript-algorithms 	         https://github.com/trekhleb/javascript-algorithms                 431 stars today
jxnblk / mdx-deck 			         https://github.com/jxnblk/mdx-deck 	                           416 stars today
apexcharts / apexcharts.js 		         https://github.com/apexcharts/apexcharts.js 	                   411 stars today
faressoft / terminalizer 		         https://github.com/faressoft/terminalizer 	                   406 stars today
istio / istio 			                 https://github.com/istio/istio 	                           309 stars today
thedaviddias / Front-End-Performance-Checklist 	 https://github.com/thedaviddias/Front-End-Performance-Checklist   315 stars today
grain-lang / grain 			         https://github.com/grain-lang/grain 	                           301 stars today
boyerjohn / rapidstring 			 https://github.com/boyerjohn/rapidstring 	                   232 stars today
CyC2018 / Interview-Notebook 			 https://github.com/CyC2018/Interview-Notebook 	                   186 stars today
donnemartin / system-design-primer 		 https://github.com/donnemartin/system-design-primer 	           189 stars today
awslabs / aws-cdk 			         https://github.com/awslabs/aws-cdk 	                           186 stars today
fivethirtyeight / russian-troll-tweets 		 https://github.com/fivethirtyeight/russian-troll-tweets 	   159 stars today
GoogleChromeLabs / ndb 			         https://github.com/GoogleChromeLabs/ndb 	                   172 stars today
crossoverJie / Java-Interview 			 https://github.com/crossoverJie/Java-Interview 	           148 stars today
vuejs / vue 			                 https://github.com/vuejs/vue 	                                   137 stars today
Microsoft / vscode 			         https://github.com/Microsoft/vscode 	                           137 stars today
flutter / flutter 			         https://github.com/flutter/flutter 	                           132 stars today
QUANTAXIS / QUANTAXIS 			         https://github.com/QUANTAXIS/QUANTAXIS 	                   132 stars today
dylanbeattie / rockstar 			 https://github.com/dylanbeattie/rockstar 	                   130 stars today
tensorflow / tensorflow 			 https://github.com/tensorflow/tensorflow 	                   106 stars today
Snailclimb / Java-Guide 			 https://github.com/Snailclimb/Java-Guide 	                   111 stars today
WeTransfer / WeScan 			         https://github.com/WeTransfer/WeScan 	                           118 stars today