Python filter() built-in

- - Python, Tutorials

Filter makes an iterator that takes a function and uses the arguments from the following iterable passed to the filter built-in. It returns a filtered iterator which contains only those values for which the function(passed as the first argument to the filter) evaluated truth value. What makes this possible is the equal status of every object in Python. One of the main goals of Python was to have an equal status for all the objects. Remember how even a function is an object in Python and hence it can be assigned to a variable, passed as an argument to an another function, etc.


filter(function or None, iterable)

The first argument is a function that you want each of the elements of the following iterables to be passed as an argument and be evaluated.

Other than the function object, the filter built-in should have one iterable as an argument such that the arguments for the function is taken from the iterable.

Filter takes two arguments
>>> def isdivisibleby2(x):
...     if x % 2 == 0:
...         return True
...     return False
...
>>> filter([1,2,3,4])
Traceback (most recent call last):
  File "", line 1, in 
TypeError: filter expected 2 arguments, got 1
>>> filter(isdivisibleby2)
Traceback (most recent call last):
  File "", line 1, in 
TypeError: filter expected 2 arguments, got 1
>>> filter(isdivisibleby2, [1,2,3,4], [5,6,7,8])
Traceback (most recent call last):
  File "", line 1, in 
TypeError: filter expected 2 arguments, got 3
>>>
Filter Example
>>> def isdivisibleby2(x):
...     if x % 2 == 0:
...         return True
...     return False
...
>>> filtered_list = filter(isdivisibleby2, [1, 2, 3, 4])
>>> filtered_list
<filter object at 0x7f04cb644da0>
>>> list(filtered_list)
[2, 4]
>>>
Filter evaluates Truthy and Falsy

Filter built-in returns a filtered iterator which contains only those values for which the function(passed as the first argument to the filter) evaluated truth value(truthy). An empty sequence such as an empty list [], empty dictionaries, 0 for numeric, None are considered false values or falsy. Almost anything excluding the earlier mentioned are considered truthy. You should read this post on Truthy and Falsy concepts in Python. https://www.thetaranights.com/idiomatic-python-use-of-falsy-and-truthy-concepts/

>>> def arbitrary_function(x):
...     return x
...
>>> filtered_list = filter(arbitrary_function, [1, 2, 3, 4])
>>> filtered_list
<filter object at 0x7f04cb5e9550>
>>> list(filtered_list)
[1, 2, 3, 4]
>>>
>>> def arbitrary_function(x):
...     return 0 # any of False, None, [], {}
...
>>> filtered_list = filter(arbitrary_function, [1, 2, 3, 4])
>>> filtered_list
<filter object at 0x7f04cb5e92b0>
>>> list(filtered_list)
[]
>>>

Python map() built-in

- - Python, Tutorials

Map makes an iterator that takes a function and uses the arguments from the following iterables passed to the map built-in. What makes this possible is the equal status of every object in Python. One of the main goals of Python was to have an equal status for all the objects. Remember how even a function is an object in Python and hence it can be assigned to a variable, passed as an argument to a function, etc.


map(func, *iterables)

The first argument is a function that you want each of the elements of the following iterables to be passed as an argument and be evaluated.

Other than the function object, a map built-in should have at least one iterable and could have iterables as an argument such that the arguments for the function is taken from each of the iterables.

Map takes at least two arguments
>>> def square(x):
...     return x**2
...
>>> map(square)
Traceback (most recent call last):
  File "", line 1, in 
TypeError: map() must have at least two arguments.
>>>
Map Example
>>> def square(x):
...     return x**2
...
>>> squared = map(square, [1,2,3,4,5])
>>> squared
<map object at 0x7f1948bbbef0>
>>> list(squared)
[1, 4, 9, 16, 25]
>>>
Map could take multiple iterables
>>> def add_and_square(x, y):
...     return (x+y)**2
...
>>> added_and_squared = map(add_and_square, [1,2,3,4], [5,6,7,8])
>>> added_and_squared
<map object at 0x7f1948b79518>
>>> list(added_and_squared)
[36, 64, 100, 144]
>>>
When you pass iterables of varying length
>>> def add_and_square(x, y):
...     return (x+y)**2
...
>>> added_and_squared = map(add_and_square, [1,2,3,4], [5,6,7,8, 9])
>>> added_and_squared
<map object at 0x7f1948b795f8>
>>> list(added_and_squared)
[36, 64, 100, 144]
>>>

When you pass iterables of varying length to map built-in, it falls back to the minimum length.

Examples of Browser Automations using Selenium in Python

- - Python, Tutorials

Browser Automation is one of the coolest things to do especially when there is a major purpose to it. Through this post, I intend to host a set of examples on browser automation using selenium in Python so people can take ideas from the code snippets below to perform browser automation as per their need. Selenium allows just about any kinds of interactions with the browser elements and hence is a go for tasks requiring user interaction and javascript support.

Installation:


pip install selenium
Download chromedriver from http://chromedriver.chromium.org/downloads
Download phantomjs from http://phantomjs.org/download.html

Login to a website using selenium
>>> from selenium import webdriver
>>> from selenium.webdriver.common.keys import Keys
>>> executable_path = "/home/bhishan-1504/Downloads/chromedriver_linux64/chromedriver"
>>> browser = webdriver.Chrome(executable_path=executable_path)
>>> browser.get("https://github.com/login")
>>> username_field = browser.find_element_by_name("login")
>>> password_field = browser.find_element_by_name("password")
>>> username_field.send_keys("bhishan")
>>> password_field.send_keys("password")
>>> password_field.send_keys(Keys.RETURN)
>>>
Switching proxy with selenium

As much as selenium is used for web scraping, it is very effective for web interactions too. Suppose a scenario where you have to cast a vote for a competition, one vote per IP address. Following example demonstrates how you would use selenium to perform a repetitive task(casting a vote in this case) from various IP addresses.

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
url = "somedummysite.com/voting/bhishan.php" # url not made public



def cast_vote(proxy):
    service_args = [
    '--proxy=' + proxy,
    '--proxy-type=http',
    ]
    print(service_args)
    browser = webdriver.PhantomJS(service_args=service_args)
    
    browser.get(each_url)
    try:
        cast_vote_element = WebDriverWait(browser, 10).until(
            EC.presence_of_element_located((By.CLASS_NAME, 'vote'))
        )
    except selenium.common.exceptions.TimeoutException:
        print("Cast vote button not available. Seems like you have used this IP already!")
        return
    cast_vote_element.click()
    browser.quit()

def main():
    with open(proxies.txt', 'rb') as f:
        for each_ip in f:
            cast_vote(each_ip.strip())



if __name__ == '__main__':
    main()
Execute JavaScript using selenium

There could be cases where you’d want to execute javascript on the browser instance. The below example is a depiction of one such scenario. Remember when in your News Feed on facebook, a post has hundreds of thousands of comments and you have to monotonously click to expand the comment threads. The example below does it through selenium but has an even bigger purpose. The following code snippet loops over a few thousand facebook urls(relating to a post) and expands the comment threads and prints the page as a pdf file. This was a part of a larger program that had something to do with the pdf files. However, it isn’t relevant to this post. Here is a link to the JavaScript code which is used in the program below that expands the comments on facebook posts. I don’t even remember where I found it though.

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait

import json
import time


# get the js to be executed.

with open('js_code.txt', 'r') as f:
    js_code = f.read()

executable_path = '/home/bhishan-1504/Downloads/chromedriver_linux64/chromedriver'


appState = {
    "recentDestinations": [
        {
            "id": "Save as PDF",
            "origin": "local"
        }
    ],
    "selectedDestinationId": "Save as PDF",
    "version": 2
}


profile = {"printing.print_preview_sticky_settings.appState": json.dumps(appState), 'savefile.default_directory': "/home/bhishan-1504/secret_project/"}

profile["download.prompt_for_download"] = False
profile["profile.default_content_setting_values.notifications"] = 2
chrome_options = webdriver.ChromeOptions()

chrome_options.add_experimental_option('prefs', profile)
chrome_options.add_argument("--start-maximized")
chrome_options.add_argument('--kiosk-printing')

# chrome_options.add_argument("download.default_directory=/home/bhishan-1504/secret_project/")
browser = webdriver.Chrome(executable_path=executable_path, chrome_options=chrome_options)

def save_pdf(count):
    browser.execute_script("document.title=" + str(count) + ";")
    browser.execute_script('window.print();')
    time.sleep(1)


def visit_page(url, count):
    browser.get(url)
    try:
        home_btn = WebDriverWait(browser, 10).until(
            EC.presence_of_element_located((By.LINK_TEXT, "Home"))
        )
    except selenium.common.exceptions.TimeoutException:
        print("Didn’t work out!")
        return

    browser.execute_script(js_code)
    time.sleep(7)
    save_pdf(count)




if __name__ == '__main__':
    count = 1
    # loop through the text file and pass to visit page function.
    with open('urls.txt', 'r') as f:
        for each_url in f.readlines():
            visit_page(each_url, count)
            count += 1

I recently published an article on Web Scraping using BeautifulSoup. You should read it.

Web Scraping – BeautifulSoup Python

- - Python, Tutorials

Data collection from public sources is often beneficial to a business or an individual. As such the term “web scraping” isn’t something new. These data are often wrangled within html tags and attributes. Python is often used for data collection from these sources. The intentions of this post is to host example code snippets so people can take ideas from it to build scrapers as per their needs using BeautifulSoup and urllib module in Python. I will be using github’s trending page https://github.com/trending throughout this post for the examples, especially because it best suits for applying various BeautifulSoup methods.

Installation:

pip install BeautifulSoup4

Get html of a page:
>>> import urllib
>>> resp = urllib.request.urlopen("https://github.com/trending")
>>> resp.getcode()
200
>>> resp.read() # the html
Using BeautifulSoup to get title from a page
>>> import urllib
>>> import bs4
>>> github_trending = urllib.request.urlopen("https://github.com/trending")
>>> trending_soup = bs4.BeautifulSoup(github_trending.read(), "lxml")
>>> trending_soup.title
<title>Trending  repositories on GitHub today · GitHub</title>
>>> trending_soup.title.string
'Trending  repositories on GitHub today · GitHub'
>>>
Find single element by tag name, find multiple elements by tag name
>>> ordered_list = trending_soup.find('ol') #single element
>>>
>>> type(ordered_list)
<class 'bs4.element.Tag'>
>>>
>>> all_li = ordered_list.find_all('li') # multiple elements
>>>
>>> type(all_li)
<class 'bs4.element.ResultSet'>
>>>
>>> trending_repositories = [each_list.find('h3').text for each_list in all_li]
>>> for each_repository in trending_repositories:
...     print(each_repository.strip())
...
klauscfhq / taskbook
robinhood / faust
Avik-Jain / 100-Days-Of-ML-Code
jxnblk / mdx-deck
faressoft / terminalizer
trekhleb / javascript-algorithms
apexcharts / apexcharts.js
grain-lang / grain
thedaviddias / Front-End-Performance-Checklist
istio / istio
CyC2018 / Interview-Notebook
fivethirtyeight / russian-troll-tweets
boyerjohn / rapidstring
donnemartin / system-design-primer
awslabs / aws-cdk
QUANTAXIS / QUANTAXIS
crossoverJie / Java-Interview
GoogleChromeLabs / ndb
dylanbeattie / rockstar
vuejs / vue
sbussard / canvas-sketch
Microsoft / vscode
flutter / flutter
tensorflow / tensorflow
Snailclimb / Java-Guide
>>>
Getting Attributes of an element
>>> for each_list in all_li:
...     anchor_element = each_list.find('a')
...     print("https://github.com" + anchor_element['href'])
...
https://github.com/klauscfhq/taskbook
https://github.com/robinhood/faust
https://github.com/Avik-Jain/100-Days-Of-ML-Code
https://github.com/jxnblk/mdx-deck
https://github.com/faressoft/terminalizer
https://github.com/trekhleb/javascript-algorithms
https://github.com/apexcharts/apexcharts.js
https://github.com/grain-lang/grain
https://github.com/thedaviddias/Front-End-Performance-Checklist
https://github.com/istio/istio
https://github.com/CyC2018/Interview-Notebook
https://github.com/fivethirtyeight/russian-troll-tweets
https://github.com/boyerjohn/rapidstring
https://github.com/donnemartin/system-design-primer
https://github.com/awslabs/aws-cdk
https://github.com/QUANTAXIS/QUANTAXIS
https://github.com/crossoverJie/Java-Interview
https://github.com/GoogleChromeLabs/ndb
https://github.com/dylanbeattie/rockstar
https://github.com/vuejs/vue
https://github.com/sbussard/canvas-sketch
https://github.com/Microsoft/vscode
https://github.com/flutter/flutter
https://github.com/tensorflow/tensorflow
https://github.com/Snailclimb/Java-Guide
>>>
Using class name or other attributes to get element
>>> for each_list in all_li:
...     total_stars_today = each_list.find(attrs={'class':'float-sm-right'}).text
...     print(total_stars_today.strip())
...
1,063 stars today
846 stars today
596 stars today
484 stars today
459 stars today
429 stars today
443 stars today
366 stars today
330 stars today
282 stars today
182 stars today
190 stars today
200 stars today
190 stars today
166 stars today
164 stars today
144 stars today
158 stars today
157 stars today
144 stars today
144 stars today
142 stars today
132 stars today
101 stars today
108 stars today
>>>
Navigate childrens from an element
>>> for each_children in ordered_list.children:
...     print(each_children.find('h3').text.strip())
...
klauscfhq / taskbook
robinhood / faust
Avik-Jain / 100-Days-Of-ML-Code
jxnblk / mdx-deck
faressoft / terminalizer
trekhleb / javascript-algorithms
apexcharts / apexcharts.js
grain-lang / grain
thedaviddias / Front-End-Performance-Checklist
istio / istio
CyC2018 / Interview-Notebook
fivethirtyeight / russian-troll-tweets
boyerjohn / rapidstring
donnemartin / system-design-primer
awslabs / aws-cdk
QUANTAXIS / QUANTAXIS
crossoverJie / Java-Interview
GoogleChromeLabs / ndb
dylanbeattie / rockstar
vuejs / vue
sbussard / canvas-sketch
Microsoft / vscode
flutter / flutter
tensorflow / tensorflow
Snailclimb / Java-Guide
>>>

The .children will only return the immediate childrens of the parent element. If you’d like to get all of the elements under certain element, you should use .descendent

Navigate descendents from an element
>>> for each_children in ordered_list.descendent:
...     # perform operations
Navigating previous and next siblings of elements
>>> all_li = ordered_list.find_all('li')
>>> fifth_li = all_li[4]
>>> # each li element is separated by '\n' and hence to navigate to the fourth li, we should navigate previous sibling twice
...
>>>
>>> fourth_li = fifth_li.previous_sibling.previous_sibling
>>> fourth_li.find('h3').text.strip()
'jxnblk / mdx-deck'
>>>
>>> # similarly for navigating to the sixth li from fifth li, we would use next_sibling
...
>>> sixth_li = fifth_li.next_sibling.next_sibling
>>> sixth_li.find('h3').text.strip()
'trekhleb / javascript-algorithms'
>>>
Navigate to parent of an element
>>> all_li = ordered_list.find_all('li')
>>> first_li = all_li[0]
>>> li_parent = first_li.parent
>>> # the li_parent is the ordered list <ol>
...
>>>
Putting it all together(Github Trending Scraper)
>>> import urllib
>>> import bs4
>>>
>>> github_trending = urllib.request.urlopen("https://github.com/trending")
>>> trending_soup = bs4.BeautifulSoup(github_trending.read(), "lxml")
>>> ordered_list = trending_soup.find('ol')
>>> for each_list in ordered_list.find_all('li'):
...     repository_name = each_list.find('h3').text.strip()
...     repository_url = "https://github.com" + each_list.find('a')['href']
...     total_stars_today = each_list.find(attrs={'class':'float-sm-right'}).text
…        print(repository_name, repository_url, total_stars_today)

klauscfhq / taskbook                             https://github.com/klauscfhq/taskbook                             1,404 stars today
robinhood / faust                                https://github.com/robinhood/faust                                960 stars today
Avik-Jain / 100-Days-Of-ML-Code 	         https://github.com/Avik-Jain/100-Days-Of-ML-Code                  566 stars today
trekhleb / javascript-algorithms 	         https://github.com/trekhleb/javascript-algorithms                 431 stars today
jxnblk / mdx-deck 			         https://github.com/jxnblk/mdx-deck 	                           416 stars today
apexcharts / apexcharts.js 		         https://github.com/apexcharts/apexcharts.js 	                   411 stars today
faressoft / terminalizer 		         https://github.com/faressoft/terminalizer 	                   406 stars today
istio / istio 			                 https://github.com/istio/istio 	                           309 stars today
thedaviddias / Front-End-Performance-Checklist 	 https://github.com/thedaviddias/Front-End-Performance-Checklist   315 stars today
grain-lang / grain 			         https://github.com/grain-lang/grain 	                           301 stars today
boyerjohn / rapidstring 			 https://github.com/boyerjohn/rapidstring 	                   232 stars today
CyC2018 / Interview-Notebook 			 https://github.com/CyC2018/Interview-Notebook 	                   186 stars today
donnemartin / system-design-primer 		 https://github.com/donnemartin/system-design-primer 	           189 stars today
awslabs / aws-cdk 			         https://github.com/awslabs/aws-cdk 	                           186 stars today
fivethirtyeight / russian-troll-tweets 		 https://github.com/fivethirtyeight/russian-troll-tweets 	   159 stars today
GoogleChromeLabs / ndb 			         https://github.com/GoogleChromeLabs/ndb 	                   172 stars today
crossoverJie / Java-Interview 			 https://github.com/crossoverJie/Java-Interview 	           148 stars today
vuejs / vue 			                 https://github.com/vuejs/vue 	                                   137 stars today
Microsoft / vscode 			         https://github.com/Microsoft/vscode 	                           137 stars today
flutter / flutter 			         https://github.com/flutter/flutter 	                           132 stars today
QUANTAXIS / QUANTAXIS 			         https://github.com/QUANTAXIS/QUANTAXIS 	                   132 stars today
dylanbeattie / rockstar 			 https://github.com/dylanbeattie/rockstar 	                   130 stars today
tensorflow / tensorflow 			 https://github.com/tensorflow/tensorflow 	                   106 stars today
Snailclimb / Java-Guide 			 https://github.com/Snailclimb/Java-Guide 	                   111 stars today
WeTransfer / WeScan 			         https://github.com/WeTransfer/WeScan 	                           118 stars today


Python Lists

- - Python, Tutorials

The intentions of this article is to host a set of example operations that can be performed around lists, a crucial data structure in Python.

Lists

In Python, List is an object that contains a sequence of other arbitrary objects. Lists unlike tuples are mutable objects.

Defining a list

Lists are defined by enclosing a sequence of objects inside square brackets, “[” and “]”. A list can contain sequence of mixed data types.

>>> # empty list
...
>>> a = []
>>>
>>> type(a)
<class 'list'>
>>>
>>> # list containing same data types
...
>>>
>>> a = [1, 4, 9, 16]
>>>
>>> type(a)
<class 'list'>
>>>
>>> # list containing different data types
...
>>> a = [1, "python", 7.4, True]
>>>
>>> type(a)
<class 'list'>
>>>
A list can be nested
>>> # nested list
...
>>> a = [[1, 4, 9], ["thetaranights.com", "blog", "python"]]
>>> type(a)
<class 'list'>
>>>
>>>
>>> a = [[1, 4, 9], "thetaranights.com"]
>>>
>>> type(a)
<class 'list'>
>>>
Accessing elements from a list via index

List index is used to access elements of a list. List index starts from 0 and should be an integer.

>>> a = [[1, 4, 9], ["thetaranights.com", "blog", "python"]]
>>> a[0]
[1, 4, 9]
>>> a[0][2]
9
>>> a[1][0]
'thetaranights.com'
>>>
Negative Indexing

Python allows accessing elements from a list via negative indexes such that the last element would be accessed via list_name[-1] and second last element would be accessed via list_name[-2]

>>> a = [1, 4, 9, 16]
>>> a[-1]
16
>>> a[-2]
9
>>> a[-3]
4
>>>
Slicing
>>> a = [1, 4, 9, 16, 25]
>>> a[1:3]
[4, 9]
>>> a[:4]
[1, 4, 9, 16]
>>> a[3:]
[16, 25]
>>> a[:-1]
[1, 4, 9, 16]
>>>
Lists are mutable
>>> a = [1, 4, 9, 16, 25]
>>>
>>> a [0] = "mutable"
>>> a
['mutable', 4, 9, 16, 25]
>>>
>>> a[:2] = [256, 1024] # changing a range of elements of a list to the sequence given in the right of assignment operator
>>> a
[256, 1024, 9, 16, 25]
>>>
Adding elements to an existing list
>>> a = [1, 4, 9, 16, 25]
>>> a.append(36)
>>> a
[1, 4, 9, 16, 25, 36]
>>>
Extending a list with another sequence
>>> a = [1, 4, 9, 16]
>>> a.extend([25, 36, 49])
>>> a
[1, 4, 9, 16, 25, 36, 49]
>>> a.extend((64, 91))
>>> a
[1, 4, 9, 16, 25, 36, 49, 64, 91]
>>>
Concatenation and Multiplication
>>> a = [1, 4, 9, 16]
>>> a + [25, 36, 49]
[1, 4, 9, 16, 25, 36, 49]
>>>

>>> # multiplication
...
>>> a * 7
[1, 4, 9, 16, 1, 4, 9, 16, 1, 4, 9, 16, 1, 4, 9, 16, 1, 4, 9, 16, 1, 4, 9, 16, 1, 4, 9, 16]
>>>
Add items to a list before certain index
>>> # first item to the insert() is the index and the later is the value to insert
...
>>> a = [1, 4, 9, 16, 25]
>>> a.insert(2, "new value")
>>> a
[1, 4, 'new value', 9, 16, 25]
>>>
Various other methods on a list
method description usage
append() Append object to the end of list L.append(object)
clear() Remove all the items from list L.clear()
copy() A shallow copy of list L.copy()
count() Return number of occurrences of value passed as argument to the method L.count(value)
extend() Extend list by appending elements from the iterable L.extend(iterable)
index() Return first index of the value L.index(value, [start, [stop]])
insert() Insert object before index L.insert(index, object)
pop() Remove and return item at index (defaults to last) L.pop([index])
remove() Remove first occurrence of value L.remove(value)
reverse() Reverse the list in-place L.reverse()
sort() Sort in-place L.sort(key=None, reverse=False)
List built-ins
built-in description
len() Return the number of elements in a list
max() Returns the largest element in the list
min() Returns the smallest element in the list
sorted() Returns the sorted version of the list. It does not sort the given list itself.
sum() Returns the sum of all the elements of the list
all() Returns true if all the elements of the list evaluate to true (See truthy and falsy concepts)
any() Returns true if any element of the list evaluates to true
enumerate() Returns enumerate object that contains the index and corresponding values of an iterable.
list() Converts an iterable (tuple, string, set, dictionary) to a list.
List Comprehension

One of the major features of python is list comprehension. It is a natural way of creating a new list where each element is the result of some operations applied to each member of another sequence of an iterable. The construct of a list comprehension is such that it consists of square brackets containing an expression followed by a for clause then by zero or more for or if clause. List comprehensions always returns a list.

>>> [x ** 2 for x in range(1, 11)]
[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]
>>>

In a rather real usage scenarios, the expression after the bracket ‘[‘ is a call to a method/function.

some_list = [function_name(x) for x in some_iterable]

Python Tuples

- - Python, Tutorials

This is an introductory post about tuples in python. We will see through examples what are tuples, its immutable property, use cases, various operations on it. Rather than a blog, it is a set of examples on tuples in python

Tuples

It is a sequence of objects in python. Unlike lists, tuple are immutable which means the contents of a tuple can’t be changed once assigned. We will see in a bit through example immutable property of tuples.

Defining a Tuple

Tuples are generally created by enclosing a sequence of objects inside parentheses. “(” and “)”

>>> ip_addresses = ("172.19.56.90", "172.37.57.32", "172.54.21.23")
>>> type(ip_addresses)

>>>
Defining an empty tuple vs single element tuple vs multi-element tuple
>>> # empty tuple
...
>>> ip_addresses = ()
>>> type(ip_addresses)
<class 'tuple'>
>>>
>>> # multi-element tuple
...
>>> ip_addresses = ("172.19.56.90", "172.37.57.32", "172.54.21.23")
>>> type(ip_addresses)
<class 'tuple'>
>>>
>>> # single element tuple
...
>>> ip_addresses = ("172.19.56.90") # incorrect
>>> type(ip_addresses)
<class 'str'>
>>>
>>> ip_addresses = ("172.19.56.90",)
>>> type(ip_addresses)
<class 'tuple'>
>>>

From the code snippet above, the method of defining an empty tuple and multi-element tuple seems obvious. However, what’s not obvious is that ip_addresses = (“172.19.56.90”) evaluates to a str type instead of a tuple. Although a single element tuple is very rare to come in use, there had to be a way to define it. Hence, a single element tuple should end with a comma “,” for the interpreter to evaluate it as a tuple.

Parentheses is also optional
>>> ip_addresses = "172.19.56.90", "172.37.57.32", "172.54.21.23"
>>> type(ip_addresses)
<class 'tuple'>
>>>

It is also optional to have the parentheses to define a tuple. This is possible due to a mechanism called packing which is one of the many useful features of Python.

Packing and Unpacking

Packing as the name suggest is creating an object by packing multiple other objects to make one compact object. Unpacking on the other hand is the vice-versa such that an object is unpacked and assigned to variables the elements of the tuple.

>>> # packing example
...
>>> ip_addresses = "172.19.56.90", "172.37.57.32", "172.54.21.23"
>>>

>>> # unpacking example
...
>>> ip, ip2, ip3 = ip_addresses
>>> ip
'172.19.56.90'
>>> ip2
'172.37.57.32'
>>> ip3
'172.54.21.23'
>>>
The number of variables on the left should equal the number of elements to be unpacked
>>> ip, ip2 = ip_addresses
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: too many values to unpack (expected 2)
>>>
>>>
>>> ip, ip2, ip3, ip4 = ip_addresses
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: not enough values to unpack (expected 4, got 3)
>>>
Accessing elements from a tuple through index
>>> ip_addresses = ("172.19.56.90", "172.37.57.32", "172.54.21.23")
>>>
>>> ip_addresses[1]
'172.37.57.32'
>>>
>>> ip_addresses[-1]
'172.54.21.23'
>>>
Looping over elements of a tuple
>>> ip_addresses = ("172.19.56.90", "172.37.57.32", "172.54.21.23")
>>> for each_ip in ip_addresses:
...     print(each_ip)
...
172.19.56.90
172.37.57.32
172.54.21.23
>>>
>>>
Slicing
>>> ip_addresses[1:]
('172.37.57.32', '172.54.21.23')
>>>

>>> ip_addresses[:-1]
('172.19.56.90', '172.37.57.32')
>>>
Tuples are immutable
>>> ip_addresses[0] = "Accidental edit"
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'tuple' object does not support item assignment
>>>

Tuples are immutable. This avoids accidental data change attempts.

Concatenation and Multiplication
>>> us_east_ips = ("172.19.56.90", "172.37.57.32", "172.54.21.23")
>>>
>>> us_west_ips = ("172.18.11.22", "172.99.22.3")
>>>
>>> type(us_east_ips)
<class 'tuple'>
>>> type(us_west_ips)
<class 'tuple'>
>>>
>>> all_ips = us_east_ips + us_west_ips
>>> type(all_ips)
<class 'tuple'>
>>> all_ips
('172.19.56.90', '172.37.57.32', '172.54.21.23', '172.18.11.22', '172.99.22.3')
>>>

Concatenation of two tuples returns a third tuple which contains the all the contents of the both tuples copied in order.

>>> duplicate_values = ("duplicate",)
>>> type(duplicate_values)
<class 'tuple'>
>>> duplicate_values * 5
('duplicate', 'duplicate', 'duplicate', 'duplicate', 'duplicate')
>>>
Count elements in a tuple
>>> arbitrary_values = ('thetaranights.com', 'python', 'python', 'tutorials')
>>>
>>> arbitrary_values.count('python')
2
>>>
>>>
Find index of an element
>>> arbitrary_values = ('thetaranights.com', 'python', 'python', 'tutorials')
>>>
>>> arbitrary_values.index('tutorials')
3
>>>
Comparison of tuples

Comparison operators work for tuples too. The evaluation starts by comparing the first elements from the either tuples and proceeds on further elements until conclusive.

>>> tup = (7, 14, 20)
>>>
>>> tup2 = (7, 14, 21)
>>>
>>> tup < tup2
True
>>>
>>> tup > tup2
False
>>>

For tup < tup2

It compares the first elements from either tuples 7 < 7 which is inconclusive, it then proceeds to comparing 14 < 14, still inconclusive, finally 20 < 21, hence True
Similar is the case for tup > tup2.

Python Subprocess

- - Python, Tutorials

Through this post, we will discuss and see via examples the purpose of subprocess, how to spawn processes, how to connect to their input/output and error pipes, etc.

subprocess

As the name suggests, subprocess is used to spawn sub-processes. It also allows for us to get the output from the process, error if any as well as ability to send keystrokes to the input of the process which generally means we can communicate with various processes. Subprocess was introduced to replace the need for os.system and os.spawn*

subprocess.run()

The subprocess.run() was added from Python3.5 and comes with all other higher versions.
It is the recommended method to invoke a subprocess for all use cases that it can handle. We can use Popen for more advanced use cases.

>>> import subprocess
>>> ls_process = subprocess.run(['ls', ‘-l’])
>>> ls_process
CompletedProcess(args=['ls', '-l'], returncode=0)
>>> ls_process.args
['ls', '-l']
>>> ls_process.returncode
0
>>> ls_process.stdout
>>> ls_process.stderr
>>>

The subprocess.run() returns a CompletedProcess instance which has attributes args, retuncode, stdout and stderr. The stdout and stderr attributes of the CompletedProcess instance will hold None unless explicitly specified on subprocess.run() call to capture them.

Capture standard output and error
>>> ls_process = subprocess.run(['ls'], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
>>> ls_process
CompletedProcess(args=['ls'], returncode=0, stdout=b'networkenv\ntest.py\n', stderr=b'')
>>> ls_process.stdout
b'networkenv\ntest.py\n'
>>>
>>> ls_process.stderr
b''
>>>

There is an optional argument “input”, allowing you to pass a string to the subprocess’s stdin. If you use this argument you may not also use the Popen constructor’s “stdin” argument, as it will be used internally.

Another useful argument can be timeout
>>> subprocess.run(['ls'], timeout=0.000000000000000000000000000000000000001)

subprocess.TimeoutExpired: Command '['ls']' timed out after 1e-39 seconds

When timeout argument is passed to the run(), it raises subprocess.TimeoutExpired error. if the process failed to complete in the given time.

check is another argument

check is another argument that can be passed which raises CalledProcessError exceptions when the exit code of the subprocess was non-zero. A zero exit code signifies success. The CalledProcessError object will hold the return code in the returncode attribute. You can catch this error and based on the returncode, perform operations as per your needs.

>>> ls_process = subprocess.run("exit 1", shell=True, check=True)
Traceback (most recent call last):
  File "", line 1, in 
  File "/usr/lib/python3.6/subprocess.py", line 418, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command 'exit 1' returned non-zero exit status 1.
>>>
Popen Constructor

The underlying process creation and management of the module is handled by the Popen class. It offers a lot of flexibility so that developers are able to handle the less common cases not covered by the convenience function subprocess.run(). The Popen() is used to execute a child program in a new process. On POSIX, the class uses os.execvp()-like behavior to execute the child program. On Windows, the class uses the Windows CreateProcess() function. The underlying examples are just for the purpose of showing usage of Popen and doesn’t necessarily mean subprocess.run() cannot achieve these operations.

Example Usage of Popen
>>> import subprocess
>>>
>>> wc_process = subprocess.Popen(['wc', '-l', 'test.py'], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
>>>
>>> echo_process

>>> stdout, stderr = wc_process.communicate()
>>> stdout
b'35 test.py\n'
>>> stderr
b''
>>>

In the above example we are using Unix tool wc (word count) and also specifying to PIPE the standard output and error.

Direct Stdout and Stderr to file
>>> echo_process = subprocess.Popen(['wc', '-l', 'test.py'], stdout=open('out.txt', 'w'), stderr=open('error.txt', 'w'))
>>> echo_process

>>>
>>> with open('out.txt', 'r') as outbuff:
...     print(outbuff.read())
...
35 test.py

>>>
Passing input to stdin
>>> script_process = subprocess.Popen(['python3', 'expect_input.py'], stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE, encoding='UTF-8')
>>>
>>> script_process.communicate("Bhishan")
('Enter your nameBhishan\n', '')
>>>

I have a helper script named expect_input.py which expects input and prints it. You can pass input to stdin via communicate() method.

PIPE output of one subprocess as input to another
>>> ls_process = subprocess.Popen('ls', stdout=subprocess.PIPE)
>>> grep_process = subprocess.Popen(['grep', '.py'], stdin=ls_process.stdout, stdout=subprocess.PIPE)
>>> stdout, stderr = grep_process.communicate()
>>> stdout
b'expect_input.py\ntest.py\n'
>>>

In the above code snippet we passed the ls_process.stdout to the stdin of the grep process. The ls_process lists the current working directory files and the later subprocess named grep_process takes the result from ls_process as it’s input and lists the ones with py in the names.

Other parameters of Popen
Argument Description Default
args A string or a sequence of program arguments. eg : "ls", ["ls", "-l"]
bufsize

bufsize will be supplied as the corresponding argument to the open() function when creating the stdin/stdout/stderr pipe file objects:

  • 0 means unbuffered (read and write are one system call and can return short)

  • 1 means line buffered (only usable if universal_newlines=True i.e., in a text mode)

  • any other positive value means use a buffer of approximately that size

  • negative bufsize (the default) means the system default of io.DEFAULT_BUFFER_SIZE will be used.

-1
executable

The executable argument specifies a replacement program to execute. It is very seldom needed. When shell=False, executable replaces the program to execute specified by args. However, the original args is still passed to the program. Most programs treat the program specified by args as the command name, which can then be different from the program actually executed. On POSIX, the args name becomes the display name for the executable in utilities such as ps. If shell=True, on POSIX the executable argument specifies a replacement shell for the default /bin/sh.

None
stdin

The executed program’s standard input file handles

None
stdout

The executed program’s standard output file handles

None
stderr

The executed program’s standard error file handles

None
preexec_fn

A callable object that is called just before the child process is executed.

None
close_fds

If close_fds is true, all file descriptors except 0, 1 and 2 will be closed before the child process is executed. (POSIX only). The default varies by platform: Always true on POSIX. On Windows it is true when stdin/stdout/stderr are None, false otherwise. On Windows, if close_fds is true then no handles will be inherited by the child process. Note that on Windows, you cannot set close_fds to true and also redirect the standard handles by setting stdin, stdout or stderr.

None
shell

The shell argument specifies whether to use the shell as the program to execute. If shell is True, it is recommended to pass args as a string rather than as a sequence.

False
cwd

Sets the current directory before the child is executed.

None
env

Defines the environment variables for the new process.

None
universal_newlines

If true, use universal line endings for file objects stdin, stdout and stderr.

False
encoding

Text mode encoding for file objects stdin, stdout and stderr. By default the stdout, stdin, stderr use bytes.

False

MongoDB and Python

- - Python, Tutorials

Python is used in many applications, mainly due to its flexibility and availability of various libraries. It works for just about any types of scenarios. This also suggests, it is often coupled with database systems. MongoDB, a NoSql. The intentions of this blog is to show through examples how python can be used to interact with MongoDB. We will specifically use pymongo, a library built by Mongo developers to interact with the MongoDB.

MongoDB

MongoDB is a free and open-source cross-platform document-oriented NoSQL database. It uses JSON like documents. This also entails the flexibility of data and doesn’t require a schema.

Installation of MongoDB

Follow the official documentation to install MongoDB.

pymongo

PyMongo is a Python distribution containing tools for working with MongoDB, and is the recommended way to work with MongoDB from Python. There are various other libraries for the ease of interaction that provide higher level abstractions and also perform document validation, etc. One such library is MongoEngine. MongoEngine is an object document mapper (ODM), which is roughly equivalent to a SQL-based object relational mapper (ORM). We will stick to the official library for MongoDB in this blog.

Installation of pymongo

pip install pymongo

Establishing a connection
>>> from pymongo import MongoClient
>>> client = MongoClient() # defaults to host="localhost", port=27017
A helper function to create a connection
from pymongo import MongoClient

def mongo_connection(host="localhost", port=27017, username=None, password=None):
    if username and password:
        mongo_uri = 'mongodb://{user}:{password}@{host}:{port}'.format(user=username, password=password, host=host, port=port)
        return MongoClient(mongo_uri)
    return MongoClient('mongodb://{host}:{port}'.format(host=host, port=port))

As seen in the above snippet, you could also construct an URI and pass it onto the MongoClient class and get the connection.

Selecting Database

There are two ways to select databases from the MongoClient instance. You could either access it as an attribute of the MongoClient instance or in a more dictionary styled construct on the instance. The later is generally useful when you receive the database name as a parameter to some function.

db = client.example_db
OR

db = client['example_db']

Note: Even if the database specified doesn’t exist, it won’t raise an error, it creates one as soon as you insert a document to one of it’s collections. In MongoDB, a collection is a grouping that contains documents. Collections can be thought of as tables and documents as rows in SQL database.

Insert a single document to MongoDB
books_collection = db.books
data = {
    'title': 'Pymongo Introduction',
    'excerpt: 'pymongo is an official library for accessing and interacting with MongoDB.',
    'author': 'Mongo'
}
result = books_collection.insert_one(data)
print('One post: {0}'.format(result.inserted_id))

In order to insert a document to a collection in MongoDB database, we select the collection through same directive as we selected database. In the above code snippet we pass the JSON data to the insert_one() method. We can access the Object Identifier as an attribute to the response. An ObjectID is a unique identifier of a document in a collection. This OID is generated by Mongo.

Insert multiple documents to MongoDB
new_books = [
    {
        'title': 'Pymongo Introduction',
        'excerpt: 'pymongo is an official library for accessing and interacting with MongoDB.',
        'author': 'Mongo'
    }, 
    {
        'title': 'TheTaraNights',
        'excerpt': 'good coders automate',
        'author': 'Bhishan'
    },
    {
        'title': 'Automate the Boring Stuff with Python',
        'excerpt': 'Anything and everything that is done manually can be automated',
        'author': 'AI. Sweigart'
    },
]
result = books_collection.insert_many(new_books)
print('Multiple posts: {0}'.format(result.inserted_ids))

In order to insert multiple documents to a collection, we can use the insert_many method which accepts a list of documents. Similarly we can access the list of Object Identifiers for the documents inserted via .inserted_ids attribute

Retrieving a document
bhishan_book = books_collection.find_one({'author': 'Bhishan'})
print(bhishan_book)

The above code snippet finds the first document that matches the criteria and returns it. If we want all the books by author ‘Bhishan’, we would use find() method instead of find_one().

bhishan_book = books_collection.find({'author': 'Bhishan'})
print(bhishan_book)

<pymongo.cursor.Cursor object at 0x107822f78>

The find() method returns a cursor object which can be iterated like a normal iterable in Python.

for each_book in bhishan_book:
    print(each_book)

Python Variables

- - Python, Tutorials

The intentions of this blog is to familiarize with how variables are assigned, the mechanism behind variable assignment, discuss equal status and how almost everything is an object in python, manipulations of objects held by the symbolic names that act as containers and termed as variables.

Variable Assignment

In Python, you don’t really assign a value to a variable. Python stores a reference to an object and the object has a value. Unlike other programming languages, you don’t need to declare or define a variable before it can be assigned to a value. Assignment is done via “=” operator.

>>> i = 7
>>> print(i)
7
>>>

Once the variable is assigned to a value, it can be used in any other expressions such that the variable will be substituted with the value assigned.

Changing value
>>> temp_value = 7
>>> print(temp_value)
7
>>> temp_value = 4
>>> print(temp_value)
4
>>> temp_value = "thetaranights.com"
>>> print(temp_value)
>>> thetaranights
>>>
Chained Assignment

Python also supports what’s referred to as chained assignment. This allows assigning a value to multiple variables at once.

>>> a = b = c = d = 7
>>> print(a)
7
>>> print(b)
7
>>> print(c)
7
>>> print(d)
7
>>> # Alternatively print all values at once. The above construct is used to make it simple to understand.
>>> print(a, b, c, d)
>>> 4 4 4 4
>>>
Type

Unlike most other programming languages, in Python, a variable can be reassigned to values of different types. In most programming language, it is an obligation to use the value of the same type as when declared due to variables being statically typed.

>>> a = 7
>>> print(a)
7
>>> a = 'thetaranights.com'
>>> print(a)
thetaranights.com
>>>
How does assignment work in Python

One of many primary goals of python was to have an equal status. i.e anything from integers, strings, lists, dictionaries, functions, classes, modules, methods can be assigned to variables, placed in lists, stored in dictionaries, passed as arguments, and so forth. Python is a highly object oriented language. Almost everything in Python is an object.

>>> a = 7
>>> type(a)
<class 'int'>
>>>> id(a)
11033280
>>> print(a)
7
>>>

Initially, Python creates an integer object, and creates a reference to the object from the variable name. Although now we can use the value directly from the variable name, it is still the object that holds the value and the variable holds the reference.

Same Object Reference
>>> a = "thetaranights.com"
>>> b = a
>>>
>>> id(a)
140521889892800
>>> id(b)
140521889892800
>>>

From the code snippet above, Python does not create a new object for b. What it does create is a reference to the same object that a points to.

Garbage Collection
>>> a = "facebook.com"
>>> b = a
>>> a = "github.com/bhishan"
>>> b = "thetaranights.com"
>>>

Now, that we re-assigned the reference of the variable a to point to a object “github.com/bhishan”, and b points to object “thetaranights.com”. We have no references to the earlier value “facebook.com” from either a and b or any other variables. Since it is no longer referenced by any of the variables, it is orphaned. An object’s life begins at which time at least one reference to it is created. During an object’s lifetime, additional references to it may be created, and references to it may be deleted. An object stays alive, as long as there is at least one reference to it. When the number of references to an object drops to zero, it is no longer accessible. At that point, its lifetime is over. Python will eventually notice that it is inaccessible and reclaim the allocated memory so it can be used for something else. This process is termed as garbage collection.

In technical terms, the id of the object is released when the count of reference to it drops to zero and this id can then be reused. Otherwise, it is guaranteed that no two objects will have the same id. The object id is given by the built-in id() which gives the integer identifier of an object.

No two objects can have the same id
>>> a = "thetaranights.com"
>>> b = a
>>> id(a)
140521889892800
>>> id(b)
140521889892800
>>> b = "facebook.com"
>>> id(b)
140521889823280
>>>

In the above code snippet, we see that the variables a and b initially point to the same object which is verified by the built-in id() that returns the same integer id for both a and b. However, when we re-assign the b to point to another object, the id has now changed. Therefore, no two objects can have the same id at the same lifetime.

Python caches small integers

Python at startup of the interpreter, creates objects for the integers in the range [-5, 256](inclusive). Every time a variable is assigned a value in this range, it refers to the same object and hence the results from the id() built-in on those variables coincide.

>>> a = -5
>>> b = -5
>>> id(a)
11032896
>>> id(b)
11032896
>>>

When we do the same for integers out of this range, a new object is created for each assignment and hence the results from the id() built-in differes.

>>> a = 257
>>> b = 257
>>> id(a)
139653174388272
>>> id(b)
139653174388144
>>>
Naming Convention

A variable name can be of any length and can consist of uppercase letters(A-Z), lowercase letters(a-z), digits(0-9), underscore(_) character and unicode characters(Python3 onwards). Therefore a variable name can be a combination of any of the above with an exception that variable names can’t start with digit.

Valid Invalid
address $address
address1 1address
address_1 1_address
_ $
Δ #Greek letter delta %

Normally, the intentions of a variable should be understandable at a glance. Hence, a variable name should always be as descriptive as it can be. Often times than not, it is our need to create multi-word variable which should also be readable. Following are some generally used constructs for multi-word variables:

Construct Name Description Example
Camel Case Second and subsequent words are capitalized. numberOfStudents
Pascal Case Identical to Camel Case except the first word is also capitalized. NumberOfStudents
Snake Case Words are separated by underscores number_of_students

The style guidelines for Python code, also known as PEP8 recommends using Snake Case for functions and variable names and Pascal Case for class names.