Posts Tagged: "python"

Web Scraping – BeautifulSoup Python

- - Python, Tutorials

Data collection from public sources is often beneficial to a business or an individual. As such the term “web scraping” isn’t something new. These data are often wrangled within html tags and attributes. Python is often used for data collection from these sources. The intentions of this post is to host example code snippets so people can take ideas from it to build scrapers as per their needs using BeautifulSoup and urllib module in Python. I will be using github’s trending page https://github.com/trending throughout this post for the examples, especially because it best suits for applying various BeautifulSoup methods.

Installation:

pip install BeautifulSoup4

Get html of a page:
>>> import urllib
>>> resp = urllib.request.urlopen("https://github.com/trending")
>>> resp.getcode()
200
>>> resp.read() # the html
Using BeautifulSoup to get title from a page
>>> import urllib
>>> import bs4
>>> github_trending = urllib.request.urlopen("https://github.com/trending")
>>> trending_soup = bs4.BeautifulSoup(github_trending.read(), "lxml")
>>> trending_soup.title
<title>Trending  repositories on GitHub today · GitHub</title>
>>> trending_soup.title.string
'Trending  repositories on GitHub today · GitHub'
>>>
Find single element by tag name, find multiple elements by tag name
>>> ordered_list = trending_soup.find('ol') #single element
>>>
>>> type(ordered_list)
<class 'bs4.element.Tag'>
>>>
>>> all_li = ordered_list.find_all('li') # multiple elements
>>>
>>> type(all_li)
<class 'bs4.element.ResultSet'>
>>>
>>> trending_repositories = [each_list.find('h3').text for each_list in all_li]
>>> for each_repository in trending_repositories:
...     print(each_repository.strip())
...
klauscfhq / taskbook
robinhood / faust
Avik-Jain / 100-Days-Of-ML-Code
jxnblk / mdx-deck
faressoft / terminalizer
trekhleb / javascript-algorithms
apexcharts / apexcharts.js
grain-lang / grain
thedaviddias / Front-End-Performance-Checklist
istio / istio
CyC2018 / Interview-Notebook
fivethirtyeight / russian-troll-tweets
boyerjohn / rapidstring
donnemartin / system-design-primer
awslabs / aws-cdk
QUANTAXIS / QUANTAXIS
crossoverJie / Java-Interview
GoogleChromeLabs / ndb
dylanbeattie / rockstar
vuejs / vue
sbussard / canvas-sketch
Microsoft / vscode
flutter / flutter
tensorflow / tensorflow
Snailclimb / Java-Guide
>>>
Getting Attributes of an element
>>> for each_list in all_li:
...     anchor_element = each_list.find('a')
...     print("https://github.com" + anchor_element['href'])
...
https://github.com/klauscfhq/taskbook
https://github.com/robinhood/faust
https://github.com/Avik-Jain/100-Days-Of-ML-Code
https://github.com/jxnblk/mdx-deck
https://github.com/faressoft/terminalizer
https://github.com/trekhleb/javascript-algorithms
https://github.com/apexcharts/apexcharts.js
https://github.com/grain-lang/grain
https://github.com/thedaviddias/Front-End-Performance-Checklist
https://github.com/istio/istio
https://github.com/CyC2018/Interview-Notebook
https://github.com/fivethirtyeight/russian-troll-tweets
https://github.com/boyerjohn/rapidstring
https://github.com/donnemartin/system-design-primer
https://github.com/awslabs/aws-cdk
https://github.com/QUANTAXIS/QUANTAXIS
https://github.com/crossoverJie/Java-Interview
https://github.com/GoogleChromeLabs/ndb
https://github.com/dylanbeattie/rockstar
https://github.com/vuejs/vue
https://github.com/sbussard/canvas-sketch
https://github.com/Microsoft/vscode
https://github.com/flutter/flutter
https://github.com/tensorflow/tensorflow
https://github.com/Snailclimb/Java-Guide
>>>
Using class name or other attributes to get element
>>> for each_list in all_li:
...     total_stars_today = each_list.find(attrs={'class':'float-sm-right'}).text
...     print(total_stars_today.strip())
...
1,063 stars today
846 stars today
596 stars today
484 stars today
459 stars today
429 stars today
443 stars today
366 stars today
330 stars today
282 stars today
182 stars today
190 stars today
200 stars today
190 stars today
166 stars today
164 stars today
144 stars today
158 stars today
157 stars today
144 stars today
144 stars today
142 stars today
132 stars today
101 stars today
108 stars today
>>>
Navigate childrens from an element
>>> for each_children in ordered_list.children:
...     print(each_children.find('h3').text.strip())
...
klauscfhq / taskbook
robinhood / faust
Avik-Jain / 100-Days-Of-ML-Code
jxnblk / mdx-deck
faressoft / terminalizer
trekhleb / javascript-algorithms
apexcharts / apexcharts.js
grain-lang / grain
thedaviddias / Front-End-Performance-Checklist
istio / istio
CyC2018 / Interview-Notebook
fivethirtyeight / russian-troll-tweets
boyerjohn / rapidstring
donnemartin / system-design-primer
awslabs / aws-cdk
QUANTAXIS / QUANTAXIS
crossoverJie / Java-Interview
GoogleChromeLabs / ndb
dylanbeattie / rockstar
vuejs / vue
sbussard / canvas-sketch
Microsoft / vscode
flutter / flutter
tensorflow / tensorflow
Snailclimb / Java-Guide
>>>

The .children will only return the immediate childrens of the parent element. If you’d like to get all of the elements under certain element, you should use .descendent

Navigate descendents from an element
>>> for each_children in ordered_list.descendent:
...     # perform operations
Navigating previous and next siblings of elements
>>> all_li = ordered_list.find_all('li')
>>> fifth_li = all_li[4]
>>> # each li element is separated by '\n' and hence to navigate to the fourth li, we should navigate previous sibling twice
...
>>>
>>> fourth_li = fifth_li.previous_sibling.previous_sibling
>>> fourth_li.find('h3').text.strip()
'jxnblk / mdx-deck'
>>>
>>> # similarly for navigating to the sixth li from fifth li, we would use next_sibling
...
>>> sixth_li = fifth_li.next_sibling.next_sibling
>>> sixth_li.find('h3').text.strip()
'trekhleb / javascript-algorithms'
>>>
Navigate to parent of an element
>>> all_li = ordered_list.find_all('li')
>>> first_li = all_li[0]
>>> li_parent = first_li.parent
>>> # the li_parent is the ordered list <ol>
...
>>>
Putting it all together(Github Trending Scraper)
>>> import urllib
>>> import bs4
>>>
>>> github_trending = urllib.request.urlopen("https://github.com/trending")
>>> trending_soup = bs4.BeautifulSoup(github_trending.read(), "lxml")
>>> ordered_list = trending_soup.find('ol')
>>> for each_list in ordered_list.find_all('li'):
...     repository_name = each_list.find('h3').text.strip()
...     repository_url = "https://github.com" + each_list.find('a')['href']
...     total_stars_today = each_list.find(attrs={'class':'float-sm-right'}).text
…        print(repository_name, repository_url, total_stars_today)

klauscfhq / taskbook                             https://github.com/klauscfhq/taskbook                             1,404 stars today
robinhood / faust                                https://github.com/robinhood/faust                                960 stars today
Avik-Jain / 100-Days-Of-ML-Code 	         https://github.com/Avik-Jain/100-Days-Of-ML-Code                  566 stars today
trekhleb / javascript-algorithms 	         https://github.com/trekhleb/javascript-algorithms                 431 stars today
jxnblk / mdx-deck 			         https://github.com/jxnblk/mdx-deck 	                           416 stars today
apexcharts / apexcharts.js 		         https://github.com/apexcharts/apexcharts.js 	                   411 stars today
faressoft / terminalizer 		         https://github.com/faressoft/terminalizer 	                   406 stars today
istio / istio 			                 https://github.com/istio/istio 	                           309 stars today
thedaviddias / Front-End-Performance-Checklist 	 https://github.com/thedaviddias/Front-End-Performance-Checklist   315 stars today
grain-lang / grain 			         https://github.com/grain-lang/grain 	                           301 stars today
boyerjohn / rapidstring 			 https://github.com/boyerjohn/rapidstring 	                   232 stars today
CyC2018 / Interview-Notebook 			 https://github.com/CyC2018/Interview-Notebook 	                   186 stars today
donnemartin / system-design-primer 		 https://github.com/donnemartin/system-design-primer 	           189 stars today
awslabs / aws-cdk 			         https://github.com/awslabs/aws-cdk 	                           186 stars today
fivethirtyeight / russian-troll-tweets 		 https://github.com/fivethirtyeight/russian-troll-tweets 	   159 stars today
GoogleChromeLabs / ndb 			         https://github.com/GoogleChromeLabs/ndb 	                   172 stars today
crossoverJie / Java-Interview 			 https://github.com/crossoverJie/Java-Interview 	           148 stars today
vuejs / vue 			                 https://github.com/vuejs/vue 	                                   137 stars today
Microsoft / vscode 			         https://github.com/Microsoft/vscode 	                           137 stars today
flutter / flutter 			         https://github.com/flutter/flutter 	                           132 stars today
QUANTAXIS / QUANTAXIS 			         https://github.com/QUANTAXIS/QUANTAXIS 	                   132 stars today
dylanbeattie / rockstar 			 https://github.com/dylanbeattie/rockstar 	                   130 stars today
tensorflow / tensorflow 			 https://github.com/tensorflow/tensorflow 	                   106 stars today
Snailclimb / Java-Guide 			 https://github.com/Snailclimb/Java-Guide 	                   111 stars today
WeTransfer / WeScan 			         https://github.com/WeTransfer/WeScan 	                           118 stars today


Python Tuples

- - Python, Tutorials

This is an introductory post about tuples in python. We will see through examples what are tuples, its immutable property, use cases, various operations on it. Rather than a blog, it is a set of examples on tuples in python

Tuples

It is a sequence of objects in python. Unlike lists, tuple are immutable which means the contents of a tuple can’t be changed once assigned. We will see in a bit through example immutable property of tuples.

Defining a Tuple

Tuples are generally created by enclosing a sequence of objects inside parentheses. “(” and “)”

>>> ip_addresses = ("172.19.56.90", "172.37.57.32", "172.54.21.23")
>>> type(ip_addresses)

>>>
Defining an empty tuple vs single element tuple vs multi-element tuple
>>> # empty tuple
...
>>> ip_addresses = ()
>>> type(ip_addresses)
<class 'tuple'>
>>>
>>> # multi-element tuple
...
>>> ip_addresses = ("172.19.56.90", "172.37.57.32", "172.54.21.23")
>>> type(ip_addresses)
<class 'tuple'>
>>>
>>> # single element tuple
...
>>> ip_addresses = ("172.19.56.90") # incorrect
>>> type(ip_addresses)
<class 'str'>
>>>
>>> ip_addresses = ("172.19.56.90",)
>>> type(ip_addresses)
<class 'tuple'>
>>>

From the code snippet above, the method of defining an empty tuple and multi-element tuple seems obvious. However, what’s not obvious is that ip_addresses = (“172.19.56.90”) evaluates to a str type instead of a tuple. Although a single element tuple is very rare to come in use, there had to be a way to define it. Hence, a single element tuple should end with a comma “,” for the interpreter to evaluate it as a tuple.

Parentheses is also optional
>>> ip_addresses = "172.19.56.90", "172.37.57.32", "172.54.21.23"
>>> type(ip_addresses)
<class 'tuple'>
>>>

It is also optional to have the parentheses to define a tuple. This is possible due to a mechanism called packing which is one of the many useful features of Python.

Packing and Unpacking

Packing as the name suggest is creating an object by packing multiple other objects to make one compact object. Unpacking on the other hand is the vice-versa such that an object is unpacked and assigned to variables the elements of the tuple.

>>> # packing example
...
>>> ip_addresses = "172.19.56.90", "172.37.57.32", "172.54.21.23"
>>>

>>> # unpacking example
...
>>> ip, ip2, ip3 = ip_addresses
>>> ip
'172.19.56.90'
>>> ip2
'172.37.57.32'
>>> ip3
'172.54.21.23'
>>>
The number of variables on the left should equal the number of elements to be unpacked
>>> ip, ip2 = ip_addresses
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: too many values to unpack (expected 2)
>>>
>>>
>>> ip, ip2, ip3, ip4 = ip_addresses
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: not enough values to unpack (expected 4, got 3)
>>>
Accessing elements from a tuple through index
>>> ip_addresses = ("172.19.56.90", "172.37.57.32", "172.54.21.23")
>>>
>>> ip_addresses[1]
'172.37.57.32'
>>>
>>> ip_addresses[-1]
'172.54.21.23'
>>>
Looping over elements of a tuple
>>> ip_addresses = ("172.19.56.90", "172.37.57.32", "172.54.21.23")
>>> for each_ip in ip_addresses:
...     print(each_ip)
...
172.19.56.90
172.37.57.32
172.54.21.23
>>>
>>>
Slicing
>>> ip_addresses[1:]
('172.37.57.32', '172.54.21.23')
>>>

>>> ip_addresses[:-1]
('172.19.56.90', '172.37.57.32')
>>>
Tuples are immutable
>>> ip_addresses[0] = "Accidental edit"
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'tuple' object does not support item assignment
>>>

Tuples are immutable. This avoids accidental data change attempts.

Concatenation and Multiplication
>>> us_east_ips = ("172.19.56.90", "172.37.57.32", "172.54.21.23")
>>>
>>> us_west_ips = ("172.18.11.22", "172.99.22.3")
>>>
>>> type(us_east_ips)
<class 'tuple'>
>>> type(us_west_ips)
<class 'tuple'>
>>>
>>> all_ips = us_east_ips + us_west_ips
>>> type(all_ips)
<class 'tuple'>
>>> all_ips
('172.19.56.90', '172.37.57.32', '172.54.21.23', '172.18.11.22', '172.99.22.3')
>>>

Concatenation of two tuples returns a third tuple which contains the all the contents of the both tuples copied in order.

>>> duplicate_values = ("duplicate",)
>>> type(duplicate_values)
<class 'tuple'>
>>> duplicate_values * 5
('duplicate', 'duplicate', 'duplicate', 'duplicate', 'duplicate')
>>>
Count elements in a tuple
>>> arbitrary_values = ('thetaranights.com', 'python', 'python', 'tutorials')
>>>
>>> arbitrary_values.count('python')
2
>>>
>>>
Find index of an element
>>> arbitrary_values = ('thetaranights.com', 'python', 'python', 'tutorials')
>>>
>>> arbitrary_values.index('tutorials')
3
>>>
Comparison of tuples

Comparison operators work for tuples too. The evaluation starts by comparing the first elements from the either tuples and proceeds on further elements until conclusive.

>>> tup = (7, 14, 20)
>>>
>>> tup2 = (7, 14, 21)
>>>
>>> tup < tup2
True
>>>
>>> tup > tup2
False
>>>

For tup < tup2

It compares the first elements from either tuples 7 < 7 which is inconclusive, it then proceeds to comparing 14 < 14, still inconclusive, finally 20 < 21, hence True
Similar is the case for tup > tup2.

Python Subprocess

- - Python, Tutorials

Through this post, we will discuss and see via examples the purpose of subprocess, how to spawn processes, how to connect to their input/output and error pipes, etc.

subprocess

As the name suggests, subprocess is used to spawn sub-processes. It also allows for us to get the output from the process, error if any as well as ability to send keystrokes to the input of the process which generally means we can communicate with various processes. Subprocess was introduced to replace the need for os.system and os.spawn*

subprocess.run()

The subprocess.run() was added from Python3.5 and comes with all other higher versions.
It is the recommended method to invoke a subprocess for all use cases that it can handle. We can use Popen for more advanced use cases.

>>> import subprocess
>>> ls_process = subprocess.run(['ls', ‘-l’])
>>> ls_process
CompletedProcess(args=['ls', '-l'], returncode=0)
>>> ls_process.args
['ls', '-l']
>>> ls_process.returncode
0
>>> ls_process.stdout
>>> ls_process.stderr
>>>

The subprocess.run() returns a CompletedProcess instance which has attributes args, retuncode, stdout and stderr. The stdout and stderr attributes of the CompletedProcess instance will hold None unless explicitly specified on subprocess.run() call to capture them.

Capture standard output and error
>>> ls_process = subprocess.run(['ls'], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
>>> ls_process
CompletedProcess(args=['ls'], returncode=0, stdout=b'networkenv\ntest.py\n', stderr=b'')
>>> ls_process.stdout
b'networkenv\ntest.py\n'
>>>
>>> ls_process.stderr
b''
>>>

There is an optional argument “input”, allowing you to pass a string to the subprocess’s stdin. If you use this argument you may not also use the Popen constructor’s “stdin” argument, as it will be used internally.

Another useful argument can be timeout
>>> subprocess.run(['ls'], timeout=0.000000000000000000000000000000000000001)

subprocess.TimeoutExpired: Command '['ls']' timed out after 1e-39 seconds

When timeout argument is passed to the run(), it raises subprocess.TimeoutExpired error. if the process failed to complete in the given time.

check is another argument

check is another argument that can be passed which raises CalledProcessError exceptions when the exit code of the subprocess was non-zero. A zero exit code signifies success. The CalledProcessError object will hold the return code in the returncode attribute. You can catch this error and based on the returncode, perform operations as per your needs.

>>> ls_process = subprocess.run("exit 1", shell=True, check=True)
Traceback (most recent call last):
  File "", line 1, in 
  File "/usr/lib/python3.6/subprocess.py", line 418, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command 'exit 1' returned non-zero exit status 1.
>>>
Popen Constructor

The underlying process creation and management of the module is handled by the Popen class. It offers a lot of flexibility so that developers are able to handle the less common cases not covered by the convenience function subprocess.run(). The Popen() is used to execute a child program in a new process. On POSIX, the class uses os.execvp()-like behavior to execute the child program. On Windows, the class uses the Windows CreateProcess() function. The underlying examples are just for the purpose of showing usage of Popen and doesn’t necessarily mean subprocess.run() cannot achieve these operations.

Example Usage of Popen
>>> import subprocess
>>>
>>> wc_process = subprocess.Popen(['wc', '-l', 'test.py'], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
>>>
>>> echo_process

>>> stdout, stderr = wc_process.communicate()
>>> stdout
b'35 test.py\n'
>>> stderr
b''
>>>

In the above example we are using Unix tool wc (word count) and also specifying to PIPE the standard output and error.

Direct Stdout and Stderr to file
>>> echo_process = subprocess.Popen(['wc', '-l', 'test.py'], stdout=open('out.txt', 'w'), stderr=open('error.txt', 'w'))
>>> echo_process

>>>
>>> with open('out.txt', 'r') as outbuff:
...     print(outbuff.read())
...
35 test.py

>>>
Passing input to stdin
>>> script_process = subprocess.Popen(['python3', 'expect_input.py'], stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE, encoding='UTF-8')
>>>
>>> script_process.communicate("Bhishan")
('Enter your nameBhishan\n', '')
>>>

I have a helper script named expect_input.py which expects input and prints it. You can pass input to stdin via communicate() method.

PIPE output of one subprocess as input to another
>>> ls_process = subprocess.Popen('ls', stdout=subprocess.PIPE)
>>> grep_process = subprocess.Popen(['grep', '.py'], stdin=ls_process.stdout, stdout=subprocess.PIPE)
>>> stdout, stderr = grep_process.communicate()
>>> stdout
b'expect_input.py\ntest.py\n'
>>>

In the above code snippet we passed the ls_process.stdout to the stdin of the grep process. The ls_process lists the current working directory files and the later subprocess named grep_process takes the result from ls_process as it’s input and lists the ones with py in the names.

Other parameters of Popen
Argument Description Default
args A string or a sequence of program arguments. eg : "ls", ["ls", "-l"]
bufsize

bufsize will be supplied as the corresponding argument to the open() function when creating the stdin/stdout/stderr pipe file objects:

  • 0 means unbuffered (read and write are one system call and can return short)

  • 1 means line buffered (only usable if universal_newlines=True i.e., in a text mode)

  • any other positive value means use a buffer of approximately that size

  • negative bufsize (the default) means the system default of io.DEFAULT_BUFFER_SIZE will be used.

-1
executable

The executable argument specifies a replacement program to execute. It is very seldom needed. When shell=False, executable replaces the program to execute specified by args. However, the original args is still passed to the program. Most programs treat the program specified by args as the command name, which can then be different from the program actually executed. On POSIX, the args name becomes the display name for the executable in utilities such as ps. If shell=True, on POSIX the executable argument specifies a replacement shell for the default /bin/sh.

None
stdin

The executed program’s standard input file handles

None
stdout

The executed program’s standard output file handles

None
stderr

The executed program’s standard error file handles

None
preexec_fn

A callable object that is called just before the child process is executed.

None
close_fds

If close_fds is true, all file descriptors except 0, 1 and 2 will be closed before the child process is executed. (POSIX only). The default varies by platform: Always true on POSIX. On Windows it is true when stdin/stdout/stderr are None, false otherwise. On Windows, if close_fds is true then no handles will be inherited by the child process. Note that on Windows, you cannot set close_fds to true and also redirect the standard handles by setting stdin, stdout or stderr.

None
shell

The shell argument specifies whether to use the shell as the program to execute. If shell is True, it is recommended to pass args as a string rather than as a sequence.

False
cwd

Sets the current directory before the child is executed.

None
env

Defines the environment variables for the new process.

None
universal_newlines

If true, use universal line endings for file objects stdin, stdout and stderr.

False
encoding

Text mode encoding for file objects stdin, stdout and stderr. By default the stdout, stdin, stderr use bytes.

False

Python Variables

- - Python, Tutorials

The intentions of this blog is to familiarize with how variables are assigned, the mechanism behind variable assignment, discuss equal status and how almost everything is an object in python, manipulations of objects held by the symbolic names that act as containers and termed as variables.

Variable Assignment

In Python, you don’t really assign a value to a variable. Python stores a reference to an object and the object has a value. Unlike other programming languages, you don’t need to declare or define a variable before it can be assigned to a value. Assignment is done via “=” operator.

>>> i = 7
>>> print(i)
7
>>>

Once the variable is assigned to a value, it can be used in any other expressions such that the variable will be substituted with the value assigned.

Changing value
>>> temp_value = 7
>>> print(temp_value)
7
>>> temp_value = 4
>>> print(temp_value)
4
>>> temp_value = "thetaranights.com"
>>> print(temp_value)
>>> thetaranights
>>>
Chained Assignment

Python also supports what’s referred to as chained assignment. This allows assigning a value to multiple variables at once.

>>> a = b = c = d = 7
>>> print(a)
7
>>> print(b)
7
>>> print(c)
7
>>> print(d)
7
>>> # Alternatively print all values at once. The above construct is used to make it simple to understand.
>>> print(a, b, c, d)
>>> 4 4 4 4
>>>
Type

Unlike most other programming languages, in Python, a variable can be reassigned to values of different types. In most programming language, it is an obligation to use the value of the same type as when declared due to variables being statically typed.

>>> a = 7
>>> print(a)
7
>>> a = 'thetaranights.com'
>>> print(a)
thetaranights.com
>>>
How does assignment work in Python

One of many primary goals of python was to have an equal status. i.e anything from integers, strings, lists, dictionaries, functions, classes, modules, methods can be assigned to variables, placed in lists, stored in dictionaries, passed as arguments, and so forth. Python is a highly object oriented language. Almost everything in Python is an object.

>>> a = 7
>>> type(a)
<class 'int'>
>>>> id(a)
11033280
>>> print(a)
7
>>>

Initially, Python creates an integer object, and creates a reference to the object from the variable name. Although now we can use the value directly from the variable name, it is still the object that holds the value and the variable holds the reference.

Same Object Reference
>>> a = "thetaranights.com"
>>> b = a
>>>
>>> id(a)
140521889892800
>>> id(b)
140521889892800
>>>

From the code snippet above, Python does not create a new object for b. What it does create is a reference to the same object that a points to.

Garbage Collection
>>> a = "facebook.com"
>>> b = a
>>> a = "github.com/bhishan"
>>> b = "thetaranights.com"
>>>

Now, that we re-assigned the reference of the variable a to point to a object “github.com/bhishan”, and b points to object “thetaranights.com”. We have no references to the earlier value “facebook.com” from either a and b or any other variables. Since it is no longer referenced by any of the variables, it is orphaned. An object’s life begins at which time at least one reference to it is created. During an object’s lifetime, additional references to it may be created, and references to it may be deleted. An object stays alive, as long as there is at least one reference to it. When the number of references to an object drops to zero, it is no longer accessible. At that point, its lifetime is over. Python will eventually notice that it is inaccessible and reclaim the allocated memory so it can be used for something else. This process is termed as garbage collection.

In technical terms, the id of the object is released when the count of reference to it drops to zero and this id can then be reused. Otherwise, it is guaranteed that no two objects will have the same id. The object id is given by the built-in id() which gives the integer identifier of an object.

No two objects can have the same id
>>> a = "thetaranights.com"
>>> b = a
>>> id(a)
140521889892800
>>> id(b)
140521889892800
>>> b = "facebook.com"
>>> id(b)
140521889823280
>>>

In the above code snippet, we see that the variables a and b initially point to the same object which is verified by the built-in id() that returns the same integer id for both a and b. However, when we re-assign the b to point to another object, the id has now changed. Therefore, no two objects can have the same id at the same lifetime.

Python caches small integers

Python at startup of the interpreter, creates objects for the integers in the range [-5, 256](inclusive). Every time a variable is assigned a value in this range, it refers to the same object and hence the results from the id() built-in on those variables coincide.

>>> a = -5
>>> b = -5
>>> id(a)
11032896
>>> id(b)
11032896
>>>

When we do the same for integers out of this range, a new object is created for each assignment and hence the results from the id() built-in differes.

>>> a = 257
>>> b = 257
>>> id(a)
139653174388272
>>> id(b)
139653174388144
>>>
Naming Convention

A variable name can be of any length and can consist of uppercase letters(A-Z), lowercase letters(a-z), digits(0-9), underscore(_) character and unicode characters(Python3 onwards). Therefore a variable name can be a combination of any of the above with an exception that variable names can’t start with digit.

Valid Invalid
address $address
address1 1address
address_1 1_address
_ $
Δ #Greek letter delta %

Normally, the intentions of a variable should be understandable at a glance. Hence, a variable name should always be as descriptive as it can be. Often times than not, it is our need to create multi-word variable which should also be readable. Following are some generally used constructs for multi-word variables:

Construct Name Description Example
Camel Case Second and subsequent words are capitalized. numberOfStudents
Pascal Case Identical to Camel Case except the first word is also capitalized. NumberOfStudents
Snake Case Words are separated by underscores number_of_students

The style guidelines for Python code, also known as PEP8 recommends using Snake Case for functions and variable names and Pascal Case for class names.

Python Operators

- - Python, Tutorials

Operators are the constructs that enable performing operations on operands(values and variables). The operators in python are represented by special symbols and keywords. The intentions of this blog is to familiarize with the various operators in Python.

Arithmetic Operators

These operators are used to perform mathematical operations ranging from addition, subtraction, multiplication, division to modulus, exponent, etc. Following table shows the arithmetic operators and it’s usage:

Operator Usage Description
+ a + b Add values on either side of the operator or unary plus
- a -b Subtract right hand operand from the left hand operand. Also unary negation
* a * b Multiply values on either side of the operator
/ a / b Divides left hand operand by right hand operand
% a % b Returns the remainder from dividing left hand operand by right hand operand
** a ** b Returns Exponent – left operand raised to the power of right
// a //b Floor Division – The division of operands where the result is the quotient in which the digits after the decimal point are removed. But if one of the operands is negative, the result is floored, i.e., rounded away from zero (towards negative infinity) −
Comparison Operators

The comparison operators are used to identify relation between operands on either side of the operator. These are also called relational operators. The values from the comparison operators is either True or False.

Operator Description Usage
== Returns True if the values on the either side of the operator is equal otherwise False. a == b
!= Returns True if the values on either sides of the operator is not equal to each other otherwise False. a != b
> Returns True if the value of the operand on the left of the operator is greater than the value on the right side of the operator. a >b

 

< Returns True if the value of the operand on the left of the operator is less than the value on the right side of the operator. a < b
>= Returns True if the value of the operand on the left of the operator is greater than or equal to the value on the right side of the operator. a >= b
<= Returns True if the value of the operand on the left of the operator is less than or equal to the value on the right side of the operator. a <= b
Assignment Operators

Assignment operators are used for assigning the value from the right operand of the operator to the left operand. Following is the various assignment operators in Python:

Operator Description Usage Equivalent to
= Assigns values from right side operands to left side operand c = a + b c = a + b
+= Adds the value of right operand to the value of left operand and assign the result to the left operand b += a b = b + a
-= Subtracts the value of right operand from the value of left operand and assign the result to left operand b -= a b = b – a
*= Multiplies the value of right operand with the value of the left operand and assigns the result to left operand b *= a b = b * a
/= Divides the value of the left operand with the value of the right operand and assigns the result to the left operand b /= a b = b / a
%= Assigns the remainder from dividing left hand operand by right hand operand to the left hand operand b %= a b = b % a
**= Assigns the value from the exponential operation to the left operand. b **= a b = b ** a
//= Performs floor division on operators and assign value to the left operand b //= a is equivalent to b = b // a
Python Logical Operators

Logical operators in python are used for conditional statements which evaluates to either true or false. AND, OR, NOT are the logical operators in python.

Operator Description Usage
and True if both sides of the operator is True x and y
or True if either of the operand is True x or y
not Complements the operand not val
Membership Operator

These operators test for membership(presence) in a sequence such as string, list or tuple. Following are the membership operators:

Operator Description Usage
in True if the value/operand in the left of the operator is present in the sequence in the right of the operator. x in y
not in True if the value/operand in the left of the operator is not present in the sequence in the right of the operator. x not in y
Identity Operator

It is used to compare the memory location of two python objects .i.e both the operands refer to the same object.

Operator Description Usage
is True if both the operands refer to the same object. x is True
is not Evaluates to false if the variables on either side of the operator point to the same object and true otherwise. x is not True
Bitwise Operators

Bitwise operators work on bits of an operand hence the name.

>>> # Bitwise AND
...
>>> a = 3
>>> b = 4
>>> a = 3 # equivalent binary is 0011
>>> b = 4 # equivalent binary is 0100
>>> a & b
0
>>>
>>> # Bitwise OR
...
>>> a | b
7
>>> # 7 is equivalent to 0111 in binary
...
>>>
>>> # Bitwise NOT
...
>>> ~ a
-4
>>>
>>> # Bitwise XOR
...
>>> a ^ b
7
>>>
>>> # Bitwise right shift
...
>>> a >> 2
0
>>>
>>> # Bitwise left shift
...
>>> a << 2
12

File Handling in Python

- - Python, Tutorials

Python has convenient built-ins to work with files. The intentions of this post is to discuss on various modes of open() and see them through examples. open() is a built-in function that returns a file object, also called a handle, as it is used to read or modify the file accordingly. We will start by opening file with default parameters and see through examples, the important modes of file reading and writing and also see the parameters of the open() built-in.

Using open() with default parameters.
>>> file_handle = open('existingfile.txt')
>>> type(file_handle)
<class '_io.TextIOWrapper'>
>>>
>>> file_handle2 = open('nonexistentfile.txt')
Traceback (most recent call last):
  File "", line 1, in 
FileNotFoundError: [Errno 2] No such file or directory: 'nonexistentfile.txt'
>>>

The open() built-in has one required parameter, file. file is either a text or byte string giving the path of the file to be opened or an integer file descriptor of the file to be wrapped. (If a file descriptor is given, it is closed when the returned I/O object is closed, unless closefd is set to False.) By default, the file is opened in read text mode. If the file to be read isn’t present in the specified path, a FileNotFoundError is raised.

mode (Different file modes).

A file can be opened for reading purpose and writing purpose. This can be specified through the optional argument mode of the open() built-in. mode is a string that specifies the mode in which the file is opened. As we have seen in the example above, it defaults to ‘r’ i.e for reading in text mode. Another common value for mode is ‘w’ for writing. The file is truncated if it already exists while opened in ‘w’ mode. ‘x’ for creating and writing to a new file, and ‘a’ for appending (which on some Unix systems, means that all writes append to the end of the file regardless of the current seek position). Following are the available modes:

Mode Meaning
’r’ open for reading(default)
’w’ open for writing, truncating the file first
’x’ create a new file and open it for writing
’a’ open for writing, appending to the end of the file if it exists
’b’ binary mode
’t’ text mode(default)
’+’ open a disk file for updating (reading and writing)

The default mode is ‘rt’ (open for reading text). The ‘x’ mode implies ‘w’ and raises an `FileExistsError` if the file already exists.

Python distinguishes between files opened in binary and text modes, even when the underlying operating system doesn’t. Files opened in binary mode (appending ‘b’ to the mode argument) return contents as bytes objects without any decoding. In text mode (the default, or when ‘t’ is appended to the mode argument), the contents of the file are returned as strings, the bytes having been first decoded using a platform-dependent encoding or using the specified encoding if given.

Writing contents to a file:
>>> file_handler = open("text.txt", "w") # or use "wt"
>>> file_handler
<_io.TextIOWrapper name='text.txt' mode='w' encoding='UTF-8'>
>>> file_handler.write("This will use the default encoding from the machine.")
>>> file_handler.close()

In the above code snippet, we open the file in write text mode and do not specify the encoding. When encoding not specified, it defaults to platform specific encoding. Also note that encoding argument should be used for text mode only.

Let’s try to read the file that we wrote and let’s see what happens when we pass a different encoding than the one used to write.

>>> file_handler = open("text.txt", mode="r", encoding="UTF-16")
>>> file_handler
<_io.TextIOWrapper name='writetext.txt' mode='r' encoding='UTF-16'>
>>> contents = file_handler.read()
Traceback (most recent call last):
  File "", line 1, in 
  File "/usr/lib/python3.6/codecs.py", line 321, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
  File "/usr/lib/python3.6/encodings/utf_16.py", line 67, in _buffer_decode
    raise UnicodeError("UTF-16 stream does not start with BOM")
UnicodeError: UTF-16 stream does not start with BOM
>>>

Therefore it is important to use the same encoding to read the file as it was when written.

>>> file_handler = open('text.txt', 'r', encoding='UTF-8')
>>>
>>> file_handler
<_io.TextIOWrapper name='text.txt' mode='r' encoding='UTF-8'>
>>>
>>> contents = file_handler.read()
>>> contents
'This will use the default encoding from the machine.'
>>>
Writing in binary mode:

“Binary” files are any files where the format isn’t made up of readable characters. Binary files can range from image files like JPEGs or GIFs, audio files like MP3s or binary document formats like Word or PDF.

>>> file_handler = open('text.txt', 'wb')
>>> file_handler
<_io.BufferedWriter name='text.txt'>
>>>
>>> byte_arr = [120, 3, 255, 0, 100]
>>> binary_format = bytearray(byte_arr)
>>> file_handler.write(binary_format)
>>> file_handler.close()
Reading in binary mode:
>>> file_handler = open('text.txt', 'rb')
>>>
>>> file_handler
<_io.BufferedReader name='text.txt'>
>>> contents = file_handler.read()
Parameters of open() built-in:

 

Parameter Parameter Type Default value Description
file Required

The path to the file.

mode Optional ’r’

The mode to open the file in.

buffering Optional -1

buffering is an optional integer used to set the buffering policy. Pass 0 to switch buffering off (only allowed in binary mode), 1 to select line buffering (only usable in text mode), and an integer > 1 to indicate the size of a fixed-size chunk buffer. When no buffering argument is given, the default buffering policy works as follows:

  • Binary files are buffered in fixed-size chunks; the size of the buffer is chosen using a heuristic trying to determine the underlying device’s “block size” and falling back on `io.DEFAULT_BUFFER_SIZE`. On many systems, the buffer will typically be 4096 or 8192 bytes long.

  • “Interactive” text files (files for which isatty() returns True) use line buffering. Other text files use the policy described above for binary files.

encoding Optional None

encoding is the name of the encoding used to decode or encode the file. This should only be used in text mode. The default encoding is platform dependent, but any encoding supported by Python can be passed. See the codecs module for the list of supported encodings.

errors Optional None

errors is an optional string that specifies how encoding errors are to be handled—this argument should not be used in binary mode. Pass ‘strict’ to raise a ValueError exception if there is an encoding error (the default of None has the same effect), or pass ‘ignore’ to ignore errors. (Note that ignoring encoding errors can lead to data loss.) See the documentation for codecs.register or run ‘help(codecs.Codec)’ for a list of the permitted encoding error strings.

newline Optional None

newline controls how universal newlines works (it only applies to text mode). It can be None, ”, ‘\n’, ‘\r’, and ‘\r\n’. It works as follows:

  • On input, if newline is None, universal newlines mode is enabled. Lines in the input can end in ‘\n’, ‘\r’, or ‘\r\n’, and these are translated into ‘\n’ before being returned to the caller. If it is ”, universal newline mode is enabled, but line endings are returned to the caller untranslated. If it has any of the other legal values, input lines are only terminated by the given string, and the line ending is returned to the caller untranslated.

  • On output, if newline is None, any ‘\n’ characters written are translated to the system default line separator, os.linesep. If newline is ” or ‘\n’, no translation takes place. If newline is any of the other legal values, any ‘\n’ characters written are translated to the given string.

closefd Optional True

If closefd is False, the underlying file descriptor will be kept open when the file is closed. This does not work when a file name is given and must be True in that case.

opener Optional None

A custom opener can be used by passing a callable as *opener*. The underlying file descriptor for the file object is then obtained by calling *opener* with (*file*, *flags*). *opener* must return an open file descriptor (passing os.open as *opener* results in functionality similar to passing None).

Magic Methods in Python – Dunder Methods

- - Python, Tutorials

Magic methods are the methods that has two underscores as the prefix and suffix to the method name. These are also called dunder methods which is an adopted name for double underscores(methods with double underscores). __init__, __str__ are some magic methods. These are a set of special methods that could be used to enhance your classes in python.

The dunder methods are also usually used for scenarios like operator overloading and allow you to emulate the behavior of the built-in types. We will start by creating a class, implementing a dunder method or two, see available dunder/magic methods that can be used to enrich the functionality of a custom class.

Creating a custom String class:

>>> class String:
...     def __init__(self, string):
...         self.string = string
...
>>> string = String("thetaranights.com")
>>> print(string)
<__main__.String object at 0x7fec2fad2400>
>>>

Even before we realize, we have made use of one of those many magic methods. The __init__ method is a magic method. __init__ is a method where you’d initialize instance attributes and other init activities. People like to call it a constructor. Think about it for a while, the method already takes the instance (self) as a parameter. Before even __init__ is called a blank object is created. The __init__ method then dynamically initializes each member. Taking self as a parameter means the object is already created before __init__ is called.

Earlier in the blog, we said that magic methods allows us to emulate the behavior of the built-in types. The result from the print(string) doesn’t really give us what we would generally want. We can implement a magic method __repr__ to present to the user of the String class a better string representation.

>>> class String:
...     def __init__(self, string):
...         self.string = string
...     def __repr__(self):
...         return "String Object: {string}".format(string=self.string)
...
>>>
>>> string = String("thetaranights.com")
>>> print(string)
String Object: thetaranights.com
>>>

In the above code snippet, we have implemented the __repr__ magic method to return a better string representation of our String class’s instance.

Another example of dunder method:

Say we want to get the results from concatenating our custom String object with a string, we would do.

>>> print(string + " Thanks for visiting")

TypeError: unsupported operand type(s) for +: 'String' and 'str'

In order for this to work we need to implement the __add__ magic method to our class String.

>>> class String:
...     def __init__(self, string):
...         self.string = string
...     def __repr__(self):
...         return "Object String: {string}".format(string=self.string)
...     def __add__(self, to_concatenate):
...         return self.string + to_concatenate
...
>>>
>>> string = String("thetaranights.com")
>>>
>>> print(string + " thanks for visiting")
thetaranights.com thanks for visiting
>>>

Now that we have implemented the __add__ magic method, we can now use the + operator. Following is the list of magic methods available:

Available Magic Methods

Binary Operators
Operator Method
+ object.__add__(self, other)
- object.__sub__(self, other)
* object.__mul__(self, other)
// object.__floordiv__(self, other)
/ object.__truediv__(self, other)
% object.__mod__(self, other)
** object.__pow__(self, other[, modulo])
<< object.__lshift__(self, other)
>> object.__rshift__(self, other)
& object.__and__(self, other)
^ object.__xor__(self, other)
| object.__or__(self, other)
Extended Assignment
Operator Method
+= object.__iadd__(self, other)
-= object.__isub__(self, other)
*= object.__imul__(self, other)
/= object.__idiv__(self, other)
//= object.__ifloordiv__(self, other)
%= object.__imod__(self, other)
**= object.__ipow__(self, other[, modulo])
<<= object.__ilshift__(self, other)
>>= object.__irshift__(self, other)
&= object.__iand__(self, other)
^= object.__ixor__(self, other)
|= object.__ior__(self, other)
Unary Operators
Operator Method
- object.__neg__(self)
+ object.__pos__(self)
abs() object.__abs__(self)
~ object.__invert__(self)
complex() object.__complex__(self)
int() object.__int__(self)
long() object.__long__(self)
float() object.__float__(self)
oct() object.__oct__(self)
hex() object.__hex__(self
Comparison Operators
Operator Method
< object.__lt__(self, other)
<= object.__le__(self, other)
== object.__eq__(self, other)
!= object.__ne__(self, other)
>= object.__ge__(self, other)
>
object.__gt__(self, other)

That’s my little introduction to dunder/magic methods in Python. You should also read this article on Debugging with breakpoint in python3.7 https://www.thetaranights.com/debugging-with-breakpoint-in-python3-7/

Debugging with breakpoint in Python3.7

- - Python, Tutorials

Python has long had a default debugger named pdb in the standard libraries. pdb defines an interactive source code debugger for python programs. The intentions of this post is to clarify through examples and explanations what’s with the new built-in breakpoint() in python3.7 vs pdb in the earlier versions.

Breakpoints are generally the point in your code where you’d temporarily like to stop the execution of the program and do some value checks and look up the status of different objects in your program. This is done by hooking up a line just above the point where you’d like to debug.

In the earlier versions of python, you’d do:
def divide(divisor, dividend):
    import pdb; pdb.set_trace()
    return dividend / divisor

if __name__ == '__main__':
    print(divide(2, 0))

Running the above code in shell produces results as following:

$ python pdbexample.py
> /home/bhishan-1504/pdbexample.py(3)divide()
-> return dividend / divisor
(Pdb) args
divisor = 0
dividend = 4000
(Pdb) continue
Traceback (most recent call last):
  File "pdbexample.py", line 6, in 
    print(divide(0, 4000))
  File "pdbexample.py", line 3, in divide
    return dividend / divisor
ZeroDivisionError: integer division or modulo by zero

It enters an interactive mode, stopping the flow of program so you can strike commands to view the status of the program and continue or exit.

Here is a list of few useful commands on the interactive mode:
Command Short form What it does
args a Print the argument list of the current function
break b Creates a breakpoint (requires parameters) in the program execution
continue c or cont Continues program execution
help h Provides list of commands or help for a specified command
jump j Set the next line to be executed
list l Print the source code around the current line
next n Continue execution until the next line in the current function is reached or returns
step s Execute the current line, stopping at first possible occasion
pp pp Pretty-prints the value of the expression
quit or exit q Aborts the program
return r Continue execution until the current function returns

With python3.7 you’d do:

def divide(divisor, dividend):
    breakpoint()
    return dividend / divisor

if __name__ == '__main__':
    print(divide(0, 4000))
$ python3.7 breakpointexample.py
> /home/bhishan-1504/pdbexample.py(3)divide()
-> return dividend / divisor
(Pdb) args
divisor = 0
dividend = 4000
(Pdb) continue
Traceback (most recent call last):
  File "pdbexample.py", line 6, in 
    print(divide(0, 4000))
  File "pdbexample.py", line 3, in divide
    return dividend / divisor
ZeroDivisionError: integer division or modulo by zero

Python3.7 comes with a built-in function named breakpoint() which enters the debugger at the call of site. While it is the same results, it is more intuitive and idiomatic.

Why was this change necessary?
  1. In the earlier version, It’s a lot to type. It also leads to typo.
  2. It ties debugging directly to the choice of pdb. There might be other debugging options, say if you’re using an IDE or some other development environment.
  3. It is two statements import pdb and pdb.set_trace()

This is also inspired from the JavaScript debugger statement js-debugger.

More implementation details (From PEP 553):

Also with the new built-in breakpoint(), there are two new name bindings for the sys module, called sys.breakpointhook() and sys.__breakpointhook__. By default, sys.breakpointhook() implements the actual importing and entry into pdb.set_trace(), and it can be set to a different function to change the debugger that breakpoint() enters. This means there is no necessary ties to pdb in python3.7, you could use debugger of your choice.
sys.__breakpointhook__ is initialized to the same function as sys.breakpointhook() so that you can always easily reset sys.breakpointhook() to the default value (e.g. by doing sys.breakpointhook = sys.__breakpointhook__). The signature of the built-in is breakpoint(*args, **kws). The positional and keyword arguments are passed straight through to sys.breakpointhook() and the signatures must match or a TypeError will be raised. The return from sys.breakpointhook() is passed back up to, and returned from breakpoint().

Since with this new directive, you are not bound to only use the pdb but any other debugger, hence the positional (* args) argument and keyword (** kwargs) argument for the built-in breakpoint(* args, ** kwargs) makes sense. Unlike pdb other debugger might expect arguments.

The breakpointhook() default implementation consults environment variable named PYTHONBREAKPOINT for various behavior of the debugger.

The environment variable can have various values and hence the behavior of the debugger.

  • PYTHONBREAKPOINT=0 disables debugging. Specifically, with this value sys.breakpointhook() returns None immediately.
  • PYTHONBREAKPOINT= (i.e. the empty string). This is the same as not setting the environment variable at all, in which case pdb.set_trace() is run as usual.
  • PYTHONBREAKPOINT=some.importable.callable. In this case, sys.breakpointhook() imports the some.importable module and gets the callable object from the resulting module, which it then calls.

This environment variable allows external processes to control how breakpoints are handled. Some uses cases include:

  • Completely disabling all accidental breakpoint() calls pushed to production. This could be accomplished by setting PYTHONBREAKPOINT=0 in the execution environment.
Disabling debugging:
$ PYTHONBREAKPOINT=0 python3.7 breakpointexample.py

This will disable any breakpoint() calls in the program file.

Run custom function on breakpoints:

With python3.7, what you could also do is execute a custom program/function where there is entry of breakpoint() in the program. One example where this is handy is when you want to get all the local variable’s values on the current function before executing the following statements.

Let us define a custom function that we want being called at breakpoint:

import sys
def local_variables():
    active = sys._getframe(1)
    print(active.f_locals)
$ PYTHONBREAKPOINT=custom_code.local_variables python3.7 breakpointexample.py
{'divisor': 0, 'dividend': 4000}
Traceback (most recent call last):
  File "pdbexample.py", line 6, in 
    print(divide(0, 4000))
  File "pdbexample.py", line 3, in divide
    return dividend / divisor
ZeroDivisionError: division by zero

That’s my little introduction to the new built-in breakpoint() in Python3.7 . You should also read about Python Assignment Expression which has been accepted for Python3.8 http://www.thetaranights.com/python-assignment-expression-pep-572-python3-8/

Python Decorators – Python Essentials

- - Python, Tutorials

The intentions of this post is to familiarize the concepts of decorators and encourage it’s use. Python allows this special ability to pass a function as an argument to another function that adds some extra behavior to the function passed as argument. These higher order functions that accept function arguments are known as decorators. Passing of functions as argument is possible because functions are first class objects in python.

One of many primary goals of python was to have an equal status. i.e anything from integers, strings, lists, dictionaries, functions, classes, modules, methods can be assigned to variables, placed in lists, stored in dictionaries, passed as arguments, and so forth. With that, it is then possible to have a higher order function that takes another function as argument and extends it’s behavior while not actively modifying it.

We will start from defining a function, a nested function, a nested function with another function as an argument, syntatic sugar for ease of decorators.

Defining a function:

>>> def foo(mixed_case):
...     return mixed_case.upper()
...
>>>
>>> foo("All upper case")
'ALL UPPER CASE'
>>>

A function is a first class object that returns a value based on the arguments passed to it. In the above example, it takes a string as an argument and returns the uppercase representation of the given string.

Defining a nested function:

>>> def foo(mixed_case):
...     def bar():
...         print(mixed_case, " => ", upper_case)
...         upper_case = mixed_case.upper()
...     bar()
...     return upper_case
...
>>>
>>> foo("Subscribe to feeds http://feeds.feedburner.com/thetaranights/NZru")
Subscribe to feeds http://feeds.feedburner.com/thetaranights/NZru  =>  SUBSCRIBE TO FEEDS HTTP://FEEDS.FEEDBURNER.COM/THETARANIGHTS/NZRU
'SUBSCRIBE TO FEEDS HTTP://FEEDS.FEEDBURNER.COM/THETARANIGHTS/NZRU'
>>>

The bar() function’s scope is only within the foo() function and hence when you call the bar() function from outside of the foo() function, you get NameError exception which makes sense.

>>> bar()
Traceback (most recent call last):
  File "", line 1, in 
NameError: name 'bar' is not defined
>>>

Decorators

>>> def foo(arbitrary_function):
...     print("Going to a bar.")
...     arbitrary_function()
...     print("Returning from a bar.")
...
>>>

That’s a decorator which extends the functionality of the arbitrary_function() by performing some actions before and after calling the function.

Using the decorator:

>>># First we define an arbitrary function named bar()
>>> def bar():
...     print("Drinking some beer.")
...
>>>
>>> bar
<function bar at 0x7fed7efdd1b8>
>>>
>>> foo(bar)
Going to a bar.
Drinking some beer.
Returning from a bar.
>>>

First we created an arbitrary function named bar() and we verify that it is infact a function and pass it as an argument to the foo() function which is a decorator.

Another example of a decorator

>>> def foo(arbitrary_function):
...     def wrapper():
...         print("Going to a bar.")
...         arbitrary_function()
...         print("Returning from a bar.")
...     return wrapper
...
>>>
>>> foo(bar)
Going to a bar.
Drinking some beer.
Returning from a bar.
>>>

Generally decorators have nested functions within them which performs some operations and calls the functions that was passed as an argument followed by cleaning up operations.

Syntactic Sugar for decorators

Syntactic sugar in a programming language is a syntax that is designed to make things easy to read and express. As such, @ symbol can be used for simplifying calls to a decorator for a function.

>>> def foo(arbitrary_function):
...     def wrapper():
...         print("Going to a bar.")
...         arbitrary_function()
...         print("Returning from a bar.")
...     return wrapper
...
>>>
>>> @foo
... def bar():
...     print("Drinking some beer.")
...
>>>
>>> bar()
Going to a bar.
Drinking some beer.
Returning from a bar.
>>>

All you have to do is write this directive @decorator_function on top of the function definition to be passed to the decorator. Note that you can also assign multiple decorators to a function, each decorator in a line.

 

When you need to pass arguments to a function that you intent to use decorator on, you have to explicitly add *args and **kwargs to the wrapper function of the decorator, else it will get lost. The arguments will then be passed to the function call from within the body of the wrapper function.

 

>>> def foo(arbitrary_function):
...     def wrapper(*args, **kwargs):
...         print("Going to a bar.")
...         arbitrary_function(*args, **kwargs)
...         print("Returning from a bar.")
...     return wrapper
...
>>>
>>> @foo
... def bar(drink_type):
...     print("Drinking some " + drink_type)
...
>>>
>>> bar("vodka")
Going to a bar.
Drinking some vodka
Returning from a bar.
>>> bar("beer")
Going to a bar.
Drinking some beer
Returning from a bar.
>>>
An example of a decorator when you need to track the execution time of a function call.
>>> def func_timer(arbitrary_function):
...     def wrapper(*args, **kwargs):
...         t = time.time()
...         arbitrary_function(*args, **kwargs)
...         t2 = time.time()
...         return "Total time for execution => " + str(t2 -t)
...     return wrapper
...
>>>
>>> @func_timer
... def bar(drink_type, bottles=1):
...     for i in range(bottles):
...         print("Drinking " + drink_type + " Bottle number: " + str(i+1))
...
>>>
>>> bar("beer", 10)
Drinking beer Bottle number: 1
Drinking beer Bottle number: 2
Drinking beer Bottle number: 3
Drinking beer Bottle number: 4
Drinking beer Bottle number: 5
Drinking beer Bottle number: 6
Drinking beer Bottle number: 7
Drinking beer Bottle number: 8
Drinking beer Bottle number: 9
Drinking beer Bottle number: 10
'Total time for execution => 0.000279188156128'
>>>
>>> bar("vodka")
Drinking vodka Bottle number: 1
'Total time for execution => 6.29425048828e-05'
>>>

The above decorator is used to time the execution of a function call. This sums us my little introduction to decorators.

Zip files using Python

- - Python, Tutorials

Zipping files can be one part of a more complex operations that we perform using programming. This can usually happen when you are working on a data pipeline and/or products requiring data movement. Python has easy methods available for zipping files and directories. For the records, a ZIP is an archive file format that supports lossless data compression. A ZIP file may contain one or more files or directories that may have been compressed.

How to archive files/directories using shutil?

The shutil module offers a number of high-level operations on files and collections of files. Following code block will zip the files and directories present in the the source directory provided as the third argument to the make_archive function from shutil.

>>> from shutil import make_archive
>>> make_archive("July17-2018", "zip", "/home/bhishan-1504/shutil_test_archive")
Details about the parameters of make_archive function:

base_name : It is the name of the file to create. This filename is expected to be without the format specific extension.

format : It is the archive format which could be one of “zip”, “tar”, “gztar”, “bztar” or any other registered format.

root_dir : It is the directory that will be the root directory of the archive i.e we typically chdir into ‘root_dir’ before creating the archive.

base_dir : It is the directory where we start archiving from; ie. ‘base_dir’ will be the common prefix of all files and directories in the archive.

The make_archive function returns the filename of the archived file. Note that owner and group are used when creating a tar archive. By default, it uses the current owner and group.

How to archive selective files/directories using zipfile?

We also have control over what files and directories should be archived rather than the entire directory tree. This can be achieved by the following code block:

>>> from zipfile import ZipFile
>>> with ZipFile("testarchive.zip", "w") as zip_buff:
...     zip_buff.write("1.txt")
...     zip_buff.write("3.txt")
...
>>>

All we do is write to the ZipFile object the files to be archived.

Details about the parameters of the ZipFile class:

file: Either the path to the file, or a file-like object. If it is a path, the file will be opened and closed by ZipFile.

mode: The mode can be either read “r”, write “w” or append “a”.

compression: ZIP_STORED (no compression) or ZIP_DEFLATED (requires zlib).
allowZip64: if True ZipFile will create files with ZIP64 extensions when needed, otherwise it will raise an exception when this would be necessary.