File Handling in Python

- - Python, Tutorials

Python has convenient built-ins to work with files. The intentions of this post is to discuss on various modes of open() and see them through examples. open() is a built-in function that returns a file object, also called a handle, as it is used to read or modify the file accordingly. We will start by opening file with default parameters and see through examples, the important modes of file reading and writing and also see the parameters of the open() built-in.

Using open() with default parameters.
>>> file_handle = open('existingfile.txt')
>>> type(file_handle)
<class '_io.TextIOWrapper'>
>>>
>>> file_handle2 = open('nonexistentfile.txt')
Traceback (most recent call last):
  File "", line 1, in 
FileNotFoundError: [Errno 2] No such file or directory: 'nonexistentfile.txt'
>>>

The open() built-in has one required parameter, file. file is either a text or byte string giving the path of the file to be opened or an integer file descriptor of the file to be wrapped. (If a file descriptor is given, it is closed when the returned I/O object is closed, unless closefd is set to False.) By default, the file is opened in read text mode. If the file to be read isn’t present in the specified path, a FileNotFoundError is raised.

mode (Different file modes).

A file can be opened for reading purpose and writing purpose. This can be specified through the optional argument mode of the open() built-in. mode is a string that specifies the mode in which the file is opened. As we have seen in the example above, it defaults to ‘r’ i.e for reading in text mode. Another common value for mode is ‘w’ for writing. The file is truncated if it already exists while opened in ‘w’ mode. ‘x’ for creating and writing to a new file, and ‘a’ for appending (which on some Unix systems, means that all writes append to the end of the file regardless of the current seek position). Following are the available modes:

Mode Meaning
’r’ open for reading(default)
’w’ open for writing, truncating the file first
’x’ create a new file and open it for writing
’a’ open for writing, appending to the end of the file if it exists
’b’ binary mode
’t’ text mode(default)
’+’ open a disk file for updating (reading and writing)

The default mode is ‘rt’ (open for reading text). The ‘x’ mode implies ‘w’ and raises an `FileExistsError` if the file already exists.

Python distinguishes between files opened in binary and text modes, even when the underlying operating system doesn’t. Files opened in binary mode (appending ‘b’ to the mode argument) return contents as bytes objects without any decoding. In text mode (the default, or when ‘t’ is appended to the mode argument), the contents of the file are returned as strings, the bytes having been first decoded using a platform-dependent encoding or using the specified encoding if given.

Writing contents to a file:
>>> file_handler = open("text.txt", "w") # or use "wt"
>>> file_handler
<_io.TextIOWrapper name='text.txt' mode='w' encoding='UTF-8'>
>>> file_handler.write("This will use the default encoding from the machine.")
>>> file_handler.close()

In the above code snippet, we open the file in write text mode and do not specify the encoding. When encoding not specified, it defaults to platform specific encoding. Also note that encoding argument should be used for text mode only.

Let’s try to read the file that we wrote and let’s see what happens when we pass a different encoding than the one used to write.

>>> file_handler = open("text.txt", mode="r", encoding="UTF-16")
>>> file_handler
<_io.TextIOWrapper name='writetext.txt' mode='r' encoding='UTF-16'>
>>> contents = file_handler.read()
Traceback (most recent call last):
  File "", line 1, in 
  File "/usr/lib/python3.6/codecs.py", line 321, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
  File "/usr/lib/python3.6/encodings/utf_16.py", line 67, in _buffer_decode
    raise UnicodeError("UTF-16 stream does not start with BOM")
UnicodeError: UTF-16 stream does not start with BOM
>>>

Therefore it is important to use the same encoding to read the file as it was when written.

>>> file_handler = open('text.txt', 'r', encoding='UTF-8')
>>>
>>> file_handler
<_io.TextIOWrapper name='text.txt' mode='r' encoding='UTF-8'>
>>>
>>> contents = file_handler.read()
>>> contents
'This will use the default encoding from the machine.'
>>>
Writing in binary mode:

“Binary” files are any files where the format isn’t made up of readable characters. Binary files can range from image files like JPEGs or GIFs, audio files like MP3s or binary document formats like Word or PDF.

>>> file_handler = open('text.txt', 'wb')
>>> file_handler
<_io.BufferedWriter name='text.txt'>
>>>
>>> byte_arr = [120, 3, 255, 0, 100]
>>> binary_format = bytearray(byte_arr)
>>> file_handler.write(binary_format)
>>> file_handler.close()
Reading in binary mode:
>>> file_handler = open('text.txt', 'rb')
>>>
>>> file_handler
<_io.BufferedReader name='text.txt'>
>>> contents = file_handler.read()
Parameters of open() built-in:

 

Parameter Parameter Type Default value Description
file Required

The path to the file.

mode Optional ’r’

The mode to open the file in.

buffering Optional -1

buffering is an optional integer used to set the buffering policy. Pass 0 to switch buffering off (only allowed in binary mode), 1 to select line buffering (only usable in text mode), and an integer > 1 to indicate the size of a fixed-size chunk buffer. When no buffering argument is given, the default buffering policy works as follows:

  • Binary files are buffered in fixed-size chunks; the size of the buffer is chosen using a heuristic trying to determine the underlying device’s “block size” and falling back on `io.DEFAULT_BUFFER_SIZE`. On many systems, the buffer will typically be 4096 or 8192 bytes long.

  • “Interactive” text files (files for which isatty() returns True) use line buffering. Other text files use the policy described above for binary files.

encoding Optional None

encoding is the name of the encoding used to decode or encode the file. This should only be used in text mode. The default encoding is platform dependent, but any encoding supported by Python can be passed. See the codecs module for the list of supported encodings.

errors Optional None

errors is an optional string that specifies how encoding errors are to be handled—this argument should not be used in binary mode. Pass ‘strict’ to raise a ValueError exception if there is an encoding error (the default of None has the same effect), or pass ‘ignore’ to ignore errors. (Note that ignoring encoding errors can lead to data loss.) See the documentation for codecs.register or run ‘help(codecs.Codec)’ for a list of the permitted encoding error strings.

newline Optional None

newline controls how universal newlines works (it only applies to text mode). It can be None, ”, ‘\n’, ‘\r’, and ‘\r\n’. It works as follows:

  • On input, if newline is None, universal newlines mode is enabled. Lines in the input can end in ‘\n’, ‘\r’, or ‘\r\n’, and these are translated into ‘\n’ before being returned to the caller. If it is ”, universal newline mode is enabled, but line endings are returned to the caller untranslated. If it has any of the other legal values, input lines are only terminated by the given string, and the line ending is returned to the caller untranslated.

  • On output, if newline is None, any ‘\n’ characters written are translated to the system default line separator, os.linesep. If newline is ” or ‘\n’, no translation takes place. If newline is any of the other legal values, any ‘\n’ characters written are translated to the given string.

closefd Optional True

If closefd is False, the underlying file descriptor will be kept open when the file is closed. This does not work when a file name is given and must be True in that case.

opener Optional None

A custom opener can be used by passing a callable as *opener*. The underlying file descriptor for the file object is then obtained by calling *opener* with (*file*, *flags*). *opener* must return an open file descriptor (passing os.open as *opener* results in functionality similar to passing None).

Magic Methods in Python – Dunder Methods

- - Python, Tutorials

Magic methods are the methods that has two underscores as the prefix and suffix to the method name. These are also called dunder methods which is an adopted name for double underscores(methods with double underscores). __init__, __str__ are some magic methods. These are a set of special methods that could be used to enhance your classes in python.

The dunder methods are also usually used for scenarios like operator overloading and allow you to emulate the behavior of the built-in types. We will start by creating a class, implementing a dunder method or two, see available dunder/magic methods that can be used to enrich the functionality of a custom class.

Creating a custom String class:

>>> class String:
...     def __init__(self, string):
...         self.string = string
...
>>> string = String("thetaranights.com")
>>> print(string)
<__main__.String object at 0x7fec2fad2400>
>>>

Even before we realize, we have made use of one of those many magic methods. The __init__ method is a magic method. __init__ is a method where you’d initialize instance attributes and other init activities. People like to call it a constructor. Think about it for a while, the method already takes the instance (self) as a parameter. Before even __init__ is called a blank object is created. The __init__ method then dynamically initializes each member. Taking self as a parameter means the object is already created before __init__ is called.

Earlier in the blog, we said that magic methods allows us to emulate the behavior of the built-in types. The result from the print(string) doesn’t really give us what we would generally want. We can implement a magic method __repr__ to present to the user of the String class a better string representation.

>>> class String:
...     def __init__(self, string):
...         self.string = string
...     def __repr__(self):
...         return "String Object: {string}".format(string=self.string)
...
>>>
>>> string = String("thetaranights.com")
>>> print(string)
String Object: thetaranights.com
>>>

In the above code snippet, we have implemented the __repr__ magic method to return a better string representation of our String class’s instance.

Another example of dunder method:

Say we want to get the results from concatenating our custom String object with a string, we would do.

>>> print(string + " Thanks for visiting")

TypeError: unsupported operand type(s) for +: 'String' and 'str'

In order for this to work we need to implement the __add__ magic method to our class String.

>>> class String:
...     def __init__(self, string):
...         self.string = string
...     def __repr__(self):
...         return "Object String: {string}".format(string=self.string)
...     def __add__(self, to_concatenate):
...         return self.string + to_concatenate
...
>>>
>>> string = String("thetaranights.com")
>>>
>>> print(string + " thanks for visiting")
thetaranights.com thanks for visiting
>>>

Now that we have implemented the __add__ magic method, we can now use the + operator. Following is the list of magic methods available:

Available Magic Methods

Binary Operators
Operator Method
+ object.__add__(self, other)
- object.__sub__(self, other)
* object.__mul__(self, other)
// object.__floordiv__(self, other)
/ object.__truediv__(self, other)
% object.__mod__(self, other)
** object.__pow__(self, other[, modulo])
<< object.__lshift__(self, other)
>> object.__rshift__(self, other)
& object.__and__(self, other)
^ object.__xor__(self, other)
| object.__or__(self, other)
Extended Assignment
Operator Method
+= object.__iadd__(self, other)
-= object.__isub__(self, other)
*= object.__imul__(self, other)
/= object.__idiv__(self, other)
//= object.__ifloordiv__(self, other)
%= object.__imod__(self, other)
**= object.__ipow__(self, other[, modulo])
<<= object.__ilshift__(self, other)
>>= object.__irshift__(self, other)
&= object.__iand__(self, other)
^= object.__ixor__(self, other)
|= object.__ior__(self, other)
Unary Operators
Operator Method
- object.__neg__(self)
+ object.__pos__(self)
abs() object.__abs__(self)
~ object.__invert__(self)
complex() object.__complex__(self)
int() object.__int__(self)
long() object.__long__(self)
float() object.__float__(self)
oct() object.__oct__(self)
hex() object.__hex__(self
Comparison Operators
Operator Method
< object.__lt__(self, other)
<= object.__le__(self, other)
== object.__eq__(self, other)
!= object.__ne__(self, other)
>= object.__ge__(self, other)
>
object.__gt__(self, other)

That’s my little introduction to dunder/magic methods in Python. You should also read this article on Debugging with breakpoint in python3.7 https://www.thetaranights.com/debugging-with-breakpoint-in-python3-7/

Debugging with breakpoint in Python3.7

- - Python, Tutorials

Python has long had a default debugger named pdb in the standard libraries. pdb defines an interactive source code debugger for python programs. The intentions of this post is to clarify through examples and explanations what’s with the new built-in breakpoint() in python3.7 vs pdb in the earlier versions.

Breakpoints are generally the point in your code where you’d temporarily like to stop the execution of the program and do some value checks and look up the status of different objects in your program. This is done by hooking up a line just above the point where you’d like to debug.

In the earlier versions of python, you’d do:
def divide(divisor, dividend):
    import pdb; pdb.set_trace()
    return dividend / divisor

if __name__ == '__main__':
    print(divide(2, 0))

Running the above code in shell produces results as following:

$ python pdbexample.py
> /home/bhishan-1504/pdbexample.py(3)divide()
-> return dividend / divisor
(Pdb) args
divisor = 0
dividend = 4000
(Pdb) continue
Traceback (most recent call last):
  File "pdbexample.py", line 6, in 
    print(divide(0, 4000))
  File "pdbexample.py", line 3, in divide
    return dividend / divisor
ZeroDivisionError: integer division or modulo by zero

It enters an interactive mode, stopping the flow of program so you can strike commands to view the status of the program and continue or exit.

Here is a list of few useful commands on the interactive mode:
Command Short form What it does
args a Print the argument list of the current function
break b Creates a breakpoint (requires parameters) in the program execution
continue c or cont Continues program execution
help h Provides list of commands or help for a specified command
jump j Set the next line to be executed
list l Print the source code around the current line
next n Continue execution until the next line in the current function is reached or returns
step s Execute the current line, stopping at first possible occasion
pp pp Pretty-prints the value of the expression
quit or exit q Aborts the program
return r Continue execution until the current function returns

With python3.7 you’d do:

def divide(divisor, dividend):
    breakpoint()
    return dividend / divisor

if __name__ == '__main__':
    print(divide(0, 4000))
$ python3.7 breakpointexample.py
> /home/bhishan-1504/pdbexample.py(3)divide()
-> return dividend / divisor
(Pdb) args
divisor = 0
dividend = 4000
(Pdb) continue
Traceback (most recent call last):
  File "pdbexample.py", line 6, in 
    print(divide(0, 4000))
  File "pdbexample.py", line 3, in divide
    return dividend / divisor
ZeroDivisionError: integer division or modulo by zero

Python3.7 comes with a built-in function named breakpoint() which enters the debugger at the call of site. While it is the same results, it is more intuitive and idiomatic.

Why was this change necessary?
  1. In the earlier version, It’s a lot to type. It also leads to typo.
  2. It ties debugging directly to the choice of pdb. There might be other debugging options, say if you’re using an IDE or some other development environment.
  3. It is two statements import pdb and pdb.set_trace()

This is also inspired from the JavaScript debugger statement js-debugger.

More implementation details (From PEP 553):

Also with the new built-in breakpoint(), there are two new name bindings for the sys module, called sys.breakpointhook() and sys.__breakpointhook__. By default, sys.breakpointhook() implements the actual importing and entry into pdb.set_trace(), and it can be set to a different function to change the debugger that breakpoint() enters. This means there is no necessary ties to pdb in python3.7, you could use debugger of your choice.
sys.__breakpointhook__ is initialized to the same function as sys.breakpointhook() so that you can always easily reset sys.breakpointhook() to the default value (e.g. by doing sys.breakpointhook = sys.__breakpointhook__). The signature of the built-in is breakpoint(*args, **kws). The positional and keyword arguments are passed straight through to sys.breakpointhook() and the signatures must match or a TypeError will be raised. The return from sys.breakpointhook() is passed back up to, and returned from breakpoint().

Since with this new directive, you are not bound to only use the pdb but any other debugger, hence the positional (* args) argument and keyword (** kwargs) argument for the built-in breakpoint(* args, ** kwargs) makes sense. Unlike pdb other debugger might expect arguments.

The breakpointhook() default implementation consults environment variable named PYTHONBREAKPOINT for various behavior of the debugger.

The environment variable can have various values and hence the behavior of the debugger.

  • PYTHONBREAKPOINT=0 disables debugging. Specifically, with this value sys.breakpointhook() returns None immediately.
  • PYTHONBREAKPOINT= (i.e. the empty string). This is the same as not setting the environment variable at all, in which case pdb.set_trace() is run as usual.
  • PYTHONBREAKPOINT=some.importable.callable. In this case, sys.breakpointhook() imports the some.importable module and gets the callable object from the resulting module, which it then calls.

This environment variable allows external processes to control how breakpoints are handled. Some uses cases include:

  • Completely disabling all accidental breakpoint() calls pushed to production. This could be accomplished by setting PYTHONBREAKPOINT=0 in the execution environment.
Disabling debugging:
$ PYTHONBREAKPOINT=0 python3.7 breakpointexample.py

This will disable any breakpoint() calls in the program file.

Run custom function on breakpoints:

With python3.7, what you could also do is execute a custom program/function where there is entry of breakpoint() in the program. One example where this is handy is when you want to get all the local variable’s values on the current function before executing the following statements.

Let us define a custom function that we want being called at breakpoint:

import sys
def local_variables():
    active = sys._getframe(1)
    print(active.f_locals)
$ PYTHONBREAKPOINT=custom_code.local_variables python3.7 breakpointexample.py
{'divisor': 0, 'dividend': 4000}
Traceback (most recent call last):
  File "pdbexample.py", line 6, in 
    print(divide(0, 4000))
  File "pdbexample.py", line 3, in divide
    return dividend / divisor
ZeroDivisionError: division by zero

That’s my little introduction to the new built-in breakpoint() in Python3.7 . You should also read about Python Assignment Expression which has been accepted for Python3.8 http://www.thetaranights.com/python-assignment-expression-pep-572-python3-8/

Python Decorators – Python Essentials

- - Python, Tutorials

The intentions of this post is to familiarize the concepts of decorators and encourage it’s use. Python allows this special ability to pass a function as an argument to another function that adds some extra behavior to the function passed as argument. These higher order functions that accept function arguments are known as decorators. Passing of functions as argument is possible because functions are first class objects in python.

One of many primary goals of python was to have an equal status. i.e anything from integers, strings, lists, dictionaries, functions, classes, modules, methods can be assigned to variables, placed in lists, stored in dictionaries, passed as arguments, and so forth. With that, it is then possible to have a higher order function that takes another function as argument and extends it’s behavior while not actively modifying it.

We will start from defining a function, a nested function, a nested function with another function as an argument, syntatic sugar for ease of decorators.

Defining a function:

>>> def foo(mixed_case):
...     return mixed_case.upper()
...
>>>
>>> foo("All upper case")
'ALL UPPER CASE'
>>>

A function is a first class object that returns a value based on the arguments passed to it. In the above example, it takes a string as an argument and returns the uppercase representation of the given string.

Defining a nested function:

>>> def foo(mixed_case):
...     def bar():
...         print(mixed_case, " => ", upper_case)
...         upper_case = mixed_case.upper()
...     bar()
...     return upper_case
...
>>>
>>> foo("Subscribe to feeds http://feeds.feedburner.com/thetaranights/NZru")
Subscribe to feeds http://feeds.feedburner.com/thetaranights/NZru  =>  SUBSCRIBE TO FEEDS HTTP://FEEDS.FEEDBURNER.COM/THETARANIGHTS/NZRU
'SUBSCRIBE TO FEEDS HTTP://FEEDS.FEEDBURNER.COM/THETARANIGHTS/NZRU'
>>>

The bar() function’s scope is only within the foo() function and hence when you call the bar() function from outside of the foo() function, you get NameError exception which makes sense.

>>> bar()
Traceback (most recent call last):
  File "", line 1, in 
NameError: name 'bar' is not defined
>>>

Decorators

>>> def foo(arbitrary_function):
...     print("Going to a bar.")
...     arbitrary_function()
...     print("Returning from a bar.")
...
>>>

That’s a decorator which extends the functionality of the arbitrary_function() by performing some actions before and after calling the function.

Using the decorator:

>>># First we define an arbitrary function named bar()
>>> def bar():
...     print("Drinking some beer.")
...
>>>
>>> bar
<function bar at 0x7fed7efdd1b8>
>>>
>>> foo(bar)
Going to a bar.
Drinking some beer.
Returning from a bar.
>>>

First we created an arbitrary function named bar() and we verify that it is infact a function and pass it as an argument to the foo() function which is a decorator.

Another example of a decorator

>>> def foo(arbitrary_function):
...     def wrapper():
...         print("Going to a bar.")
...         arbitrary_function()
...         print("Returning from a bar.")
...     return wrapper
...
>>>
>>> foo(bar)
Going to a bar.
Drinking some beer.
Returning from a bar.
>>>

Generally decorators have nested functions within them which performs some operations and calls the functions that was passed as an argument followed by cleaning up operations.

Syntactic Sugar for decorators

Syntactic sugar in a programming language is a syntax that is designed to make things easy to read and express. As such, @ symbol can be used for simplifying calls to a decorator for a function.

>>> def foo(arbitrary_function):
...     def wrapper():
...         print("Going to a bar.")
...         arbitrary_function()
...         print("Returning from a bar.")
...     return wrapper
...
>>>
>>> @foo
... def bar():
...     print("Drinking some beer.")
...
>>>
>>> bar()
Going to a bar.
Drinking some beer.
Returning from a bar.
>>>

All you have to do is write this directive @decorator_function on top of the function definition to be passed to the decorator. Note that you can also assign multiple decorators to a function, each decorator in a line.

 

When you need to pass arguments to a function that you intent to use decorator on, you have to explicitly add *args and **kwargs to the wrapper function of the decorator, else it will get lost. The arguments will then be passed to the function call from within the body of the wrapper function.

 

>>> def foo(arbitrary_function):
...     def wrapper(*args, **kwargs):
...         print("Going to a bar.")
...         arbitrary_function(*args, **kwargs)
...         print("Returning from a bar.")
...     return wrapper
...
>>>
>>> @foo
... def bar(drink_type):
...     print("Drinking some " + drink_type)
...
>>>
>>> bar("vodka")
Going to a bar.
Drinking some vodka
Returning from a bar.
>>> bar("beer")
Going to a bar.
Drinking some beer
Returning from a bar.
>>>
An example of a decorator when you need to track the execution time of a function call.
>>> def func_timer(arbitrary_function):
...     def wrapper(*args, **kwargs):
...         t = time.time()
...         arbitrary_function(*args, **kwargs)
...         t2 = time.time()
...         return "Total time for execution => " + str(t2 -t)
...     return wrapper
...
>>>
>>> @func_timer
... def bar(drink_type, bottles=1):
...     for i in range(bottles):
...         print("Drinking " + drink_type + " Bottle number: " + str(i+1))
...
>>>
>>> bar("beer", 10)
Drinking beer Bottle number: 1
Drinking beer Bottle number: 2
Drinking beer Bottle number: 3
Drinking beer Bottle number: 4
Drinking beer Bottle number: 5
Drinking beer Bottle number: 6
Drinking beer Bottle number: 7
Drinking beer Bottle number: 8
Drinking beer Bottle number: 9
Drinking beer Bottle number: 10
'Total time for execution => 0.000279188156128'
>>>
>>> bar("vodka")
Drinking vodka Bottle number: 1
'Total time for execution => 6.29425048828e-05'
>>>

The above decorator is used to time the execution of a function call. This sums us my little introduction to decorators.

Idiomatic Python – Writing better Python

- - Python, Tutorials

This is a follow-up post of Idiomatic Python – Looping Approaches. The purpose of the article is to highlight on better code and encourage it.

Looping over dictionary keys

>>> books_price = {
...     'Clean Code: A Handbook of Agile Software Craftsmanship': 42.17,
...     'The Self-Taught Programmer: The Definitive Guide to Programming Professionally': 15.09,
...     'The Art of Computer Programming, Volumes 1-4A Boxed Set': 174.96
... }
>>> for book in books_price:
...     print(book)
...

The above code snippet should not be used for mutating the dictionary. You do not want to change the size of the dictionary while you actively iterate over it.

>>> for book in books_price:
...     if book.startswith('The'):
...             del books_price[book]
... 
Traceback (most recent call last):
  File "", line 1, in 
RuntimeError: dictionary changed size during iteration
>>>
>>> books_price
{'Clean Code: A Handbook of Agile Software Craftsmanship': 42.17, 'The Self-Taught Programmer: The Definitive Guide to Programming Professionally': 15.09}
>>>

If by any chance you wrapped the above code around a broad exception, you will then have an inconsistent data. See how in the above example one key value pair has been removed from the books_price dictionary.

Proper way of mutating dictionary while iterating over it’s keys:
>>> books_price = {'Clean Code: A Handbook of Agile Software Craftsmanship': 42.17, 'The Art of Computer Programming, Volumes 1-4A Boxed Set': 174.96, 'The Self-Taught Programmer: The Definitive Guide to Programming Professionally': 15.09}
>>>
>>> for book in books_price.keys():
...     if book.startswith('The'):
...         del books_price[book]
...
>>>
>>> books_price
{'Clean Code: A Handbook of Agile Software Craftsmanship': 42.17}
>>>

The books_price.keys() copies all the keys of the books_price dictionary and makes a list. What we are really doing is iterating over the list and modifying the dictionary. This way we avoid trying to mutate the dictionary while actively iterating on it.

Note that in python3 dict.keys() only creates an iterable that provides a dynamic view on the keys of the dictionary while not actually making a new list of keys. This is also known as dictionary view. Therefore you have to explicitly pass the dict.keys() to a list class i.e list(dict.keys())

Construct a dictionary from sequence pairs

>>> books = [
... 'Clean Code: A Handbook of Agile Software Craftsmanship',
... 'The Art of Computer Programming, Volumes 1-4A Boxed Set',
... 'The Self-Taught Programmer: The Definitive Guide to Programming Professionally']
>>> prices = [42.17, 174.96, 15.09]
>>>
>>> from itertools import izip
>>> books_price = dict(izip(books, prices))
>>> books_price
{'Clean Code: A Handbook of Agile Software Craftsmanship': 42.17, 'The Art of Computer Programming, Volumes 1-4A Boxed Set': 174.96, 'The Self-Taught Programmer: The Definitive Guide to Programming Professionally': 15.09}
>>>

 

You could also use zip instead of izip. zip provides a list while izip provides an iterator instead of a list, hence the name izip (I for iterator). In python3 though, zip is equivalent to izip.

 

>>> books_price = dict(zip(books, prices))
>>> books_price
{'Clean Code: A Handbook of Agile Software Craftsmanship': 42.17, 'The Art of Computer Programming, Volumes 1-4A Boxed Set': 174.96, 'The Self-Taught Programmer: The Definitive Guide to Programming Professionally': 15.09}
>>>

Looping over dictionary keys and values

>>> for key in books_price:
...     print key, ' => ', books_price[key]
...
Clean Code: A Handbook of Agile Software Craftsmanship  =>  42.17
The Art of Computer Programming, Volumes 1-4A Boxed Set  =>  174.96
The Self-Taught Programmer: The Definitive Guide to Programming Professionally  =>  15.09
>>>

While the above code works as expected, it needs to re-hash every key and do a value lookup. OR you could do something like this:

>>> for key, value in books_price.items():
...     print key, " => ", value
...
Clean Code: A Handbook of Agile Software Craftsmanship  =>  42.17
The Art of Computer Programming, Volumes 1-4A Boxed Set  =>  174.96
The Self-Taught Programmer: The Definitive Guide to Programming Professionally  =>  15.09
>>>

The problem with the above code is that it creates a huge list of the keys and values. A better approach would be:

>>> for key, value in books_price.iteritems():
...     print key, " => ", value
...
Clean Code: A Handbook of Agile Software Craftsmanship  =>  42.17
The Art of Computer Programming, Volumes 1-4A Boxed Set  =>  174.96
The Self-Taught Programmer: The Definitive Guide to Programming Professionally  =>  15.09
>>>

iteritems() returns an iterator instead of list. Note that in python3, iteritems() is not present and python3’s items() is equivalent to iteritems() in python2.

You should read the predecessor article to this one which encourages on writing better python circling around list looping approaches at http://www.thetaranights.com/idiomatic-python-looping-approaches/

Zip files using Python

- - Python, Tutorials

Zipping files can be one part of a more complex operations that we perform using programming. This can usually happen when you are working on a data pipeline and/or products requiring data movement. Python has easy methods available for zipping files and directories. For the records, a ZIP is an archive file format that supports lossless data compression. A ZIP file may contain one or more files or directories that may have been compressed.

How to archive files/directories using shutil?

The shutil module offers a number of high-level operations on files and collections of files. Following code block will zip the files and directories present in the the source directory provided as the third argument to the make_archive function from shutil.

>>> from shutil import make_archive
>>> make_archive("July17-2018", "zip", "/home/bhishan-1504/shutil_test_archive")
Details about the parameters of make_archive function:

base_name : It is the name of the file to create. This filename is expected to be without the format specific extension.

format : It is the archive format which could be one of “zip”, “tar”, “gztar”, “bztar” or any other registered format.

root_dir : It is the directory that will be the root directory of the archive i.e we typically chdir into ‘root_dir’ before creating the archive.

base_dir : It is the directory where we start archiving from; ie. ‘base_dir’ will be the common prefix of all files and directories in the archive.

The make_archive function returns the filename of the archived file. Note that owner and group are used when creating a tar archive. By default, it uses the current owner and group.

How to archive selective files/directories using zipfile?

We also have control over what files and directories should be archived rather than the entire directory tree. This can be achieved by the following code block:

>>> from zipfile import ZipFile
>>> with ZipFile("testarchive.zip", "w") as zip_buff:
...     zip_buff.write("1.txt")
...     zip_buff.write("3.txt")
...
>>>

All we do is write to the ZipFile object the files to be archived.

Details about the parameters of the ZipFile class:

file: Either the path to the file, or a file-like object. If it is a path, the file will be opened and closed by ZipFile.

mode: The mode can be either read “r”, write “w” or append “a”.

compression: ZIP_STORED (no compression) or ZIP_DEFLATED (requires zlib).
allowZip64: if True ZipFile will create files with ZIP64 extensions when needed, otherwise it will raise an exception when this would be necessary.

Python Assignment Expression – PEP 572 – Python3.8

- - Python, Tutorials

A recent buzz in the Python Community is PEP 572’s acceptance for Python3.8 .
PEP stands for Python Enhancement Proposals and each such PEPs are assigned a number by the PEP editors and once assigned are never changed.

What exactly is PEP 572(Directly from PEP 572)?

Abstract

This is a proposal for creating a way to assign to variables within an expression using the notation NAME := expr. A new exception, TargetScopeError is added, and there is one change to evaluation order.

Rationale

Naming the result of an expression is an important part of programming, allowing a descriptive name to be used in place of a longer expression, and permitting reuse. Currently, this feature is available only in statement form, making it unavailable in list comprehensions and other expression contexts.
Additionally, naming sub-parts of a large expression can assist an interactive debugger, providing useful display hooks and partial results. Without a way to capture sub-expressions inline, this would require refactoring of the original code; with assignment expressions, this merely requires the insertion of a few name := markers. Removing the need to refactor reduces the likelihood that the code be inadvertently changed as part of debugging (a common cause of Heisenbugs), and is easier to dictate to another programmer.

What are assignment expressions?

As of now, in Python, assignment has to be a statement. This restricts for an example assignments from within if or while statements. Therefore following would be a Syntax Error in Python:

if x = foo():
    # do something
else:
    # do something else

If this were made valid in python, it would have led to errors for confusing an assignment (=) with comparison operator (==). The code would then still execute without errors but produce unintended results.

Interestingly PEP 572 introduces a new operator := that assigns and returns a value. This is no replacement for the assignment operator and has a different purpose. Let us see the use case:

In most contexts where arbitrary Python expressions can be used, a named expression can appear. This is of the form NAME := expr where expr is any valid Python expression other than an unparenthesized tuple, and NAME is an identifier.
The value of such a named expression is the same as the incorporated expression, with the additional side-effect that the target is assigned that value:

Our scenario: We want to process the contents of a file in a chunk-wise fashion. What we would naturally do is :

chunk = file.read(64)
while chunk:
    process(chunk)
    chunk = file.read(64)

The above code has redundancy, the need to do file.read(64) twice.

You could also do the following to avoid redundancy:

while True:
    chunk = file.read(64)
    if not chunk:
        break
    process(chunk)

The above doesn’t communicate the intent very well.

With assignment expression, you could do:

while chunk := file.read(64):
   process(chunk)

Remember, we discussed := being assignment and return. The assignment to chunk happens at the while expression which also makes the data locally available in addition to deciding whether or not to exit the loop. It has no redundancy and communicates the intent very gracefully. That’s pretty awesome or is it not? There has been a lot of discussions regarding this on the internet.

One other example for use case of assignment expression:

# Share a subexpression between a comprehension filter clause and its output
filtered_data = [y for x in data if (y := f(x)) is not None]

PEP 572 has been discussed greatly on various forums such as reddit, hackernews. Here are a few such threads that are actually interesting to go through and bird a variety of viewpoints and opinions.
https://www.reddit.com/r/Python/comments/8ylelb/feedback_on_draft_post_to_pythonideas/
https://news.ycombinator.com/item?id=17448439

Idiomatic Python – Looping Approaches

- - Python, Tutorials, Web

Python has it’s own unique techniques and guidelines for looping. Through this article, I will present a few examples on bad and better approaches on looping. While the end goal can be achieved using both sets of the codes to follow, the purpose is to highlight on the better approaches and encourage it.

Looping over a range of numbers:

>>> for i in [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]:
...     print i ** 2
...

Better approach when using Python3

>>> for i in range(10):
...     print i ** 2
...

Better approach when using Python2

>>> for i in xrange(10):
...     print i ** 2
...

Prior to Python3, range was implemented as a function that returns a list instance. This would mean that a list would be created before looping over it. This brought frictions in the community due to the memory and running complexities that came with it. In Python3, range is implemented with lazy evaluation in that it evaluates the item in the sequence as it furthers on the loop. This eliminates the memory and computation issues of the Python2 range function. It is not that this approach was never present. Python2’s xrange is equivalent to Python3 range.

Looping over a collection:

>>> fruits = ["apple", "mango", "grapes", "banana"]
>>> for i in range(len(fruits)):
...     print(fruits[i])
...

Better Approach:

>>> for fruit in fruits:
...     print(fruit)
...

In python list is an iterable and hence implements the __iter__ method that returns an iterator. An iterable is an object that has an __iter__ method which returns an iterator, or which defines a __getitem__ method that can take sequential indexes starting from zero (and raises an IndexError when the indexes are no longer valid). So an iterable is an object that you can get an iterator from.
An iterator is an object with a next (Python 2) or __next__ (Python 3) method.

Looping Backwards:

>>> fruits = ["apple", "mango", "grapes", "banana"]
>>> for i in range(len(fruits) - 1, -1, -1):
...     print(fruits[i])
...

Better Approach:

>>> for fruit in reversed(fruits):
...     print(fruit)
...

Looping over two collections:

>>> names = ["Hussein", "Mohammed", "Osama"]
>>> fruits = ["Apple", "Mango", "Banana"]
>>>
>>> n = min(len(names), len(fruits))
>>> for i in range(n):
...     print(names[i], " => ", fruits[i])
...

Good Approach:

>>> for name, fruit in zip(names, fruits):
...     print(name, " => ", fruit)
...

Better Approach:

>>> from itertools import izip
>>> for name, fruit in izip(names, fruits):
...     print(name, " => ", fruit)
...

zip provides a list while izip provides an iterator instead of a list, hence the name izip (I for iterator). You can argue scenarios where zip would be appropriate but not for the above example.

Idiomatic Python – Use of Falsy and Truthy Concepts

- - Python, Tutorials

Out of many, one reason for python’s popularity is the readability. Python has code style guidelines and idioms and these allow future readers of the code to comprehend to the intentions of it. It is highly important that the code is readable and concise. One such important tip is to use falsy and truthy concepts.

It should be at our best interest to avoid direct comparison to True, False, None. As such we should be well known about truthy and falsy concepts.

Truthy refers to the values that shall always be considered true. Similarly falsy refers to the values that shall always be considered false.

An empty sequence such as an empty list [], empty dictionaries, 0 for numeric, None are considered false values or falsy. Almost anything excluding the earlier mentioned are considered truthy.

An example of a code snippet that is considered bad:

i = 0
if i == 0:
    foo()
    # do something
else:
    bar()
    # do something else
x = True
if x == True:
    foo()
    # do something
else:
    bar()
    # do something else
numbers_list = [1, 2, 3, 4]
if len(numbers_list) > 0:
    foo()
    # do something

An example of code snippet that is considered relatively wise:

i = 0
if not i:
    foo()
    # do something
else:
    bar()
    # do something else
x = True
if x:
    foo()
    # do something
else:
    bar()
    # do something else
numbers_list = [1, 2, 3, 4]
if numbers_list:
    foo()
    # do something

Copying mutable objects in Python

- - Python, Tutorials

An assignment statement in python does not create copies of objects. It binds the name to the object. While working with mutable objects and/or collections of mutable objects, it creates inconsistencies and hence it would be of interest to us to have ways to make real copies of the objects. Essentially, we would require copies such that modifying it would not modify the original object. An example for what happens when we use assignment statements to make copies of mutable objects.

 

>>> fruits = ["apple", "mango", "orange"]
>>> fruits_copy = fruits
>>> id(fruits)
140684382177688
>>> id(fruits_copy)
140684382177688
>>> fruits_copy.append("grapes")
>>> fruits_copy
['apple', 'mango', 'orange', 'grapes']
>>> fruits
['apple', 'mango', 'orange', 'grapes']
>>>

 

The above example demonstrates how using an assignment statement for copying a mutable object has an impact on the original object when the later one is modified.

 

Python’s built-in mutable collections like lists, dicts, and sets can be copied by calling their factory functions on an existing collection.

 

new_list = list(original_list)
new_dict = dict(original_dict)
new_set = set(original_set)

However, this only makes a shallow copy of the objects and goes only a level deep in the recursion tree.

 

A shallow copy refers to construction of a new object followed by populating it with the references to the child objects found in the original object. This implies that a shallow copy is only one level deep. The recursion tree on copying does not proceed further on childs of the object.

 

>>> x = [[1, 2 , 3], [4, 5, 6]]
>>> x_copy = list(x)
>>> id(x) == id(x_copy)
False
>>> x_copy.append([7, 8, 9])
>>> x_copy
[[1, 2, 3], [4, 5, 6], [7, 8, 9]]
>>> x
[[1, 2, 3], [4, 5, 6]]

 

In the above example, since x_copy is a new object and the contents upto one level deep are copied to the object, it does not affect the original object. But, remember a shallow copy is only one level deep while all other childs are only referenced and not copied as new ones. Following example should explain it.

 

>>> x = [[1, 2, 3], [4, 5, 6]]
>>> x_copy = list(x)
>>> x_copy[0][0] = "Changed"
>>> x_copy
[['Changed', 2, 3], [4, 5, 6]]
>>> x
[['Changed', 2, 3], [4, 5, 6]]

 

On the flip side, a deep copy is a recursive process. It initially constructs a new object followed by recursively populating it with copies of the child objects found in the original object. Deep copy walks the complete object tree to create a fully independent clone of the original object and all of its childs.

 

>>> import copy
>>> x = [[1, 2, 3], [4, 5, 6]]
>>> x_copy = copy.deepcopy(x)
>>> id(x) == x_copy
False
>>> x_copy
[[1, 2, 3], [4, 5, 6]]
>>> x
[[1, 2, 3], [4, 5, 6]]
>>> x_copy[0][0] = "New Object confirmation"
>>> x_copy
[['New Object confirmation', 2, 3], [4, 5, 6]]
>>> x
[[1, 2, 3], [4, 5, 6]]
>>>

The above example shows a deep copy and hence change in the copied object does not have any effect on the original object.