Skip to main content

To Python 3 and Back Again: Is It Worth the Switch?

Python 3 has been in existence for 7 years now, yet some still prefer to use Python 2 instead of the newer version. This is a problem especially for neophytes that are approaching Python for the first time. I realized this at my previous workplace with colleagues in the exact same situation. Not only were they unaware of the differences between the two versions, they were not even aware of the version that they had installed.

Inevitably, different colleagues had installed different versions of the interpreter. That was a recipe for disaster if they would’ve then tried to blindly share the scripts between them.

This wasn’t quite their fault, on the contrary. A greater effort for documenting and raising awareness is needed to dispel that veil of FUD (fear, uncertainty and doubt) that sometimes affects our choices. This post is thus thought for them, or for those who already use Python 2 but aren’t sure about moving to the next version, maybe because they tried version 3 only at the beginning when it was less refined and support for libraries was worse.

Two Dialects, One Language


First of all, is it true that Python 2 and Python 3 are different languages? This is not a trivial question. Even if some people would settle the question with: “No, it’s not a new language”, as a matter of fact several proposals that would have broken compatibility without yielding important advantages have been rejected.

To Python 3 and Back Again: Is It worth the Switch?

Python 3 is a new version of Python, but it’s not necessarily backwards compatible with code written for Python 2. At the same time it’s possible to write code that is compatible with both versions, and this is not by chance but a clear commitment of the developers that drafted the several PEP (Python Extension Proposal). In the few cases in which syntax is incompatible, thanks to the fact that Python is a language with which we can dynamically modify code at runtime, we can solve the problem without relying on preprocessor with a syntax completely alien to the rest of the language.

The syntax is thus not a problem (especially ignoring versions of Python 3 before 3.3). The other big difference is the behavior of code, its semantics and the presence/absence of big libraries only for one of the two versions. This is indeed a significant problem, but it’s not completely unique or new for those who already have experience with other programming languages. You probably already happened to get an old codebase/library that fails to build with recent versions of the same compiler used originally. It’s the compiler itself in these cases that will help you (in Python, instead help will come from your own test suite).

Why make the new version different then? What advantages will these changes bring to us?

A Concrete Example


Let’s assume we want to write a program to read the owner of files/directories (on a Unix system) in our current directory and print them on screen.

# encoding: utf-8

from os import listdir, stat

# to keep this example simple, we won't use the `pwd` module
names = {1000: 'dario',
         1001: u'олга'}

for node in listdir(b'.'):
    owner = names[stat(node).st_uid]
    print(owner + ': ' + node)

Does everything work correctly? Apparently it does. We specified the encoding for the file containing the source code, if we have a file created by олга (uid 1001) in our directory its name will be printed correctly, and even if we have files with non-ASCII names these will be printed correctly.

There’s still a case that we haven’t covered yet though: a file created by олга AND with non-ASCII characters in the name…

su олга -c "touch é"

Let’s try to launch again our small script, and we’ll obtain a:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)



If you think about it, a similar situation could be nasty: You have written your program (thousands of lines long instead of the few 4 of this example), you start to gather some users, some of them even from non-English speaking countries with more exotic names. Everything is okay, until one of these users decides to create a file that users with more prosaic name can create without any problem. Now your code will throw an error, the server might answer every request from this user with a error 500, and you’ll need to dig in the codebase to understand why suddenly these errors are appearing.

How does Python 3 help us with this? If you try to execute the same script, you’ll discover that Python is able to detect right away when you’re about to execute a dangerous operation. Even without files with peculiar names and/or created by peculiar users, you’ll receive right away an exception like:

`TypeError: Can't convert 'bytes' object to str implicitly`

Related to line:

print(owner + ': ' + node)

The error message is even more easy to understand, in my opinion. The str object is owner, and node is a bytes object. Knowing this, it’s obvious that the problem is due to the fact that listdir is returning us a list of bytes objects.

A detail that not everybody knows is that listdir returns a list of bytes objects or unicode strings depending on the type of the object that was used as input. I avoided using listdir('.') exactly to obtain the same behavior on Python 2 and Python 3, otherwise on Python 3 this would’ve been an unicode string that would’ve made the bug disappear.

If we try to change a single character, from listdir(b'.') to listdir(u'.') we’ll be able to see how the code now works on both Python 3 and Python 2. For completeness, we should also change 'dario' to u'dario'.

This difference in the behavior between Python 2 and Python 3 is however supported by a radical difference in how the two versions handle string types, a difference that is mainly perceived when porting from one version to the other.

In my opinion, this situation is emblematic of the maxim: “splitters can be lumped more easily than lumpers can be split”. What was lumped together in Python 2 (unicode strings and default strings of byte, which could be freely coerced together) has been split in Python 3.

Tools for Automated Conversion


For this reason tools like 2to3, even if well written and extremely useful to automate the conversion of every other difference, have some limitations. With the bytes/unicode split the difference in behavior surfaces at runtime, and a tool that can only do parsing/static analysis thus won’t be able to save you if you have a huge Python 2 codebase that mixes these two types. You’ll have to roll up your sleeves and properly design your API to decide if functions that until now accepted indiscriminately any type of strings should now work only with some of these (and which ones). Conversely, though getting a lot less use, tools of conversion from Python 3 to Python 2 have much easier life. Let’s see an example:

Sometime ago, I wrote a toy HTTP server (only dependency: python-magic), and this is the version for Python 2 (automatically converted from the Python 3 one without any need for manual changes): https://gist.github.com/berdario/8abfd9020894e72b310a

Now, if you want you can have a look directly at the code converted to Python 3 with 2to3, or you can convert it directly on your system. When trying to execute it you’ll realize how every error that you can try to fix by hand is related to the bytes/unicode split.

You can manually apply changes like these: https://gist.github.com/berdario/34370a8bc39895cae139/revisions

And thus, you get your program working again on Python 3. These are not complex changes, but they require nonetheless to reason on which data types your functions are working upon, and on the control flow. It’s 13 lines of changes out of 120, a ratio not too easy to handle: with thousands of lines of code to port, you could easily end up with hundreds to modify.



If you’re curious, you could then try to convert this code that you just brought to Python 3 back to Python 2. Using 3to2 you’d obtain this: https://gist.github.com/berdario/cbccaf7f36d61840e0ed. In which the only change that had to be applied manually is .encode('utf-8') at line 55.

Starting from Python 3 (if you’ll ever need to convert it back to Python 2), it’s much easier. But if you need to have your code working on another version, a complete conversion like this is not the best choice. It’s much better to maintain compatibility with both versions of Python. To do that you can rely on tools like futurize.

Python 3 Is Not Just about Unicode


Even if you don’t have the chance to use Python 3 in production (maybe one of the libraries that you’re using is bulky and compatible with Python 2 only), I’d suggest for you to keep your code compatible with Python 3. You could even stub/mock out the incompatible libraries, just so that you could run continuously your tests on both versions. This will make it easier for you when in the future you’ll finally be ready to migrate to Python 3, not to mention how it can help you in better design your API, or to identify errors like in the example at the beginning of this post.



All this talking about porting and byte/unicode difference, even if you were initially skeptical about using/starting with Python 3, probably led you to think of it as the lesser evil rather than tackling the porting in the future. But if porting is the stick, where’s the carrot? Is it the new features added to the language and to its standard library?

Well, after 5 years of time from the release of the last minor version of Python 2, there are plenty of interesting tidbits that are piling up. For example I found myself relying quite often on things like the new keyword-only arguments.

Optional Keyword Arguments


When I wanted to write a function to merge an arbitrary number of dictionaries together (similar to what dict.update does, but without modifying the inputs) I found it natural to add a function argument to let the caller customize the logic. This way this function could be invoked as follows to simply merge multiple dictionaries by retaining values in the rightmost dicts.

merge_dicts({'a':1, 'c':3}, {'a':4, 'b':2}, {'b': -1})
# {'b': -1, 'a': 4, 'c': 3}

Likewise, to merge by adding the values:

from operator import add
merge_dicts({'a':1, 'c':3}, {'a':4, 'b':2}, {'b': -1}, withf=add)
# {'b': 1, 'a': 5, 'c': 3}

Implementing such an API in Python 2 would have required to define a **kwargs input and look for the withf argument. If the caller did mistype the argument as (e.g.) withfun the error would be silently ignored, though. In Python 3 instead it’s perfectly fine to add an optional argument after variable arguments (and it will be usable only with its keyword):

def second(a, b):
    return b

def merge_dicts(*dicts, withf=second):
    newdict = {}
    for d in dicts:
        shared_keys = newdict.keys() & d.keys()
        newdict.update({k: d[k] for k in d.keys() - newdict.keys()})
        newdict.update({k: withf(newdict[k], d[k]) for k in shared_keys})
    return newdict

Unpacking Operator


Since Python 3.5, the naive merging can actually be done with the new unpacking operator. But even before 3.5 Python got an improved form of unpacking:

a, b, *rest = [1, 2, 3, 4, 5]
rest
# [3, 4, 5]

This has been available to us since 3.0. Akin to destructuring, this kind of unpacking is a limited/ad-hoc form of the pattern matching commonly used in functional languages (where it is also used for flow control) and it’s a common feature in dynamic languages like Ruby and Javascript (where support for EcmaScript 2015 is available).

Simpler APIs for Iterables


In Python 2, a lot of APIs that dealt with iterables were duplicated, and the default ones had strict semantics. Now instead everything will generate values as needed: zip(), dict.items(), map(), range(). Do you want to write your own version of enumerate? In Python 3 it’s as simple as composing functions from the standard library together:

zip(itertools.count(1), 'abc')

Is equivalent to enumerate('abc', 1).

Function Annotations


Wouldn’t you like to define HTTP APIs as simply as this?

@get('/balance')
def balance(user_id: int):
    pass
    
from decimal import Decimal

@post('/pay')
def pay(user_id: int, amount: Decimal):
    pass

No more '<int:user_id>' ad-hoc syntax, and the ability to use any type/constructor (like Decimal) inside your routes without having to define your own converter.

Something like this has already been implemented, and what you see is valid Python syntax, exploiting the new annotations to make it more convenient to write APIs that are also self documenting.

Wrapping Up


These are just a couple of simple examples, but the improvements are far reaching and ultimately help you in writing more robust code. An example is exception chain tracebacks enabled by default, showcased in the aptly named post “The most underrated feature in Python 3” by Ionel Cristian Mărieș, which is also covered in this other post by Aaron Maxwell, together with the stricter comparison semantics of Python 3, and the new super behavior.

This is not all. There are plenty of other improvements, these are the ones I feel have the most impact day-to-day:


A more thorough panorama can be obtained with the “What’s New” pages of the documention, or for another overview of the changes I also suggest this other post by Aaron Maxwell and these slides from Brett Cannon.

Python 2.7 will be supported until 2020, but don’t wait until 2020 to move to a new (and better) version!

The original article is from Toptal.

Comments

Most popular posts

Pairing the Raspberry Pi 3 with your Playstation 3 controller

While setting up the MAME emulator on the Raspberry Pi 3 I decided to experiment with the PS3 controller trying to pair it with the RPi. I found a useful guide here: http://holvin.blogspot.it/2013/11/how-to-setup-raspberry-pi-as-retro.html At section 4 the author describes how to compile sixpair utility, test that everything is working and compile the QtSixA tool. But there are some differences to be noted when working with the Raspberry Pi version 3. First, and most obvious, of all: the RPi 3 has already a Bluetooth device built in, so you don't have to plug a dongle in it, and it's compatible with the PS3 controller. 1. Sixpair The sixpair utility succeeds in coupling with the controller. But to test that it's working I had to test the js1 joystick port, and not the js0 as stated in the guide; so the actual command is: jstest /dev/input/js1 2. QtSixA The QtSixA download link must be changed, because the one shown doesn't compile with the latest...

JSON Web Token Tutorial: An Example in Laravel and AngularJS

With the rising popularity of single page applications, mobile applications, and RESTful API services, the way web developers write back-end code has changed significantly. With technologies like AngularJS and BackboneJS, we are no longer spending much time building markup, instead we are building APIs that our front-end applications consume. Our back-end is more about business logic and data, while presentation logic is moved exclusively to the front-end or mobile applications. These changes have led to new ways of implementing authentication in modern applications. Authentication is one of the most important parts of any web application. For decades, cookies and server-based authentication were the easiest solution. However, handling authentication in modern Mobile and Single Page Applications can be tricky, and demand a better approach. The best known solutions to authentication problems for APIs are the OAuth 2.0 and the JSON Web Token (JWT). What is a JSON Web Token? A JSO...

Software Release Management For Small Teams

Formalizing The Release Management Process (If There’s Any) In some team configurations, especially ones that are found in startups, there are no DevOps, nor infrastructure engineers, to provide support when releasing a new version of the product. Moreover, unlike large bureaucratic companies with defined formal processes, the CTO or Head of Software Development team in a startup is often not aware of the complexities of the software release management process; a few developers in the company may be aware of the complex details of the process, but not everyone. If this knowledge is not documented thoroughly , I believe it could result in confusion. In this article, I’ll try to provide some tips about how to formalize the release process, particularly from the developer’s point of view. Enter The Software Release Checklist You may be familiar with the idea of a checklist for some operations, as per the Checklist Manifesto , a book by Atul Gawande. I believe a formal release proc...