As a follow up of the previous article I'd like to list here some hints about adapting your Python2 sources for the migration to Python3 that I discovered while doing the same job on some of my scripts. The list is not complete because, as I said, it's the result of a personal experience and not of a thorough exploration.
To write code compatible with Python version 2 and 3 you have to enable some backports that are available starting from the 2.5 version of the language.
Starting from Python 3 they'll return an iterator, while in Python 2 they return a list. For compatibility you can enclose the result of the function in a "list()" call that forces it to be a list in Python 3 also.
This implies also that to retrieve the next element of an iterator you cannot use the "next" method any more, but you must use the "next" function introduced with Python 2.6:
but it should be:
should be:
Warning: the "sha" module is now split into different versions: "sha1", "sha224" etc., so writing the import you should remember to change its name to preserve compatibility:
To enable this behaviour from Python 2.5:
and after it imports must be written as follows:
etc., while the usual syntax:
will always try to import "foo" from the package root.
This of course is a problem: relative imports are based on the content of the __name__ attribute of a module. If you execute a module as __main__, for example when you're testing it, the interpreter doesn't know where to start looking for modules imported with a relative path and raises an exception. This means that you cannot execute modules as standalone programs, but you are forced to install and import them, after which relative paths will work. On the contrary, you can use only absolute paths, but this still forces you to install a module before you can run it, let alone that if you move it into another directory of your package you have to fix all the paths in the imports.
In other words: the first module must be imported with an absolute path, then all the relative imports will work.
This is annoying, to say the least, because you cannot use the __name__ == '__main__' trick any more to test a single module.
A rather complex solution is to add the following code to the beginning of the module you need to make a standalone executable:
These rows transform the directory containing the module into a package adding it to sys.path and modifying the module's internal name (the one the interpreter sees) in '<package>.<module>' so that relative imports will start to work. Working with paths you aren't bound to the module's location that can change.
Be warned: by using sys.path.insert you may insert names that will override identical system names (if your package happens to have overlapping directory names with system ones); by using sys.path.append the risk is to use a previously installed version of the package instead of the one you're testing. I don't think there's a way to fix both issues at the same time.
This probably doesn't mean that strings read from the terminal will be unicode.
"bytes" in Python 2 returns an immutable string. If you want a mutable byte string you should use "bytearray".
True division for ints and longs will convert the arguments to float and then apply a float division. That is, even 2/1 will return a float (2.0), not an int. For floats and complex, it will be the same as classic division.
Integer division will become "//".
Alternatively you can keep on using keys, values e items the old way but enclosing them in a "list()" function, that is converting the result to a list, so that you get a list in Python 3 also.
To write code compatible with Python version 2 and 3 you have to enable some backports that are available starting from the 2.5 version of the language.
General changes
“long”s will be gone
In Python 3 the distinction between int and long will disappear, so as the "0L" syntax. To avoid syntax problems you can replace "0L" with "long(0)" and in Python 3 you'll redefine long=int so that everything will work again. At the beginning of the script you have to check for the Python version to enable the substitution:import sys python_version_major = sys.version_info [ 0 ] if python_version_major > 2: long = int
xrange function replaced by range
Like the previous case it's enough to check for the Python version to rename the xrange function to "range" when it's needed.The “unicode” type will disappear
With Python 3 all strings will be unicode, so the "str" datatype will be unicode by default. Like the previous cases (again!) testing the Python version will be enough to do unicode=str when needed.Some functions will return an iterator in place of a list
These functions are: zip, range, map.Starting from Python 3 they'll return an iterator, while in Python 2 they return a list. For compatibility you can enclose the result of the function in a "list()" call that forces it to be a list in Python 3 also.
next method replaced by __next__
When you define an iterator you must use the "__next__" method in place of "next". So, for compatibility you can define them to be equal:def next ( self ): ... __next__ = next
This implies also that to retrieve the next element of an iterator you cannot use the "next" method any more, but you must use the "next" function introduced with Python 2.6:
next(foo) instead of foo.next()
__nonzero__ method replaced by __bool__
To implement the truth test on an object in Python 2 you use the __nonzero__ method, while in Python 3 this will be replaced by the __bool__method. So, for compatibility it's enough to define them to be equal:def __nonzero__ ( self ): … __bool__ = __nonzero__
Exceptions cannot be strings
In other words this syntax is now invalid:raise SerializationError, “decoding error”
but it should be:
raise SerializationError ( “decoding error” )
Changes to modules
Removed “exceptions” module
Exceptions don't belong to the "exceptions" module any more but they've been moved to __builtins__, so the "exceptions" modules has become useless and it has been removed.“md5”, “sha”, etc. modules collected under “hashlib”
Modules related to hash generation have been collected under hashlib, so that imports must be changed. For example the following instruction:import md5
should be:
from hashlib import md5
Warning: the "sha" module is now split into different versions: "sha1", "sha224" etc., so writing the import you should remember to change its name to preserve compatibility:
from hashlib import sha1 as sha
Changes introduced from Python 2.5
Absolute imports
In Python 3 imports are absolute: to import a file from the same package you must use the dot "." before the name of the file to make it relative to the current package.To enable this behaviour from Python 2.5:
from __future__ import absolute_import
and after it imports must be written as follows:
from .foo import Foo from . import foo
etc., while the usual syntax:
import foo
will always try to import "foo" from the package root.
This of course is a problem: relative imports are based on the content of the __name__ attribute of a module. If you execute a module as __main__, for example when you're testing it, the interpreter doesn't know where to start looking for modules imported with a relative path and raises an exception. This means that you cannot execute modules as standalone programs, but you are forced to install and import them, after which relative paths will work. On the contrary, you can use only absolute paths, but this still forces you to install a module before you can run it, let alone that if you move it into another directory of your package you have to fix all the paths in the imports.
In other words: the first module must be imported with an absolute path, then all the relative imports will work.
This is annoying, to say the least, because you cannot use the __name__ == '__main__' trick any more to test a single module.
A rather complex solution is to add the following code to the beginning of the module you need to make a standalone executable:
if __name__ == '__main__': import os, sys # fullpath of this file __this_path = os.path.abspath ( sys.argv [ 0 ] ) # module name (this file name without extension) __this_name = os.path.splitext ( os.path.basename ( __this_path ) ) [ 0 ] # fullpath of the directory that contains this file __parent_path = os.path.dirname ( __this_path ) # name of the directory that contains this file __parent_name = os.path.basename ( __parent_path ) # add to the path the directory up two levels (that is not the one # that contains the script, but one more above it) sys.path.insert ( 0, os.path.dirname ( __parent_path ) ) # import as a module the directory that contains the script __import__ ( __parent_name ) # change the script name __name__ = '%s.%s' % ( __parent_name, __this_name ) # from this point on relative imports work
These rows transform the directory containing the module into a package adding it to sys.path and modifying the module's internal name (the one the interpreter sees) in '<package>.<module>' so that relative imports will start to work. Working with paths you aren't bound to the module's location that can change.
Be warned: by using sys.path.insert you may insert names that will override identical system names (if your package happens to have overlapping directory names with system ones); by using sys.path.append the risk is to use a previously installed version of the package instead of the one you're testing. I don't think there's a way to fix both issues at the same time.
Changes introduced from Python 2.6
Print function
from __future__ import print_function def print(*args, sep=' ', end='\n', file=None)
New syntax to capture the value of an exception
In Python 3 the “as” keywork will be mandatory:try: … except TypeError as exc: …
Unicode format for all literal strings
from __future__ import unicode_literals
This probably doesn't mean that strings read from the terminal will be unicode.
Difference between unicode and 8 bit
Everything that is 8 bit should be built with "bytes" or "b": in Python 2 "bytes" is an alias for "str", so the result is different from Python 3, but you have to use "bytes" in Python 2 in instructions like "isinstance(x, bytes)" to make the 2to3 tool understand that you're managing bytes and not unicode strings."bytes" in Python 2 returns an immutable string. If you want a mutable byte string you should use "bytearray".
Notation for binary and octal numbers
The new syntax for octal numbers is "0o" instead of the simple "0" at the beginning. There's also the new "0b" syntax to identify binary numbers together with the "bin" function that converts a number in binary format.Integer division
The "classical" division is an integer divison. To switch to the "true" division you need:from __future__ import division
True division for ints and longs will convert the arguments to float and then apply a float division. That is, even 2/1 will return a float (2.0), not an int. For floats and complex, it will be the same as classic division.
Integer division will become "//".
"callable" removed
"callable" has been deprecated; it has been replaced by:isinstance(x, collections.Callable)
"has_key" removed
The "has_key" method has been removed: you must use the "in" operator."apply" removed
"apply" has been deprecated: you must use the extended function call:func(*args, **kwargs)
Changes introduced from Python 2.7
DeprecationWarning
These warnings has been disabled by default. While developing you should enable them. There are some ways to do it:- by using the "-Wdefault" flag from the command line;
- by setting the PYTHONWARNINGS environment variable to "default";
- by calling warnings.simplefilter('default') in your code.
Dictionary: keys, values e items
Starting from Python 3 these methods return a "view" that is automatically updated when the dictionary is updated. In Python 2.7 you have this behaviour using the viewkeys, viewvalues e viewitems methods that are automatically converted by the 2to3 tool.Alternatively you can keep on using keys, values e items the old way but enclosing them in a "list()" function, that is converting the result to a list, so that you get a list in Python 3 also.
Literal set syntax
Braces: {1, 2, 3, 4} Empty set: set()
Dictionary and set comprehensions
>>> {x: x*x for x in range(6)} {0: 0, 1: 1, 2: 4, 3: 9, 4: 16, 5: 25} >>> {('a'*x) for x in range(6)} set(['', 'a', 'aa', 'aaa', 'aaaa', 'aaaaa'])
__cmp__ method removed
The __cmp__ method will be removed in Python 3.0: you must define every single method for comparing an object (__lt__, etc.). There's an utility that helps in the task.importlib
Python 3.1 has a richer importlib that contains a complete reimplementation of the import mechanics. The Python 2.7 version contains only the import module function.
Comments
Post a Comment