PYTHON OPTIMISATION
• Understand the performance costs of Python
• Learn how to avoid common performance pitfalls
Copyright By PowCoder代写 加微信 powcoder
• Learn about available tools to improve Python performance
Why Python
• Python is easy to learn
• Good for text processing
• Good for quick visualisations • Many scientific libraries
Python drawbacks
• Python is interpreted
• Python is dynamically typed
• Numbers are boxed (and arbitrary-precision)
• Everything is an object
• Python is garbage collected
• Python lacks true parallelism • i.e. Global Interpreter Lock (GIL)
• Optimising code carries some risks • Makes maintenance difficult
• Introduces subtle bugs
• Investment of time/money
• Optimising the wrong thing • Measure first
• Profiling shows time spent in individual parts of code.
• Part of standard library
• Low overhead
• Profile calls to C functions
import cProfile
import re cProfile.run(‘re.compile(“foo|bar”)’)
199 function calls (194 primitive calls) in 0.001 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000
4 0.000
4 0.000
2 0.000
1 0.000
1 0.000
1 0.000
0.000 0.001 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.001 0.000 0.001 0.000 0.000
0.001
0.000 enum.py:265(__call__)
0.000 enum.py:515(__new__)
0.000 enum.py:801(__and__)
0.001 re.py:231(compile)
0.001 re.py:286(_compile)
0.000 sre_compile.py:223(_compil)
• Can also be applied to full script:
python -m cProfile [-o output_file] [-s sort_order] myscript.py
• Can process saved profile data with pstats.Stats:
import pstats
p = pstats.Stats(‘restats’)
p.strip_dirs().sort_stats(-
1).print_stats()
p.sort_stats(‘time’).print_stats(10) p.print_callers(.5, ‘init’)
• for loops can be slow:
for i, el in enumerate(lst):
lst[i] = el ** 2
map(lambda x: x ** 2, lst)
• Use List comprehensions
[el ** 2 for el in lst] • Not always true
• Don’t do this if not building a list
Global variables and imports
• Global variable accesses are slower than local def func():
global x # avoid x= 5
• Imports are slow
• Some code can be optimised by doing them only when necessary
• Doesn’t speed up the whole program, just delays the cost
• Problems for large parallel runs of serial programs
def func():
import numpy as np
np.some_numpy_function()
non_imported_function()
Other things
• Built-ins are usually written in C
• So preferable if usable
• Use tuples over lists
• tuples are immutable
• tuples are stored like an array, lists have metadata and data storage
tuple1 = (1,2,3,4)
list1 = [1,2,3,4]
• String .join instead of + (Python 2 only)
Parallel Python
• Common way to accelerate program is to parallelise it using threads
• Python has thread support
• Unfortunately, Python has Global Interpreter Lock
• This means only a single thread can run at any one time
• Threads only useful for I/O or calls to native code.
Parallel Python
• Python has support for parallelism using multiple processes
• Message-passing based communication • Support for lock based synchronisation • Worker pool for easier offload
• Extremely heavyweight
• Only generally useful for large, trivially parallel jobs…
• …or distributed rather than parallel computing
Native code
• If all else fails, write everything in C/Fortran and call from Python
• Many ways of doing this • CPython API
• PyBind11
• ctypes • Swig
• For good performance, Python alone is not enough
• Need to use other tools, or hand off to native,
optimized code
• Fortunately, many of these tools exist.
Performant Python: Numpy
• NumPy is a library for numerical computing
• Adds support for multidimensional arrays • Why are Python lists of lists not suitable?
• Provides operations on those arrays
• Implemented in C and Fortran
• Implicitly vectorised
• Uses well-established libraries with good performance
• i.e. mkl, openblas, etc..
• Use this for anything involving 1D or 2D arrays
def MatrixMul( mtx_a, mtx_b):
tpos_b = zip( *mtx_b)
rtn = [[ sum( ea*eb for ea,eb in zip(a,b)) for b in tpos_b] for a in mtx_a] return rtn
import numpy as np
a = np.matrix([1,3], [3,4] ) b = np.matrix([1,3], [3,4] ) np.dot(a, b)
• Code written entirely in NumPy should be almost as fast as the C version
• Try to keep code on the NumPy “side”
• AvoidcopiesbetweenpythonlistsandNumPyarraysifpossible • Avoid for loops: use NumPy operations
SciPy – Higher level functionality
• SciPy is a library for general scientific computing
• Contains useful, fast codes: • Fourier transforms
• Linear algebra
• Sparse matrices
• Written in C or NumPy
import numpy as np
from scipy.sparse import csr_matrix
A = csr_matrix([[1, 2, 0], [0, 0, 3], [4, 0, 5]]) v = np.array([1, 0, -1])
• Compiles Python code to native code “Just-in-Time (JIT)”
• Uses LLVM backend
• No need to replace existing Python code: only need
to annotate
• Compiles Python numbers to native numeric types
• Falls back to Python if it can’t optimise
• Can auto-parallelise loops
• Works well with NumPy
• Best used with long-running functions
NUMBA example
• Compiles on first function call from numba import jit
def f(arr):
for i, el in enumerate(arr):
arr[i] = el ** el
• Can parallelise:
from numba import jit
@jit(nopython=True, parallel=True) def f(arr):
for i in prange(len(arr)):
arr[i] = arr[i] ** arr[i]
• Cython is a superset of the Python language
• Optionally typed
• Type-annotated code is statically compiled to C
• Good for accelerating critical parts of Python programs
• Running an unmodified program through Cython will not speed anything up
• Good for calling C code from Python
def fib(n):
“””Print the Fibonacci series up to n.””” a, b = 0, 1
while b < n:
print(b, end=' ')
a, b = b, a + b print()
from distutils.core import setup from Cython.Build import cythonize setup(
ext_modules=cythonize("fib.pyx"), )
python setup.py build_ext --inplace
Intel Python Distribution
• Intel provides optimised library for Scientific and Numerical computing (MKL)
• Can call this directly from Python
• But Intel also has its own Python distribution • Accelerates NumPy code with MKL
• Comes with NUMBA and Cython and others
• Not clear on whether these are accelerated
• Possibly uses Intel C compiler and threading libraries
• Drop-in replacement for your existing numerical Python code
• PyPy is an alternative implementation of the Python language
• Uses completely new virtual machine and runtime
• "Hot" loops are JIT-compiled to machine code
• Can result in large speedup, for free. • Average of 7.6 times faster
• Not all libraries are supported
• NumPy mostly supported
• Still no proper multithreading
• CPython extensions are slow • Use cffi
PyPy performance
• Python is generally slow
• Compared to fully compiled languages such as C and Fortran
• Library support exists for faster scientific computing • NumPy, SciPy, Tensorflow et al.
• NUMBA allows us to compile Python functions to machine code
• PyPy should speed up vanilla Python code
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com