As Numexpr was added to the Fedora repositories only recently, it may be time to quickly mention what it is and what is new in version 2.0. According to Numexpr website:

The numexpr package evaluates multiple-operator array expressions many times faster than NumPy can. It accepts the expression as a string, analyzes it, rewrites it more efficiently, and compiles it to faster Python code on the fly. It's the next best thing to writing the expression in C and compiling it with a specialized just-in-time (JIT) compiler, i.e. it does not require a compiler at runtime.

Example use:

In [1]: import numpy as np
In [2]: import numexpr as ne
In [3]: a = np.random.rand(1e6)
In [4]: b = np.random.rand(1e6)
In [5]: timeit ne.evaluate('a**b')
10 loops, best of 3: 29.5 ms per loop

Also, array slicing can be used with the local_dict argument:

In [10]: timeit ne.evaluate('c**d', \
        local_dict={'c':a[1:], 'd':b[:-1]})
10 loops, best of 3: 28 ms per loop

Computations are multithreaded and bypass the GIL. Version 2.0 comes with a new virtual machine which speeds up broadcasting operations, on-the-fly translation of dtypes and Fortran-ordered arrays. The associated drawback is a slower start of the virtual machine. But overall, the gain is important for large arrays.

With a low number of elements and simple calculations, numexpr should not be used:

In [15]: %timeit a**2 + b**2 - 2*a*b
100000 loops, best of 3: 13 us per loop
 
In [16]: %timeit ne.evaluate('a**2 + b**2 - 2*a*b')
10000 loops, best of 3: 31.1 us per loop

However, with a higher complexity of operations, numexpr can come pretty close to NumPy on a small amount of elements.

Around 10⁶ elements, a good improvement is indeed visible:

In [7]: a = np.random.rand(1e6)
In [8]: b = np.random.rand(1e6)
In [12]: %timeit a**2 + b**2 - 2*a*b
100 loops, best of 3: 19.5 ms per loop
In [10]: %timeit ne.evaluate('a**2 + b**2 - 2*a*b')
100 loops, best of 3: 2.55 ms per loop

The following graph shows the impact of the number of threads on the computation time. numexpr.evaluate() was used with out=an_existing_array to avoid the creation of a new array as output. Each point was computed 9 times. Each plotted point is the median of this set, and the error bars show the best/worst case of the time ratio. Numpy vs Numexpr For some reason, there is a gap around 2000 elements, when the computations are multithreaded. Any suggestion about its origin would be welcome !

Overall, Numexpr is a powerful python module, speeding up array operations while reducing their memory requirements. Thanks to the developers. A list of supported functions is available at Numexpr website. By the way, PyTables makes a good use of Numexpr. I might write about that soon.