Basic parallelization
In this tutorial, we consider a simple Python function and how we can parallelize it to run in parallel on the GPU. The code shown in this example is found at examples/hello.py in the ParPy repository. Run it using python examples/hello.py from the root directory.
Elementwise addition
Assume we have an implementation of elementwise addition in Python that we want to run on the GPU:
def elemwise_add(x, y, out, N):
for i in range(N):
out[i] = x[i] + y[i]
We have to annotate this function to be able to parallelize it. An annotated version of this function is shown below:
import parpy
@parpy.jit
def elemwise_add(x, y, out, N):
parpy.label('outer')
for i in range(N):
out[i] = x[i] + y[i]
To use the ParPy features, we first need to import the parpy package. We decorate the function using @parpy.jit to indicate that the function should be just-in-time (JIT) compiled when it is called. Finally, to be able to parallelize the function, we use parpy.label to associate the label outer with the subsequent for-loop. When we invoke the function, we can refer to this label to control how much parallelism to use in the for-loop.
Before calling the function, we allocate input data to the function. We can use NumPy to allocate data to test our implementation.
import numpy as np
N = 1024
x = np.random.randn(N).astype(np.float32)
y = np.random.randn(N).astype(np.float32)
out = np.empty_like(x)
When calling a JIT-compiled function we need to specify how we want it to be parallelized. Parallelization is controlled via the labels declared in the function body. When calling the function, we need to provide an object containing compilation options, including a parallel specification defining how to parallelize each label, as a dictionary where labels are keys. We use the parpy.par function to construct a default object of compiler options with a given parallel specification p:
p = {'outer': parpy.threads(N)}
opts = parpy.par(p)
Given that a supported backend has been properly set up, we are now ready to call the function. We provide the compiler options using the opts keyword argument:
elemwise_add(x, y, out, N, opts=opts)
assert np.allclose(out, x + y, atol=1e-5)
print("Test OK")