Python: Multiprocessing to Boost Up Preprocessing Speed

Photo by NASA on Unsplash

In today’s world, data is ubiquitous. The quantity of data increases exponentially, but the size of the data keeps on growing. Often, processing such big data becomes a bottleneck when running a deep learning model. For this topic, I would like to discuss how to speed up preprocessing speed by running the same task in parallel.

In this article, a brief introduction to multiprocessing is laid out, followed by a case study of implementing multiprocessing to slice a whole slide image into patches

Before I go any deeper, I would like to briefly talk about how Python manages incoming data.

Python possesses a global interpreter lock (GIL), which manages memory issues by allocating one thread to a particular CPU core at a time. Such management is necessary to avoid memory leakage and containment issues. However, this process limits only one thread to proceed at a time. The inability of GIL to take more than one thread prevents computers to utilize CPUs at full capacity.

Check out this link if you want to learn more about GIL and how it operates.

Fortunately, we can launch parallel tasks with Python3 already built-in package called, concurrent.futures. The package comes with two options to perform asynchronous execution: threading and processing.

Simply, threading means using a pool of threads and processing means using a pool of processes to execute calls asynchronously. However, a key difference to note is that threads run in the same memory space while processes have separate memory. Therefore, the choice between the two methods really depends on the purpose of the task that you are trying to establish.

Case Study

Whole slide image(WSI), which is a digital image of biopsy tissue, is heavy. What I really mean is that it requires a large memory space to manipulate data in place, taking up more than 16GB when it is converted to ‘float’.

There are many reasons to train the model with patches rather than the whole image itself. To mention a few, training data increases significantly and it enables the model to inspect the configurations of the cells even more closely, resulting in more detailed output.

For the above reasons, I have decided to slice WSI… but the time to generate patches takes forever. So I found this instance to be the perfect place to implement multiprocessing, using concurrent.futures, to boost up the speed.

Figure 1. Pooling executor pseudocode

When a multi-processing executor is launched, as you can see in Figure 1, we can simply map the desired function onto an iterable sequence — (mapping is the key in injecting parallelism into the program).

A suggestion: When formulating an iterable sequence, it might be better to yield rather than to return it. Yield converts a regular Python function into a generator, producing a series of values over time without storing the entire sequence in memory. Thus, if iterable sequence holds quite a lot of information, yielding might be a better path to take. Note figure 2 for an example of generating iterable sequences.

Figure 2. Pseudocode of generating iterable sequences

For this case study, the function would be responsible for generating patches from WSI after preprocessing and filtering procedures. The iterable sequence would be a list of sequences of each patch’s location and necessary information used to generate patches. Once the function and iterator are properly set up, we can now begin the multi-processing executor. The function will be applied to each incoming sequence and place into available workers, generating multiple patches at once.

To help visually understand the interworkings of multi-processing, see Figure 2. The executor will load up each iterator with a location of the patch and necessary information used for preprocessing and filtering. This process allows multiple patches to be generated during the same amount of time that was used to generate a patch, previously.

Figure 2. Interworkings of multi-processing.

The effect of multi-processing was quite surprising. Previously, it took an average of 8 minutes to fully composite patches after preprocessing and filtering. Once multiprocessing was applied, there was a 57% reduction in speed, resulting in an average of 3 minutes.

Figure 3. Patch generation runtime.

Overall, multiprocessing is a fast and efficient approach when used appropriately. If you are dealing with large data, I definitively recommend sitting back and ponder about the ways to instill parallelism into your algorithm to make your code faster and cooler :D — it is pretty satisfying to watch all CPUs working at their full capacity.

Thanks for reading! Any comments and questions are welcomed in the chat.

AI enthusiast who is currently on the quest for exploring new insights and ideas