Stealing pages from the server...

Run Python Function Faster


Intoduction

To make python code more faster, we can use multiprocessing to map a function to a list. We will also use the implanted solution in tqdm library to show progress bar whilst running the code in multiprocessing manner.

Main Code

In this example, we want to speed up the process of tokenisation on the text with max workers of 4.

from nltk import word_tokenize
from tqdm.contrib.concurrent import process_map


def main():
    sentences = [
        'Due to multiprocessing, the estimation time could be unstable.', 
        'Context manager for Pool is only available in Python 3.', 
        'This will redraw the bar at each step on a new line.', 
        'We can use following code as suggested in.'
    ]
    results = process_map(word_tokenize, sentences, max_workers=4)
    print(results)


if __name__ == '__main__':
    main()
[
    ['The', 'estimation', 'time', 'could', 'be', 'unstable', '.'], 
    ['Context', 'manager', 'for', 'Pool', 'is', 'available', 'in', 'Python', '.'], 
    ['This', 'will', 'redraw', 'the', 'bar', 'at', 'each', 'step', '.'], 
    ['We', 'can', 'use', 'following', 'code', 'as', 'suggested', 'in', '.']
]

References

  1. https://stackoverflow.com/a/59905309
  2. https://blog.csdn.net/weixin_39274659/article/details/107794635

Author: Yang Wang
Reprint policy: All articles in this blog are used except for special statements CC BY 4.0 reprint polocy. If reproduced, please indicate source Yang Wang !
 Previous
Download VoxCeleb Dataset Download VoxCeleb Dataset
Most of the download scripts on the website are not able to access the VoxCeleb datasets, so I decide to share how I did it in this article.
2022-11-16
Next 
Mix Audio with Noise Mix Audio with Noise
In this tutorial, I will look into how to prepare audio data, and mix it with noise. By doing so, we can generate new examples for free and make our training dataset more generalised. This step is also called data augmentation, this simplest technique usually works better.
2022-09-19
  TOC