Concurrency in Python. A beginners guide to exploiting the… | by David Farrugia | May, 2023


A common example used to illustrate the benefits of multi-threading is the concurrent downloading of multiple images over a network. We can use this example to gain an understanding of how effective multi-threading can be, and how it can be implemented in Python.

import os
import time
from urllib.parse import urlparse
from urllib.request import urlretrieve
from typing import List
from numpy import round

IMGS_URL_LIST =
['https://dl.dropboxusercontent.com/s/2fu69d8lfesbhru/pexels-photo-48603.jpeg',
'https://dl.dropboxusercontent.com/s/zch88m6sb8a7bm1/pexels-photo-134392.jpeg',
'https://dl.dropboxusercontent.com/s/lsr6dxw5m2ep5qt/pexels-photo-135130.jpeg',
'https://dl.dropboxusercontent.com/s/6xinfm0lcnbirb9/pexels-photo-167300.jpeg',
'https://dl.dropboxusercontent.com/s/2dp2hli32h9p0y6/pexels-photo-167921.jpeg',
'https://dl.dropboxusercontent.com/s/fjb1m3grcrceqo2/pexels-photo-173125.jpeg',
'https://dl.dropboxusercontent.com/s/56u8p4oplagc4bp/pexels-photo-185934.jpeg',
'https://dl.dropboxusercontent.com/s/2s1x7wz4sdvxssr/pexels-photo-192454.jpeg',
'https://dl.dropboxusercontent.com/s/1gjphqnllzm10hh/pexels-photo-193038.jpeg',
'https://dl.dropboxusercontent.com/s/pcjz40c8pxpy057/pexels-photo-193043.jpeg',
'https://dl.dropboxusercontent.com/s/hokdfk7y8zmwe96/pexels-photo-207962.jpeg',
'https://dl.dropboxusercontent.com/s/k2tk2co7r18juy7/pexels-photo-247917.jpeg',
'https://dl.dropboxusercontent.com/s/m4xjekvqk4rksbx/pexels-photo-247932.jpeg',
'https://dl.dropboxusercontent.com/s/znmswtwhcdbpc10/pexels-photo-265186.jpeg',
'https://dl.dropboxusercontent.com/s/jgb6n4esquhh4gu/pexels-photo-302899.jpeg',
'https://dl.dropboxusercontent.com/s/rjuggi2ubc1b3bk/pexels-photo-317156.jpeg',
'https://dl.dropboxusercontent.com/s/cpaog2nwplilrz9/pexels-photo-317383.jpeg',
'https://dl.dropboxusercontent.com/s/16x2b6ruk18gji5/pexels-photo-320007.jpeg',
'https://dl.dropboxusercontent.com/s/xqzqzjkcwl52en0/pexels-photo-322207.jpeg',
'https://dl.dropboxusercontent.com/s/frclthpd7t8exma/pexels-photo-323503.jpeg',
'https://dl.dropboxusercontent.com/s/7ixez07vnc3jeyg/pexels-photo-324030.jpeg',
'https://dl.dropboxusercontent.com/s/1xlgrfy861nyhox/pexels-photo-324655.jpeg',
'https://dl.dropboxusercontent.com/s/v1b03d940lop05d/pexels-photo-324658.jpeg',
'https://dl.dropboxusercontent.com/s/ehrm5clkucbhvi4/pexels-photo-325520.jpeg',
'https://dl.dropboxusercontent.com/s/l7ga4ea98hfl49b/pexels-photo-333529.jpeg',
'https://dl.dropboxusercontent.com/s/rleff9tx000k19j/pexels-photo-341520.jpeg'
]

def download_images(img_url_list: List[str]) -> None:
# validate inputs
if not img_url_list:
return
os.makedirs('images', exist_ok=True)

# get time in seconds
start = time.perf_counter()

# for every url in our list, we parse the url and download its contents
for img_num, url in enumerate(img_url_list):
urlretrieve(url, f'images{os.path.sep}{img_num+1}')

print(f"Retrieved {len(img_url_list)} images took {round(time.perf_counter() - start, 2)} seconds")

download_images(IMGS_URL_LIST)

In the single-threaded script above, we set up a function (download_images) that retrieves some images publicly hosted on Dropbox.

This script took 22.06 seconds to download the 26 images.

To improve this, we can shift our programming mindset and use a different approach when building a script to use concurrency.

Instead of writing a function that loops through every single URL and retrieves its contents, we can split the logic into two main functions: the target and the runner functions.

The role of the target function is to encapsulate the logic needed to process a single URL.

As we want to have a thread for every URL, we need to provide the individual threads with the knowledge of how to process the URL.

The runner function is then used to trigger a new thread for every URL and store their results.

In the runner function, we need to instruct the main thread to create and start a child thread for every URL.

We can do this by looping through the passed URLs and creating and starting a new thread in Python as follows:

from threading import Thread

t = Thread(target=<TARGET_FUNCTION>, args=(<SOME_ARGS>))
t.start()

In our example, we want to wait for all images to download before continuing with the program. To do this, we can instruct the main thread to wait for all child threads to complete before proceeding with the program execution by calling the join() function. This function joins the child thread with the main thread and the main thread will not proceed until the execution of all joined threads completes.

from threading import Thread

# store our threads
threads = []

t = Thread(target=<TARGET_FUNCTION>, args=(<SOME_ARGS>))
t.start()
threads.append(t)

# join all child threads to the main thread
for thread in threads:
thread.join()

By using join() function we can ensure that our program waits for all the child threads to complete before continuing with the next steps of the script.

Adapting the original script to use multi-threading would look something like this:

import threading
import os
import time
from urllib.parse import urlparse
from urllib.request import urlretrieve
from typing import List
from numpy import round

IMGS_URL_LIST =
['https://dl.dropboxusercontent.com/s/2fu69d8lfesbhru/pexels-photo-48603.jpeg',
'https://dl.dropboxusercontent.com/s/zch88m6sb8a7bm1/pexels-photo-134392.jpeg',
'https://dl.dropboxusercontent.com/s/lsr6dxw5m2ep5qt/pexels-photo-135130.jpeg',
'https://dl.dropboxusercontent.com/s/6xinfm0lcnbirb9/pexels-photo-167300.jpeg',
'https://dl.dropboxusercontent.com/s/2dp2hli32h9p0y6/pexels-photo-167921.jpeg',
'https://dl.dropboxusercontent.com/s/fjb1m3grcrceqo2/pexels-photo-173125.jpeg',
'https://dl.dropboxusercontent.com/s/56u8p4oplagc4bp/pexels-photo-185934.jpeg',
'https://dl.dropboxusercontent.com/s/2s1x7wz4sdvxssr/pexels-photo-192454.jpeg',
'https://dl.dropboxusercontent.com/s/1gjphqnllzm10hh/pexels-photo-193038.jpeg',
'https://dl.dropboxusercontent.com/s/pcjz40c8pxpy057/pexels-photo-193043.jpeg',
'https://dl.dropboxusercontent.com/s/hokdfk7y8zmwe96/pexels-photo-207962.jpeg',
'https://dl.dropboxusercontent.com/s/k2tk2co7r18juy7/pexels-photo-247917.jpeg',
'https://dl.dropboxusercontent.com/s/m4xjekvqk4rksbx/pexels-photo-247932.jpeg',
'https://dl.dropboxusercontent.com/s/znmswtwhcdbpc10/pexels-photo-265186.jpeg',
'https://dl.dropboxusercontent.com/s/jgb6n4esquhh4gu/pexels-photo-302899.jpeg',
'https://dl.dropboxusercontent.com/s/rjuggi2ubc1b3bk/pexels-photo-317156.jpeg',
'https://dl.dropboxusercontent.com/s/cpaog2nwplilrz9/pexels-photo-317383.jpeg',
'https://dl.dropboxusercontent.com/s/16x2b6ruk18gji5/pexels-photo-320007.jpeg',
'https://dl.dropboxusercontent.com/s/xqzqzjkcwl52en0/pexels-photo-322207.jpeg',
'https://dl.dropboxusercontent.com/s/frclthpd7t8exma/pexels-photo-323503.jpeg',
'https://dl.dropboxusercontent.com/s/7ixez07vnc3jeyg/pexels-photo-324030.jpeg',
'https://dl.dropboxusercontent.com/s/1xlgrfy861nyhox/pexels-photo-324655.jpeg',
'https://dl.dropboxusercontent.com/s/v1b03d940lop05d/pexels-photo-324658.jpeg',
'https://dl.dropboxusercontent.com/s/ehrm5clkucbhvi4/pexels-photo-325520.jpeg',
'https://dl.dropboxusercontent.com/s/l7ga4ea98hfl49b/pexels-photo-333529.jpeg',
'https://dl.dropboxusercontent.com/s/rleff9tx000k19j/pexels-photo-341520.jpeg'
]

# this is our target function
def download_image(url: str, img_num: int) -> None:
urlretrieve(url, f'images{os.path.sep}{img_num+1}')

def download_images(img_url_list: List[str]) -> None:
# validate inputs
if not img_url_list:
return
os.makedirs('images', exist_ok=True)

# get time in seconds
start = time.perf_counter()

# create a list to store all of our threads
threads = []

# for every url in our list, we parse the url and download its contents
for img_num, url in enumerate(img_url_list):
# create a new thread
t = threading.Thread(target=download_image, args=(url, img_num))

# start the new thread
t.start()

# add the new thread to our list of threads
threads.append(t)

# here we instruct the main thread to wait for all child threads to complete before proceeding
for thread in threads:
thread.join()

print(f"Retrieved {len(img_url_list)} images took {round(time.perf_counter() - start, 2)} seconds")

download_images(IMGS_URL_LIST)

This script retrieves the same 26 images in only 2.92 seconds! That is only 13.24% of the time taken using single-threaded code.

Impressive isn’t it?



Source link

Leave a Comment