Run Things in the Background with Julia | by Bence Komarniczky | May, 2023

Stop waiting and start multi-threading

Photo by Max Wolfs on Unsplash

Even though Julia is one of the fastest languages out there, sometimes it can take time for things to execute. If you’re a data scientist or analyst using Julia, maybe you want to send computation off to a server, wait for it to finish, and then do something with the results.

But waiting is boring.

When you’re in the middle of your work, full of ideas and enthusiasm to deliver something interesting, you want to keep pounding that keyboard to find something else.

Let me show you a simple technique in Julia, how you can dispatch computation to another thread and get on with your work.

As I said before, Julia is fast. As a modern language, it is also built with multiprocessing in mind. So using those extra cores in your machine is easy if you know how to do it.

First of all, we must make sure we start a Julia instance with multiple threads:

julia -t 4

This will start Julia using 4 threads. We can confirm this by asking for the number of threads:

julia> using Base.Threads

julia> Threads.nthreads()

Making a slow function

Photo by Frederick Yang on Unsplash

Now that we have more threads it’s time to see this magic in action. But we need something to run for a while for this to make sense. I assume if you’re reading this article, you already have something in mind, but because I prefer to have complete examples in my articles, I’ll write a little function here to entertain myself.

This “slow” function could be a call to build an ML model, run some SQL-like queries on a database or fetch some data from cloud storage. Use your imagination and go wild!

julia> function collatz(n, i=0)
if n == 1
elseif iseven(n)
collatz(n / 2, i + 1)
collatz(3n + 1, i + 1)
collatz (generic function with 2 methods)

julia> collatz(989345275647)

julia> averageSteps(n) = sum(i -> collatz(i) / n, 1:n)
averageSteps (generic function with 1 method

If you’re curious about what the above is about and why I picked 989,345,275,647 then read this Wiki page.

Photo by K. Mitch Hodge on Unsplash

Since we have Threads in our namespace, we can use the @spawn macro to send computation to another thread. This means that we get our REPL back immediately and we can continue working as before.

julia> res = @spawn averageSteps(1e7)
Task (runnable) @0x000000015d061f90

julia> 2^5 + 12

julia> fetch(res)

Ignore my lack of imagination, I just couldn’t be bothered to come up with something more sophisticated after spawning.

Basically, what’s happening here is that @spawn returns a Task. This task is automatically dispatched to a free thread that can work on it in the background allowing you to write more code and ask more questions in the meanwhile. Once you need the results, you can collect the results of the tasks with fetch which will wait for the Task to finish and return the results.

One way to show that this indeed works is to show some timings.

First, we’ll run our function on the current thread and measure the time it takes. Then we’ll spawn a Task and finally we’ll spawn and immediately wait for the results.

julia> @time averageSteps(1e7)
16.040698 seconds

julia> @time res = @spawn averageSteps(1e7)
0.009290 seconds (31.72 k allocations: 1.988 MiB)
Task (runnable) @0x000000015d179f90

julia> @time fetch(@spawn averageSteps(1e7))
16.358641 seconds (24.31 k allocations: 1.553 MiB, 0.06% compilation time)

As you can see, our function takes about 16s to run. But if we dispatch the task, then we immediately return a Task. This comes with some overhead as you can see in the final row, since this is slightly (0.3s) slower than just running the computation on the main thread.

Hopefully, this little trick will enlighten newcomers to Julia about the awesome superpowers a modern, multi-threaded language can give them. If you enjoyed reading my ramble about this topic, give me a 👏 or 👏 👏.

Source link

Leave a Comment