How to Make Your Python Packages Really Fast with Rust | by Isaac Harris-Holt | May, 2023

Goodbye, slow code

Photo by Chris Liverani on Unsplash

Python is… slow. This is not a revelation. Lots of dynamic languages are. In fact, Python is so slow that many authors of performance-critical Python packages have turned to another language — C. But C is not fun, and C has enough foot guns to cripple a centipede.

Introducing Rust.

Rust is a memory-efficient language with no runtime or garbage collector. It’s incredibly fast, super reliable, and has a really great community around it. Oh, and it’s also super easy to embed into your Python code thanks to excellent tools like PyO3 and maturin.

Sound exciting? Great! Because I’m about to show you how to create a Python package in Rust step-by-step. And if you don’t know any Rust, don’t worry — we’re not going to be doing anything too crazy, so you should still be able to follow along. Are you ready? Let’s oxidise this snake.


Before we get started, you’re going to need to install Rust on your machine. You can do that by heading to and following the instructions there. I would also recommend creating a virtual environment that you can use for testing your Rust package.

Script overview

Here’s a script that, given a number n, will calculate the nth Fibonacci number 100 times and time how long it takes to do so.

This is a very naive, totally unoptimised function, and there are plenty of ways to make this faster using Python alone, but I’m not going to be going into those today. Instead, we’re going to take this code and use it to create a Python package in Rust

Maturin setup

The first step is to install maturin, which is a build system for building and publishing Rust crates as Python packages. You can do that with pip install maturin.

Next, create a directory for your package. I’ve called mine fibbers. The final setup step is to run maturin init from your new directory. At this point, you’ll be prompted to select which Rust bindings to use. Select pyo3.

Image by author.

Now, if you take a look at your fibbers directory, you’ll see a few files. Maturin has created some config files for us, namely a Cargo.toml and pyproject.toml. The Cargo.toml file is configuration for Rust’s build tool, cargo, and contains some default metadata about the package, some build options and a dependency for pyo3. The pyproject.toml file is fairly standard, but it’s set to use maturin as the build backend.

Maturin will also create a GitHub Actions workflow for releasing your package. It’s a small thing, but makes life so much easier when you’re maintaining an open source project. The file we mostly care about, however, is the file in the src directory.

Here’s an overview of the resulting file structure.

├── .github/
│ └── workflows/
│ └── CI.yml
├── .gitignore
├── Cargo.toml
├── pyproject.toml
└── src/

Writing the Rust

Maturin has already created the scaffold of a Python module for us using the PyO3 bindings we mentioned earlier.

The main parts of this code are this sum_as_string function, which is marked with the pyfunction attribute, and the fibbers function, which represents our Python module. All the fibbers function is really doing is registering our sum_as_string function with our fibbers module.

If we installed this now, we’d be able to call fibbers.sum_as_string() from Python, and it would all work as expected.

However, what I’m going to do first is replace the sum_as_string function with our fib function.

This has exactly the same implementation as the Python we wrote earlier — it takes in a positive unsigned integer n and returns the nth Fibonacci number. I’ve also registered our new function with the fibbers module, so we’re good to go!

Benchmarking our function

To install our fibbers package, all we have to do is run maturin developin our terminal. This will download and compile our Rust package and install it into our virtual environment.

Image by author.

Now, back in our file, we can import fibbers, print out the result of fibbers.fib() and then add a timeit case for our Rust implementation.

If we run this now for the 10th Fibonacci number, you can see that our Rust function is about 5 times faster than Python, despite the fact we’re using an identical implementation!

Image by author.

If we run for the 20th and 30th fib numbers, we can see that Rust gets up to being about 15 times faster than Python.

20th fib number results. Image by author.
30th fib number results. Image by author.

But what if I told you that we’re not even at maximum speed?

You see, by default, maturin developwill build the dev version of your Rust crate, which will forego many optimisations to reduce compile time, meaning the program isn’t running as fast as it could. If we head back into our fibbers directory and run maturin developagain, but this time with the --release flag, we’ll get the optimised production-ready version of our binary.

If we now benchmark our 30th fib number, we see that Rust now gives us a whopping 40 times speed improvement over Python!

30th fib number, optimised. Image by author.

Rust limitations

However, we do have a problem with our Rust implementation. If we try to get the 50th Fibonacci number using fibbers.fib(), you’ll see that we actually hit an overflow error and get a different answer to Python.

Rust experiences integer overflow. Image by author.

This is because, unlike Python, Rust has fixed-size integers, and a 32-bit integer isn’t large enough to hold our 50th Fibonacci number.

We can get around this by changing the type in our Rust function from u32 to u64, but that will use more memory and might not be supported on every machine. We could also solve it by using a crate like num_bigint, but that’s outside the scope of this article.

Another small limitation is that there’s some overhead to using the PyO3 bindings. You can see that here where I’m just getting the 1st Fibonacci number, and Python is actually faster than Rust thanks to this overhead.

Python is faster for n=1. Image by author.

Things to remember

The numbers in this article were not recorded on a perfect machine. The benchmarks were run on my personal machine, and may not reflect real-world problems. Please take care with micro-benchmarks like this one in general, as they are often imperfect and emulate many aspects of real world programs.

Source link

Leave a Comment