Async for LangChain and LLMs. How to make Langchain chains work with… | by Gabriel Cassimiro | Jul, 2023

So for this, we are going to use a resource called Asynchronous calls. To explain this, first I will explain briefly what the code is doing and where the time is taking too long.

In our example, we go through each row of the data frame, extract some information from the rows, add them to our prompt, and call the GPT API to get a response. After the response, we just parse it and add it back to the data frame.

Image by Author

The main bottleneck here is when we call the GPT API because our computer has to wait idly for the response from that API (about 3 seconds). The rest of the steps are fast and can still be optimized but that is not the focus of this article.

So instead of waiting Idly for the response, what if we sent all the calls to the API at the same time? This way we would only have to wait for a single response and then process them. This is called Asynchronous calls to the API.

Image by Author

This way we do the pre-process and post-process sequentially but the calls to the API do not have to wait for the last response to come back before sending the next one.

So here is the code for the Async chains:

In this code, we use the Python syntax of async and await. LangChain also gives us the code to run the chain async, with the arun() function. So in the beginning we first process each row sequentially (can be optimized) and create multiple “tasks” that will await the response from the API in parallel and then we process the response to the final desired format sequentially (can also be optimized).

Run time (10 examples):

Summary Chain (Async) executed in 3.35 seconds.
Characteristics Chain (Async) executed in 2.49 seconds.

Compared to the sequential:

Summary Chain (Sequential) executed in 22.59 seconds.
Characteristics Chain (Sequential) executed in 22.85 seconds.

We can see almost a 10x improvement in the run time. So for big workloads, I highly recommend using this method. Also my code is full of for loops that can also be optimized further to improve performance.

The full code to this tutorial can be found in this Github Repo.

Source link

Leave a Comment