Decoupled Frontend — Backend Microservices Architecture for a ChatGPT-Based LLM Chatbot | by Marie Stephen Leo | May, 2023

A practical guide to building a headless ChatGPT application with Streamlit, FastAPI, and the OpenAI API

Image generated by Author using Midjourney V5.1 using the prompt: “decoupled frontend backend software application”

In my previous post, I wrote about the differences between monolithic and microservices architecture patterns for LLM-based chatbot applications. One of the significant advantages of picking the microservices architecture pattern is that it allows the separation of frontend code from the data science logic so that a data scientist can focus on the data science logic without worrying about the frontend code. In this post, I will show you how to build a microservice chatbot application with Streamlit, FastAPI, and the OpenAI API. We will decouple the frontend and backend codes from each other to easily swap out the frontend for another frontend framework like React, Swift, Dash, Gradio, etc.

First, create a new conda environment and install the necessary libraries.

# Create and activate a conda environment
conda create -n openai_chatbot python=3.9
conda activate openai_chatbot

# Install the necessary libraries
pip install streamlit streamlit-chat "fastapi[all]" openai

Like my previous blog post, we’ll build the backend using FastAPI. The most crucial part of any API is the API contract which defines the input format that the API accepts and the output format the API will send back to the client. Defining and aligning to a robust API contract allow frontend developers to work independently of API developers as long as both parties respect the contract. This is the beauty of decoupling the frontend from the backend. FastAPI lets us easily specify and validate the API contract using Pydantic models. The API contract for our backend is as follows:

API contract details. Image by Author

The backend will be responsible for the following tasks:

  1. First, we initialize a new FastAPI app, load the OpenAI API key, and define a system prompt that will inform ChatGPT of the role we want it to play. In this case, we want ChatGPT to play the role of a comic book assistant, so we prompt it as such. Feel free to “engineer” different prompts and see how ChatGPT responds!
  2. Next, we create two Pydantic models, Conversation and ConversationHistory, to validate the API payload. The Conversation model will validate each message in the conversation history, while the ConversationHistory model is just a list of conversations to validate the entire conversation history. The OpenAI ChatGPT API can only accept assistant or user in the role parameter, so we specify that restriction in the Conversation model. If you try sending any other value in the role parameter, the API will return an error. Validation is one of the many benefits of using Pydantic models with FastAPI.
  3. Next, we reserve the root route for a health check.
  4. Finally, we define a /chat route that accepts a POST request. The route will receive a ConversationHistory payload, which is a list of conversations. The route will then convert the payload to a Python dictionary, initialize the conversation history with the system prompt and list of messages from the payload, generate a response using the OpenAI ChatGPT API, and return the generated response and the token usage back to the API caller.
# %%writefile
import os
from typing import Literal

import openai
from fastapi import FastAPI
from pydantic import BaseModel, Field

app = FastAPI()

# Load your API key from an environment variable or secret management service
openai.api_key = os.getenv("OPENAI_API_KEY")

system_prompt = "You are a comic book assistant. You reply to the user's question strictly from the perspective of a comic book assistant. If the question is not related to comic books, you politely decline to answer."

class Conversation(BaseModel):
role: Literal["assistant", "user"]
content: str

class ConversationHistory(BaseModel):
history: list[Conversation] = Field(
"role": "assistant",
"content": "Hello, I'm a comic book assistant. How can I help you today?",
{"role": "user", "content": "tell me a quote from DC comics about life"},

async def health_check():
return {"status": "OK!"}"/chat")
async def llm_response(history: ConversationHistory) -> dict:
# Step 0: Receive the API payload as a dictionary
history = history.dict()

# Step 1: Initialize messages with a system prompt and conversation history
messages = [{"role": "system", "content": system_prompt}, *history["history"]]

# Step 2: Generate a response
llm_response = openai.ChatCompletion.create(
model="gpt-3.5-turbo", messages=messages

# Step 3: Return the generated response and the token usage
return {
"message": llm_response.choices[0]["message"],
"token_usage": llm_response["usage"],

That’s it! We can now run the backend on our local machine using uvicorn backend:app — reload and test it using the Swagger UI at

FastAPI docs for the backend. Image by Author

We’ll build the frontend headless and completely independent of the backend. We only have to respect the API contract used by the backend. Before building the frontend user interface, let’s define a few helper functions.

  1. clear_conversation() will help us clear the conversation history. It will also initialize the conversation_history session state variable to store the conversation history and the total_cost session state variable to hold the total conversation cost.
  2. display_conversation() will help us display the conversation history in reverse order, with the most recent message on top.
  3. download_conversation() will allow us to download the conversation history as a CSV file.
  4. calc_cost(): will help us calculate the cost of the conversation based on the number of tokens used. The OpenAI API charges $0.002 per 1000 tokens, so we’ll use that to calculate the conversation cost.
# %%writefile
from datetime import datetime

import pandas as pd
import streamlit as st
from streamlit_chat import message

def clear_conversation():
"""Clear the conversation history."""
if (
st.button("🧹 Clear conversation", use_container_width=True)
or "conversation_history" not in st.session_state
st.session_state.conversation_history = []
st.session_state.total_cost = 0

def display_conversation(conversation_history):
"""Display the conversation history in reverse chronology."""

for idx, item in enumerate(reversed(conversation_history)):
# Display the messages on the frontend
if item["role"] == "assistant":
message(item["content"], is_user=False, key=f"ai_{idx}")
elif item["role"] == "user":
message(item["content"], is_user=True, key=f"human_{idx}")

def download_conversation():
"""Download the conversation history as a CSV file."""
conversation_df = pd.DataFrame(
st.session_state.conversation_history, columns=["role", "content"]
csv = conversation_df.to_csv(index=False)

label="💾 Download conversation",

def calc_cost(token_usage):
return token_usage["total_tokens"] * 0.002 / 1000

Now we have everything we need to build the user interface using Streamlit. Let’s create a file and import the helper functions we defined above.

  1. First, we’ll define the URL of our FastAPI backend.
  2. openai_llm_response() will append the latest user input to the conversation_history session state variable using the user role. Then, we’ll create the payload in the format our backend FastAPI app expects with a history field. Finally, we’ll send the payload to the backend and append the generated response to the conversation_history session state variable. We’ll also increment the cost with the cost of the generated response.
  3. main(): is the bulk of the UI design. Below the title, we add buttons for clearing and downloading the conversation using the helper functions in Then we have a text input field where the user can enter their question. Pressing enter will send the text typed in the text input field to the backend. Finally, we display the cost of the conversation and the conversation history.
# %%writefile
import requests
import streamlit as st
import utils

# Replace with the URL of your backend
app_url = ""

def openai_llm_response(user_input):
"""Send the user input to the LLM API and return the response."""
# Append user question to the conversation history
{"role": "user", "content": user_input}
payload = {"history": st.session_state.conversation_history}

# Send the entire conversation history to the backend
response =, json=payload).json()

# Add the generated response and cost to the session state
st.session_state.total_cost += utils.calc_cost(response["token_usage"])

def main():
st.title("🦸 ChatGPT Comic Book Assistant")

col1, col2 = st.columns(2)
with col1:

# Get user input
if user_input := st.text_input(
"Ask any comic book question 👇", key="user_input", max_chars=50

# Display the cost
st.caption(f"Total cost of this session: US${st.session_state.total_cost}")

# Display the entire conversation on the frontend

# Download conversation code runs last to ensure the latest messages are captured
with col2:

if __name__ == "__main__":

That’s it! We’ve completed our frontend app. We can now test it using streamlit run

Streamlit App interface. Image by Author]

Building a chatbot using the OpenAI API following a microservices architecture is easy by decoupling the frontend from the backend. Here are some thoughts on when to consider a decoupled architecture:

  1. Your app is relatively complex or needs to support mid to large-scale traffic. The decoupled architecture allows for independent scaling of the frontend and backend to handle large-scale traffic.
  2. You have dedicated frontend developer resources to build the UI or need to serve external customers requiring a highly polished UI. In this tutorial, we used Streamlit to construct a simple user interface, but it can get difficult or even impossible to build more complex UIs. It’s best to build customer-facing apps using specialized UI frameworks like React, Swift, etc.
  3. You want to improve the data science logic independent of the frontend. For example, you can update the prompts or add multiple microservices, all orchestrated by an API server entry point, without worrying about the frontend code as long as you respect the same API contract you’ve aligned with the frontend engineers.

However, there may be situations when there are better architectural choices than decoupling for your app. Here are some thoughts on when NOT to use decoupled architecture:

  1. Your app is simple or has low traffic. You can have a monolithic app since scaling is not an issue.
  2. You do not have dedicated frontend developer resources to build the UI, or your app only serves internal customers who might be more forgiving of a rough UI design. This is especially true while building a minimal viable product or prototype.
  3. You’re a unicorn wanting to improve the data science logic and frontend interface simultaneously!

Source link

Leave a Comment