Best Practices for Prompt Engineering | by Dmytro Nikolaiev (Dimid)

More sophisticated approaches to solving even more complex tasks are now being actively developed. While they significantly outperform in some scenarios, their practical usage remains somewhat limited. I will mention two such techniques: self-consistency and the Tree of Thoughts.

The authors of the self-consistency paper offered the following approach. Instead of just relying on the initial model output, they suggested sampling multiple times and aggregating the results through majority voting. By relying on both intuition and the success of ensembles in classical machine learning, this technique enhances the model’s robustness.

Self-consistency. Figure 1 from the Self-Consistency Improves CoT Reasoning in Language Models paper

You can also apply self-consistency without implementing the aggregation step. For tasks with short outputs ask the model to suggest several options and choose the best one.

Tree of Thoughts (ToT) takes this concept a stride further. It puts forward the idea of applying tree-search algorithms for the model’s “reasoning thoughts”, essentially backtracking when it stumbles upon poor assumptions.

Tree of Thoughts. Figure 1 from the Tree of Thoughts: Deliberate Problem Solving with LLMs paper

If you are interested, check out Yannic Kilcher’s video with a ToT paper review.

For our particular scenario, utilizing Chain-of-Thought reasoning is not necessary, yet we can prompt the model to tackle the summarization task in two phases. Initially, it can condense the entire job description, and then summarize the derived summary with a focus on job responsibilities.

Output for a prompt v5, containing step-by-step instructions. Image by Author created using ChatGPT

In this particular example, the results did not show significant changes, but this approach works very well for most tasks.

Few-shot Learning

The last technique we will cover is called few-shot learning, also known as in-context learning. It’s as simple as incorporating several examples into your prompt to provide the model with a clearer picture of your task.

These examples should not only be relevant to your task but also diverse to encapsulate the variety in your data. “Labeling” data for few-shot learning might be a bit more challenging when you’re using CoT, particularly if your pipeline has many steps or your inputs are long. However, typically, the results make it worth the effort. Also, keep in mind that labeling a few examples is far less expensive than labeling an entire training/testing set as in traditional ML model development.

If we add an example to our prompt, it will understand the requirements even better. For instance, if we demonstrate that we’d prefer the final summary in bullet-point format, the model will mirror our template.

This prompt is quite overwhelming, but don’t be afraid: it is just a previous prompt (v5) and one labeled example with another job description in the For example: 'input description' -> 'output JSON' format.

Output for a prompt v6, containing an example. Image by Author created using ChatGPT

Summarizing Best Practices

To summarize the best practices for prompt engineering, consider the following:

  • Don’t be afraid to experiment. Try different approaches and iterate gradually, correcting the model and taking small steps at a time;
  • Use separators in input (e.g. <>) and ask for a structured output (e.g. JSON);
  • Provide a list of actions to complete the task. Whenever feasible, offer the model a set of actions and let it output its “internal thoughts”;
  • In case of short outputs ask for multiple suggestions;
  • Provide examples. If possible, show the model several diverse examples that represent your data with the desired output.

I would say that this framework offers a sufficient basis for automating a wide range of day-to-day tasks, like information extraction, summarization, text generation such as emails, etc. However, in a production environment, it is still possible to further optimize models by fine-tuning them on specific datasets to further enhance performance. Additionally, there is rapid development in the plugins and agents, but that’s a whole different story altogether.

Prompt Engineering Course by DeepLearning.AI and OpenAI

Along with the earlier-mentioned talk by Andrej Karpathy, this blog post draws its inspiration from the ChatGPT Prompt Engineering for Developers course by DeepLearning.AI and OpenAI. It’s absolutely free, takes just a couple of hours to complete, and, my personal favorite, it enables you to experiment with the OpenAI API without even signing up!

That’s a great playground for experimenting, so definitely check it out.

Wow, we covered quite a lot of information! Now, let’s move forward and start building the application using the knowledge we have gained.

Generating OpenAI Key

To get started, you’ll need to register an OpenAI account and create your API key. OpenAI currently offers $5 of free credit for 3 months to every individual. Follow the introduction to the OpenAI API page to register your account and generate your API key.

Once you have a key, create an OPENAI_API_KEY environment variable to access it in the code with os.getenv('OPENAI_API_KEY').

Estimating the Costs with Tokenizer Playground

At this stage, you might be curious about how much you can do with just a free trial and what options are available after the initial three months. It’s a pretty good question to ask, especially when you consider that LLMs cost millions of dollars!

Of course, these millions are about training. It turns out that the inference requests are quite affordable. While GPT-4 may be perceived as expensive (although the price is likely to decrease), gpt-3.5-turbo (the model behind default ChatGPT) is still sufficient for the majority of tasks. In fact, OpenAI has done an incredible engineering job, given how inexpensive and fast these models are now, considering their original size in billions of parameters.

The gpt-3.5-turbo model comes at a cost of $0.002 per 1,000 tokens.

But how much is it? Let’s see. First, we need to know what is a token. In simple terms, a token refers to a part of a word. In the context of the English language, you can expect around 14 tokens for every 10 words.

To get a more accurate estimation of the number of tokens for your specific task and prompt, the best approach is to give it a try! Luckily, OpenAI provides a tokenizer playground that can help you with this.

Side note: Tokenization for Different Languages

Due to the widespread use of English on the Internet, this language benefits from the most optimal tokenization. As highlighted in the “All languages are not tokenized equal” blog post, tokenization is not a uniform process across languages, and certain languages may require a greater number of tokens for representation. Keep this in mind if you want to build an application that involves prompts in multiple languages, e.g. for translation.

To illustrate this point, let’s take a look at the tokenization of pangrams in different languages. In this toy example, English required 9 tokens, French — 12, Bulgarian — 59, Japanese — 72, and Russian — 73.

Tokenization for different languages. Screenshot of the OpenAI tokenizer playground

Cost vs Performance

As you may have noticed, prompts can become quite lengthy, especially when incorporating examples. By increasing the length of the prompt, we potentially enhance the quality, but the cost grows at the same time as we use more tokens.

Our latest prompt (v6) consists of approximately 1.5k tokens.

Tokenization of the prompt v6. Screenshot of the OpenAI tokenizer playground

Considering that the output length is typically the same range as the input length, we can estimate an average of around 3k tokens per request (input tokens + output tokens). By multiplying this number by the initial cost, we find that each request is about $0.006 or 0.6 cents, which is quite affordable.

Even if we consider a slightly higher cost of 1 cent per request (equivalent to roughly 5k tokens), you would still be able to make 100 requests for just $1. Additionally, OpenAI offers the flexibility to set both soft and hard limits. With soft limits, you receive notifications when you approach your defined limit, while hard limits restrict you from exceeding the specified threshold.

For local use of your LLM application, you can comfortably configure a hard limit of $1 per month, ensuring that you remain within budget while enjoying the benefits of the model.

Streamlit App Template

Now, let’s build a web interface to interact with the model programmatically eliminating the need to manually copy prompts each time. We will do this with Streamlit.

Streamlit is a Python library that allows you to create simple web interfaces without the need for HTML, CSS, and JavaScript. It is beginner-friendly and enables the creation of browser-based applications using minimal Python knowledge. Let’s now create a simple template for our LLM-based application.

Firstly, we need the logic that will handle the communication with the OpenAI API. In the example below, I consider generate_prompt()function to be defined and return the prompt for a given input text (e.g. similar to what you saw before).

And that’s it! Know more about different parameters in OpenAI’s documentation, but things work well just out of the box.

Having this code, we can design a simple web app. We need a field to enter some text, a button to process it, and a couple of output widgets. I prefer to have access to both the full model prompt and output for debugging and exploring reasons.

The code for the entire application will look something like this and can be found in this GitHub repository. I have added a placeholder function called toy_ask_chatgpt() since sharing the OpenAI key is not a good idea. Currently, this application simply copies the prompt into the output.

Without defining functions and placeholders, it is only about 50 lines of code!

And thanks to a recent update in Streamlit it now allows embed it right in this article! So you should be able to see it right below.

Now you see how easy it is. If you wish, you can deploy your app with Streamlit Cloud. But be careful, since every request costs you money if you put your API key there!

In this blog post, I listed several best practices for prompt engineering. We discussed iterative prompt development, the use of separators, requesting structural output, Chain-of-Thought reasoning, and few-shot learning. I also provided you with a template to build a simple web app using Streamlit in under 100 lines of code. Now, it’s your turn to come up with an exciting project idea and turn it into reality!

It’s truly amazing how modern tools allow us to create complex applications in just a few hours. Even without extensive programming knowledge, proficiency in Python, or a deep understanding of machine learning, you can quickly build something useful and automate some tasks.

Don’t hesitate to ask me questions if you’re a beginner and want to create a similar project. I’ll be more than happy to assist you and respond as soon as possible. Best of luck with your projects!

Source link

Leave a Comment