4 Crucial Factors for Evaluating Large Language Models in Industry Applications | by Skanda Vivek | Aug, 2023

Over the past few months, I’ve had the opportunity to chat with folks from the legal, healthcare, finance, tech, insurance industries on LLM adoption. And each of them comes with unique requirements and challenges. In healthcare, for example — privacy is king. In finance, getting the numbers right is paramount. Lawyers want specialized, fine-tuned models for tasks like drafting legal documents.

In this article I’m going through the key decision factors that help you choose the right model for your particular case.

As Satya Nadella stated in his 2023 Keynote at Microsoft Inspire, there are 2 main paradigm shifts Generative AI introduces:

  1. A more natural language computer interface
  2. A reasoning engine, that sits on top of all your custom documents

Response quality is extremely important in both of these use categories. Our interface with computers has been getting closer and closer to natural language (think of how much more friendly Python is compared with C++ or how much more friendly C++ is, compared to machine language). However, the reliability of these programming languages have never really been an issue — if there is an issue, we call it a programming bug, and attribute it to humans making errors. However, the more natural interface from LLMs creates a new problem, where LLMs are known to hallucinate or give wrong answers, and so a new type of “AI bug” gets introduced. Thus, response quality, becomes extremely important.

The same is with the 2nd use case. While we are all comfortable using Google search, behind the scenes Google is using vector embeddings and other matching techniques, to figure out which page most likely contains an answer to a question you ask. If the page lists wrong results — that again is a human error, due to humans listing incorrect information. However, LLMs again introduce the possibility that answers…

Source link

Leave a Comment