Orca: Properly Imitating Proprietary LLMs | by Cameron R. Wolfe, Ph.D. | Sep, 2023

Cameron R. Wolfe, Ph.D.

Leveraging imitation to create high-quality, open-source LLMs… (Photo by Thomas Lipke on Unsplash) As research progresses on large language models (LLMs), one key question that remains unanswered is whether an existing, high-quality LLM can be used to effectively train another LLM. Currently, there is a lot of debate and contention around this topic. The recent … Read more

Hybrid Search 2.0: The Pursuit of Better Search | by Noam Schwartz | Sep, 2023

Hybrid Search 2.0: The Pursuit of Better Search | by Noam Schwartz | Sep, 2023

The normalization function was slightly biased; it weighed text search higher and gave it more significance in the final results. Distance-based algorithms such as K-Nearest Neighbors (KNN) calculate distances between data points, whereas BM25 is based on the frequencies of occurances of keywords. Both return scores that are completely on different scales. This can lead … Read more

Designing Operations Research Solutions: A User-friendly Routing Application with Streamlit | by Bruno Scalia C. F. Leite | Sep, 2023

Designing Operations Research Solutions: A User-friendly Routing Application with Streamlit | by Bruno Scalia C. F. Leite | Sep, 2023

As previously described, we can define general Streamlit settings in the config.toml file inside the .streamlit folder. In the TSP example, I used it to define colors and font. This is kind of a light purple layout, but you can try different colors. Just make sure to define them in Hex color codes. [theme]primaryColor = … Read more

Exploring GEMBA: A New LLM-Based Metric for Translation Quality Assessment | by Dr. Varshita Sher | Sep, 2023

Dr. Varshita Sher

#GEN-AI RESEARCH PAPERS Using LLMs for evaluating translation quality Image generated by Author using DALL.E 2 Introduction I recently read an intriguing paper from the Microsoft¹ team (published in May 2023) that caught my attention. The paper delves into the world of translation evaluation, shedding light on an innovative metric called GEMBA (GPT Estimation Metric … Read more

Add Your Own Data to an LLM Using Retrieval-Augmented Generation (RAG) | by Beatriz Stollnitz | Sep, 2023

Beatriz Stollnitz

Learn how to add your own proprietary data to a pre-trained LLM using a prompt-based technique called Retrieval-Augmented Generation Photo by Joshua Sortino on Unsplash Introduction Large Language Models (LLMs) know a lot about the world, but they don’t know everything. Since training these models takes a long time, the data they were last trained … Read more

GenAI for Better NLP Systems I: A Tool for Generating Synthetic Data | by Nabanita Roy | Sep, 2023

Nabanita Roy

Experimenting with the usage of GenAI for generating and augmenting synthetic data using Python for Prompt Engineering Photo by SR on Unsplash One of the key challenges of Machine Learning(ML) is unbalanced data and the biases they introduce in ML models. With the advent of powerful Generative AI (GenAI) models, we can augment imbalanced training … Read more

A (Philosophical) Perspective on Skills Gaps in AI | by Mathieu Lemay | Sep, 2023

A (Philosophical) Perspective on Skills Gaps in AI | by Mathieu Lemay | Sep, 2023

Recently, a few projects of ours had clients asking about project handoffs to not-yet-existing internal teams. “How do we train our team to own the solution that you built?” “How can we ensure future-proofing our team with changes in AI?” Variations on these questions were, for the most part, answered with recommendations or change management … Read more

Linear Algebra 2: Echelon Matrix Forms | by tenzin migmar (t9nz) | Sep, 2023

tenzin migmar (t9nz)

Image from Europeana on Unsplash Row echelon form and reduced row echelon form Preface Welcome back to the second essay of my ongoing series on the basics of Linear Algebra, the foundational math behind machine learning. In my previous article, I introduced linear equations and systems, matrix notation, and row reduction operations. This article will … Read more