Chatbots Are About to Disrupt Music Search | by Max Hilsdorf | Jun, 2023

Photo by Alexandre Debiève on Unsplash

The recent advancements in artificial intelligence have propelled chatbot technology to new heights, enabling them to understand and respond to human queries with increased intelligence and nuance. Recognizing the advantages of conversational search over traditional methods, industry giants such as Microsoft and Google have started implementing AI chatbots into their web search engines.

In the realm of music search, this shift holds particular relevance as simple keyword-based searches, akin to “Googling,” have only gained prominence in the past few months. Some music tech startups like Cyanite have launched free-text music search features as a breakthrough technology earlier this year. Prior to that, music search primarily relied on selecting genres, moods, or artists and sifting through potentially numerous songs to find the desired one.

However, the emergence of models like ChatGPT has facilitated a transition from “free-text” search to truly conversational search. This transformative approach allows us to overcome the repetitive “input-output-repeat” workflow and replace it with dynamic and natural conversations. This paradigm shift aligns with the ongoing evolution witnessed in major search engines like Microsoft’s Bing and Google.

Moreover, the accessibility of building custom music chatbots has reached unprecedented levels. In fact, I built my own music chatbot prototype within less than a day and at less than $5 — but more on that in a follow-up post. While ChatGPT itself already proves quite useful for music discovery, tailored chatbot systems offer even more refined responses, seamless integrations with music databases or web players, and greater control over the conversation. Thanks to recent developments in the open-source community, building custom chatbots trained on domain- or company-specific data and customized to individual needs has become easier than ever before.

Photo by Christina @ on Unsplash

When it comes to music discovery, conversations provide a natural and intuitive way to explore and find new songs or artists. We often rely on recommendations from friends, music enthusiasts, or experts because these conversations allow us to express our preferences and receive personalized suggestions. Chatbots can replicate this conversational experience, enabling users to engage in a dialogue about their musical tastes and receive tailored recommendations.

In contrast, the traditional “Googling” workflow, which we have become accustomed to over the past 20 years, can sometimes feel unintuitive. When searching for music in this manner, we follow a strict sequence of steps:

  1. Craft a text prompt that describes what we are looking for.
  2. Search through the responses and listen to some tracks.
  3. If unsatisfied, we either analyze how to improve our prompt to make the search engine understand us better or return to step 1.

The problem with this approach is that each search attempt effectively erases all previous results, even if they could be useful for subsequent stages of the search process. This limitation is where chatbots excel, as they have the capability to store the entire search history, i.e. conversation.

Suppose your prompt is

A punk rock song with moderate tempo, a female singer, and lyrics about unrequited love.

A traditional search engine will perform a search and recommend you 20 tracks, sorted by predicted relevance. You listen to the first 5 tracks and don’t like them. All of them are either live recordings or have a tempo that is a little bit too fast for your taste. In consequence, you alter your prompt

A studio recording of a punk rock song with moderate-to-low tempo, a female singer, and lyrics about unrequited love.

and restart the search. The search engine performs another (potentially costly) search and presents you with another 20 songs. This procedure continues until you have found something you like.

In contrast, a music search chatbot solves this problem much more elegantly. You start with your prompt

A punk rock song with moderate tempo, a female singer, and lyrics about unrequited love.

The chatbot is smart and asks you to specify the tempo more precisely, because it knows that the results could otherwise be imprecise. You tell the chatbot “dunno, maybe 110–130 bpm?” without altering your original prompt. Taking into account all the information gathered from the conversation, the chatbot initiates a search and presents you with a new list of 20 tracks ranked by predicted relevance.

However, upon reviewing the top 5 tracks, you find that they are once again live recordings, which you had not specified as a preference. Fortunately, instead of going back to the prompting stage and starting the search process anew, you can leverage the chatbot’s capabilities to refine the results. You simply ask the chatbot to exclude all live recordings from the recommendations. Understanding your request, the chatbot filters the existing 20 tracks into two categories: studio recordings and live recordings. It then presents you with the filtered results, eliminating the need for another costly search.

This approach of conceptualizing music search as a process, not as a simple input-output operation, clearly benefits the user who usually does not know precisely what they want. A chatbot that…

  1. guides the user to describe their needs by asking follow-up questions or pointing to imprecise formulations
  2. updates its recommendations quickly based on further specifications

has the potential to massively disrupt all existing music search systems.

Photo by Jonas Leupe on Unsplash

One of the most exciting prospects of music chatbots is their potential to act as domain-specific experts, akin to pocket musicologists. Musicologists are individuals with extensive knowledge about various genres, artists, historical context, and other intricate details of music. By encapsulating this expertise within a chatbot, users can access a wealth of information and insights instantaneously.

To illustrate this, let us consider a scenario where you are searching for a song that embodies a specific musical attribute, but you can only describe it through a reference such as “a guitar solo in the style of AC/DC.” While a competent music search engine can search for songs with guitar solos, it may struggle to understand the reference, especially if it doesn’t have any songs by AC/DC in its database. In such cases, your only alternative is to attempt to articulate what makes an AC/DC solo sound unique, which can be challenging for someone without extensive musical knowledge.

This is where the pocket musicologist chatbot becomes invaluable. A music chatbot trained on a variety of sources such as album reviews, fan forums, sheet music, and published scientific music analyses possesses a deep understanding of how a typical AC/DC guitar solo is structured and performed. Therefore, the chatbot can take your reference and formulate a precise prompt that describes the desired musical characteristics in a way that can be effectively utilized by the search engine.

Here, I asked ChatGPT to list a few stereotypical attributes of an AC/DC solo. This is what it came up with:

1. Rhythm and blues influence.

2. Simple and catchy melodies.

3. Raw and gritty classic rock sound.

4. Power chords and iconic riffs.

5. Bluesy bends and vibrato.

7. High energy and aggressive playing.

Of course, these results are not amazing. However, keep in mind that this a general-purpose chatbot with no specific musicological training. You can imagine that such a domain-specific chatbot would be able to come up with more accurate and precise descriptions.

This is only one example of how conceptualizing music chatbots as pocket musicologists can help improve the search experience. However, the possibilities are endless. For example, you could start the conversation by asking the chatbot to list some interesting jazz subgenres and explain the key characteristics of each. Then, you can choose a genre that sounds interesting and initiate a search within that genre. Consider this: When ChatGPT was released in late 2022, no one could have predicted the wide range of uses it would have for its millions of users. The same will apply to chatbot-based music search.

Photo by JESHOOTS.COM on Unsplash

To Chat or Not to Chat

In the previous sections, I have presented three arguments for why we may be on the brink of a paradigm shift in music search. While there may be dissenting opinions regarding the validity of these arguments, the crucial question at hand is whether the advantages and feasibility of chatbot-based music search can persuade companies and research institutions to pursue the development of this technology.

It is by no means obvious that chatbots will completely replace traditional search engines. Fortunately, we can examine the developments in other search domains, particularly web search. Despite the availability of web-search chatbots such as the new Bing or Google search and Perplexity AI, their usage remains primarily limited to technology enthusiasts and AI professionals. Clearly, these chatbots have not gained widespread adoption comparable to the general-purpose (and offline!) chatbot, ChatGPT. Most notably, they are far from replacing the conventional Google search engine.

While this may be partly due to the time it takes for the products to mature and their potential users to adopt them, there are also practical considerations that favor the continued use of more traditional keyword-based or semantic searches. For instance, search engines are frequently employed to locate specific articles, websites, or songs that we have previously encountered but don’t recall the complete name or web address. In such situations, employing a keyword-based search and matching the search input to results that closely align with the entered keywords is much more practical. Utilizing a sophisticated technology like a chatbot for this purpose would be like cracking nuts with a sledgehammer.

Furthermore, not every interaction needs to be a conversation. When searching for songs within a particular genre, for example, you might prefer not to engage in a dialogue with an AI bot. There are two reasons for this. Firstly, you might want instant results, making any response from the search engine that deviates from providing search results a waste of time. Secondly, employing a chatbot can transform the purely mechanical act of conducting a music search into a social interaction. This potential social aspect could be seen as a drawback, particularly for individuals who seek solace in music as a means to escape their social environment.

In summary, I anticipate that this paradigm shift will not completely eliminate traditional approaches to music search. Instead, I envision chatbots being utilized in scenarios where users seek guidance and consultation rather than a quick filtering of a music catalog. In the realm of production music, a chatbot-based search can greatly assist in finding the perfect track for a commercial or a YouTube video. However, for the average user of music streaming services, resorting to more traditional search systems may be more practical in most cases. Ultimately, the paradigm shift could manifest in production music libraries and music streaming services incorporating both types of searches to accommodate the diverse needs of their users.

Technical Implementation

In today’s tech landscape, building customized chatbots that align with your specific business requirements has become remarkably accessible. There are several approaches to achieving this. One method involves utilizing foundational models such as OpenAI’s GPT models via their API and augmenting them with custom logic. Another approach involves leveraging open-source language models and fine-tuning them using domain-specific data to ensure relevance and accuracy.

Using pre-built foundational models like GPT-4 via an API offers numerous advantages. Firstly, it enables businesses to begin utilizing the models directly without the need for additional data acquisition, preparation, or machine learning efforts on their part. This is particularly advantageous for companies with limited or no in-house data scientists, as it simplifies the task into a software engineering problem. Secondly, there is no need to concern yourself with constructing on-premise or cloud infrastructure to accommodate the computational requirements of these large models. By leveraging the API, businesses can access a managed solution at a relatively low cost.

One downside is that your data, including user search inputs, music metadata, etc., could potentially be accessed by the model provider (e.g., OpenAI) or even used to further train their models. This may not comply with internal data governance guidelines or external regulations, depending on your situation. Another drawback is that these third-party solutions often do not permit fine-tuning their models on your data. As a result, it becomes impossible to develop the type of “pocket musicologist” described earlier.

On the other hand, using open-source models on your own infrastructure presents several advantages. Firstly, there are no limitations in constructing a chatbot that is entirely customized to meet your specific requirements. Through the process of fine-tuning, you can transform it into a comprehensive music expert or train it to understand the specific vocabulary used within your company or domain. Secondly, each step of the workflow, including fine-tuning and model inference, can be implemented within your own infrastructure. This eliminates any concerns regarding compliance with internal guidelines or external regulations.

The drawback of building your own chatbot using open-source models is that it produces a significant engineering overhead. This consumes both computational and human resources, which you may be hesitant to invest in a product before witnessing initial results. Furthermore, open-source models become outdated at an accelerated pace. Consequently, you would need to transition to newer models and repeat the fine-tuning process regularly, leading to the consumption of additional resources. In contrast, a managed solution through an API provides greater flexibility to experiment with and switch to alternative chatbot models.

In conclusion, if you aim to develop a quick prototype or lack the necessary human resources to fine-tune and deploy your own models within a suitable infrastructure, I suggest opting for a managed API solution at present. This is precisely what I did for my chatbot prototype, and you are welcome to draw inspiration from my approach. However, it is important to note that these solutions are likely to be outcompeted by more advanced and tailored systems, such as those based on open-source models.

Source link

Leave a Comment