But first, let’s start with some perspective. If we look back to the introduction of business intelligence tools in the early 2000s, the great value of those tools lies in their ability to provide non-technical, line-of-business people the ability to leverage their domain knowledge by enabling them to select, analyze and present data, without writing a stitch of code. Sound familiar?
Providing user-friendly means to analyze data is nothing new. It will always have incredible value. Indeed, it is a multi-billion dollar industry that continues to grow. However, these tools have no use without domain knowledge. This applies to any data analysis, regardless of the tool(s) being used. Even if it’s generative AI. Without domain knowledge, we do not know what questions to ask of our data. And even if the questions were provided to us, how do we interpret our findings?
And in my view, the greatest value of data analysis work lies in its ability to answer ad hoc questions. Unforeseen, mission-critical questions. Complex, multi-layered, nonlinear types of questions. Answering these questions requires domain knowledge.
For example, why did sales on our best-selling product just drop off a cliff? Our primary supplier just went out of business, what do we do? Why did our customer churn rate double last month? These are not straightforward types of questions that can follow an established decision tree.
What these few examples have in common is that they require immediate answers to situational questions that have never been asked before. And that is really the key. If you understand the construct of generative AI, its inability to answer questions of this nature is truly its Achilles heel in ever being able to replace data analysts fully.
To briefly summarize, generative AI utilizes existing data sets to ‘train’ an LLM to generate a probability-driven answer based on whatever training data it has been fed. And while you can continuously fine-tune your model with ever more precise data sets, how would you train your model on multi-layered, situational questions that have never been asked before?
It would be analogous to you starting a new job as a data analyst in an industry that you are not yet familiar with. And on day one, you are asked to urgently answer one of the questions above. Where would you even start? What data would you pull? How would you even know what all of the potential variables you would need to consider? And, even if you could somehow derive an answer, how would you know if it is correct?
It is for these reasons that I don’t foresee the role of data analyst ever being fully replaced by generative AI. However… generative AI, in its current state, already has many uses in the data analysis field and those uses will only continue to expand with ever-increasing functionality.
Current Potential Uses for Generative AI in Data Analysis
As of today, the highest and best use of generative AI in the data analysis field is its ability to both write code and in turn, explain the code it writes (which it does quite well). I’ve personally used it to help me write and understand Python code.
For those of you who are looking to enter the data analysis field, I could not encourage you enough to take advantage of generative AI to help you learn to code. It would have greatly speeded up my learning curve when I was first cutting my teeth in this field.
In another, truly exciting development for data analysts, generative AI has fueled the development of dedicated coding tools. GitHub has released its Copilot product, which can suggest coding solutions/improvements in real-time as you are writing it!
Earlier in this article, I referred to the potential hurdles companies would face in building their own LLMs. There is possibly one new alternative to that: Databricks has recently released an open-source LLM called ’ Dolly’. In theory, this could solve the issues of cost (being open source) and having to push your data outside of your company’s firewall. It’s a smaller-scale LLM, more suited for focused datasets.
I mention Dolly, primarily as an example of how quickly developments in the field of generative AI are moving and as a heads-up to how they may affect the data analysis field going forward.
As we have already seen, the evolution of AI will only continue to progress at light speed.
There is no doubt in my mind that generative AI will reshape the workflows in data analysis. Generally speaking, repetitive types of tasks or even analyses will in time be performed by generative AI. I could also see coding becoming more of a commodity, versus being a highly developed skill.
Based on the above, I believe that the prototypical data analyst in the future will possess business line-level domain knowledge combined with an ability to incorporate generative AI tools to help them be more efficient and productive with their time.
Lastly, on a personal note, I would encourage anyone reading this to embrace generative AI. Learn about it and use it both in your personal and business lives. With new APIs and plugins constantly being created, its reach and capabilities will only grow.
For better or worse.