Everything has changed in a short period of time. AI tools, like ChatGPT and GPT-4, are taking over and completely changing both education and the landscape of learning technical skills. I felt that I needed to write this article to address some important things:
- In the new age of artificial intelligence, is it still important to learn data science?
- If so, what is the best way to learn these skills by leveraging the new technologies that are out there? And how would I do that if I had to start over again, right now?
- What does the future of the data science look like?
As AI continues to evolve, will data scientists become obsolete or will their role be more crucial than ever?
From a personal perspective, I still feel that I add more value to my clients than just the AI would, and I’ve been able to (at least) double my work output with these new tools available. Right now, I feel like AI won’t take my job, but, realistically, the future is more uncertain than ever.
Before you get scared about jobs disappearing, let’s take a look at the following scenario: In some future, you run a company that has AI doing your analytics work for you.
Who would you want running the AI, prompting it, and overseeing it? Would you want someone with a background in data science or software engineering to oversee these programs or would you like someone who is untrained?
I think the answer is pretty obvious. You would want someone with experience and knowledge of how to work with data running these AI systems.
In the short term, this scenario is hopefully hypothetical. But it does give me some confidence that some aspect of these skills have resilience.
Even if the landscape changes to where data scientists are doing less hands-on coding, I still feel like these skills you develop from learning this field will be very useful in a world more heavily integrated with AI. AI is grounded in data science, and at some level we are integrated into this system more than other careers.
In addition to that, AI still hallucinates, and we will need as many people as possible with good knowledge to oversee it and act as a feedback loop.
While I am uncertain about the future of data scientists work, there is one thing I am quite certain about: data, analytics, and AI will become an even bigger part of our lives moving forward. Don’t you think that people who have learned these domains will be set up for more relative success as well?
This article would end here if I didn’t think it was still worth learning data science. To be clear, I still think it is still 100% worth it. But, to be honest, learning just data science isn’t enough anymore. You need to learn how to use new AI tools as well.
The funny thing is learning both data science and these AI tools is easier than learning just data science alone. Let me explain.
As it so happens, you’re entering at the perfect time to learn these two domains together.
If you learn data science by leveraging the new AI tools that are out there, you get a twofold benefit:
- You get a more personalized and iterative education experience from learning the data domain with the AI
- You also get to upskill in AI tools at the same time.
You get twice the benefit for about half the work if my calculations are correct.
If the ability to use AI tools can help you land a job and do better work, it is better to know how to work with them than to ignore them. In the last three months, I feel like I’ve learned more about data science than I have in the past three years combined. I attribute the majority of this to the use of ChatGPT.
So, how do you do this? How do you actually learn data science with AI?
This is exactly what I would do if I had to start over with all these tools available to me.
Step 1: Develop A Roadmap
I would develop a roadmap. You can do this by looking through other courses or by having a conversation with ChatGPT. You can literally ask it to make you a data science learning roadmap based on your learning objectives.
If you don’t have learning objectives, you can also ask it to create a list for you and you can find ones you like.
If you want more information about developing educational roadmaps, check out this article where I go more in-depth about the subject.
Step 2: Design ChatGPT to Be My Tutor
I would design ChatGPT to be my tutor. You can create personas with GPT-4, which is probably my favorite feature. You can use a prompt like this:
In this scenario, you are one of the best data science teachers in the world. Please answer my data science questions in a way that will help me develop the best understanding of the domain. Please use many real-world or practical examples and give me practice problems that are relevant along the way.
Step 3: Develop a Course of Study
I’m almost definitely biased, but I think that free courses or paid courses, like mine, are still a good option for creating a structure for learning. As you go through the course of study, you can ask your ChatGPT tutor to give you examples, expand on topics, and give you practice problems.
Step 4: Try Advanced Tools Like AutoGPT
If you’re a little more advanced on the AI front, you could use a tool like AutoGPT to generate a course curriculum for you. I may try to do this and see what it comes up with. If I do, I will share it on my GitHub. I also interviewed GPT-4 on my podcast where I go more in-depth about what GPT-4 is.
Step 5: Do Projects
If you’re already comfortable with coding, you could probably skip to doing projects. I have personally learned a lot from doing projects in tandem with ChatGPT. I did this for the real estate Kaggle challenge.
If it is your very first project, just asking for it to do things is fine, but as you progress, you want to be more intentional and interactive about how you use it.
Let’s compare how a beginner versus an advanced practitioner should go about learning on a project.
A Beginner’s Project Walkthrough
An example of a beginner’s project walkthrough could look like this:
- You feed ChatGPT the information about the rows and columns of the data
- You ask it to create boilerplate code to explore this data for null values, outliers, and normality
- You ask it what questions you should ask of this data
- You ask it to clean the data and build the model for you to make a prediction on the dependent variable
While it may seem like it is doing all the work for you, you still have to get this project to run in your environment. You are also prompting and problem solving as you go along.
There is no guarantee that it will work like there is when you’re copying someone else’s project, so I feel like this is a nice learning middle ground for involvement.
An Advanced Practitioner’s Project Walkthrough
Now, let’s think about how a more advanced practitioner would use this:
1. You could follow the same steps of generating boilerplate code, but this should be expanded upon. So, you might want to experiment with more hands-on exploration of the data and hypothesis testing. Maybe, choose one or two questions you want to answer with data and descriptive statistics and start analyzing it.
2. For someone who has done a few projects, I recommend generating some of the code yourself. Let’s say you made a simple bar chart in plotly. You could feed that in and ask ChatGPT to reformat it, to change the color or the scale, etc.
By doing this, you can rapidly iterate on visualizations, and you can see in real time how different tweaks to the code change the graph. This immediate feedback is great for learning.
3. I also think it is important that you review these changes and see how they were made. Also if you don’t understand something, just ask ChatGPT right there to expand on what it did.
4. More advanced practitioners should also focus more heavily on the data engineering and the pipelines for productionizing code. These are things that you still need to be fairly hands-on with. I found that ChatGPT was able to get me part of the way there, but I needed to do a lot of debugging myself.
5. From there, you may want to go through and have the AI run some algorithms and do parameter tuning. To be honest, I think this will be the part of data science that will be automated the fastest. I think parameter tuning will see diminishing returns for normal practitioners, but maybe not for the highest level Kagglers.
6. You should focus your time on feature engineering and feature creation. This is also something that the AI models can help with, but not completely master. After you’ve got some decent models, see what data you can add, what features you can create, or what transforms you can do to increase your results.
In a world with these advanced AI tools, I think it is even more important to do projects than ever. You have to build things, and share your work. Fortunately, with these AI tools, it is also easier than ever to do that. It’s easier produce a web app. It’s easier to work with new packages that you’ve never worked with before.
I would highly encourage you to create real-world impact and tangible things in your data science work. That will be the new way to differentiate when others are also using these tools to learn and build.
The world is changing, and so is data science. Are you ready to embrace the challenge and create a real-world impact with your projects?
I alluded to it earlier, but I think the way we all work is changing. I think it is an uncertain time for all fields, including data science.
On the other hand, I think that data science is an excellent mix of technical and problem-solving skills that scale well to almost any new world or field.
I’ve talked at length in my podcast about how I think data science is one of the closest fields to pure entrepreneurship out there. I think that, in a world changed by AI, we will need to leverage that entrepreneurial spirit as much as possible.