How I built a programming language: The (difficult) Path to Success | by Yashrajvishwakarma | Jul, 2023

Frankly, difficult is an understatement.

My previous article about my programming language outlined its syntax and provided a general idea of how I built it.

But I decided to write another one devoted to my journey towards the final outcome, because if I’m being honest, the journey certainly had a lot more downs than ups, and the challenges I had to overcome were really intimidating. Hopefully, this article also serves as motivation for anyone undertaking a similar project and facing similar difficulties, to not give up and instead know that someone else was once in the same position as them, and ended up fulfilling their goal.

But why should you care?

After all, all of you are from varied backgrounds, whether it be data science, or just good old Python & Java.

Well, even if you aren’t interested in building your own programming language, many of the soft skills I improved on during this experience will certainly be relatable (and hopefully motivational) to all of you, especially if you’re interested in programming.

How many times do you write a program, where every line looks correct, and you’re instead given a cryptic error that seems near-impossible to debug, at which point you begin losing your motivation and think of giving up.

Photo by Tim Gouw on Unsplash

Well, I faced those all-too-common issues way too many times while building the language, and the manner in which I eventually overcame them and persevered hopefully communicates a memorable message as you progress through the article!

Getting started

The original idea for this came when I was in Grade 8 (or Grade 9, can’t quite remember), but back then I barely understood conditionals and iterative loops, so I forgot about it.

But years later, the same idea reignited itself in my mind, except I had no clue how to even start with it.

Photo by Towfiqu barbhuiya on Unsplash

So like all other things I didn’t know how to do, I Googled

“How to make your own programming language”.

And well, the results were quite promising. I spent a long time going through each of them line-by-line, and picked up on the gist of it:

There’s two types of languages: interpreted and compiled. Interpreted languages tend to be more simple to build, but are slower. On the other hand, compiled languages convert your code to machine code, and then execute it (compiled languages are generally faster).

The general process to building a language is:

  • Define the purpose and syntax of the language: This was quite easy for me (I knew I wanted to build a math-related language), and to save myself some work and also make the language as high-level as possible, I decided to only keep a few (essential & useful) functions and just two variables, a function and a floating-point number.
  • Build a lexer & parser: A lexer is the first component of any language, and places every word/phrase of your code into different categories (such as keywords, operators, comments etc.). But by itself, these lexemes are meaningless, and to make sense of them, one needs to build a parser. The parser ensures that the lexemes follow the language’s syntax, and places them into an AST (abstract syntax tree).
  • Executing code: Each node in the AST represents one of the tokens collected by our original lexer, and this includes function names as well. Thus, running our AST allows us to execute the written code line-by-line; this is the general basis for an interpreted programming language.

But a compiler requires a few extra steps; you need to use popular libraries such as LLVM and libgccjit, which convert the code to a compiled executable file.

Compiled or interpreted?

On the one hand, I could create an interpreted language, which is quite challenging and requires me to devote a large amount of time and effort. On the other hand, I could create a compiled language, which is even more challenging, and requires me to devote even more time and effort.

But the idea of building my own compiled language just seemed far too appealing, and I fell into the trap.

Advice: don’t do it.

While there are resources available on building a compiler using LLVM (definitely recommend watching the 100-second video on LLVM, both entertaining and somewhat educational), I quickly got lost in their documentation because of how complex it was.

I’d hedge a guess and say that you could probably learn the basics of Java faster than learning the basics of building a compiler with LLVM.

Photo by Emile Perron on Unsplash

A colossal waste of time (or was it)?

Unfortunately for me, I spent over two months trying to hustle my way through first LLVM, and then libgccjit. The latter proved more promising and I even wrote code for the compiler, however I then faced a plethora of issues relating to setting up the libgccjit package; I went as far as to downloading the entire source code for it from GitHub and placing it in the same directory as my compiler code. But still, nothing seemed to work. I scrolled through countless Reddit and StackOverflow threads on it, but to no avail.

When StackOverflow can’t help you, you know something’s very wrong.

On the verge of giving up

At this point, I had really hit a new low, because I hadn’t made an iota of progress in so long, I began devaluing all the work I’d done, such as building a lexer & parser from scratch, without the popular lex/yacc & flex/Bison tools. At this point, I asked myself “Should I throw it all in the (virtual) garbage can?”. I was tempted to say yes, but I don’t quit that easily, especially when it comes to coding-related projects.

And so I went for a walk, cleared my head, and went back to the drawing board, this time with a different aim: Building an interpreted language.

Finally completing it.

Once I decided that I would build an interpreter, I also made the difficult decision of ditching C and returning to Python. Under normal circumstances, it’s best to use lower-level languages to build interpreted languages, as their speed compensates for the slow interpreter.

But I was in a desperate time, and those call for desperate measures.

After returning to India, I spent the next week in a state of sleeplessness, and quite literally worked 24 hours a day (around 21 if you don’t count breakfast, lunch and dinner — I’m a slow eater).

And what happened next is something that happened to me for the first time, and will likely never happen again — my code worked on the first attempt! No debugging, no more confusion.

When I ran my source code, every single component: from the lexer, to the parser & the AST worked properly, and correctly executed each line of the code as it should have.

Photo by the blowup on Unsplash

Reflection & a final note

The quote above may sound cliche, but that doesn’t detract from its validity; I faced far more failures than successes while building AdvAnalysis, but yet, I ended up on top.

While you could learn about things to be cognizant of when building your own programming language (such as compiled vs interpreted, the process to follow), the main takeaway I had (and hope you had as well), is that no matter how discouraging it may be to constantly encounter a cryptic error in your program, in the end, it always works as you expect it to, if only you have the ability to be patient & return to square one and follow a different approach.

To be honest, as a fellow programming enthusiast, you probably already knew that. 🙂

Source link

Leave a Comment