This article sums it up perfectly. I was interested in building a compiler long before going to college and this was the most accessible body of work.
Building a recursive descent parser from scratch was an eye opener to 17yo me on how a seemingly very complex problem that I had no idea how to approach can be made simple by breaking it down into the right primitives.
"breaking things down into the right primitives" is the real key to programming. There are many books and web pages about algorithms, but I wish there were more searchable and browsable resources for how to approach problems through primitives.
The process of breaking a complex problem down into the right primitives requires great understanding of the original problem in the first place.
Whats blocking me during programming usually are edge cases I had no idea about. Its still hard to find good material on compilers if you are not into reading dry ass books. Thats a me problem though, I simply cant force myself to read boring factual only content (one of the reasons as to why I love beejs guides).
From the Publications section of that Wikipedia page:
>The April 1971 Communications of the ACM article "Program Development by Stepwise Refinement",[22][23] concerning the teaching of programming, is considered to be a classic text in software engineering.[24] The paper is considered to be the earliest work to formally outline the top-down method for designing programs.[25][26] The article was discussed by Fred Brooks in his influential book The Mythical Man-Month and was described as "seminal" in the ACM's brief biography of Wirth published in connection to his Turing Award.[27][28]
> Rather than getting stuck in front-end minutiae, the tutorial goes straight to generating working assembly code, from very early on
Good summary.
I had no background in compilers or related theory but read Jack Crenshaw's Let's Build a Compiler tutorials some time ago. My main take away from reading half a dozen or so of these tutorials was that building a simple compiler for a toy language was a small project that was well within my grasp and ability, not a huge undertaking that required mastery of esoteric pre-requisites or a large amount of planning.
I got a lot of enjoyment messing about with toy compiler projects related to Brainfuck.
Why Brainfuck? It's a beautiful little toy language. Brainfuck has 8 instructions, each instruction is 1 character, so parsing reduces to getting a char and switching on it. I guess it depends on what you want to explore. If you want to focus on writing recursive descent parsers, not the best choice!
One initial project could be to compile (transpile) from Brainfuck source to C source. You can do this as a source to source compiler without any internal representation by transforming each Brainfuck operation to a corresponding C statement. Brainfuck is specified in terms of a single fixed length array of bytes, and a pointer - an index into that array - that can be moved around, and basic manipulations of the byte it is pointing it. So on the C side you need two variables: one for the array and a second, an index for the pointer.
A second project could be compiling from Brainfuck to assembly language, skipping C. You'd need to read a few tutorials/reference docs about your chosen assembly language and learn how to run the assembler to compile tiny assembly programs into native executables. You could explore some examples of what output assembly programs you get when you compile small Brainfuck programs to C and then compile those C programs to assembly. You could write a direct source to source compiler without an internal representation, where each Brainfuck operation is directly mapped to a snippet of assembly instructions. Once you've got this working, you can compile a Brainfuck program into an assembly program, and then use the usual toolchain to assemble that into a native executable and run it.
There's also lots of projects in another direction, treating Brainfuck as the target language. Imagine that your job is to write Brainfuck programs for a CPU that natively executes Brainfuck. Try writing a few tiny Brainfuck programs by hand and savour how trying to do almost anything involves solving horrible little puzzles. Maybe it'd be much easier to do your job if you, the Brainfuck programmer, didn't have to manually track which index of the array is used to store what. You could invent a higher level language supporting concepts like local variables, where you could add two local variables together and store the results in a third local variable! Maybe you could allow the programmer to define and call their own functions! Maybe you could support `if` blocks, comparisons! You could have a compiler that manages the book-keeping of memory allocation and mapping complex high level abstractions such as integer addition into native Brainfuck concepts of adding one to things and moving left or right. Projects in this direction let you explore more stuff about parsers (the input syntax for your higher level language is richer), internal representations, scopes and so on.
This article sums it up perfectly. I was interested in building a compiler long before going to college and this was the most accessible body of work.
Building a recursive descent parser from scratch was an eye opener to 17yo me on how a seemingly very complex problem that I had no idea how to approach can be made simple by breaking it down into the right primitives.
"breaking things down into the right primitives" is the real key to programming. There are many books and web pages about algorithms, but I wish there were more searchable and browsable resources for how to approach problems through primitives.
The process of breaking a complex problem down into the right primitives requires great understanding of the original problem in the first place.
Whats blocking me during programming usually are edge cases I had no idea about. Its still hard to find good material on compilers if you are not into reading dry ass books. Thats a me problem though, I simply cant force myself to read boring factual only content (one of the reasons as to why I love beejs guides).
>a seemingly very complex problem that I had no idea how to approach can be made simple by breaking it down into the right primitives.
https://en.wikipedia.org/wiki/Niklaus_Wirth
From the Publications section of that Wikipedia page:
>The April 1971 Communications of the ACM article "Program Development by Stepwise Refinement",[22][23] concerning the teaching of programming, is considered to be a classic text in software engineering.[24] The paper is considered to be the earliest work to formally outline the top-down method for designing programs.[25][26] The article was discussed by Fred Brooks in his influential book The Mythical Man-Month and was described as "seminal" in the ACM's brief biography of Wirth published in connection to his Turing Award.[27][28]
This brings back memories, I got to learn from Jack Crenshaw's tutorial from comp.compilers newsgroup on USENET.
Which by the way, it is still active, https://compilers.iecc.com/index.phtml
For modern compiler and a more direct approach I recommend https://www.cs.cornell.edu/~asampson/blog/llvm.html
> Rather than getting stuck in front-end minutiae, the tutorial goes straight to generating working assembly code, from very early on
Good summary.
I had no background in compilers or related theory but read Jack Crenshaw's Let's Build a Compiler tutorials some time ago. My main take away from reading half a dozen or so of these tutorials was that building a simple compiler for a toy language was a small project that was well within my grasp and ability, not a huge undertaking that required mastery of esoteric pre-requisites or a large amount of planning.
I got a lot of enjoyment messing about with toy compiler projects related to Brainfuck.
Why Brainfuck? It's a beautiful little toy language. Brainfuck has 8 instructions, each instruction is 1 character, so parsing reduces to getting a char and switching on it. I guess it depends on what you want to explore. If you want to focus on writing recursive descent parsers, not the best choice!
One initial project could be to compile (transpile) from Brainfuck source to C source. You can do this as a source to source compiler without any internal representation by transforming each Brainfuck operation to a corresponding C statement. Brainfuck is specified in terms of a single fixed length array of bytes, and a pointer - an index into that array - that can be moved around, and basic manipulations of the byte it is pointing it. So on the C side you need two variables: one for the array and a second, an index for the pointer.
A second project could be compiling from Brainfuck to assembly language, skipping C. You'd need to read a few tutorials/reference docs about your chosen assembly language and learn how to run the assembler to compile tiny assembly programs into native executables. You could explore some examples of what output assembly programs you get when you compile small Brainfuck programs to C and then compile those C programs to assembly. You could write a direct source to source compiler without an internal representation, where each Brainfuck operation is directly mapped to a snippet of assembly instructions. Once you've got this working, you can compile a Brainfuck program into an assembly program, and then use the usual toolchain to assemble that into a native executable and run it.
There's also lots of projects in another direction, treating Brainfuck as the target language. Imagine that your job is to write Brainfuck programs for a CPU that natively executes Brainfuck. Try writing a few tiny Brainfuck programs by hand and savour how trying to do almost anything involves solving horrible little puzzles. Maybe it'd be much easier to do your job if you, the Brainfuck programmer, didn't have to manually track which index of the array is used to store what. You could invent a higher level language supporting concepts like local variables, where you could add two local variables together and store the results in a third local variable! Maybe you could allow the programmer to define and call their own functions! Maybe you could support `if` blocks, comparisons! You could have a compiler that manages the book-keeping of memory allocation and mapping complex high level abstractions such as integer addition into native Brainfuck concepts of adding one to things and moving left or right. Projects in this direction let you explore more stuff about parsers (the input syntax for your higher level language is richer), internal representations, scopes and so on.