Representing-Programs - Nate's Notes

# Representing Programs What is actually meant by representing programs? > What we consider to be the programming language when we are manipulating programs as part of the compilation process. So, what we read in and transform and eventually translate into another language as part of the compiler. ### Concrete Syntax The most obvious way to represent a program is as *text*. That is as concrete text, just as you would type into a text editor when you are writing a program: ![](Screen%20Shot%202021-10-27%20at%207.08.56%20AM.png) What are the challenges associated with representing a program via concrete syntax? Well, it is definitely very easy to write, but it is a terrible representation for actually transforming. This is not the right *level of abstraction* for understanding the semantics of programs that are necessary in order to correctly transform it. ### Abstract Syntax Now, the next type of syntax that you might image is **abstract syntax**. This will generally be done via an [Abstract Syntax Trees](Abstract%20Syntax%20Trees.md). Now, what may make this good vs. bad for a compiler? The Good... * This representation is perfect if you are writing an [interpreter](interpreter). It is very easy to see how you could use a recursive function to process the levels in the tree and recall yourself, and to run the program. * For instance, to run a block you just need to recursively interpret all the statements in the block while updating some reflection of the programs environment. * It is also convenient for a lot of [compilers](Compiler.md) The Bad... * The biggest downside to the AST representation is that all the different types of nodes in this tree have different behavior. For instance, if you want to write a compiler analysis that can process any program, you have to constantly be reasoning about the difference in semantics of all these different types of nodes ![](posts/images/Abstract%20syntax%20trees%202.png) * This is great for representing programs as humans write them. ### Lists of instructions What if we were to represent programs as lists of instructions? The main thing that this will allow us to do is to make the representation *more regular* and *more predictable* (specifically for the purpose of transforming in a compiler). ![](Screen%20Shot%202021-10-27%20at%207.55.23%20AM.png) So, one question is: How can we look at this representation of the program? One way is the **Control Flow Graph**. The basic idea of a CFG is that every vertex is an instruction (something you might execute in the program, no a label), and edges between the instructions represent the transfer of control. ![](Screen%20Shot%202021-10-27%20at%208.01.54%20AM.png) For instance consider the program above that is written in an instruction based language. There are only two instructions. There is one constant instruction and one print instruction. So the entire program can be represented as a CFG: ![](posts/images/Abstract%20syntax%20trees%203.png) So, we can summarize a CFG as: * A **directed** [graph](Types-of-Graphs.md) * Vertices are instructions * Edges indicate **possible** flow of control * Exactly one **entry** vertex and one **exit** vertex Here is a slightly more complicated example: ![](posts/images/Abstract%20syntax%20trees%204.png) The CFG has four instructions (not including labels). We see that we start with a const instruction and immediately after that we execute a jump instruction. Immediately after the jump instruction the print instruction executes, so we have an edge represent that *transfer of control*. We have one lonely instruction hanging out all on it's own that is guaranteed to never be executed. So it is worth noting that based on this representation of the CFG we already can see that we are able to remove certain code that is useless. So, if we wanted to write our first compiler pass on these programs, we could just construct a control flow graph and look for the lonely nodes that are guaranteed to never be executed and delete them (think about this as *graph optimization*). There is a key idea at play here: > By *shifting* the *representation of the program* we were able to look at the *semantics* of the program differently, and different properties became obvious. --- Date: 20211027 Links to: [Advanced Compilers Course](Advanced%20Compilers%20Course.md) [Big Idea Representation](Big%20Idea%20Representation.md) Tags: References: * [Video Lesson](https://vod.video.cornell.edu/media/1_vnx6laq9)