The main purpose of Compiler is to convert high-level language to low level language. The different phases of Compiler are as:
- Lexical Analysis
- Syntax Analysis
- Semantic Analysis
- Intermediate Code Generator
- Code optimization
- Target Code Generator
But before discussing the phases of Compiler, let’s first understand the processing of a program code.
Stages of program written in high-level language converted to low-level language:-
Fig.1 Stages of processing a Program code
- High-level language – It contains files such as #include<…>, #define, which is removed by the Pre-Processor by including the meaning of those files (File Inclusion).
- Pre-processor – It substitutes the files in the source program written by us in file inclusion.
Note: #define is a macro expansion function which is a small function for which we really don’t call the actual function. Main function calling is overhead so we call the small functions.
- Compiler – Once the program reaches the compiler, it will be in pure form which means it will not contain any # lines. The # lines are also called pre-processor derivatives, i.e., they direct the pre-processor about what to do. The compiler takes the high-level language and converts it into assembly language. Assembly language is not entirely 0s and 1s and not entirely high-level language. It is an intermediate.
- Assembler – For every platform (intel, amd, motorola) and top of it, the OS (Windows, Linux and 64 bit/32 bit), we are going to have some assembler. An assembler for 1 platform will not work for another platform. Therefore, assembler is like a manual using which we can operate one instrument but not any other instrument. The assembler converts the assembly language into machine code (0s and 1s). There are 2 types of machine codes:-
- Relocatable – we can load this machine code at any point in the computer and run.
- Absolute – it cannot be relocated during the execution of the computer program and is loaed into computer fixed storage at each use.
- Loader/Linker – It will take the relocateable machine code and convert it into absolute code.
Phases of Compiler
Fig. 2 Phases of Compiler
- Lexical Analysis – In this phase, the lexical analyzer will take the program, read it and convert the program into tokens.
- Syntax Analysis – the stream of tokens is given to the syntax analyzer (Parser) which is going to convert it into a parse tree.
- Semantic Analysis – The parse tree is given to semantic analyzer which verifies whether it is valid or not.
- Intermediate Code Generator – The semantically verified parse tree is given to Intermediate code generator which generates three-address code.
- Code Optimization – The three-address code is given to the code optimizer. The main purpose of code optimizer is to reduce the size of the program or to reduce the number of lines in the three address code.
- Target Code Generator – The reduced/optimized code is given to target code generator which is finally going to generate the assembly code.
All the phases of the entire compiler software module is going to take support of the Symbol Table Manager and also the Error Handler.
Example to show the working of the phases of Compiler:-
x = a + b * c; // assume as the source program
Now,
Fig. 3 Lexical Analysis, Syntax Analysis
Lexical Analyzer removes the white spaces, cmments and converts a stream of lexemes into stream of tokens. It identifies the different tokens using patterns. For example, in case of identifier we have, l(l + d)*, where, l = letters/alphabets, d = digits
Syntax Analyzer takes the tokens one by one and also takes a grammar (generally Context-free grammar) which defines some rules to which the program is matched and verified. Also called as parser, it is going to design the parse tree.
Rule:
S -> id = E
E -> E + T/T
T -> T * F/F
F -> id
We have to check that the yield of the parse tree and the input to the syntax analyzer is same. If it is, we can say that the input is according to the format we provided.
Fig. 4 Semantic Analysis, Intermediate Code Generator
Semantic Analyzer works by analyzing the tokens received, such as the left-hand side has to be a variable name, here, it cannot be a digit or a function name or an array name. The variable name should be compatible with the type of right-hand side. These kind of type checking is done here.
Intermediate Code generator generates intermediate codes. There are many types of intermediate code but the most popular is the three-address code, in which there are only three addresses at maximum in every line of the code.
Note: Why do we convert into intermediate code?
If we convert a program into intermediate code, then the intermediate code can be converted into machine language in the last 2 phases which are depended on the platform. So, till the ICG, whatever machine we run the compiler on, it is the same for all, which means if we want to design a new compiler for a new platform, we need not start from scratch. We can take all phases till the ICG and change the last 2 phases, so that new compilation can be performed.
Fig. 5 Code optimization, Target Code Generator
Code Optimizer reduces the number of lines. Here, we don’t need three lines to do the job, only 2 lines could do it.
Target Code Generator receives the output of the code optimizer. The main purpose of TCG is to write the code which assembler can understand.
Note: Most of the jobs are done by the syntax analyzer although we have so many phases. Therefore, syntax analyzer is the heart of the compiler. Till the phase ICG, it is also called as front-end, the remaining part is called back-end (CO and TCG). Theoretically, the entire front-end is implemented together and then the back-end.
For implementing the compiler, there are various tools available. For lexical analyzer, there is a tool called lex. For syntax analyzer, we have Yacc (Yet Another Compiler Compiler).
Yacc is the standard parser generator for the Unix operating system. An open source program, Yacc generates code for the parser in the C programming language.
LANCE is implementation of front-end of the compiler which will give intermediate code and then we can implement back-end on our own, depending on the platform.
Related Links
- Compiler Design Notes – Click Here
- Theory of Computation Notes – Click Here