01

Introduction to Compiler Design

Chapter 1 • Beginner

50 min

Introduction to Compiler Design

Compiler translates source code to target code. Understanding compiler phases is essential for GATE CS.

What is a Compiler?

Compiler is a program that translates source code (high-level language) to target code (machine code or intermediate code).

Phases:

  1. Lexical Analysis
  2. Syntax Analysis
  3. Semantic Analysis
  4. Intermediate Code Generation
  5. Code Optimization
  6. Code Generation

Compiler Phases

1. Lexical Analysis (Scanning)

Purpose: Convert source code to tokens.

Input: Source code (character stream)

Output: Tokens (keywords, identifiers, operators, etc.)

Example:

  • Input: "int x = 5;"
  • Tokens: KEYWORD(int), ID(x), OPERATOR(=), NUMBER(5), SEMICOLON

2. Syntax Analysis (Parsing)

Purpose: Build parse tree from tokens.

Input: Tokens

Output: Parse tree or syntax tree

Example:

  • Tokens → Parse tree showing expression structure

3. Semantic Analysis

Purpose: Check semantic correctness.

Tasks:

  • Type checking
  • Scope resolution
  • Symbol table management

4. Intermediate Code Generation

Purpose: Generate intermediate representation.

Forms:

  • Three-address code
  • Quadruples
  • Triples

5. Code Optimization

Purpose: Improve code efficiency.

Optimizations:

  • Constant folding
  • Dead code elimination
  • Loop optimization

6. Code Generation

Purpose: Generate target code.

Output: Machine code or assembly

Lexical Analysis

Lexical Analyzer (Scanner): Converts source to tokens.

Token: Smallest meaningful unit.

Token Types:

  • Keywords: if, while, int
  • Identifiers: variable names
  • Operators: +, -, *, /
  • Literals: numbers, strings
  • Punctuation: ;, {, }

Regular Expressions: Used to specify token patterns.

Finite Automata: Used to recognize tokens.

Symbol Table

Symbol Table: Stores information about identifiers.

Information:

  • Name
  • Type
  • Scope
  • Address

Operations:

  • Insert
  • Lookup
  • Delete

GATE CS Weightage

Compiler Design typically accounts for:

  • 6-10 marks out of 100 in GATE CS
  • Lowest priority subject
  • Requires understanding of parsing and code generation

Important Topics for GATE CS

  1. Lexical Analysis (Medium Priority)
  • Token recognition
  • Regular expressions
  1. Parsing (High Priority)
  • Top-down parsing
  • Bottom-up parsing
  1. Syntax-Directed Translation (Medium Priority)
  • SDT schemes
  • Attribute grammars
  1. Code Generation (Low Priority)
  • Three-address code
  • Code optimization