GATE CS - COMPILER DESIGN:LL(1) and LR Parsing — GATE CS Complete Reference

Mastering ll(1) and lr parsing — gate cs complete reference concepts and implementation.

LL(1) and LR Parsing — GATE CS Complete Reference

Parsing is consistently the most heavily tested topic in GATE Compiler Design, contributing 4–7 marks. The key skill is computing FIRST and FOLLOW sets correctly, then constructing or validating parse tables.

Context-Free Grammars (CFG) — Foundations

A CFG G = (V, T, P, S) where:

  • V = non-terminals (uppercase), T = terminals (lowercase)
  • P = productions, S = start symbol

Derivation types:

  • Leftmost derivation: Always expand the leftmost non-terminal
  • Rightmost derivation: Always expand the rightmost non-terminal (used in LR parsing)
  • Ambiguous grammar: A sentence has more than one parse tree

FIRST and FOLLOW Sets

FIRST(X)

The set of terminals that can begin a string derived from X.

Rules:

  1. If X is terminal: FIRST(X) = {X}
  2. If X → ε: add ε to FIRST(X)
  3. If X → Y₁Y₂…Yₖ: add FIRST(Y₁) − {ε}; if ε ∈ FIRST(Y₁), add FIRST(Y₂) − {ε}; etc.

Example:

E → T E'
E' → + T E' | ε
T → F T'
T' → * F T' | ε
F → ( E ) | id

FIRST(F) = {(, id}
FIRST(T') = {*, ε}
FIRST(T) = FIRST(F) = {(, id}  [T' can be ε]
FIRST(E') = {+, ε}
FIRST(E) = FIRST(T) = {(, id}

FOLLOW(X)

The set of terminals that can appear immediately to the right of X in a sentential form.

Rules:

  1. Add $ (end marker) to FOLLOW(S)
  2. If A → αBβ: add FIRST(β) − {ε} to FOLLOW(B)
  3. If A → αB or ε ∈ FIRST(β): add FOLLOW(A) to FOLLOW(B)

Example (continuing above):

FOLLOW(E) = {), $}
FOLLOW(E') = FOLLOW(E) = {), $}
FOLLOW(T) = FIRST(E') ∪ FOLLOW(E') = {+} ∪ {), $} = {+, ), $}
FOLLOW(T') = FOLLOW(T) = {+, ), $}
FOLLOW(F) = FIRST(T') ∪ FOLLOW(T') = {*} ∪ {+, ), $} = {*, +, ), $}

LL(1) Parsing

LL(1) = scan input Left-to-right, Leftmost derivation, 1 lookahead symbol.

Predictive Parsing Table Construction

For each production A → α:

  1. For each terminal a ∈ FIRST(α): add A → α to M[A, a]
  2. If ε ∈ FIRST(α): for each terminal b ∈ FOLLOW(A): add A → α to M[A, b]

LL(1) condition: No cell M[A, a] has more than one entry.

Example LL(1) Table (arithmetic grammar)

        id     +      *      (      )      $
E    E→TE'               E→TE'
E'      ε    E'→+TE'     ε      ε     ε
T    T→FT'               T→FT'
T'      ε    T'→ε   T'→*FT' ε      T'→ε  T'→ε
F    F→id               F→(E)

LL(1) Parse Trace

Input: id + id * id$

StackInputAction
$Eid+id*id$E → T E'
$E'Tid+id*id$T → F T'
$E'T'Fid+id*id$F → id
$E'T'idid+id*id$Match id
$E'T'+id*id$T' → ε
$E'+id*id$E' → +TE'
$E'T++id*id$Match +
$E'Tid*id$T → FT'
.........
$$Accept

LR Parsing

LR(k) = scan Left-to-right, Rightmost derivation (reversed), k lookahead.

LR parsers are bottom-up and strictly more powerful than LL parsers.

Power hierarchy: LR(1) > SLR(1) > LL(1)

SLR(1) — Simple LR

Uses FOLLOW sets to resolve reduce conflicts. Fastest to construct but weakest.

LALR(1) — Look-Ahead LR

Merges LR(1) states with identical core (most practical choice — used by yacc/bison).

Same number of states as SLR but handles more grammars.

LR(1) — Canonical LR

Most powerful. Separate states for different lookaheads. Large state count.

LR Item and Closure

An LR(0) item is a production with a dot indicating parse position:

  • [E → T • E'] means we've recognized T, expecting E' next

Closure(I): For each item [A → α•Bβ] in I, add all items [B → •γ] for B → γ productions.

Goto(I, X): Move dot past X in all items of I, then take closure.

Ambiguous Grammars

Ambiguous grammar: One string has multiple parse trees.

Example (dangling else — classic GATE):

S → if E then S | if E then S else S | other

"if E1 then if E2 then S1 else S2" is ambiguous — else can bind to either if.

Resolution: Disambiguate by convention (else matches nearest if) and modify grammar.

GATE fact: No algorithm can determine if an arbitrary CFG is ambiguous (undecidable problem). But specific grammars can be proven ambiguous by exhibiting two parse trees.

Grammar Classification

GrammarLanguageProperty
Regular (Type 3)RegularFinite automaton
Context-Free (Type 2)CFLPDA (pushdown automaton)
Context-Sensitive (Type 1)CSLLinear bounded automaton
Unrestricted (Type 0)RETuring machine

CFL closure properties (GATE exam favourite):

  • Closed under: union, concatenation, Kleene star, homomorphism
  • NOT closed under: intersection (in general), complement

Regular vs CFL: {aⁿbⁿ | n ≥ 0} is CFL but NOT regular. {aⁿbⁿcⁿ | n ≥ 0} is neither.

Practice Problems

  1. Compute FIRST and FOLLOW for: S → AbB | ε, A → a | aS | ε, B → b | bB
  2. Construct the LL(1) parse table for the grammar and verify whether it is LL(1)
  3. Show that the grammar E → E + E | E E | id is ambiguous (two parse trees for id + id id)
  4. How many LR(0) items does the augmented grammar S' → S, S → aAb, A → a | ε have?
  5. Convert the ambiguous expression grammar to unambiguous form maintaining operator precedence