GATE CS - COMPILER DESIGN:LL(1) and LR Parsing — GATE CS Complete Reference
Mastering ll(1) and lr parsing — gate cs complete reference concepts and implementation.
LL(1) and LR Parsing — GATE CS Complete Reference
Parsing is consistently the most heavily tested topic in GATE Compiler Design, contributing 4–7 marks. The key skill is computing FIRST and FOLLOW sets correctly, then constructing or validating parse tables.
Context-Free Grammars (CFG) — Foundations
A CFG G = (V, T, P, S) where:
- V = non-terminals (uppercase), T = terminals (lowercase)
- P = productions, S = start symbol
Derivation types:
- Leftmost derivation: Always expand the leftmost non-terminal
- Rightmost derivation: Always expand the rightmost non-terminal (used in LR parsing)
- Ambiguous grammar: A sentence has more than one parse tree
FIRST and FOLLOW Sets
FIRST(X)
The set of terminals that can begin a string derived from X.
Rules:
- If X is terminal: FIRST(X) = {X}
- If X → ε: add ε to FIRST(X)
- If X → Y₁Y₂…Yₖ: add FIRST(Y₁) − {ε}; if ε ∈ FIRST(Y₁), add FIRST(Y₂) − {ε}; etc.
Example:
E → T E'
E' → + T E' | ε
T → F T'
T' → * F T' | ε
F → ( E ) | id
FIRST(F) = {(, id}
FIRST(T') = {*, ε}
FIRST(T) = FIRST(F) = {(, id} [T' can be ε]
FIRST(E') = {+, ε}
FIRST(E) = FIRST(T) = {(, id}
FOLLOW(X)
The set of terminals that can appear immediately to the right of X in a sentential form.
Rules:
- Add $ (end marker) to FOLLOW(S)
- If A → αBβ: add FIRST(β) − {ε} to FOLLOW(B)
- If A → αB or ε ∈ FIRST(β): add FOLLOW(A) to FOLLOW(B)
Example (continuing above):
FOLLOW(E) = {), $}
FOLLOW(E') = FOLLOW(E) = {), $}
FOLLOW(T) = FIRST(E') ∪ FOLLOW(E') = {+} ∪ {), $} = {+, ), $}
FOLLOW(T') = FOLLOW(T) = {+, ), $}
FOLLOW(F) = FIRST(T') ∪ FOLLOW(T') = {*} ∪ {+, ), $} = {*, +, ), $}
LL(1) Parsing
LL(1) = scan input Left-to-right, Leftmost derivation, 1 lookahead symbol.
Predictive Parsing Table Construction
For each production A → α:
- For each terminal a ∈ FIRST(α): add A → α to M[A, a]
- If ε ∈ FIRST(α): for each terminal b ∈ FOLLOW(A): add A → α to M[A, b]
LL(1) condition: No cell M[A, a] has more than one entry.
Example LL(1) Table (arithmetic grammar)
id + * ( ) $
E E→TE' E→TE'
E' ε E'→+TE' ε ε ε
T T→FT' T→FT'
T' ε T'→ε T'→*FT' ε T'→ε T'→ε
F F→id F→(E)
LL(1) Parse Trace
Input: id + id * id$
| Stack | Input | Action |
|---|---|---|
| $E | id+id*id$ | E → T E' |
| $E'T | id+id*id$ | T → F T' |
| $E'T'F | id+id*id$ | F → id |
| $E'T'id | id+id*id$ | Match id |
| $E'T' | +id*id$ | T' → ε |
| $E' | +id*id$ | E' → +TE' |
| $E'T+ | +id*id$ | Match + |
| $E'T | id*id$ | T → FT' |
| ... | ... | ... |
| $ | $ | Accept |
LR Parsing
LR(k) = scan Left-to-right, Rightmost derivation (reversed), k lookahead.
LR parsers are bottom-up and strictly more powerful than LL parsers.
Power hierarchy: LR(1) > SLR(1) > LL(1)
SLR(1) — Simple LR
Uses FOLLOW sets to resolve reduce conflicts. Fastest to construct but weakest.
LALR(1) — Look-Ahead LR
Merges LR(1) states with identical core (most practical choice — used by yacc/bison).
Same number of states as SLR but handles more grammars.
LR(1) — Canonical LR
Most powerful. Separate states for different lookaheads. Large state count.
LR Item and Closure
An LR(0) item is a production with a dot indicating parse position:
- [E → T • E'] means we've recognized T, expecting E' next
Closure(I): For each item [A → α•Bβ] in I, add all items [B → •γ] for B → γ productions.
Goto(I, X): Move dot past X in all items of I, then take closure.
Ambiguous Grammars
Ambiguous grammar: One string has multiple parse trees.
Example (dangling else — classic GATE):
S → if E then S | if E then S else S | other
"if E1 then if E2 then S1 else S2" is ambiguous — else can bind to either if.
Resolution: Disambiguate by convention (else matches nearest if) and modify grammar.
GATE fact: No algorithm can determine if an arbitrary CFG is ambiguous (undecidable problem). But specific grammars can be proven ambiguous by exhibiting two parse trees.
Grammar Classification
| Grammar | Language | Property |
|---|---|---|
| Regular (Type 3) | Regular | Finite automaton |
| Context-Free (Type 2) | CFL | PDA (pushdown automaton) |
| Context-Sensitive (Type 1) | CSL | Linear bounded automaton |
| Unrestricted (Type 0) | RE | Turing machine |
CFL closure properties (GATE exam favourite):
- Closed under: union, concatenation, Kleene star, homomorphism
- NOT closed under: intersection (in general), complement
Regular vs CFL: {aⁿbⁿ | n ≥ 0} is CFL but NOT regular. {aⁿbⁿcⁿ | n ≥ 0} is neither.
Practice Problems
- Compute FIRST and FOLLOW for: S → AbB | ε, A → a | aS | ε, B → b | bB
- Construct the LL(1) parse table for the grammar and verify whether it is LL(1)
- Show that the grammar E → E + E | E E | id is ambiguous (two parse trees for id + id id)
- How many LR(0) items does the augmented grammar S' → S, S → aAb, A → a | ε have?
- Convert the ambiguous expression grammar to unambiguous form maintaining operator precedence