Compilers(Question about Computer Science)
1. Notes on Code generation and analysis: (see CH 5,8,9 Aho + Ullman)
What to generate depends on source language, architecture of target machine or runtime sys.
Common intermediate codes: 3 address, high-level ( Lisp/Scheme, C, Modula 2) parse tree/CFG. I-code is still research topic.
Three-address code. Statements of form x := y op z
types: assignment
binary op x := y op z => OP x y z unary op x := op y => OP x y copy - x := y => MOV x y
indexed assignment x[i] := y => MOVI x y offset(i) x := y[i] (i is offset on address of x | y)
pointer/address assignment *x := y => MOV contents(address(x)) y x := *y => MOV x content(address(y)) x := &y => MOV x addrsss(y)
jumps unconditional jump: GOTO label(x) => JMP x conditional jump: IF x op y GOTO z => COND x y z
calls param x => PARM x => PAR x
.. call label(x) number-parm(y) => GOSUB x y .. return value(x) => RET x
Loops := conditional backward jumps
// do {} while
label z { loop body } if(x op y) jmp z or // while{}
label w if NOT (x op y) jmp z { loop body }
jmp w label z
1
2
Logic (if-then-else) := conditional/unconditional forward jumps
if( NOT (x op y ) ) jmp z { true block } jmp endif label z { false block } label endif
Note: jumps are not L-attributed (inherited values) or synthezied values: (L-attribute means on the parse tree to the left of where we need the value)
Backward jumps - label is on LH side of tree, but arbitrarily far Forward jumps - label is on RH side of tree.
Therefore handling labels cannot be done in grammar, requires symbol table.
Implies => we can’t (easily) check correctness of IF-THEN-ELSE structure built with jumps (so we dislike GOTOs!!). When we build structure using jumps from IF-THEN-ELSE, we know it is correct.
Backpatching:
To handle things like addresses of forward jumps, we place marker when we encounter the jump, fill it in on second pass after we find the labels (error message if we don’t find the label).
Analysis of larger code blocks: Expression has value as synthesized attribute - what is value of an assignment statement? (C++ says it is a boolean, value T-F)
Values/types transferred from one statement to another via memory/symbol table.
Problem: when can values change? Example : ++x, x++operator in C/C++ causes change during execution of a
statement, gets in the way of DAG representation/optimization
We dislike side effects because they get in way of optimizations/ interpretation of semantics: .
x = 5; y = 10; z = y*(x + y++); // what is value of z? // otherwise z = (++y)*(x + y); . Change in the value of y is a side effect of the z assignment statements. Questions: - When do we know that contents of a given memory location/register
is unchanged? (so we can use common expression elimination). -
3
When can we change statement/evaluation order without affecting meaning? (so we can reorder things so can keep values in registers or cache).
Statements in a basic block may be analyzed together, since they will be executed together. Note that order matters, because
assignments modify memory => dependences: • WRITE(x) -> READ(x) : direct (ReadAfterWrite - RAW) dependence:
preserve order because must read value written. • READ(x) <- WRITE(x) : anti-dependence (WriteAfterRead - WAR): pre-
serve order because must read before value is replaced. • WRITE(x) - WRITE(x) : Write-Write - strictly not a dependence, but
preserve order because must read the correct write.
Reaching definitions. Which statement writes a value that is used by some later statement? Is there a write statement that cannot reach a read? (DEAD code)
Basic block: -. IS : list of statements B=(k1 ... kN) such that IF execution reaches k1 THEN all the statements in B must be executed, AND there is no way to execute any statement kj in B without executing k1 : Also called straight-line code.
Normal execution order is lexicographic order, unless modified by a control state- ment (i.e., we execute statement n+1 after statement n).
control statemnt : anything which changes the standard execution order (jump, switch, it-else, loop
Basic block analysis is normally done on a function/procedure. A program is a function, it may call other functions. Class (object) methods are functions
Definition. statment = "leader" if and only if - it is the first executable statement of a function - it is the statment inmediately after a control statement - it is the target of a jump (a transfer of control) nothing else is a leader.
• BASIC BLOCKS Given: function/procedure text
Base case: first executable statemnt is leader assign unique label B_0 to block
Loop: • add statement to to block B_i • while next statment is not( leader) goto Loop • else: • if( next is leader) assign unique label B_j to block ; goto Loop: • else halt.
Result: list of basic blocks, with code in each block
Control-Flow Graph : (CFG). CFG={V,A,s,E} is a directed graph, constructed as follows: V = nodes A = arcs s = start node in V E = end nodes in V
(1) Split program into basic blocks, number/label each block, place in V (2) The first block in the program is the root/start block = s s is a member of
V (3) Let x, y be blocks in V. If it is possible to reach the first statement in x
from the last statement in y (e.g., y immediately follows x, or there is a jump from y to x) then add (directional) arc (x,y) to A
4
(4) If x is a block in V that contains a halt or stop statement, or a block with no outgoing arcs, place x in E.
Dead code: if x is a node in V and there is no path s-*>x, then x cannot be reached from the start of the program and is dead code.
Infinite loops: if x is a node in V and there is no path x-*e, where e is some node in E, then x is part of an infinite loop.
Execution: Can be described as a sequence of basic blocks that trace a path through the CFG, starting at s, following each arc in the specified direction, ending in one of the nodes in E