moreover, we could, in principle, write the results directly from
"memory" (data cache) to the ALU, since the "memory" (data cache)
read gives us the data as soon as the read is complete:
above as final example from last week, friday, 2 lectures ago, lecture8:
A=B+C
D=E-F
program to implement this example:
LDB $201
LDC $202
ADDABC; A=B+C
STA $200; save result
LDE $204
LDF $205
SUBDEF; D=E-F
STD $203
re-arrangement to avoid data hazards.
answer:
LDB $201
LDC $202
LDE $204; avoid a data hazard stall
ADDABC
LDF $205; was exchanged with below to avoid data hazard stall
STA $200; was exchanged with below to avoid data hazard stall
SUBDEF
; another instruction from further down in the code needs to go here
STD $203
by scheduling the instructions, the compiler attempts to produce code for which there is one instruction per clock tick (per execution unit).