even better than just forwarding

generally, we can write the results from the destination register directly to the ALU:

moreover, we could, in principle, write the results directly from "memory" (data cache) to the ALU, since the "memory" (data cache) read gives us the data as soon as the read is complete:

above as final example from last week, friday, 2 lectures ago, lecture8:

A=B+C
D=E-F

program to implement this example:
LDB $201
LDC $202
ADDABC; A=B+C
STA $200; save result
LDE $204
LDF $205
SUBDEF; D=E-F
STD $203 re-arrangement to avoid data hazards.

answer:
LDB $201
LDC $202
LDE $204; avoid a data hazard stall
ADDABC
LDF $205; was exchanged with below to avoid data hazard stall
STA $200; was exchanged with below to avoid data hazard stall
SUBDEF
; another instruction from further down in the code needs to go here
STD $203

by scheduling the instructions, the compiler attempts to produce code for which there is one instruction per clock tick (per execution unit).