Friday, July 17, 2020

Latch timing failure

I think I found the problem in the i4004 CPU instruction pointer incrementer. And it doesn't bode well for the latch-based implementation.

Here's the problem. The instruction pointer DRAM is configured as four rows of 12 bits, each row representing one of the four instruction pointers. The normal cycle is:
  1. Pre-charge the DRAM column sense lines.
  2. Read all 12 bits of the active IP into a 12-bit register.
  3. Gate the low-order 4 bits onto the data bus.
  4. Add 1 to the low 4 bits, saving the carry out.
  5. Update the low 4 bits of the register.
  6. Gate the middle 4 bits onto the data bus.
  7. Add the carry to the middle 4 bits, saving the carry out.
  8. Update the middle 4 bits of the register.
  9. Gate the high 4 bits onto the data bus.
  10. Add the carry to the high 4 bits.
  11. Update the high 4 bits of the register.
  12. Write the 12-bit register back to the active IP.
Here's what Step 5 looks like in behavioral simulation. On the bottom we have the least significant bit output of the adder. The short (400ns) pulse in the middle is the write enable for the low 4 bits of the 12-bit register. Above that is the LSB of the 12-bit register itself.

Next let's look at the post-map simulation with the same arrangement of signals. The LSB of the adder becomes a 1 much later than in the behavioral simulation, and goes back to a 0 much sooner. How much sooner?

Here's the same post-map simulation, zoomed in at the falling edge of the write-enable. The adder output falls 25ps (that's picoseconds, or 0.000000000025 seconds) before the gate enable goes inactive, but that's long enough for the latch to capture a zero rather than a one. Bummer.

The problem appears to be in the way I've coded the 12-bit temporary register. This register (really charges on MOSFET inverter gates in a real i4004 CPU) is written from three non-overlapping sources. Because of the way the circuitry is implemented, any of these three can set the register content without conflict.

In an FPGA, this is implemented using a mux to select an input source and a storage element to retain the value. Since I coded the input selection logic as implemented in the real i4004 CPU, the mux selectors and the latch gate are driven by the same signals. In this case, though, the mux output is changing 25 picoseconds before the latch gate has gone inactive, and the latch captures the wrong value. This is why most modern logic uses clocked flip-flops rather than latches.

The challenge I face is to separate the mux selectors from the latch gates such that the mux outputs are stable before and after the latch is enabled. This is turning out to be non-trivial.

No comments:

Post a Comment