Sunday, July 19, 2020

More latching mux timing problems

As I mentioned in the previous post, my timing problems seemed to be related to the timing between the output of a mux and the latch intended to capture the output of the mux. Here's a simpler example.

This is the mux in the instruction pointer that determines whether the Effective Address Counter or the Refresh Counter is used to select the active DRAM row. Counting from the top left, the first two signals are the mux selector inputs: the subcycle X12 and X32 signals. The middle pair are the Refresh Counter outputs, and the right-most pair are the Effective Address Counter outputs.

During subcycles X12 and X22, a DRAM row is read and written back unmodified to refresh the row. Thus the Refresh Counter outputs are selected to drive the DRAM row decoder at the beginning of subcycle X12. For the rest of the instruction cycle the Effective Address counter needs to drive the DRAM row decoder, so it is selected at the beginning of subcycle X32.

Rather than have the selected counter actively drive the decoder continuously, this is a latching mux: when the selector signals are inactive, the previously-selected counter outputs are latched.

In an FPGA, this is implemented using a multiplexer followed by a flip-flop or latch. I coded this in Verilog as follows:
// Row selection mux
reg  [1:0]  row;
always @(*) begin
    if (x12)
        row <= addr_rfsh;
    if (x32)
        row <= addr_ptr;
end
This appears to work properly on the Spartan-3E. However, on the Spartan-6, the mux output sometimes changes before the latch captures the outputs. In this specific case, the simulator reports the input for row[1] latch changes 30ps before the gate goes inactive, while the row[0] input changes only 48ps after the latch gate goes inactive. Neither of these are determinate which makes this implementation unreliable.

This leaves me in a bit of a quandary. In the situation I described in the previous post, CLK2 was a common term in the mux/latch selectors, so I refactored the mux selectors to remove CLK2 and used that to gate the latch. In this case there isn't a common factor. I could use a flip-flop clocked on the falling edge of CLK1, but that's not a general solution to all instances of this problem.

The most reliable general solution I can think of is to go back to my original design, using edge-clocked flip-flops and a relatively fast system clock. While not quite as true to the i4004 design, the transparent latch model may not be feasible. I might even consider a hybrid design, where transparent latches are used only in selected paths.

I should probably trace through the original schematics and find the path that passes through the most latches in a single clock pulse. This will determine the minimum number of system clock edges per MCS-4 clock pulse and thus the system clock frequency. While the Spartan-6 will happily run with a clock frequency in the hundreds of megahertz, I'd like to keep the frequency as low as I can for power and dissipation reasons.

3 comments:

  1. Your project is very cool!

    Can you explain how your counter module works? I can't figure it out what it does. Can it go up or down?

    ReplyDelete
    Replies
    1. Thanks! It's always rewarding when someone finds value in my insanity. :-)

      The counter can only count up (or only down when using the inverted outputs). For the refresh counters, only one direction is needed. One might think that the IP row counter needs to go in both directions -- one way for a call (JMS) and the other for a return (BBL) -- but they solve this by incrementing the counter once for a JMS and three times for a BBL.

      A step-by-step analysis of the counter is in the April 26, 2020, article titled "Revisiting the i4004 counter": https://insanity4004.blogspot.com/2020/04/revisiting-i4004-counter.html

      Delete
  2. Thanks for the info! I've been working on a 4004 RTL simulator in Python, and your Verilog code is very useful in figuring out what's going on!

    ReplyDelete