Sunday, April 26, 2020

Revisiting the i4004 counter

When I first started this project (has it really been eight years??), one of my first experiments was to build a two-bit counter based on the i4004 schematic on a solderless breadboard. I wrote about this in a post entitled Breadboarding a 2-bit counter. It became my eighth posting -- there are now over 250 and growing.

When I originally coded the i4004 emulation, I faced the question of how to emulate the FETs used as transmission gates. Many of these are used to temporarily latch a value. Having little experience with Verilog and FPGAs, and not really understanding how the dynamic logic in the i4004 worked, I coded them using Verilog constructs which infer edge-clocked flip-flops. For comparison, here is the code for an edge-clocked flip-flop with a clock enable (ce), and for a transparent latch with a gate enable (ge):
Edge-clocked flip-flop Transparent latch
reg q;
always @(posedge clk) begin
    if (ce)
        q <= d;
end
reg q;
always @(*) begin
    if (ge)
        q <= d;
end

The problem with using edge-clocked flip-flops is when there are n of these in a row, the output does not propagate to the end until there have been n clock cycles. The real i4004, which uses transmission gates, does not have this latency. Thus if a section of logic has six of these in a row, the synchronous system clock must clock at least 6 times for each i4004 clock pulse.

For this reason I want to replace the edge-clocked flip-flops in my Verilog emulation of the i4004 with transparent latches to eliminate this requirement.

My first attempt at replacing edge-clocked flip-flops with transparent latches was with the i4004's instruction phase generator. This can be thought of as a series of eight Master/Slave flip-flops forming an 8-bit shift register. The D input of each MS FF is clocked into the master by CLK2, and from the master into the slave by CLK1. The output of the slave is the Q output, which is fed into the next stage's D input.

Here's a comparison of the Verilog required to create such a master/slave flip-flop with edge-clocked flip-flops and with transparent latches:
Edge-clocked flip-flops Transparent latches
reg m, q;
always @(posedge sysclk) begin
    if (clk2)
        m <= d;
    if (clk1)
        q <= m;
end
reg m, q;
always @(*) begin
    if (clk2)
        m <= d;
    if (clk1)
        q <= m;
end

When testing the i4002 RAM emulation my testbench used edge-clocked flip-flops to generate its instruction phase timing, while my i4002 emulation used transparent latches. This resulted in the testbench's phase being one system clock period (50ns) behind the i4002's phases.

I thought I'd start with something small like the counter. Here's a screen shot of such a counter taken from the i4002 RAM refresh counter circuit.  The inputs are on the lower left. The output of the counter is at the top, and the carry circuits to the next stage are at the lower right. In the i4002 RAM there are five of these circuits cascaded to produce the five-bit refresh row counter.

To pick this apart, let's start with the two middle FETs that are back to back. These form a pair of inverters with outputs cross-linked to inputs: a classic bi-stable configuration. The stacked FETs outside of these form an AND gate, and connecting them to the cross-linked transistors turns them from inverters to NOR gates. The top two FETs are inverting output drivers.

To step the counter forward, the upper control line is brought high. This allows the state of the outputs to be gated onto the lower FETs of the AND gates. This is the equivalent of clocking the Master FF. When this control line is brought low the gates of these FETs will hold their state long enough for the next cycle to occur.

This dependence on undriven FET gates holding their state is characteristic of dynamic logic, but it results in one of the few known design flaws in the i4004. This type of counter is used to determine which of the four instruction pointers is used to fetch instructions. If the software does not execute a call (JMS) or return (BBL) instruction for too long, the counter suffers bit-rot and may randomly switch to the wrong IP when the software finally does. The workaround was to call a subroutine frequently enough that the counter changed state before bit-rot set in.

To finish the step, the upper control line low and the lower control line is brought high. This causes the cross-connected FET flip-flop outputs to switch state. Unlike the Master flip-flop, whose state is subject to corruption if the counter doesn't change state, this value is stable.

My original Verilog implementation of the counter using edge-clocked flip-flops didn't match the behavior of the i4004 counter because I failed to understand how the counter worked in detail: it changed the outputs when the Master FF was updated rather than . Changing the code to use transparent latches resulted in the testbench failing: the first stage counter changed state when it shouldn't have.

After several hours I tracked that problem to the testbench itself. I'd taken a shortcut in the testbench that didn't faithfully generate the control signals. Even though the waveform display never showed the lower (slave) step enable line going active, it was, in fact, active for long enough that the simulator clocked the slave FF. Making the testbench more accurate addressed this issue.

Here's a comparison of the old and new counter implementations using behavioral Verilog:
Edge-clocked flip-flops Transparent latches
reg q, q_n;
always @(posedge sysclk) begin
    if (step_a) q <= ~q_n;
    if (step_b) q_n <= q;
end
reg master, slave;
always @(*) begin
    if (step_a) master <= ~slave;
    if (step_b) slave  <= master;
end

What I still don't understand is why the counter sections use internal signals to connect to the next higher counter section. Why not use the outputs that drive external circuits?

Two possibilities come to mind: loading and latency. The inverters that drive external (non-counter) circuits isolate the counter circuits from the external circuit loading, at the cost of additional latency. Maybe there was a concern that the extra latency would cause the ripple up through the counter stages to take long enough to cause timing problems? The i4004 CPU uses 2-bit and 3-bit counters, while the i4002 RAM uses a 5-bit counter. I coded a structural implementation of the counter so I could add latencies, but it's a ripple counter and I don't see any timing issues within the counters themselves. Without knowing a lot more about the characteristics of the chip itself I don't think I have the information to speculate further.

No comments:

Post a Comment