It's the use of a transmission gate as a latch that caused me the most trouble. There's an example on the left. The input is the red ("high" or "1") coming in from the bottom. It connects to a FET configured as a transmission gate. If the gate was "open", the input would pass through and drive the other side of the gate to the same value. Here the gate is "closed" because the FET's Gate lead is low (blue). Thus the high on the input does not pass through.
So what's the level on the other side of the transmission gate? It's whatever level was present the last time the transmission gate was open. In this example it's a "floating low", depicted in a faint blue color. Thus the transmission gate is acting as a latch.
Why is this a problem to implement in an FPGA? Latch primitives are available, and can be inferred in behavioral code. The problem is that there are cases where the input signals and the selection signals are derived from the same sources. When I tried to implement the DRAM logic using latches, the data input to the latch was the output of a multiplexer, and there was a race condition between the deassertion of the latch enable and the select inputs to the mux. The only way to get it to work in simulation was to carefully structure delays in the signals, and that's not practical in real life.
The most robust solution is to use clocked flip-flops rather than latches. This is such a common problem that even the Verilog compiler issues warnings against using latches. The benefit of using clocked flip-flops is that if all signals change in response to the clock, none will have changed before the inputs are captured. The problem is that rather than having the results flow through all stages of logic at full speed, they can only move one stage per clock. My problem was two-fold: I wanted to be as close to the original as possible, and I didn't know how many stages of clocked logic there were between input and output.
Eventually I threw in the towel. My current iteration uses the 50 MHz oscillator on the Spartan-3E reference board as a system clock. This means there are 68 rising clock edges in each of the 8 execution phases, and 19 in each 380ns minimum-width i4004 "clock" pulse. I haven't counted the maximum number of sequential stages yet, but my impression is it's less than 6, and probably much less. This will result in a design that should work on any FPGA, at the cost of requiring a system clock several times faster than the i4004's "clocks".
No comments:
Post a Comment