The last time I posted on interfacing a VFD with the Busicom emulation I commented that "All I need to do is connect the WR register in an emulated 4002 to my VFD driver logic. […] I can peek into the internal circuits and use the contents stored
there." The Verilog I wrote to test the VFD assumed this would be true. The effects of this can be seen in the surprisingly high resource requirements I noted in my post Packing worms into a can, though I didn't realize it at the time.
In software, the emphasis is on abstracting the implementation away from the underlying hardware. This was true when I started back in the 1970s, and the emphasis has only gotten greater as years have passed. It's so extreme now that many recent Computer Science graduates no longer understand anything about cache line utilization or translation lookaside buffer churn. I owe much of my professional career to bucking this trend and understanding how software interacts with hardware in great detail. But here I neglected to consider the hardware implications of this design decision.
Let's think about how the WR register in an emulated i4002 RAM chip will be implemented in an FPGA. If the WR register is constructed of individual flip-flops my original statement would be correct: the input mux to the VFD could just read the outputs of these flops. But such an implementation would be highly inefficient in terms of FPGA resource utilization. There are eight flip-flops in each Spartan 6 slice, so storing the 640 bits in two i4002 RAMs would require 80 slices of the 1430 available just for data storage. A better design would be to put the emulated registers in some sort of FPGA memory.
The Spartan family supports two such memory types: Block RAM and Distributed RAM. The Spartan 6 LX9 FPGA I'm using has 32 Block RAMs, each of which is an array of 18K bits that can be configured in a number of ways. I'll probably construct the 1280-word i4001 ROM array using one of these in a 2Kx8 configuration (62.5% utilization). But for the two i4002 RAM chips using a Block RAM would be overkill (<4% utilization).
Distributed RAM in the Spartan 6 is assembled using the storage in the 6-input lookup tables (LUTs) in a SLICEM type slice. These LUTs can be used in 64x1 or 32x2 configurations. The simplest implementation would be to use two of these in 32x2 mode to represent each of the four 20x4 registers in the i4002. Two emulated i4002 would thus require 16 LUTs. A SLICEM contains four such LUTs, so two i4002s would occupy four of the 357 available SLICEMs in this FPGA.
Another implementation would be to put the eight 16x4 "registers" of "main memory" storage in one RAM configured as 128x4, and the eight sets of 4x4 "status memory" storage in another RAM configured as 32x4. This cuts the resource usage for two emulated i4002s to 10 LUTs, or 2.5 SLICEMs.
However, both of these memory types share a common attribute: to read data you present an address, tickle the control lines, and the data for that address appears on the output port. Yet both the i4002 emulation and the VFD driver needs to access the same data asynchronously with respect to each other. One option is some sort of arbitrator; another is to configure the RAM in dual-port mode. With dual ports, the i4002 emulation reads and writes the RAM through one port while the VFD reads it via the second, read-only port.
There are two problems with using dual-port mode. While Block RAMs natively support dual R/W ports, multi-port configuration of Distributed RAM uses double the number of LUTs, with one LUT for each of the read ports. Realistically, 5 or even 8 SLICEMs is probably an acceptable cost, and I might be able to arrange it such that only the RAMs used to represent the WR register are dual-port. The problem then becomes the logic I developed for suppressing leading zeros on the display, which depends on having simultaneous access to all 20 characters of the WR register. I'll have to redesign and test the entire VFD driver firmware subsystem to fix that.
An alternative is to implement the VFD using bus snooping. Basically the VFD would monitor the bus for writes to the addresses representing the WR register and the sign and decimal position status characters and capture these values as they fly by. This keeps the VFD logic separate from the i4002 emulation at the cost of having to decode the bus transactions for several instructions (SRC, WRM, WR0, WR1). I'd also have to allocate storage for a duplicate of the WR register. I'll have to look at this, but I expect it will be cheaper in resource usage, and possibly simpler, to redesign the display logic to access the WR register as dual-port RAM.
No comments:
Post a Comment