Insanity 4004: i4001 ROM emulation refactoring

This weekend I worked on the i4001 emulation, bringing the quick hack I did years ago closer to the operation of the actual chip. While doing this I've been wrestling with two conflicting objectives: making the emulation conform as closely as possible to the real i4001, while also using the hardware resources available in the Spartan-6 FPGA efficiently.

The i4001 (and i4002) chips are like Frankenstein's monster, with unrelated pieces bolted together. The i4001 is a 256x8-bit ROM with a 4-bit I/O port tacked on for convenience. The I/O port shares little of the circuitry with the ROM: primarily the data bus buffers and the timing recovery flip-flops. Everything else is separate, sharing the same silicon to only reduce the number of physical chips a product designer would need to place on a PCB.

My initial i4001 emulation supported only a single 256x8-bit memory array with no I/O Port support. The Busicom 141-PF calculator used five i4001 chips, and instantiating five of these i4001 modules would also instantiate five separate 2K-bit ROM arrays. All five of these ROM arrays would fit within a single 16K-bit Block RAM on a Spartan-6 or Spartan-3E FPGA, but the Xilinx XST synthesizer has no way to merge them on its own. This makes very inefficient use of the FPGA's hardware resources. But how best to implement the ROM array using a single Block RAM while maintaining the appearance of five separate chips?

I thought about an implementation where a single module would be instantiated to emulate all the i4001 chips in a system. This would work nicely for the ROM portions but it made the I/O Port portions ugly. There's no mechanism I know of in Verilog to allow a variable number of module ports based on an instantiation parameter. That meant providing a single module port with a parameterized width to represent all the I/O ports. This would work, and I already use a generate loop to handle the individual I/O Port configuration (input/output, true/inverted, pullup/pulldown). But I just didn't like the idea: it was too far from the original implementation. I want to have a module instantiation for each i4001 chip.

The next idea was to separate the ROM and I/O Port portions into separate modules. I could instantiate one ROM module to represent all the ROMs, and instantiate an I/O Port module to represent each of the I/O Ports on a per-chip basis. I actually coded and tested this, but it still seemed like too much of a deviation from the original.

Yesterday I decided that the way to most closely emulate the i4001 chip was to separate the ROM array from the rest of the i4001 module. I added Verilog ports to the i4001 module for a 12-bit ROM address output port and an 8-bit ROM data input port, and recoded the ROM array access to use these ports as a shared bus. Then I created a simple i4001_rom module that instantiates the FPGA Block RAM by inference and handles its initialization via the $readmemh system task, requiring all of 20 lines of Verilog. I can now instantiate up to sixteen i4001 modules and they can share one i4001_rom module through a common ROM interface bus.

One could argue that this is inefficient use of FPGA resources because there are now multiple MCS-4 bus decoders when one would suffice for all emulated i4001 chips. However, XST is pretty good about optimizing such duplicated logic. For example, all the instantiations of the timing_recovery module get collapsed into one, since they all share the same inputs. Regardless, I'm happier with this, and that's what counts on a hobby project. As long as it fits.

To keep an eye on resource utilization, I've been updating a top-level Verilog module that instantiates the major pieces that would make up a Busicom 141-PF implementation. With five i4001s, two i4002s, and one i4004 instantiated, it uses 193 of the 1430 Spartan-6 LX9 slices. That's less than 14%, which leaves plenty of room for my keyboard, VFD, and printer interfaces.

Monday, May 11, 2020

i4001 ROM emulation refactoring

No comments:

Post a Comment