For example, the way the i4002 distinguishes a register write from a register read is by monitoring the data bus during the M2 cycle and decoding the OPA portion instruction being fetched from the i4001 ROM. This isn't terribly complex logic but it requires a thorough understanding of the operation of the data bus, as this is not explained in the datasheets.
It also makes testing the i4002 chip complex, as the test module has to emulate the entire eight-phase bus transaction, including the instruction fetch operation. For this I've implemented a state machine that fetches, decodes, and executes several of the MCS-4 instructions. I'd test it with the full Verilog i4004 emulation, but that would run much slower in simulation.
Another thing I've been trying to understand how the Xilinx XST tools allocate resources when implementing distributed RAM.
My Busicom 141-PF emulation will require that I emulate two i4002 RAM chips. Each i4002 contains four 20-character by 4-bit "registers", plus an output port. To allow my Vacuum Fluorescent Display to continuously show the content of one of these registers (the "WR" register, stored in chip 0, register 1), the storage for that register must be instantiated in the Spartan 6 as a dual-port RAM. The others can be instantiated as single-port RAMs, which I assumed would save resources.
In the Spartan 6, a dual-port RAM is implemented using two LUTs. Each is written with the same write address and data inputs, but the read address and outputs are separate. A Spartan 6 lookup table (LUT) can hold 64 bits, and can be used in either a 64x1 or a 32x2 configuration as long as the input lines are common. I took these two statements and rather naively assumed that this would allow packing two bits of a single-port RAM into a single LUT. Apparently this is not true.
My latest Verilog implementation of a i4002 RAM chip assumes all of the registers require dual-port access and expose four pairs of address and data-out ports, one pair for each of the emulated registers. I'd previously considered various other schemes, but after some experimentation decided this was the smartest way to handle it.
I still wanted to understand how the Xilinx XST tool was allocating the resources, so I created a test module that instantiated two i4002 RAM modules. It also exported one dual-port address/data pair, connected to i4002 chip 0 register 1. This is the configuration I expect to use for the Busicom 141-PF implementation. I then used conditional compilation so I could easily build it with or without the dual-port interface.
Here's a table that makes it easy to compare the resource utilization of the two versions:
Slice LUT usage | No Dual-port | One Dual-port |
---|---|---|
Number used as Memory: | 32 | 32 |
Number used as Dual Port RAM: | 0 | 4 |
Number using O6 output only: | 0 | 0 |
Number using O5 output only: | 0 | 0 |
Number using O5 and O6: | 0 | 4 |
Number used as Single Port RAM: | 32 | 28 |
Number using O6 output only: | 32 | 28 |
Number using O5 output only: | 0 | 0 |
Number using O5 and O6: | 0 | 0 |
Number used as Shift Register: | 0 | 0 |
In both cases, the number of LUTs used is the same: 32. XST properly recognized that only one of the registers is being configured as dual-port. It appears that a dual-port RAM uses both outputs (O5 and O6) from the LUT, while the single-port RAM uses only one output (O6). This isn't what I expected at all, but I guess it makes sense.
One of the mistakes engineers sometimes make is to try to outsmart their tools. Sometimes you just have to let the tools do their thing and see whether the results are acceptable. There are 1,440 LUTs usable as RAM in a Spartan 6‑LX9, and this requires only 2.2% of those resources. I think that qualifies as "acceptable".
No comments:
Post a Comment