Why not? Because I noticed that the resource requirements for my latch-based i4002 RAM were off the charts: 550 slice registers! What happened?
It turns out that LUT RAM requires a clock edge to perform a write. This is noted in the documentation but I overlooked it. When I removed the posedge clause from always block, the toolchain recognizes that it can't use a LUT RAM for the array and switches to using slice registers. There are only four registers in a slice, so the slice requirements jump massively.
Here's the Verilog code, illustrating the one line that changes between inferring a write clock or not:
All the other logic in the i4002 emulation uses latched registers rather than clocked flip-flops. The two assign statements at the end of the code block show this supports dual-port access to the emulated register. Here's a comparison of the resource requirements:reg [3:0] ram_array [0:(RAM_ARRAY_SIZE-1)]; `ifdef CLOCKED_LUTRAM always @(posedge sysclk) begin `else always @(*) begin `endif if (write) begin ram_array[addr] <= data_in; end end assign data_out = ram_array[addr]; assign data2_out = ram_array[addr2];
Resource | Latched | Clocked |
---|---|---|
Occupied slices | 204 (14.3%) | 34 (2.4%) |
Slice Registers | 550 (4.8%) | 38 (0.3%) |
Slice LUTs as Logic | 357 (6.2%) | 20 (0.3%) |
Slice LUTs as Memory | 0 (0.0%) | 16 (1.1%) |
What about the i4001 ROM? My intent has always been to put the ROM array into a Block RAM. The standard Busicom calculator had four i4001 ROMs, providing 1024 bytes of code storage. However, the version with the optional square root support required a fifth ROM for a total of 1280 bytes. The easiest way for me to support that is to use one of the Spartan-6's 18K-bit Block RAMs in a 8-bit by 2048 configuration. I haven't decided exactly how to have five i4001 instantiations share a single BRAM, but I have a pretty firm idea.
For the purposes of this test, though, I simply hacked my 256-byte i4001 emulation to expand the ROM depth to 2048 bytes. Again, here's the Verilog code, illustrating the one line that changes between inferring a read clock or not:
The comparison is a bit more tricky than with the i4002 RAM, because the toolchain knows about the contents of the ROM and optimizes the non-BRAM version a bit. For example, the last 768 bytes of the array are known to be zero. But here's a comparison of the resource requirements using the Busicom software including the square root code anyway:reg [7:0] rom_array [0:2047]; reg [7:0] rom_temp; `ifdef CLOCKED_BRAM always @(posedge sysclk) begin `else always @(*) begin `endif rom_temp <= rom_array[rom_addr]; end
Resource | Latched | Clocked |
---|---|---|
Occupied slices | 100 (7.0%) | 14 (1.0%) |
Slice Registers | 18 (0.2%) | 18 (0.2%) |
Slice LUTs as Logic | 226 (4.0%) | 5 (0.1%) |
16K-bit Block RAMs | 0 (0.0%) | 1 (3.1%) |
No comments:
Post a Comment