Tuesday, April 28, 2020

Choosing the right FPGA for the wrong reasons

I finished changing my i4004 emulation from using edge-clocked flip-flops to level-sensitive data latches. I haven't even tried simulating it yet, but just for fun I ran it through the Xilinx toolchain to see what sort of resources it required. As I suspected it's quite small, occupying all or part of 9% of the available slices in the Spartan-6 LX9 FPGA.

With that in mind, I got to wondering if this would have fit into a Lattice iCE40. I'd touched on this a few years ago in the post Packing worms into a can, but I don't think I ever tried running the i4004 CPU emulation through the Lattice iCEcube2 toolchain. Today I did, and the results of that were eye-opening in ways I didn't expect.

There are many differences between the Spartan-6 and the iCE40, the most notable being that the S6 has 6-input look-up tables while the iCE40 has 4-input LUTs. Today I discovered that the flip-flops in the iCE40 cannot be used as transparent data latches. This is a feature of the Spartan-6 and other Xilinx FPGAs, including the Spartan-3e on the reference board I use to drive my hardware re-creation. It is supported in some other Lattice FPGAs, but apparently not by the iCE40.

[Note: Per the Xilinx FPGA CLB User Guide (UG384), each Spartan-6 slice contains eight storage elements. Four can be used as either a level-sensitive latch or as an edge-clocked D flip-flop. The other four can only be used as an edge-clocked flip-flop.] 

How much would this affect the resource usage when emulating the i4004 and associated chips in an iCE40 FPGA? I ran the code that used edge-clocked flip-flops through the same iCEcube2 toolchain with the same options. Here's a comparison using the iCE40-HX4K FPGA:

iCE40 Resource Edge-Clocked FF Data Latches
Logic blocks (PLB) 80 of 440 (18.2%) 99 of 440 (22.5%)
Lookup tables (LUTs) 541 of 3520 (15.4%) 714 of 3520 (20.3%)
Storage elements (FFs) 187 of 3520 (5.3%) 0 of 3520 (0.0%)

Without hardware support for level-sensitive data latches, I can only surmise that they are implemented as LUTs with their outputs feeding back to their inputs. This would account for the high LUT usage in the iCE40. I don't know if this would work reliably, and I suspect if I'd used the iCE40-HX4K on my P170-DH replacement board this would have forced me back to using edge-clocked flip-flops.

How does this compare with the Spartan-6 I did use? A head-to-head comparison is difficult due to the different architectures and the way the tools report the data. To make comparisons more apples-to-apples I've used the Spartan-6 LX4, rather than the LX9 I actually placed on my board:

FPGA Resource Spartan-6 LX4 iCE40-HX4K
Logic blocks (Slices/PLBs) 137 of 600 (22.8%) 99 of 440 (22.5%)
Lookup tables (LUTs) 254 of 2400 (10.5%) 714 of 3520 (20.3%)
Storage elements (FFs) 234 of 4800 (4.9%) 0 of 3520 (0.0%)

The Spartan-6 LX9 has more than double (2.38x) the number of slices of the LX4. Although the iCE40-HX8K has bit more than double (2.18x) the number of programmable logic blocks (PLBs) of the HX4K, it is only available as a fine-pitch BGA which made it unusable for me.

In the Spartan-6, about 25% of the LUTs can instead be used as a 64-bit RAM. The Xilinx tools chose not to place the 12x4 instruction pointer array or the 8x8 scratch pad array in LUT RAM; I expect these are too shallow to make efficient use of a LUT RAM. [See the next post for an explanation.] However, the four 20x4 RAM arrays in each of the two i4002 RAM chips certainly do end up in LUT RAM, a feature not available in the iCE40.

If I'd used the iCE40-HX4K I probably would have structured the i4002 RAM chip emulation to share a single Block RAM, as I plan to do with the S6 emulation of the i4001 ROM chip. However, the iCE40 BRAMs have only one write and one read port, which would have made using the WR register as the source for the Vacuum Fluorescent Display more difficult.

I chose the Xilinx Spartan-6 LX9 FPGA over the Lattice iCE40-HX4K primarily for its size and my familiarity with the tools. It was a good choice, but I should have also considered features such as the availability of LUT RAM and the ability to use flip-flops as latches. These are turning out to make a significant difference in resource requirements.

No comments:

Post a Comment