Insanity 4004: Packing worms into a can

Back a few years I threw the Verilog source code for the 4004 CPU at the Lattice toolchain and observed that it occupied "40% of the logic cells and one of the 16 block RAMs" of an iCE40-HX1K FPGA. From this I felt that using an iCE40-HX4K part, which is 2.75 times the size, would be plenty to implement the entire Busicom 141-PF calculator system. Of course it'd be even nicer if I could use the HX8K part, 6 times the size of the HX1K, but it's only available in a BGA.

While trying to get the comma positioning working in the VFD driver code I took a look at the resource utilizations for two potential implementations. What shocked me was not the difference, but the total size: 492 of the 1280 lookup tables (LUTs). That's 38%!

Once I caught my breath I asked myself how this compared with the Xilinx parts I've worked with. So I built the same Verilog code using Xilinx ISE 14.7, specifying a Spartan 6 chip; it used 331 LUTs. However, the Spartan 6 has 6-input LUTs, versus the 4-input LUTs of the iCE40. Trying to find a more apples-to-apples comparison I built it yet again, this time specifying the Spartan 3E chip which also has 4-input LUTs. The Spartan 3E used 320 LUTs. Advantage Xilinx, whether by better architecture, better synthesis, or both.

But what was using all these LUTs? I thought over my code and really couldn't see any real hogs. Most of the selectors in the case statements are 4-bit quantities, which should require only one LUT per output bit to implement. Looking at the Xilinx ISE Map report I noticed a section for "Utilization by Hierarchy." Excellent! I enabled the detailed Map report and reran the mapper. To my great surprise it is the Digilent Adept I/O Expansion interface that is taking up the majority of the LUTs: 205 of 331 in the LX9, and 157 of 320 in the S3E. (I don't know how to get the same statistics out of the Lattice version of Synplify Pro).

While it's certainly a relief to find my VFD driver isn't the hog I thought it was, it does point out the risk of underestimating resource utilization. The Adept interface doesn't look very complicated -- it's only about 100 lines of Verilog -- but it takes a lot of resources. For comparison, the entire 4004 CPU and one 4001 ROM, which I think of as complex, requires only 400 S3E LUTs.

So back to the choice of chips for the calculator. I found I was able to do more with ISE in a shorter time, though whether that's better integration or just greater familiarity I'm not sure. The Xilinx tools and chips seem be more efficient in their use of resources than the Lattice. Finally, I can get the Spartan 6 LX9 in a TQFP-144 package, while the iCE40-HX8K is only available in a 0.8mm-pitch BGA. Yes, mounting a BGA scares me, but more importantly it would be infeasible with this design unless I made the board with blind vias and that would really jack up the cost.

I think that settles the matter. The Lattice iCE40 series is quite reasonable to work with, if you're looking for something small, power efficient, or are willing to stick with commercial boards. The toolchain is quite acceptable, even if you do have to obtain a new (free) license every year. But for my specific needs, I think the Spartan 6 LX9 is the best fit.

Why not Altera? They make fine chips. The answer, simply put, is that I don't know Altera, I don't have any equipment to program them, and I just don't have time or interest in adding yet another variable. They do see to have quite a selection of chips in hobbyist-friendly (i.e. non-BGA) packages, though. Some other project, perhaps.

Monday, May 15, 2017

Packing worms into a can

No comments:

Post a Comment