Monday, March 15, 2021

A simulated Busicom 141-PF using Verilator?

Last summer I came up with the grand idea of mating my Verilog implementation of the Busicom 141-PF with a simulated keyboard and printer using the Verilog Procedural Interface (VPI) and the Icarus simulator. However, a quick experiment showed that the simulation would run far too slowly to be useful.

Today, while browsing through this blog it, occurred to me that there is a Verilog compiler called Verilator that converts synthesizable Verilog into multithreaded C++. The published benchmarks suggest the resulting program runs as much as 100 times faster than simulation, and can take advantage of multi-core CPUs. That's pretty much what I'd need to make this work.

Verilator supports the SystemVerilog Direct Programming Interface (DPI), and support for a very limited subset of VPI. I suspect I could implement the keyboard and printer interfaces using either.

Scoping the CCLK line

Given how much time I spent fretting over the layout of the configuration clock (CCLK) circuit (1, 2, 3, 4), it seemed strange that I hadn't gone back and checked what the signal actually looked like on the board.

Thursday, December 17, 2020

KiCad import of an Eagle project

Now that I've gotten the FPGA-based emulation of the i4004 CPU running, I thought I'd go back to the discrete component implementation. It would be nice to get that running for the 50th anniversary of the chip, and that's only 11 months away.

That project, though, was started using Eagle as a PCB CAD tool. I haven't used Eagle in about four years, having spent all my hobbyist efforts on KiCad. I could finish them using my perpetually-licensed Eagle 7 installation, but do I really want to? I'd have to relearn the UI, after spending so long with the KiCad UI.

I've long wondered whether it would be worth the effort to move the remaining four PCBs from Eagle to KiCad. Each of these PCBs have complete schematics occupying several Eagle schematic pages. I'd completed basic layouts, and started routing to some extent. I knew KiCad v5 had an Eagle import capability, but how would it handle this situation?

Frankly, I didn't hold out much hope for this working well enough to be practical.

With KiCad v6 on the horizon it seemed smarter to try that rather than play with v5. After updating my KiCad sources with the latest code I rebuilt and installed v6. Then I created a new KiCad project and started experimenting with the Eagle import feature.

It took a few tries to figure out how this is supposed to work. However, quite to my surprise, the results look very good. The schematics imported cleanly, including the custom symbols I'd created for the FDV-301 and BSS-83 MOSFETs. The inter-sheet connections were all properly translated to global labels (Eagle 7 has no concept of hierarchical sheets). The layouts and initial routing I'd done also came across nicely.

This might actually work!

Wednesday, July 29, 2020

Running the Busicom software in a Spartan-6

Eight years ago I decided I wanted to learn about programmable logic: PLAs, CPLDs, and FPGAs.

About the same time, I became aware that a group of engineers and computer history buffs had gotten Intel to release the schematics for first commercially-available microprocessor: the Intel 4004 CPU. They'd also retrieved the software that drove the first commercial product that used the i4004: the Busicom 141-PF.

I decided to re-create the i4004 CPU in an FPGA.

It turns out that's a lot like learning to swim by attempting to cross the English channel. But I've never been known to shy away from a challenge.

Sunday, July 26, 2020

An (unlatched) house of cards

While trying to understand why my latch-based implementation of the Busicom 141-PF calculator didn't work when implemented in a Spartan-6, I tested the i4001 ROM implementation separately. It seemed to work, so I focused on the i4004 CPU.

Now that I have the i4004 CPU switched back to using clocked flip-flops I tried a broader test. I instantiated one CPU, one ROM, and one RAM, and loaded the ROM with the basic functional test that is loaded by default into the i400x analyzer. This test starts with subroutine calls and returns, then checks conditional jumps before testing the ALU functions. My intent was to see whether there were enough flip-flop clock edges per CLK1 or CLK2 pulse for everything to propagate as needed. It's the path through the ALU I'm most worried about.

I first started with behavioral simulation, and noted that the first few instructions executed as expected. This gave me confidence to try a Post-P&R simulation, which failed. Unlike the original failure, though, where the address placed on the 4-bit data bus alternated between 000 and 001, the ROM address on the bus appeared correct. However, the address presented to the Block RAM containing the instructions was always zero. This suggested the i4001 ROM emulation wasn't working.

This made no sense to me. I'd tested the latch-based i4001 in both Post-P&R simulation and a real Spartan-6 and it seemed to work just fine, but when combined with other modules it appears to fail.

In my career as a professional software engineer I've sometimes encountered code whose author claimed didn't work because of "a bug in the optimizer". Now, I've found a couple of very real compiler bugs, but they're extremely rare in commonly-used compilers. This sort of problem is almost invariably caused by the programmer not understanding subtle details of the language, and not by bugs in the compiler.

With this in mind I decided to convert the i4001 back to using edge-clocked flip-flops; something I sort-of expected to do anyway but hadn't gotten to.

Of course the thing worked immediately. I think I'm done playing with the latch-based implementation of any of these chips.

As I've said before, the problem I have with the edge-clocked flip-flop version of these chips is that the original design often assumes data can flow through multiple latches during a single CLK1 or CLK2 pulse. Since data can only propagate through a flip-flop on a clock edge, there must be more clock edges during a CLK1 or CLK2 pulse than there are flip-flops in series.

An instruction cycle is divided into eight subcycles using 8-stage shift registers that produce one-hot outputs (meaning only one of the outputs is active at any one time, uniquely identifying the subcycle). The shift register in the i4004 is self-initializing, and produces a SYNC signal output that is used by the shift registers in the i4001 and i4002 to synchronize themselves to the one in the i4004.

Rather than duplicate this critical code in several places I'd extracted both the timing generator and the timing recovery logic into separate Verilog modules. This allows me to test them individually, and use them to test other modules.

One of my concerns was that using edge-clocked flip-flops would result in a one-clock skew in the timing between the generator and the recovery outputs. This would eat into the number of clock edges seen by the flip-flops within a CLK1 or CLK2 "clock" pulse, and result in data not arriving in time or in tristate output driver overlaps.

Subcycle timing signals change in response to CLK1 going active. In my current design, CLK1 (and CLK2) is the output of a flip-flop. The flip-flop that drives CLK1 changes state in response to a rising clock edge, and thus lags the clock edge by about a nanosecond. That means that the flip-flops in the subcycle shift register won't change state, because CLK1 hasn't yet changed when the rising clock edge occurs. Instead, the subcycle shift registers change state on the next rising clock edge, causing a delay of one clock cycle time.

This, in turn, means that the logic that depends on the subcycle timing signal won't change until the rising clock edge after that, or two clock edges after the one that caused the change in CLK1. That's two of my eight rising clock edges consumed already.

[Edit: It's actually only one clock edge. The combinational logic gets the entire period between the CLK1 pulse going active and the next clock edge to update, but the flip-flop doesn't act on the results until that clock edge occurs.]

See why I'm concerned about the possibility of a cycle of skew between the subcycle signals in the i4004 and other chips?

Fortunately, the shift registers in the i4001 and i4002 see the same timing relationship of as the shift register in the i4004. Thus the two shift registers should stay synchronized with each other.

But after a series of failures I'm done making assumptions. The behavioral simulation looked good, but what about the Post-P&R simulation?

These screen captures shows the Post-P&R simulation waveforms. We can clearly see that the generated and recovered subcycle signals are in sync. Zooming in on this shows they change on the same clock edge.

Just to avoid disappointment later, I generated a bit file from this test and loaded it into the Spartan-6 on my P170-DH replacement board. The results shown on my logic analyzer match the Post-P&R simulation. I'd post a screen capture of this too, but having a resolution of 2ns rather than the 1ps of the simulation it's actually less interesting than those above.

Tuesday, July 21, 2020

Another flaw in the original i4004?

While trying to count the number of latches a signal might need to pass during either the CLK1 or CLK2 pulses, I took a look at a signal named by the analyzer "M12+M22+CLK1~(M11+M21)".  I determined its function is to gate the internal data bus into the scratchpad register array and instruction pointer array, so I tend to refer to this signal as the "data gate". Other signals determine which, if any, of the array cells are written.

This seemed straight-forward enough until I started looking at how this signal is generated. Here's the logic as depicted on the original i4004 schematics: