Insanity 4004: 2020

Thursday, December 17, 2020

KiCad import of an Eagle project

Now that I've gotten the FPGA-based emulation of the i4004 CPU running, I thought I'd go back to the discrete component implementation. It would be nice to get that running for the 50th anniversary of the chip, and that's only 11 months away.

That project, though, was started using Eagle as a PCB CAD tool. I haven't used Eagle in about four years, having spent all my hobbyist efforts on KiCad. I could finish them using my perpetually-licensed Eagle 7 installation, but do I really want to? I'd have to relearn the UI, after spending so long with the KiCad UI.

I've long wondered whether it would be worth the effort to move the remaining four PCBs from Eagle to KiCad. Each of these PCBs have complete schematics occupying several Eagle schematic pages. I'd completed basic layouts, and started routing to some extent. I knew KiCad v5 had an Eagle import capability, but how would it handle this situation?

Frankly, I didn't hold out much hope for this working well enough to be practical.

With KiCad v6 on the horizon it seemed smarter to try that rather than play with v5. After updating my KiCad sources with the latest code I rebuilt and installed v6. Then I created a new KiCad project and started experimenting with the Eagle import feature.

It took a few tries to figure out how this is supposed to work. However, quite to my surprise, the results look very good. The schematics imported cleanly, including the custom symbols I'd created for the FDV-301 and BSS-83 MOSFETs. The inter-sheet connections were all properly translated to global labels (Eagle 7 has no concept of hierarchical sheets). The layouts and initial routing I'd done also came across nicely.

This might actually work!

Friday, August 28, 2020

My small contribution to KiCad v6

It's been merged.

Running the Busicom software in a Spartan-6

Eight years ago I decided I wanted to learn about programmable logic: PLAs, CPLDs, and FPGAs.

About the same time, I became aware that a group of engineers and computer history buffs had gotten Intel to release the schematics for first commercially-available microprocessor: the Intel 4004 CPU. They'd also retrieved the software that drove the first commercial product that used the i4004: the Busicom 141-PF.

I decided to re-create the i4004 CPU in an FPGA.

It turns out that's a lot like learning to swim by attempting to cross the English channel. But I've never been known to shy away from a challenge.

An (unlatched) house of cards

While trying to understand why my latch-based implementation of the Busicom 141-PF calculator didn't work when implemented in a Spartan-6, I tested the i4001 ROM implementation separately. It seemed to work, so I focused on the i4004 CPU.

Now that I have the i4004 CPU switched back to using clocked flip-flops I tried a broader test. I instantiated one CPU, one ROM, and one RAM, and loaded the ROM with the basic functional test that is loaded by default into the i400x analyzer. This test starts with subroutine calls and returns, then checks conditional jumps before testing the ALU functions. My intent was to see whether there were enough flip-flop clock edges per CLK1 or CLK2 pulse for everything to propagate as needed. It's the path through the ALU I'm most worried about.

I first started with behavioral simulation, and noted that the first few instructions executed as expected. This gave me confidence to try a Post-P&R simulation, which failed. Unlike the original failure, though, where the address placed on the 4-bit data bus alternated between 000 and 001, the ROM address on the bus appeared correct. However, the address presented to the Block RAM containing the instructions was always zero. This suggested the i4001 ROM emulation wasn't working.

This made no sense to me. I'd tested the latch-based i4001 in both Post-P&R simulation and a real Spartan-6 and it seemed to work just fine, but when combined with other modules it appears to fail.

In my career as a professional software engineer I've sometimes encountered code whose author claimed didn't work because of "a bug in the optimizer". Now, I've found a couple of very real compiler bugs, but they're extremely rare in commonly-used compilers. This sort of problem is almost invariably caused by the programmer not understanding subtle details of the language, and not by bugs in the compiler.

With this in mind I decided to convert the i4001 back to using edge-clocked flip-flops; something I sort-of expected to do anyway but hadn't gotten to.

Of course the thing worked immediately. I think I'm done playing with the latch-based implementation of any of these chips.

As I've said before, the problem I have with the edge-clocked flip-flop version of these chips is that the original design often assumes data can flow through multiple latches during a single CLK1 or CLK2 pulse. Since data can only propagate through a flip-flop on a clock edge, there must be more clock edges during a CLK1 or CLK2 pulse than there are flip-flops in series.

An instruction cycle is divided into eight subcycles using 8-stage shift registers that produce one-hot outputs (meaning only one of the outputs is active at any one time, uniquely identifying the subcycle). The shift register in the i4004 is self-initializing, and produces a SYNC signal output that is used by the shift registers in the i4001 and i4002 to synchronize themselves to the one in the i4004.

Rather than duplicate this critical code in several places I'd extracted both the timing generator and the timing recovery logic into separate Verilog modules. This allows me to test them individually, and use them to test other modules.

One of my concerns was that using edge-clocked flip-flops would result in a one-clock skew in the timing between the generator and the recovery outputs. This would eat into the number of clock edges seen by the flip-flops within a CLK1 or CLK2 "clock" pulse, and result in data not arriving in time or in tristate output driver overlaps.

Subcycle timing signals change in response to CLK1 going active. In my current design, CLK1 (and CLK2) is the output of a flip-flop. The flip-flop that drives CLK1 changes state in response to a rising clock edge, and thus lags the clock edge by about a nanosecond. That means that the flip-flops in the subcycle shift register won't change state, because CLK1 hasn't yet changed when the rising clock edge occurs. Instead, the subcycle shift registers change state on the next rising clock edge, causing a delay of one clock cycle time.

This, in turn, means that the logic that depends on the subcycle timing signal won't change until the rising clock edge after that, or two clock edges after the one that caused the change in CLK1. That's two of my eight rising clock edges consumed already.

[Edit: It's actually only one clock edge. The combinational logic gets the entire period between the CLK1 pulse going active and the next clock edge to update, but the flip-flop doesn't act on the results until that clock edge occurs.]

See why I'm concerned about the possibility of a cycle of skew between the subcycle signals in the i4004 and other chips?

Fortunately, the shift registers in the i4001 and i4002 see the same timing relationship of as the shift register in the i4004. Thus the two shift registers should stay synchronized with each other.

But after a series of failures I'm done making assumptions. The behavioral simulation looked good, but what about the Post-P&R simulation?

These screen captures shows the Post-P&R simulation waveforms. We can clearly see that the generated and recovered subcycle signals are in sync. Zooming in on this shows they change on the same clock edge.

Just to avoid disappointment later, I generated a bit file from this test and loaded it into the Spartan-6 on my P170-DH replacement board. The results shown on my logic analyzer match the Post-P&R simulation. I'd post a screen capture of this too, but having a resolution of 2ns rather than the 1ps of the simulation it's actually less interesting than those above.

Tuesday, July 21, 2020

Another flaw in the original i4004?

While trying to count the number of latches a signal might need to pass during either the CLK1 or CLK2 pulses, I took a look at a signal named by the analyzer "M12+M22+CLK1~(M11+M21)". I determined its function is to gate the internal data bus into the scratchpad register array and instruction pointer array, so I tend to refer to this signal as the "data gate". Other signals determine which, if any, of the array cells are written.

This seemed straight-forward enough until I started looking at how this signal is generated. Here's the logic as depicted on the original i4004 schematics:

Latch timing failure

I think I found the problem in the i4004 CPU instruction pointer incrementer. And it doesn't bode well for the latch-based implementation.

Here's the problem. The instruction pointer DRAM is configured as four rows of 12 bits, each row representing one of the four instruction pointers. The normal cycle is:

Pre-charge the DRAM column sense lines.
Read all 12 bits of the active IP into a 12-bit register.
Gate the low-order 4 bits onto the data bus.
Add 1 to the low 4 bits, saving the carry out.
Update the low 4 bits of the register.
Gate the middle 4 bits onto the data bus.
Add the carry to the middle 4 bits, saving the carry out.
Update the middle 4 bits of the register.
Gate the high 4 bits onto the data bus.
Add the carry to the high 4 bits.
Update the high 4 bits of the register.
Write the 12-bit register back to the active IP.

Here's what Step 5 looks like in behavioral simulation. On the bottom we have the least significant bit output of the adder. The short (400ns) pulse in the middle is the write enable for the low 4 bits of the 12-bit register. Above that is the LSB of the 12-bit register itself.

Next let's look at the post-map simulation with the same arrangement of signals. The LSB of the adder becomes a 1 much later than in the behavioral simulation, and goes back to a 0 much sooner. How much sooner?

Here's the same post-map simulation, zoomed in at the falling edge of the write-enable. The adder output falls 25ps (that's picoseconds, or 0.000000000025 seconds) before the gate enable goes inactive, but that's long enough for the latch to capture a zero rather than a one. Bummer.

The problem appears to be in the way I've coded the 12-bit temporary register. This register (really charges on MOSFET inverter gates in a real i4004 CPU) is written from three non-overlapping sources. Because of the way the circuitry is implemented, any of these three can set the register content without conflict.

In an FPGA, this is implemented using a mux to select an input source and a storage element to retain the value. Since I coded the input selection logic as implemented in the real i4004 CPU, the mux selectors and the latch gate are driven by the same signals. In this case, though, the mux output is changing 25 picoseconds before the latch gate has gone inactive, and the latch captures the wrong value. This is why most modern logic uses clocked flip-flops rather than latches.

The challenge I face is to separate the mux selectors from the latch gates such that the mux outputs are stable before and after the latch is enabled. This is turning out to be non-trivial.

Thursday, July 16, 2020

Xilinx ISE iSim simulation modes

I've been doing some research into the various levels of simulation available using the Xilinx ISE toolchain and simulator. All of my simulation to this point had been at the behavioral level. I knew it was possible to simulate ASICs that had been designed using Verilog at the component level, but I'd never had need to look into this as behavioral simulation had been sufficient.

My research showed that using the Xilinx ISE toolchains, a design can be simulated in any of five stages along the way from HDL to loadable bit file:

Fixing combinational loops

As part of the conversion from edge-clocked flip-flops to transparent data latches I re-coded my i4004 counter module. This resulted in several warnings about combinational loops, but those warnings were buried in a sea of warnings about my use of data latches. It seemed to work well enough when implemented for the Spartan-3E, and I assumed it would work well enough on the Spartan-6.

I was wrong.

More bugs in the i4004 CPU emulation

I knew there was at least one lingering bug in the emulated i4004 CPU because I didn't see the contents of the OPA "register" output on the data bus during the X1 subcycle. But apparently the problems are bigger than that.

In my all-in test of the Busicom 141‑PF reconstruction, I noted several problems. To try and isolate the problem to one of the four emulated MCS-4 chips I decided I needed to test each of the emulated chips separately, starting with the i4001 ROM.

Earlier today I cleaned up the VFD driver and the keypad remapping code so it was consistent with the rest of this project's code, and updated the top module for my P170-DH rebuild. Behavioral simulations ran fine, and synthesis and implementation gave only the expected warnings, so I decided to load it into the hardware and see how it ran.

At first I was pretty happy. The VFD displayed a single zero in the right-most digit, with the decimal point illuminated to just the right of the zero. The VFD driver reads the WR register, which is stored in RAM 0 register 1, via a dual-port interface to the RAM. Since an i4002 RAM clears itself to zeros while the reset input is asserted, this is what it should display. Unfortunately, that was the last part that worked as expected.

First-Word Fall-Through FIFOs

Basic AXI Handshake protocol

Sometimes an engineer will see a technique on a website or in a document and think, "Wow! That's a good way to do that!" Such is what happened when I came across the AXI style handshake described in this Wikipedia article. I'd been looking for a good way for the M-32TL translator and printer modules to synchronize, and this seemed like a simple and well-tested mechanism.

This led me to consider the interface on the other side of the translator module, which is intended to talk to a FIFO. When I'd originally coded the translator I used a standard FIFO generated using ISE Coregen, figuring that was the best way to make sure the FIFO worked properly and was implemented efficiently. After discovering the AXI handshake I decided to re-implement it to use a First-Word Fall-Through FIFO because that allowed me to use the AXI handshake on both sides. Again I used Coregen.

If I wanted to run simulations using Icarus Verilog, though, I needed an alternative to a Coregen-constructed FIFO. Then I noticed that the Coregen FIFO used 27 slices for a 64 by 4-bit, common-clock FIFO. And I started looking at other FIFO implementations.

Translating EP102 codes to M-32TL codes

After recoding my EP102 printer emulation to send a stream of print drum sector numbers into the print buffer FIFO, I needed to develop some sort of translator to take those sector numbers and convert them to M-32TL print wheel sector numbers that the M-32TL printer driver could swallow.

This turned out to be harder than I expected.

An experiment with Icarus Verilog

Something I've given serious consideration is creating a simulated Busicom 141-PF using the Verilog Procedural Interface (VPI, aka PLI) to interface the simulated system with the outside world.

While many Verilog simulators support VPI, the Xilinx iSim simulator is not one of them. However, Icarus Verilog does, and as engineers do I started gathering data and doing experiments.

Timing diagram applications

I'm old-school enough that I draw most of my timing diagrams with graph paper, a pencil and a big eraser. I knew what I wanted the timing to look like for the module that translates EP102 printer codes to M-32TL printer codes, but this time I thought it might be nice to find an application to draw them.

In general I favor open-source apps, but I'm willing to consider something closed-source if the license terms are good. Here are the applications I found worthy of note.

Reworking the EP102 emulation

A few months ago I designed, coded, and started testing a Verilog module to emulate the EP102 printer found in the Busicom 141-PF calculator. I wrote a bit about it in the post EP-102 to M-32TL Glue logic

The EP102 is a drum printer capable of printing one of 13 characters in each of 17 columns. When a row of characters is properly aligned for printing, the printer sends a "sector" signal to the computer. To allow the printer to identify which row is aligned for printing, a second "index" signal is sent when the first row is aligned. To print a given character, the computer waits until it sees the sector signal indicating that character is aligned for printing, then sends a signal to the printer that causes a mechanical hammer to press the paper against the drum. It thus takes one full rotation of the drum to print all the columns of that line.

My first task was to code a module that generates the print drum index and sector signals for the calculator, while tracking the active sector number. Since the sector number ranges from 0 to 12, it's represented by a 4-bit value.

First trial of the Busicom calculator software

I've been slowly assembling a Verilog top module that will instantiate an emulation of the Busicom 141-PF calculator in the Spartan-6 LX9 FPGA in my rebuild Canon P170-DH calculator. With the emulation of all four required MCS-4 chips appearing to work properly, it seemed time to test it out.

The MCS-4 chips required for this are:

Five i4001 256 x 8-bit ROM with a 4-bit I/O port
Two i4002 20 x 4-bit RAM with a 4-bit Output port
Three i4003 10-bit serial-in, parallel-out Shift Register
One i4004 Central Processing Unit

To this I've added a clock driver that generates the two-phase clocks required by the MCS-4 chips.

The i4003 Shift Register

Intel 4003 shift register

The last of the original MCS-4 chips I need to emulate for the Busicom 141-PF re-creation is the i4003 shift register. I expected this to be simple: it's a 10-bit shift register with a clock input, a serial input, a serial output, 10 parallel outputs, and a parallel output enable input. How complex could this be?

In my mind I was picturing one Verilog always block for the shift register and a continuous assignment statement for the output enables. Four lines of code plus the module definition.

As it turns, the i4003 shift register is a more complicated chip than I expected.

i4001 data bus output timing

The next thing I wanted to try was to run the MCS-4 Digital Clock using the Instruction Pointer board I built in 2012.

In theory, all I should need to do is pull the Verilog IP module out of the clock project and replace it with the interface to the IP board. When I tried it, though, it didn't work.

XST conditional synthesis with Verilog generate

One of the purposes of creating my MCS-4 Digital Clock was to come up with something that would run on my hybrid debugging setup and give a clear indication of proper operation. This setup is the Digilent Spartan-3E Starter reference board connected to an expansion card that serves as a carrier for my i4004 CPU boards. This allows me to run the as-yet unimplemented parts of the i4004 CPU in the Spartan-3E FPGA.

An MCS-4 Digital Clock

Who needs yet another digital clock? Haven't we all done this before?

I did something like this in PIC-16 assembly language when I first connected my Microchip PICDEM-2 reference board to an LCD in 1999. (Anyone need a UV EPROM eraser?) I did it again in Verilog when I bought this Digilent Spartan-3E Starter board in 2009. But this digital clock is special: it's written in Intel MCS-4 Assembly language, and it's running on my emulation of an Intel 4004 CPU.

i4001 ROM emulation refactoring

This weekend I worked on the i4001 emulation, bringing the quick hack I did years ago closer to the operation of the actual chip. While doing this I've been wrestling with two conflicting objectives: making the emulation conform as closely as possible to the real i4001, while also using the hardware resources available in the Spartan-6 FPGA efficiently.

Another bug? I FIN you not

The FIN instruction is supposed to fetch the contents of the ROM location specified by the scratchpad R0R1 register pair, and store those contents into another scratchpad register pair. Obviously this didn't work properly or I wouldn't be writing about it.

A bug in my i4004 ALU implementation

Almost eight years ago I observed in Coding the ALU in Verilog that the i4004 arithmetic logic unit used positive logic for bits 0 and 2, and negative logic for bits 1 and 3. I thought this was odd but didn't ponder the matter for very long.

Last night I discovered a bug in my implementation of the ALU: if the accumulator contained 0x0 (4'b0000), executing the CMA (Complement Accumulator) instruction set the accumulator to 0x5 (4'b0101) rather than 0xF (4'b1111). Obviously this was wrong, and it bore a suspicious resemblance to the positive/negative pattern I'd noted before.

Debugging the i4004 emulation

I had fantasies of just quickly hacking up the i4001 ROM and i4002 RAM emulations and loading the full Busicom 141-PF into my P170-DH board to see if it would run, but I've been an engineer for way too long to do that. Instead I've been plodding along, coding and testing each step of the way.

I'm not quite finished with the peripheral chips, but I felt good enough about their current state to finally start a simulation of a i4004 CPU connected to a i4001 ROM and a i4002 RAM in the Xilinx iSim simulator. For a test program I used the sample code that is loaded by default into Lagos Kintli's i4004 analyzer.

Inferring LUT and Block RAM

As part of my testing of the latch-based i4004 emulation, I constructed an emulation of a complete, if minimal, MCS-4 system consisting of one i4004 CPU, one i4001 ROM, and one i4002 RAM. My intent was to run the short test program that loads by default into the i400x analyzer. But I never got that far.

Why not? Because I noticed that the resource requirements for my latch-based i4002 RAM were off the charts: 550 slice registers! What happened?

Choosing the right FPGA for the wrong reasons

I finished changing my i4004 emulation from using edge-clocked flip-flops to level-sensitive data latches. I haven't even tried simulating it yet, but just for fun I ran it through the Xilinx toolchain to see what sort of resources it required. As I suspected it's quite small, occupying all or part of 9% of the available slices in the Spartan-6 LX9 FPGA.

With that in mind, I got to wondering if this would have fit into a Lattice iCE40. I'd touched on this a few years ago in the post Packing worms into a can, but I don't think I ever tried running the i4004 CPU emulation through the Lattice iCEcube2 toolchain. Today I did, and the results of that were eye-opening in ways I didn't expect.

Revisiting the i4004 counter

When I first started this project (has it really been eight years??), one of my first experiments was to build a two-bit counter based on the i4004 schematic on a solderless breadboard. I wrote about this in a post entitled Breadboarding a 2-bit counter. It became my eighth posting -- there are now over 250 and growing.

When I originally coded the i4004 emulation, I faced the question of how to emulate the FETs used as transmission gates. Many of these are used to temporarily latch a value. Having little experience with Verilog and FPGAs, and not really understanding how the dynamic logic in the i4004 worked, I coded them using Verilog constructs which infer edge-clocked flip-flops. For comparison, here is the code for an edge-clocked flip-flop with a clock enable (ce), and for a transparent latch with a gate enable (ge):

Edge-clocked flip-flop	Transparent latch
reg q; always @(posedge clk) begin if (ce) q <= d; end	reg q; always @(*) begin if (ge) q <= d; end

The problem with using edge-clocked flip-flops is when there are n of these in a row, the output does not propagate to the end until there have been n clock cycles. The real i4004, which uses transmission gates, does not have this latency. Thus if a section of logic has six of these in a row, the synchronous system clock must clock at least 6 times for each i4004 clock pulse.

For this reason I want to replace the edge-clocked flip-flops in my Verilog emulation of the i4004 with transparent latches to eliminate this requirement.

i4002 DRAM array operation

To match the operation of the i4002 RAM chip in my Verilog emulation, I needed to understand exactly how the RAM array is accessed.

The i4002 RAM array is physically and electrically constructed of 20 rows by 16 columns of DRAM cells. The 16 columns are grouped into 4-bit wide "registers". This is much the way it is depicted in the Intel documentation.

This arrangement means all RAM operations -- read, writes, and refreshes -- act on an entire row at a time. The selection of one of the four 4-bit wide registers is handled by multiplexing and not by addressing.

Now let's take a look at what happens in each phase of the execution of an instruction.

i4002 RAM emulation development

Most of my recent tinkering with my Verilog implementation of the Busicom 141-PF calculator has been focused on the i4002 RAM chip bus interface logic. Because the i4004 was limited to a 4-bit data bus by Intel management, the i4002 (and i4001 ROM) have to pay more attention to the bus traffic than one might expect, which complicates the bus interface.

For example, the way the i4002 distinguishes a register write from a register read is by monitoring the data bus during the M2 cycle and decoding the OPA portion instruction being fetched from the i4001 ROM. This isn't terribly complex logic but it requires a thorough understanding of the operation of the data bus, as this is not explained in the datasheets.

Load testing the AC Mains supply

In the previous post AC Mains power supply I asked, "I wonder what will happen when I fire up the VFD filament and +30V supplies, plus a complex FPGA design?" Well, there's no time like the present to find out.

What I did was to take the printer test I described in the post Testing the printer interface and added the VFD display test I described in the post Proof of Life. The two don't interact so this was easy to do.

With the bench output supply set to 7.5VDC, the board draws about 124mA. This varies depending on how many digits and segments of digits are illuminated.

AC Mains power supply

This morning I wondered what the output of the AC Mains power supply would look like while the printer was running. I hate leaving questions unanswered, so I took a look:

This is a screen capture from my oscilloscope connected across the unregulated DC supply. The vertical scale is 2.00 volts per division, DC coupled, and the horizontal is 100 milliseconds per division. The single-sweep trigger is set for the negative edge at 8V DC.

With the AC power applied and the regulators turned off, the unregulated output of the bridge rectifier is 10.5 volts. Turning on the regulators drops this to about 10.1V. Printing the same eight-character pattern as before pulls this down to about 6.5V as the motor starts up before stabilizing at about 7.5V with a 0.75Vp-p, 120 Hz ripple. If you look closely you can pick out the seven small dips where the print hammer solenoid fires, with the big valley at the end occurring when the solenoid fires to print the eighth character and is held engaged to trigger the paper feed. Once the motor shuts off the big filter capacitor recharges with the classic RC charge curve.

In November I tested a prototype of my FPGA interface to the M32-TL printer using the Lattice iCEblink40-HX1K and a circuit assembled on solderless breadboard, described here. Since this worked well I was pretty sure this would also work on the P170-DH replacement board. But there is always a chance something didn't get laid out correctly, or something I overlooked.

When I put the replacement PCB into the P170-DH shell last month, I decided I really didn't want to be taking the board out again if I could help it. All the logic is on the exposed side of the board, and removing it requires removing 20 small screws. So I also connected the AC mains transformer and the printer. After proving out the keyclick sounder, the last untested subsystem was the printer interface.

Keypad remapping

When I wrote the previous blog entry Sunday I fully expected I'd implement the first option I listed, which was to scan and decode the Canon P170-DH keypad matrix, then convert these keycodes back to Busicom 141-PF matrix signals. As usual, things did not go according to plan.

Busicom keyboard interface

Now that I know the replacement board's keypad works, I started thinking about how to interface this to the emulated Busicom 141-PF calculator's "circuitry".

The Busicom scanned its keyboard using two MCS-4 family chips: a i4001 ROM and a i4003 shift register. A ROM may seem to be an odd device to use in this situation, but the i4001 also provided a 4-bit I/O port. The shift register provided 10 output pins and a simple way to walk a "0" bit down the outputs in sequence. Thus the keyboard was arranged electrically as 8 x 4 matrix. The shift register selects which of the eight columns is being read, and the I/O port reads which of the four rows contains a pressed key, if any. The last two "columns" selected by the shift register return 4-bit coded values from two slide switches.

This dictates the interface my emulation must present.

Keypad and Switch tests

For some reason I've been hesitant to test my keypad design. Maybe because it's something that is either going to work right or I'm going to have to scrap the board and have another one fabricated. But I figured it was time to find out.

High on my eventual success with the keyclick sounder, I decided to test the keypad and associated slide switches. My original plan was to do a simple "walking ones" test to make sure each keypad column could be driven high or low. However, because I decided to put Schmitt trigger inverters between the keypad row tracks and the FPGA pins, I couldn't do the same for the rows. To see what the row inputs were getting I decided to drive the 32-bit debug connector with both the column outputs and the row inputs. Having gone that far I added the inputs from the slide switches to the debug connector too.

Keyclick sounder tests

Today dawned dreary and rainy, so I decided I needed to spend some time working on the Canon P170-DH calculator rebuild. I pulled out my checklist of things that needed testing and decided to start with the audible keyclick sounder.

Before I decided I could make the fluorescent display of the P170-DH work, I decided that it would be advantageous to provide an audible feedback to indicate that a key had been pressed and recognized. I described my search for the "right" sound in my posting Tick... tick... tick... a couple of years ago, and I did a proof of concept using a 556 dual timer and a piezoelectric sounder. But I'd never actually tried driving the final circuit from an FPGA, figuring there really wasn't much to test with such a simple circuit.

Treating CLK1/2 signals as global latch gate signals

If I were to change the Verilog i4004 emulation to treat the MCS-4 system CLK1 and CLK2 signals as latch gates and not as clock enables, it seemed it might make sense to treat them as I would global clock signals.

Emulating old tech in an FPGA

When I started this project I had a couple of goals. The first was to learn about CPLDs and FPGAs, and how they were programmed using HDLs like Verilog and VHDL. My second was to understand how the i4004 CPU functioned internally, and recreate one in an FPGA. Somehow this second goal morphed into a third goal, which was to build an i4004 CPU from discrete components.

But my prime focus has always been on the learning, not on producing a result.

Spartan-6 distributed RAM confusion

Okay, I'm utterly confused.

According to the documentation. dual-port distributed RAM has one read/write port and one read-only port. It is is created by allocating twice as many LUTs as needed for single-port RAM; the write port writes both LUTs, but each read port addresses only one of the two LUTs. Thus one might expect a dual-port RAM to use twice as many LUTs as a single-port RAM.

Previously I created a test that instantiated one of the four dual-port 20x4-bit "registers" in an emulated i4002 RAM chip. Since the docs say a Spartan-6 LUT can be used either as a 64x1-bit RAM or as a 32x2-bit RAM, I expected XST to allocate two LUTs per port for a total of four LUTs per register. The "technology schematic" generated by ISE showed four RAM32X1D library primitives, yet the synthesis summary showed six LUTs allocated as dual-port RAM.

I could see four with things packed as I described above, or I could see eight if the synthesis used the 64x1 rather than the 32x2 configuration. But six LUTs made no sense. Changing the array size from 20 to 32 had no effect. So I created another test with the full complement of four registers.

Testing the External I/O interface

I finally got time to test the new P170-DH board's external I/O interface with the Spartan-3E Starter Kit's PS/2 interface. I found the results most interesting, and not quite what I expected.

Testing the Verilog RAM-based VFD driver

Over several evenings I wrote the Verilog to drive my VFD from a RAM-based i4002 RAM emulation. The comparison between this and the previous one is apples and bananas, which is to say they're not very similar. But I did get it working.

This evening I finished debugging in the simulator and loaded the bitstream into the FPGA through the JTAG. The test is set up to increment the content of the emulated WR register twice a second.

Although most people reading this blog wouldn't have trouble picturing a counter incrementing twice a second here's a video anyway, showing the carry from the low order digit to the higher digits as it counts from 1990 to 2012.

Spice simulation of the External I/O

I got to wondering how accurately my "back of the envelope" calculations modeled the External I/O interface circuit. So I set it up in LTspice:

The best laid plans

When I decided to add an external interface using PS/2 compatible 6-pin mini-DIN connectors, one of the ideas was to be able to interface it with my Digilent Spartan-3E Starter board (on sale now, btw) which has a PS/2 connector. Rather than connect it to the FPGA with resistors, I decided to use SN74LVC1T45 level translators in a configuration taken from the Digilent PmodLVLSHFT level-shifter:

One of eight channels of the PmonLVLSHFT circuit

Proof of Life

I got tired of taking baby steps. Here is the VFD in operation on the replacement PCB:

Supplied with 5 VDC, the board draws about 205 mA in this configuration. The digit values are hard-coded, and I haven't coded the RAM-based implementation described in the previous post.

The odd brightness variation is a beat frequency between the VFD's refresh rate (5.85 ms or ~171 Hz) and my phone's camera. It's not visible to the naked eye.

i4002 emulation and the VFD interface

The last time I posted on interfacing a VFD with the Busicom emulation I commented that "All I need to do is connect the WR register in an emulated 4002 to my VFD driver logic. […] I can peek into the internal circuits and use the contents stored there." The Verilog I wrote to test the VFD assumed this would be true. The effects of this can be seen in the surprisingly high resource requirements I noted in my post Packing worms into a can, though I didn't realize it at the time.

In software, the emphasis is on abstracting the implementation away from the underlying hardware. This was true when I started back in the 1970s, and the emphasis has only gotten greater as years have passed. It's so extreme now that many recent Computer Science graduates no longer understand anything about cache line utilization or translation lookaside buffer churn. I owe much of my professional career to bucking this trend and understanding how software interacts with hardware in great detail. But here I neglected to consider the hardware implications of this design decision.

Thursday, December 17, 2020

Friday, August 28, 2020

Wednesday, July 29, 2020

Sunday, July 26, 2020

Tuesday, July 21, 2020

Sunday, July 19, 2020

Friday, July 17, 2020

Thursday, July 16, 2020

Wednesday, July 15, 2020

Monday, June 29, 2020

Saturday, June 27, 2020

Friday, June 26, 2020

Monday, June 22, 2020

Monday, June 15, 2020

Saturday, June 13, 2020

Tuesday, May 26, 2020

Friday, May 22, 2020

Tuesday, May 19, 2020

Wednesday, May 13, 2020

Monday, May 11, 2020

Tuesday, May 5, 2020

Sunday, May 3, 2020

Wednesday, April 29, 2020

Tuesday, April 28, 2020

Sunday, April 26, 2020

Sunday, April 19, 2020

Friday, April 17, 2020

Wednesday, April 8, 2020

Monday, April 6, 2020

Friday, April 3, 2020

Sunday, March 29, 2020

Saturday, March 28, 2020

Monday, January 20, 2020

Saturday, January 18, 2020

Wednesday, January 15, 2020

Friday, January 10, 2020

Tuesday, January 7, 2020

Monday, January 6, 2020

Friday, January 3, 2020