Insanity 4004: 2012

Monday, December 31, 2012

An interesting coincidence

It's New Year's Eve for most of us who use the Gregorian calendar, and New Year's Day for those more than 10 hours east of me. Best wishes to everyone!

I was browsing around the other day, and I noticed the following announcement in the Intel 4004 35th Anniversary site:

Coming soon: Synthesizable Verilog source and testbench for the 4001 ROM, 4002 RAM, and 4004 microprocessor.

They're doing one too! Or... are they planning to link to or repost mine, once I get the rest of the Verilog posted? I'd like to think so, but I would have thought I'd have gotten a "that's a cool project!" email by now. Not that they're obligated -- most of the code is under the Creative Commons license. Maybe they're just shy? :-D

Regardless, there will be updates. It's just that they're moving at the speed of boredom: the more bored I am the faster they move. Recently work has been very not-boring, which has slowed the pace of my hobby projects.

ALU board progress

I've been working on the Arithmetic Logic Unit board layout. I have roughly placed all four bits of the data path (bit 0 at the top, bit 3 at the bottom) and the Keyboard Process logic. This accounts for 67% of the transistors with plenty of area left over.

I've done very little routing yet. With the Instruction Pointer board I was able to route everything using just the top and bottom layers, reserving the inner layers for unbroken ground and power planes. That isn't going to work on this board. I briefly considered a 6-layer board, but that seems like gross overkill. I spent some idle time perusing my copy of High-Speed Digital Design, A Handbook of Black Magic, and have come to the conclusion that while keeping an unbroken ground plane will be advantageous, there's little point in fighting hard for a power plane.

My rationale is that most of the FETs are wired in a grounded-source configuration, which makes them respond to gate inputs with reference to ground. The FDV301N is designed to work in low voltage circuits, and is only specified to remain off with an input at or below 0.5V; its Vgs(th) is as low as 0.70V. Thus a negative ground bounce of less than a volt could easily cause one to turn on when it should be off. In contrast, a droop of even several volts in the Vdd supply really won't affect them.

The only place where the Vdd supply voltage is critical is the drive to the BSS83N pass-gate FETs; except for these, I could run the entire CPU on 3.3V. Of secondary concern are push-pull drivers, due to the Vgs drop across the pull-up transistors. I expect reasonable bypassing will take care of both these problem areas.

This frees up one of the inner layers for routing. My stack-up plan now looks like this:

Layer	#	Use
Top	1	Local connections
Inner	2	Ground Plane
Inner	3	Vertical connections
Bottom	16	Horizontal connections

In my last posting I said I might rotate the bit paths 90 degrees. I did give that a try. However, I hadn't placed the last of the components in the paths, and it got a bit tight. Further, it became more difficult to mentally translate from the schematic to the board. In the end I switched it back around.

Tuesday, December 4, 2012

Starting the ALU board

The instruction decoder and the arithmetic/logic unit are so tightly coupled that they really should be on one board. Of the 70-some inter-board connections, better than one out of three connect only these two boards. Thus it makes sense that the layout of these two boards, and the pin-out of the remaining connector pins, be accomplished in parallel.

Here's the start of the ALU board layout:

As with the other boards, I've used "wires" drawn on one of the documentation layers to sketch out the various pieces. The four horizontal areas containing components represent the four-bit data path through the ALU. The right-most group forms the input value "register" (really a transmission gate to capture the data bus, two inverters to generate true and inverted versions of the data bit, and a pair of push-pull drivers). Moving right to left, the remaining groups are the carry predictor, the carry output driver, the adder, and the true/complement selector. The group on the far left is a combination of the bidirectional shifter and the accumulator register. Since the accumulator's value may need to remain stable for long periods of time, it includes a built-in refresh circuit to keep the charges from leaking away.

The odd collection of rectangular areas in the lower left represent the so-called "Keyboard Process" logic, a translation table of sorts that converts a one-of-four accumulator value (0/1/2/4/8/*) into a 4-bit binary output value (0/1/2/3/4/15). I was surprised to find such a function implemented in hardware, but apparently they felt it was a reasonable speed vs silicon area trade-off.

There's still a good bit of circuitry to be placed on the board outside of the sketched areas. The original layout of the data path occupied the entire width of the sketched area, but clearly that can be narrowed significantly. So much so, that I may rotate the whole area 90 degrees. We'll see once I get some routing done.

Clearly I'm not going to have boards ready to be fabbed in time to get them back by Christmas. But maybe I'll be able to spend some of my holiday break on the project.

Saturday, November 24, 2012

Starting the Instruction Decode board

If I'm going to have boards to work on by Christmas I need to get them submitted in the next week or so. It's not the highest thing on my priority list, but I did make a start this week.

I started by beginning the layout of the Instruction Decode board. Here's a snapshot of its current state, with about 80 components yet to be placed and essentially no routing done:

The schematic shows the OPR decoder above the OPA decoder, with the final combinational logic below that. This makes sense if the ALU is the next thing down, but in my implementation the ALU is on another board reached through the smaller connector. This led me to rearrange the groups a bit so the final combinational logic is between the OPR and OPA decoders. The groups of components nearest the larger connector form the OPR and OPA "registers"; since these registers are updated for each instruction cycle, transmission gates capture the state of the data bus on MOSFET gate capacitances rather than using flip-flops.

The structure of the combinational logic doesn't lend itself to tight packing, so I went ahead and repartitioned the design as described in the post A little re-partitioning. This moved the 61 components of the ALU timing circuits off the Instruction Decode board and onto the ALU board where they belong. To avoid making the ALU board more dense, I moved the 60 components of the external chip select logic off the ALU board and onto the Timing and I/O board.

Not bad for a day's work.

Tuesday, November 13, 2012

It leads to the Source...

I've finally had time to post the Verilog source for the 4004 CPU to my OpenCores project.

The sources posted synthesize for the Xilinx Spartan-3e using the WebPack edition of ISE 12.4 XST, and simulate using ISim. They also synthesize using the Lattice iCEcube2™ tool chain for the HX40 series, though I haven't tried loading that yet.

I haven't yet posted the test bench code, nor have I posted the code implementing the 4001 ROM chip. I'll get around to this sooner or later, I promise.

I have this fantasy of finishing the layout of the remaining boards in time to populate them over the christmas holidays, but realistically that's unlikely to happen. It's somewhat more likely that I'll do some layout during the holidays, get them fabricated in January, and work on them over the following months.

Monday, October 22, 2012

A Non-Update

After devoting a lot of personal time over several months to this project, my attention has necessarily turned elsewhere. Activities such as client projects, where an internal release is pending at the end of this month, have priority. They are paying the bills, after all.

That said, I have not abandoned this project. I still plan to release the Verilog source for my 4004 CPU implementation later this month. Also on my "to do" list for later this year is characterization of the DRAM array, and the layout of the remaining four PCBs to complete the CPU.

Sunday, September 30, 2012

Like a Phoenix

Like the mythical Phoenix that rises from its own ashes, this project has taken wing.

While writing the last post I started to say, "The possible causes are too numerous to itemize." Once posted, I realized that the problem was so repeatable that the problem was likely in the soft implementation of the 4001 ROM, which I'd hacked together rather quickly. I really intended to leave it like that, but my brain wouldn't stop thinking about it.

One of the hacks in the soft 4001 code was to only assert the external data bus during the portions of the M12 and M22 phases while CLK2 was asserted, when I knew the soft 4004 was sampling the internal data bus. That's a far narrower period than a real 4001 would drive. After some research I concluded that I could safely expand it to the entire M12 and M22 phases, which is what I believe a real 4001 would do. That worked OK in simulation (i.e. no invalid states with multiple drivers), so I had to try it with the real IP board.

And it worked! My logic analyzer has a limited buffer and I was only able to capture the first 260us after POC de-assertion. That's a bit more than 16 machine cycles, sampled on the rising edge of the 20ns sysclk. During that period it executed two NOPs, a JUN, three consecutive JMS followed by three consecutive BBL (nested subroutine calls and returns, which test the EA counter and all four rows of the IP array), a LDM (load accumulator), and a JCN (jump condition). That tests most of the functions of the board, and I think that's pretty good!

Of course, this really should work, as the underlying design was in production for 15 years (1971-1986). But this also validates the work I put in selecting components, laying out the board, and placing and soldering the 500-some components on the board. Not to mention over a thousand lines of Verilog HDL. Not too shabby for a software engineer playing at electronics engineering.

Note that I haven't added any of the extra charge-storage capacitors I laid out on the board; this is purely using the MOSFET Gate capacitances and whatever small capacitance the PCB traces add for value storage. Apparently that's sufficient with a 1.36us cycle time. At some point I'll do some margin testing with the clock rate and see just how fast and slow this will run.

It starts... and crashes

Just for a laugh I chopped out the Verilog version of the Instructor Pointer board and hooked up the real IP board. With a logic analyzer connected to the interface between the two I triggered the recording on the falling edge of the Power-On Clear output. At first I got garbage, then realized that the soft reset was far too short to clear the real IP DRAM array. Holding the POC input button I reset the analyzer's trigger, held my breath, and released the reset.

During the first machine cycle the hardware IP board reports an address of 000 (hex) at the appropriate time, which is correct. The soft 4001 ROM responds with 00 (hex) — a NOP instruction — which is also correct. So far, so good. During the second machine cycle the IP board reports an address of 001h, which suggests that the incrementer circuit is working. The soft ROM responds with another NOP, again correct.

The third cycle produces address 002h, more evidence of a successful increment. The instruction fetched is 40h — a JUN, or unconditional jump, a two-cycle instruction — again correct, and the JUN+JMS and SC signals from the soft instruction decoder switch states at the proper time. The fourth cycle produces address 003h and the byte fetched is 0Bh, the proper destination address for the jump. What should happen here is that, instead of incrementing the current IP value, the IP board should store the destination address of 0Bh. This apparently is not happening properly, as the next address to come out is F0Fh. From there it goes F10h, which is wrong but makes sense, 101h, which doesn't, and downhill from there.

The signals being fed into the IP board look roughly correct, based on the both simulators' outputs. The problem appears to be fairly repeatable, which will make it easier to debug. In fact, it's so repeatable I suspect the problem is that the soft 4001 ROM isn't outputting the data long enough to meet timing with the real IP board. If this isn't it I may have to probe some internal signals, which will necessitate soldering some wire loops to the board to form test points I can grab with a micro-clip.

Unfortunately I don't have time to dig into the problem further tonight.

A cheaper FPGA reference board

One of the things I found time to look for was a cheaper reference board. The Digilent Spartan-3E Starter board I'm using is packed full of features, and as reference boards go is quite reasonable at around $200 (USD). However, it's gross overkill for this little application. I'm using 3% of the FPGA's slices and LUTs, and none of the other on-board hardware.

I came across some really hobbyist-unfriendly products in my search. One product came with a free toolchain license, but it automatically synthesized logic to monitor how long the FPGA had been running and shut it down after 24 hours. The cost of the lowest tier license that didn't produce self-terminating loads was prohibitively high, and required expensive, yearly renewals. I don't care how good their products are or how cheap a single chip is, the lack of a usable toolchain makes them not suitable for hobbyist use.

Poking around a bit further I found a demo board with a Lattice HX1K part that sells for $20 (USD). Yes, that's 10% of the S3E board's cost. There's no extra hardware on this board, just the FPGA, an oscillator, a SPI Flash ROM, and an ATMEL microcontroller that implements the USB programming interface. The board is in a form factor that supports both PMOD and ChipKit interfaces, for those who value such features.

Running my current 4004 CPU sources through their toolchain produces a load that uses about 40% of the logic cells and one of the 16 block RAMs (it put the 64-bit scratchpad array in a 4K-bit block RAM, but not the 48-bit instruction pointer array). The remaining block RAMs are sufficient to support a full set (16 each) of 4001 ROMs and 4002 RAMs, and there should be enough logic cells to interface them. Power consumption at full speed is spec'd as a small fraction of the quiescent consumption of the S3E, though it's also a small fraction of the capacity too, and requires only two voltage sources rather than three. For more ambitious designs there is another chip in the same family that has 4x the logic cells, though it's in a larger package (TQ144 vs VQ100) so you can't just swap chips on the same board.

There is a free development environment license (Windoze-only *grumble*), though the license is only valid for one year. I'm hoping they'll allow hobbyists to request a new license. After all, hobbyists sometimes turn into consultants who recommend products in their professional capacity.

My plan is to continue to use the Xilinx S3E as a target, as I'm much more familiar with the Xilinx ISE development environment, but with an eye toward making sure it'll port easily to the Lattice HX series.

No code posts this weekend

Firstly, work is sucking up my time, and I literally haven't touched the code since early last week. Secondly, my lawyer friend is totally saturated with end-of-quarter work and hasn't been able to review my proposed license disclosure verbiage.

So I won't be posting the Verilog source code for my 4004 CPU implementation this weekend. Rest assured, though, it will come, and will be under the same Creative Commons non-commercial license as the original Intel materials.

Wednesday, September 26, 2012

OpenCores project created

A couple of days ago I created a project on the OpenCores.org website for the Verilog version of the MCS-4 components. There's a direct link to the project here, or you can look under the "Processor" projects.

The project is currently empty, other than a general description and a couple of links. Since the Verilog is derived directly from the Intel schematics, I want to make sure all of the Verilog source files have appropriate copyright and license notices before posting them. This may happen this weekend.

I've started to notice an up-tick in the number of hits on this blog from that project. For those of you looking for more background on the Verilog aspect of this blog, here are a few links to some of the early entries in this blog you may find most helpful:

That should be enough to get you started. I've also begun tagging the postings based on their content.

Sunday, September 23, 2012

It's a RACE!

I've been switching back and forth between the hardware and Verilog versions of the 4004 CPU, based mostly on mood. The Verilog version had been moving ahead so quickly I was thinking I'd have it working before I got the Instruction Pointer board fully populated and tested, but then I ran into the inevitable bugs in my translation from schematic to Verilog.

Suddenly I have an Instruction Pointer board which needs testing, with nothing to properly test it. When my plans for the evening were rescheduled, I got some time to focus on the Verilog version.

Most of the problems I've run into have either been from misreading the schematic, or from dropping a negation in the constant polarity reversals. Let me explain the latter: the type of circuits used in the 4004 are either NOR or NAND gates. This means that the output is the negation (i.e. "NOT") of the basic logic. When they want to OR two signals the output comes out inverted (if either A OR B is 1, the output is 0, else it's 1). The same thing happens with a NAND (if A AND B are 1, the output is a 0, else 1). Every time I trace a signal through series of logic circuits I have to account for this reversal. If I do a verbatim translation to Verilog I usually get it right, but if I need to do a bit of creative interpretation the chance of getting it wrong goes up.

Finding such mistakes can be a challenge, but this is a geek's version of a good detective mystery. Every possible clue must be considered; some clues are misleading, but others lead you closer and closer to catching the culprit. It's something I'm pretty good at, and it's fun when there isn't a deadline hanging over my head.

A little while ago I was very excited, thinking I had the system working. The simulator had successfully fetched and executed the first two instructions, but then the simulator screen full of nice, green signal traces (valid states, either 0 or 1) turned to a cascade of red (invalid or unknown states, neither 0 or 1, represented as 'X'). The first clue was that the first two instructions are both NOPs (No-Operation), which are coded as '0000'. The third instruction, which was being fetched when things blew up, was a JUN (Jump Unconditional), coded as '0100'. For 20 nanoseconds, one internal clock cycle, the data bus showed '0100', but then it changed to '0X00'. That started a cascade of red 'X' states as one signal after another became unknown. It took about an hour (including time spent petting the cat while pondering the problem) to track it back to a missing inversion in the generation of the OPR-IB signal; this caused a second source to attempt to drive the data bus to '0000', and the conflict caused the failure.

I'm not claiming I've fixed all the bugs, but the simulator just stepped through a sequence of JMS and BBL (call and return) instructions, which tests a whole lot of logic in the Instruction Decode and Instruction Pointer groups. This is enough to be able to test the Instruction Pointer board pretty thoroughly.

So the race is on!

Adventures in reflow soldering

Late in the afternoon I woke up feeling well enough to do some hardware work. I'd been considering whether to populate and test another small section of the Instruction Pointer board, or just do the rest of the board in one marathon session. Experiments suggested that if I populated much more I wouldn't be able to use the solder stencils for the rest, which sort-of settled the issue.

At about 6pm I used the stencil to put solder paste on as much of the board as I could, then started placing components. At first this was slow, but eventually I developed a fairly efficient rhythm. Still, I had a bit more than half the board to populate, or about 300 parts to place. It took me about four hours, by which the solder paste had mostly dried out, so rather than the components settling into the paste like a foot into soft mud, they sat on top. I was rather worried that even with a low airflow, the hot air soldering wand would blow components away before the solder paste reflowed. Indeed, this happened with a few components, but most stayed in place.

Reflow soldering has to be seen to be appreciated. The dull-gray solder paste suddenly turns bright silver, and misshapen smears suck together like a scene from the Terminator 2 movie. Components that are slightly misaligned rise and twist into alignment, then settle to sit flat on the board. It looks like animation, but it's surface tension in action.

And then there are the failures. Resistors rise on one end until they look like partly-raised drawbridges, a condition aptly named "drawbridging", or until they're completely upright, known as "tombstoning". Transistors rise on one lead, looking like an acrobat balancing on one arm. All these have known causes, mostly related to poor solder paste application or uneven heating, both offenses I'm horribly guilty of committing. A couple components simply flew away, like a speck of dust blowing in a gust of wind and barely perceptible until a careful inspection shows a couple of empty pads where there should be a resistor. There's a reason I bought 100 extra resistors.

I think for the next board I'll just populate the whole thing at once, which will minimize my problems with the stencils. I may also try the hot plate method of reflow, which gives a more even temperature distribution and avoids blowing components away.

With the missing components replaced and the soldering problems fixed, I checked for major problems (like a short between the +5V and Ground buses). Not finding any real problems, I plugged the thing into my test jig. My simple test driver doesn't give a real test, but the few signals I probed looked reasonable.

By this point it was midnight, and I crawled off to bed.

A Verilog 4001 ROM

Saturday morning I woke up way too early and couldn't get back to sleep, so I worked some more on the Verilog version of the MCS-4 system.

I started by coding a Verilog implementation of the Intel 4001, a combination of a 256 x 8-bit read-only memory and a 4-bit I/O port. No Flash ROM this chip, it was programmed to order during manufacturing by Intel, along with the configuration of the I/O port. The 4004 CPU could address up to 16 of these, giving a maximum program address space of 4096 bytes. The Busicom 141 calculator used four of these to store its program, with an optional fifth that implemented the square-root function. Firmware updates required replacing the chips — not something one did willy-nilly.

Of course, my version has an important refinement: the program is loaded from a text file containing the hexidecimal representation of the program code. In simulation this is done when the simulator starts, but the XST synthesizer can merge this into the bitstream used to program a real Xilinx FPGA. This will allow me to create a complete virtual MCS-4 system within the FPGA.

When it looked like I'd worked out the bugs in the 4001, I created a test bench that would connect the 4004 to a 4001 loaded with the test program supplied with Lagos Kintli's simulator and tried running the system. Unfortunately there were bugs, always bugs. Fatigue set in and I ended up taking a long afternoon nap.

Friday, September 21, 2012

Verilog debugging 1

As the haze and malaise of my fever begins to subside, I've started looking for the problems in my Verilog implementation. I found a couple minor problems — one allowed the external data bus to float when POC was asserted, the other failed to output the Carry bit on D0 during the X3 phase — but nothing that would prevent basic functionality from operating, like initializing the IP DRAM array during POC.

While I won't be surprised if there are more real errors, it's starting to look like most of the problem is how picky the Verilog simulator is being regarding uninitialized, or 'X', inputs. I suspect that the system would eventually stabilize if these inputs were either 0 or 1 (most likely 1, given the resistive pull-ups), but the X values keep propagating.

Update: I decided to initialize the OPR and OPA registers to zero. These are the inputs to the instruction decoder, and thus cause almost all the 'X' signals to become defined by the end of the first instruction cycle. My test bench code drives an "INC R1" instruction onto the external data bus during each instruction cycle, and I can see scratchpad register 1 incrementing. This is a HUGE step forward.

Unfortunately, I also see the Instruction Pointer counting 000, 001, 002, 062, 063, 062, 063 (hex) and so on. The IP incrementer works properly in the unit test, so there must be something in the interaction with the other components on the data bus. An "INC R1" instruction is coded as 61 (hex), which could be a hint.

Later update: After spending three hours looking for a bug in the incrementer that would cause a jump from 002 to 062, the truth occurred to me: there isn't a bug in the incrementer, there's a bug in the instruction decoder. Because of an error in interpreting the schematic, it's executing my "INC R1" opcode as a JCN instruction and as an INC instruction. At the same time. With ugly results.

That it took me three hours to notice this is why I'm not working on "real work" yet: I'm not quite back to normal and I'd probably screw it up.

Thursday, September 20, 2012

The entire i4004 CPU in Verilog

It seems weird that, after wrestling with a Verilog implementation of the i4004 CPU off and on since 2009, it would require only about 1250 lines of actual behavioral Verilog code (about 200 more in blank lines and comments). I was really expecting it to take a lot more, though I don't know why as that's about one line for every two transistors, and a lot of that is port definition stuff.

Of course it doesn't quite work yet. I tracked down one bug caused by an errant tilde — the "L" signal was inverted from what it should be, causing havoc on the data bus. Even with that fixed it still doesn't work. Debugging is being hindered by another sort of bug: I'm running a 101.8F (38.8C) fever. Yes, I seem to have caught the flu or whatever is going around.

Once I get it working I'll look into posting the code to the OpenCores site, and the 4004 35th Anniversary website if they'll have it. Since it's derived from the Intel schematics it will carry the same non-commercial Creative Commons license, but I really can't see that many people clamoring to use an implementation of a 42 year old design in a commercial product.

I think I'm going to order a pizza and see what's on TV. If I have the attention span for TV.

Wednesday, September 19, 2012

More Population, More Bugs

I've been debating how best to approach populating this board. It's certainly easier to populate the board if I can use the solder stencils to place solder paste, rather than have to put dollops of paste on each pad. Thus the limited flexibility of the stainless steel stencils dictates what areas I can easily populate. As tempting as it might be to just do the whole board at once, that's just not practical. I'm sure I'd manage to brush against the board and smear a few hundred components off their pads before I'd be able to get everything properly placed.

In the end I decided to get all the decode logic populated and tested, then work my way across the board toward the upper-left corner. So that's what I spent the evening doing.

Here's a crappy photo of a section of board partially populated, taken through the microscope. The upper row of components are 4.7K Ohm resistors in place. The second row has two FDV301 FETs and an unpopulated space for a third, while the third row is still unpopulated. You can clearly see the little rectangles of solder paste on the pads left by the stencil.

Remember, the larger components in this photo are half the size of a single grain of long-grain rice. This makes component placement a tedious, exacting activity. It's obvious why they have machines to do this. It'd be even worse if I had to then touch a soldering iron to each of these pads, but the Hot Air station makes it so much easier. I really am sold on this thing.

Here's the result, after everything was soldered down, and the board placed in the test jig. I'm not quite as good as a commercial pick-and-place machine followed by a computer-controlled reflow soldering system, but this is a hobby.

I went through the various outputs with my 'scope. I'm still not used to seeing RC charging curves, having worked mostly with TTL and CMOS logic that gives actively-driven, fast-rising edges. But if I mentally cut off the signal at the 2V mark, which is higher than any signal needs to go unless it's driving a transmission gate, things look pretty good.

Or it did, until I got to the Row Read decoder. That signal simply refused to go high. There are four inputs to the final NOR circuit; all need to be low for the output to go high. In this test configuration, two inputs are a constant low, and showed grounded on the 'scope. Two other inputs are pulses, and the output is supposed to go high when both of those pulses are low at the same time. I could see that happening at the appropriate time, but the output never went high. I tried reheating the connections of the four parallel transistors that make up the NOR circuit, but that didn't help. Eventually I removed the bottom two FETs -- the two whose Gate leads were being held low -- and the circuit started working. If you look really closely at the picture above you'll see where these two used to be.

I can see several possible causes. I could have had a solder bridge under one of the components, even though I didn't see a short with my ohmmeter. I could have damaged one of the FETs, either through static or by over-zealous use of the hot air. Or one of the FETs could have been defective. I'll check the board to make sure there isn't a short from drain to gate pads (which shouldn't be, if the electrical test was thorough) and then replace them with new parts.

I think there's a lesson here about laying out the circuits such that testable groups are separated enough that I can use the stencil without kinking it.

Update: The problem with Row Read decoder was that the POC input, which drives one of the four FETs, wasn't connected on the test jig and was floating. Putting a scope probe on the gate lead was enough to siphon off any charge and show 0V, but removing the probe allowed it to float high enough to turn the FET on and pull the drain to ground. Fixing this on the test jig restored full functionality.

Wiring the Test Jig

Last night I finished wiring the test jig. It's not pretty, but none of the signals need to have fast rise or fall times, and there's a lot of noise immunity in the signaling levels.

The signals arrive from the Spartan-3E reference board through the connector on the right of this photo and terminate at the pin header next to it. From there, the rats nest of wires connects to the Source leads of the array of FDV301 MOSFETs. The Gate leads of this array are connected to the +3.3V power bus sourced by the reference board; if you look closely you'll see a heavy, bare copper wire running vertically through each of the four columns of FETs. The Drain leads then connect to the larger inter-board connector on the left of this picture.

I've already had one of these leads break due to flexing while the rest were being attached, so I'm planning on attaching a piece of bare perf-board to the standoffs to provide some mechanical protection while the board is handled. I'll attach identical standoffs and perf-board to the bottom of the reference board so it'll sit at the same height off the bench.

With this done, of course I couldn't wait to test it. I hooked everything up, loaded the FPGA from my laptop, and started probing the partially-populated IP board. My plan was to test the 2-bit row refresh counter, but it wasn't counting. I started debugging by checking the INH and ~INH signals, looking for any indication that they were changing. They weren't. Probing the inputs that generated this signal I saw that none of them were changing either. Then I realized that all the inputs to this circuit are statically driven from the Verilog code, and that this is exactly what I'd intended. Doh!

The counter was getting the proper CLK1 input, so why wasn't the counter counting? Could it have something to do with the unpopulated area of the board I'd clearly marked "Rfsh" (as in "Refresh")? Well, yes...

Back to my soldering bench (which is separate from my test bench -- 500 MHz scope probes are damned expensive and far too easy to melt). I put dabs of solder paste on each of the pads in the refresh enable logic, carefully placed the components, then soldered each of them with my soldering iron. Back to the test bench. Still nothing. I probed the circuit and found what appeared to be an open solder joint. Back to the soldering bench, but this time I fired up the hot air station. It takes a little longer, because the board has to come up to temp unless you want to fry things, but the results seem to be a lot more reliable. Especially when you're dealing with leads that are barely 0.5mm (0.020") wide.

With a lesson on soldering SMT components learned, I went back to the test bench. Hot! Hot! Hot! I keep forgetting that hot air work leaves the board hot, and having two full interior copper planes holds a lot of heat. When the board cooled down I hooked it up and sat there with a silly smile on my face, watching as a 42 year old design counted 0, 1, 2, 3, 0, 1, 2, 3...

It's ALIVE...

Monday, September 17, 2012

Coding the ALU in Verilog

I finished coding the Arithmetic/Logic Unit (ALU) board in Verilog. The result is surprisingly compact: just 218 lines, if you ignore blank lines and comments. The Instruction Decoder board only required another 239 lines of Verilog.

Since one of my goals is to really understand how everything works, rather than just doing a rote translation, it took longer than it might otherwise. There are still pieces that aren't clear to me, though. It didn't take long to recognize that there's a carry prediction stage before the actual adder, but I don't understand why the even bits (0 and 2) use positive logic while the odd bits (1 and 3) use negative. Google had one hit that indicated that doing so had speed advantages, but the link led to an IEEE paper that I didn't feel like paying $20 to read. I haven't really dug deeply into the adder either. When I understand these more clearly I'll try to write them up here.

I already understand the operation of the two DRAM arrays, and I'm guessing I can have them re-coded in a few evenings. The timing logic I'm already using to drive the test jig. To make a system that will execute the test program provided as part of the simulator I'll need to implement the 4001 ROM/GPIO chip, and maybe the 4002 RAM/GPO chip.

Maybe I'll see the 4004 running in an FPGA sooner than I expected.

Sunday, September 16, 2012

Dynamic problems, Static solutions

I thought I'd explain a little bit about the problems that stymied me for so long when trying to implement the 4004 in an FPGA.

The biggest problem was the use of transmission gates. These are FETs used as analog switches, passing or blocking whatever signals are presented. These can be used bidirectionally, as in the picture to the right, where the horizontal FET makes this circuit act as either a NAND or a NOR gate. Once the operation of the circuit is clearly understood, though, this can be worked around.

It's the use of a transmission gate as a latch that caused me the most trouble. There's an example on the left. The input is the red ("high" or "1") coming in from the bottom. It connects to a FET configured as a transmission gate. If the gate was "open", the input would pass through and drive the other side of the gate to the same value. Here the gate is "closed" because the FET's Gate lead is low (blue). Thus the high on the input does not pass through.

So what's the level on the other side of the transmission gate? It's whatever level was present the last time the transmission gate was open. In this example it's a "floating low", depicted in a faint blue color. Thus the transmission gate is acting as a latch.

Why is this a problem to implement in an FPGA? Latch primitives are available, and can be inferred in behavioral code. The problem is that there are cases where the input signals and the selection signals are derived from the same sources. When I tried to implement the DRAM logic using latches, the data input to the latch was the output of a multiplexer, and there was a race condition between the deassertion of the latch enable and the select inputs to the mux. The only way to get it to work in simulation was to carefully structure delays in the signals, and that's not practical in real life.

The most robust solution is to use clocked flip-flops rather than latches. This is such a common problem that even the Verilog compiler issues warnings against using latches. The benefit of using clocked flip-flops is that if all signals change in response to the clock, none will have changed before the inputs are captured. The problem is that rather than having the results flow through all stages of logic at full speed, they can only move one stage per clock. My problem was two-fold: I wanted to be as close to the original as possible, and I didn't know how many stages of clocked logic there were between input and output.

Eventually I threw in the towel. My current iteration uses the 50 MHz oscillator on the Spartan-3E reference board as a system clock. This means there are 68 rising clock edges in each of the 8 execution phases, and 19 in each 380ns minimum-width i4004 "clock" pulse. I haven't counted the maximum number of sequential stages yet, but my impression is it's less than 6, and probably much less. This will result in a design that should work on any FPGA, at the cost of requiring a system clock several times faster than the i4004's "clocks".

Saturday, September 15, 2012

A little re-partitioning

While coding the Verilog version of the i4004, I came to realize there's a group of circuits I originally assigned to the Instruction Decoder board that is really part of the ALU.

At the same time, there's a group of I/O circuits on the ALU (the ROM/RAM chip selects) that I'd really rather have on the Timing and I/O board, just to keep all the I/O drivers together.

I think the number of components to be moved off the ALU just about matches the number of components to be moved onto it, which will help keep the density manageable. It's actually a net reduction in inter-board connections, though there still are plenty of uncommitted pins on the connectors.

Full Circle

When I started this project a couple of years ago, my goal was to implement the i4004 CPU in an FPGA. What blocked that was my lack of experience with FPGAs, and the techniques that could be used to translate a slow, dynamic-logic design into a high-speed, static-logic implementation. The goal eventually morphed into building the CPU out of discrete components, but I'm still using the FPGA reference board I bought for the project to drive the test jig.

One of the questions floating around in my head was how to properly test each board or sets of boards. I gave thought to using a PicoBlaze soft microcontroller core in the FPGA, and I also considered building a simple state machine that would just drive defined signal patterns. These may yet become reality, but to test anything complex would require a really detailed analysis of the signal patterns found in a real i4004 CPU.

Having now found a solid solution to my FPGA implementation problem, another means for testing the discrete logic version presents itself: implement the CPU in the FPGA, broken into modules the same way the discrete implementation is broken into boards. This way I can mix and match soft implementations and hard, subject only to the limitations imposed by the available I/O pins on the FPGA. Which may, unfortunately, preclude testing the Instruction Decoder or ALU except as a pair.

Thus far I have a large portion of the Instruction Decoder board coded. It's amazing how large arrays of transistors collapse into a single line of behavioral Verilog. I have several versions of the DRAM arrays prototyped from previous experiments, and though most are structural implementations I wouldn't expect those to take very long to recode. Figuring out how the ALU works in detail may take some time, but it's something I need to do anyway, and it's something I can do in bed with my laptop propped up on my lap rather than in the lab.

Friday, September 14, 2012

A little test jig work

This evening I didn't get home until after 9pm, so I only had a couple of hours. I mounted additional FDV301s on Surfboard adapters to bring the count to 40. I thought about using some of the other MOSFETs I'd tried, like the DMN26 and BSS138 types, but decided it would be smarter to have them all the same. Here I'm taking advantage of the FDV301's protection diode from Source to Gate as additional protection against the Source going dangerously high and damaging the FPGA.

Because of the way the FETs are lined up, connecting all the gates to the Vcc bus (3.3V) on the expansion board was just a matter of running a heavy wire down the column hitting all the middle pins, and a little solder work. I got them all soldered onto the test jig board, but only had time to wire through and test 4 of the 40 FPGA I/Os. Completing the rest of the wiring shouldn't take any longer than it would if I'd used resistors.

Here is a photo of the top of the board:

If it's not totally obvious, each of the little cards is a Surfboard adapter card carrying one FDV301 MOSFET. Four columns of ten is forty FETs, each protecting one of the 40 FPGA I/O lines that show up at the pin header to the left of the prototyping area. Sorry if the photo isn't great.

Thursday, September 13, 2012

Spartan protection

Last night I re-verified the results of my earlier tests of using a FET as input protection for the Spartan-3E FPGA board. This time, though, I specifically looked at what happens when the gate voltage drops to zero.

While feeding a strongly-driven more-or-less square 5V signal into the drain lead of the FET, I brought the gate voltage down to zero while monitoring the voltage on the source lead. At zero I see a weak ghost of the input signal with an amplitude of about ±200 mV. That's not enough to turn on the protection diodes, so it should be safe. Even when powered (and why would the gate voltage ever be zero while the FPGA is powered?) this still counts as a zero input.

With the gate held at 3.3V, the output at the source lead is clamped to 2.8V, a half-volt drop. The minimum Vih for LVCMOS33 inputs is 2.0V, giving a 0.8V noise margin which should be plenty.

The test jig for the IP board drives all the internal signals except for the four data lines, but once I get to the Instruction Decode and ALU boards this situation will reverse. I think it makes sense to go ahead and put FETs in line with all the signals now (except CLK1 and CLK2, which are already buffered) to avoid major reworking of the jig later. I may just do all 40 lines in two rows of 20 to make it consistent.

Wednesday, September 12, 2012

Easing the interface problem

Re-creating the i4004 CPU is a fun project, but the proof is in creating a running system of some sort. The obvious target is a replica of the Busicom calculator, especially since the firmware for it is available.

The thought of re-creating another set of chips — 4001s, 4002s, and 4003s — really isn't appealing. Nor is developing a bunch of discrete glue logic for modern RAMs, ROMs and I/O ports. Not when there's an easier, more flexible approach: a small FPGA. Unfortunately, the days of 5V FPGAs is past; most will do 3.3V but internally use 2.5V or even 1.2V. That means adding level converters, just like on the test jig.

That got me thinking... the FPGAs use one voltage for the core logic, and a different voltage for the I/O circuits. Why couldn't I do the same on my i4004 replica? All the external I/O interfaces are on one board, and it'll be the last board I make. If I connect the drains of the high-side FETs to a separate Vio pin, I should be able to run the I/O interfaces at 3.3V while the rest of the i4004 runs at 5V. That'd make it easier to work with.

Unfortunately this won't work for the test jig, which uses FPGA logic to drive and monitor internal signals. I have 25 FDV301s already mounted on "Surfboard®" adapter boards, and another 40 Surfboards as yet unused, with less and less need for them. I might as well put them to use protecting the Spartan-3E reference board.

First signs of life

I did a bit of work on the test jig last night. First I wired up the GND and +5V pins on the J1 connector. Then I temporarily connected the I/O pin carrying the SYNC output from the FPGA board to the ~CN pin on the inter-board connector through a 470 ohm resistor. This pin connects to a single inverter (an FDV301 FET and a 4.7K resistor), and would allow me to see what effect the series resistor would have on a signal.

I plugged the test jig into my FPGA board, and was rather alarmed to see that even with the power switch in the "OFF" position the FPGA board was feeding +5V through the expansion connector. Examining the schematic I found that the power switch controls a signal input to the regulators on the board and doesn't disconnect the +5V input or the feed to the expansion board. I consider that a design error. It also heightens my worries about driving +5V into an unpowered FPGA I/O pin, even through a resistor. I may have to put FETs or other level converters in series with all the connections on the test jig.

After wrestling a bit with Impact, Xilinx's FPGA loader program, I got the FPGA loaded. The output drivers are still set to FAST (something I ought to change) so there were sharp peaks on the leading and falling edges of the output. On the other side of the resistor, though, there were no sharp peaks, and only a little bit of rounding and no loss of amplitude.

Even better was the signal on the output side of the inverter. On the breadboard, the fast turn-on of the FDV301 causes significant ringing, with the initial excursions reaching -1V and +0.7V. I see none of that on the real PCB. I expect this is because the source lead is now connected to a true ground plane through a very short via and thermals, rather than comparatively long wires.

I didn't have the rest of the signals that feed into the INH circuit connected, and the ~CN pin needs to be connected to the proper source, so I haven't seen the rest of the circuit function. I'll get to that another night.

Tuesday, September 11, 2012

PCB Faux Pas

If it sounds like I was overly nervous about something as simple as a PCB, it's because I've had some bad experiences in the past. Here are a couple of PCB fabrication errors I've run into.

Way back in 2002 I had a batch of PCBs made by a board house specializing in one-day, quick-turn prototypes. I believe I submitted the Gerber files on a Thursday morning and the boards arrived Saturday morning. Most of the boards looked great, but this one appears to have been over-etched in a small area.

To give you a sense of scale, these holes are sized for 4-40 machine screws; I think they're 0.122" in diameter.

My other example is from 2009. I'd started experimenting with CPLDs, and decided to use one to design a controller for some old NEC 4164 (64K x 1-bit) DRAM chips I had laying around. The CPLD I wanted to use comes only in surface-mount packages, so I made my first tentative steps towards using SMDs. I spent days trying to decide how big the pads should be, trying to balance my desire for wide pads against my utter lack of experience soldering anything tighter than 0.1" pitch, let alone a fifth that (~20 mils). Finally I settled on 12 mil wide pads as a good compromise.

Here's what I got. Notice that the 8 mil traces don't get any wider when they emerge from under the solder mask. The device leads are 8.7 mils wide, making them wider than the "pads" they're supposed to sit on. Since the "pads" are pre-tinned (hot air leveled solder) they're convex, and trying to align the device on the "pads" was a real pain. The leads kept sliding off into the open spaces between the "pads", which are quite wide enough for a lead to sit in.

When I called the board house, they protested that they'd made the boards exactly as I'd specified. Yet Eagle and every Gerber viewer I could find showed 12-mil pads. After far too much denial, their customer service person finally admitted that they'd left off the pads entirely. What really pissed me off, though, was that the best they'd offer to make things right was a discount on another order. Needless to say, I never placed another order with them.

So forgive me when I get excited about the high quality of the boards I just received. It's a welcome change.

Monday, September 10, 2012

First components on the IP Board

It's been a busy weekend, so I'm splitting this update into several posts.

One of my goals for the weekend was to figure out how to best use the hot air soldering station I borrowed from work. I've watched a number of videos on YouTube on the topic, and while many are fun to watch, none that I saw gave specifics for temperature or flow rates. Fortunately I have a collection of dead and useless boards chock full of SMDs to experiment with.

First I experimented with de-soldering a bunch of 0603 resistors and capacitors. I learned that they often stay stuck to the board when they're already loose and can be lifted off with no effort. That was a surprise. I played with the flow rates to see how high I could go before the loose 0603s would blow around sitting loose on the bench. I duplicated a SparkFun video where I unsoldered a medium sized TQFP with no mechanical damage. This was fun -- I gotta get one of these things!

Finally I experimented with re-soldering the loose 0603s using solder paste. I've done this before with wire solder, but using paste was a new experience. The stuff comes out of the syringe looking like a gelatinous ooze, speckled with little solder balls. I was expecting it to adhere to the board, like toothpaste, but this stuff (Chip-Quik SMD291AX) slithers around, floating on the flux in the suspension, making it hard to distribute evenly. This is not what I expected, given what I'd seen in the videos. My first attempts resulted in way too much solder in some areas, leaving bridges and causing components to shift badly. Eventually I tried squeezing out a blob of solder paste on a convenient surface, and using a pointy tool to transfer small quantities by dipping the tip in the paste and then dabbing it onto the pads. Keeping the hot air flow rate down avoided blowing the components away and they ended up soldered very nicely.

I got stainless-steel solder stencils with my boards, but I was unsure if I'd use them. I want to build up and test the board in sections, rather than laying everything down at once, which would make it difficult to use the stencils. I decided to start with the two 2-bit counters on the far right side of the board. This would also require the circuits that generate the INH and ~INH signals, which are also on the far right side. I put the board flat on my bench, carefully aligned the stencil, and clamped it in place. I squeezed some solder paste onto the stencil and used an old credit card to squeegee it over the holes. It looked good when I lifted the stencil, but through the microscope it looked like gelatinous blobs rather than the rectangular blocks I expected. Some of the blobs had spread enough to form larger blobs bridging two unconnected pads. I hoped this would fix itself when I heated them, but I have plenty of solder wick if not.

Placing the components was repetitious and rather tedious. Fortunately I only have four types of components to place: two transistors with different numbers of leads, and two values of resistors. There's also the occasional capacitor, but there aren't many of them. I think it took about an hour, but I finally I had them all positioned. I fired up the hot air station, took a deep breath, and started in the lower right corner of the board. I don't have an under-the-board pre-heater, so I began by holding the nozzle a bit higher above the board to warm that area, watching to see if any of the components showed signs of blowing off. After a few minutes I lowered the nozzle and watched in fascination as the misshapen blobs of ooze turned into shiny solder joints. I worked my way up the board, allowing the excess heat from the area I was soldering to warm the adjacent areas. Once started the process went surprisingly fast.

One big difference between using hot air and using a soldering iron: the iron heats small spots rather rapidly, while the hot air heats large areas of the board rather slowly. This is especially true for a multi-layer board with large copper planes. Don't try to pick up the board until it cools!

Once the board had cooled enough to handle, I gave it a thorough inspection. I found one joint that hadn't reflowed; a brief touch with an iron fixed that. I could see the solder-stealing effects the via-in-pad technique, where the via had wicked away some of the solder that otherwise would have been on the joint. With 12 mil vias the effect wasn't disastrous, but I could see where it might be a problem with production runs. And those blobs that had run together? They separated out nicely into separate joints with no bridges.

Here's a blurry photo of the result, after I added the large connector and plugged into the test jig:

The smaller connector on the left isn't soldered in place, so I may be able to use the stencil to prep the DRAM array when I get to it; I'm not sure yet. I'll try to take a better picture later this week.

Test jig hardware

It's been a busy weekend, so I'm splitting this update into several posts.

The test jig firmware does nothing unless there is a way for the signals to reach the i4004 board under test. Here's a picture of the unmodified Digilent prototyping board for the Spartan-3E Starter board I'm using:

There are a couple of problems with this board as I need to use it. First are the pre-installed sockets to the left and above-left of the prototyping area. These sockets are intended to be used with a solderless breadboard glued over the prototyping area. If you intend to solder parts to the board (or wire-wrap them, which is the "official" use for this board variant) the sockets block access to the breakouts of the expansion connector on the far left. So those had to go.

I thought it might be handy to have quick access to the CLKIN, CLKOUT, and CLKIO circuits, which I'm using to carry the CLK1, CLK2, and 50 MHz system clock outputs. My logic analyzer pod connects to the circuit being observed with fly-wires that have pin sockets on their ends, so it's handiest if these circuits have pin headers in them. The i4004 boards would block access to straight headers, so I used a right-angle header instead.

The next change is to add the connectors that will mate to the i4004 boards. These are a PC/104-type 64-pin (32x2) stack-through connector on the right, and a 40-pin (20x2) connector on the left. There are a several reasons for putting these connectors on opposite sides of the board. The first is that it provides some mechanical stability without the need of stand-offs. Secondly, I plan to use the smaller connector for the extra signals that have to pass from the Instruction Decoder to the Arithmetic and Logic Unit, and it looks like it'd be easier to route if they're away from the larger connector. Thirdly, and most importantly, it's easy to solder two rows of pins, but if the connector pins are long it's a royal pain to try to solder three or more rows, as the additional rows block access.

The last addition I've completed thus far is the TC4427A clock driver chip in the upper-middle of the board. Eventually it will be joined by series dropping resistors or MOSFETs for the other lines.

Sunday, September 9, 2012

Test jig firmware

It's been a busy weekend, so I'm splitting this update into several posts.

I spent part of this weekend working on the test jig. I reacquainted myself with Verilog (the FPGA coding language I use) and whipped up a behavioral implementation of a two-phase clock generator and the i4004 timing chain.

As I mentioned before, the Spartan-3E's DCMs and PLLs won't run slowly enough to generate the i4004's clocks. Instead, I put the input from the on-board 50 MHz crystal oscillator onto a global clock line and use that to clock a divide-by-100 counter. The two clock phases' start and end points are triggered at pre-determined points in this counter's cycle. This is made more complex since I intend to drive the boards through a Microchip TC4427A clock driver, which introduces about 75ns of delay. Rather than have the timing chain outputs arrive before the signal that supposedly generated them, I generate both internal and external flavors of CLK1 and CLK2, with the external flavor being advanced by 4 counter steps (80ns). The result is that the output of the TC4427A arrives at the inter-board connector shortly before the other timing signals.

The i4004 executes instructions in eight distinct phases. During phases A1, A2, and A3, the i4004 outputs the three nibbles (4-bits) of the 12-bit instruction address. These are followed by phases M1 and M2, during which the external ROMs containing the program supply the two nibbles of the 8-bit instruction code. The last three phases, X1, X2, and X3, are used to execute the instruction.

The i4004 generates the eight signals identifying the eight execution phases from the two-phase input clock through a shift register. Its operation seems ingenious to me: the first seven phase signals are NORed together to produce the input to the shift register. Thus it will clock in zeros until the first seven bits are all zero, and only then will a one bit be clocked in. Regardless of the shift register's initial state, and with no need for special initialization circuitry, it will self-initialize within eight clock cycles.

The only non-obviousness in my implementation is that I clock all the FPGA logic with the 50 MHz oscillator input, and the CLK1 and CLK2 signals are used as clock enables. Much like in the real i4004, CLK2 enables the clocking of new inputs and CLK1 enables the clocking of the new outputs, but my implementation uses (inferred) clocked D flip-flops rather than the pass-through latches of the original. This avoids a huge range of problems I encountered when I tried to implement the circuitry in the FPGA with two clocks and latches. Too much of the i4004 depends on the inertia in floating input signals for a direct implementation.

By the time I was done I had all the appropriate timing control outputs in place. This I first simulated, then loaded it into the real Spartan-3E for a look-see with my logic analyzer. Both results match what the i4004 simulator produces, which gives me a sense of confidence.

In order to fully test the Instruction Pointer board I'll need to properly emulate a number of Instruction Decode and ALU board outputs, but for initial testing it should be sufficient to have the correct static values asserted on these lines.

Friday, September 7, 2012

It's HERE! It's HERE!

My local FedEx depot isn't far from where I'm working, so I went over after work last night to pick up my package. As with the last order I placed with PCB-Pool, the boards were shrink-wrapped to a stiffening card and packaged in a very sturdy cardboard box. This avoids problems with damage enroute. I'd also ordered the stainless steel solder paste stencils and they were taped securely to the stiffening card.

To avoid having to rush the production of a replacement board if one turns up defective, a board house may choose to make extra copies of a board. Normally these are discarded, but one of the PCB-Pool options is to agree to buy these extra boards at half price. I'm sure that keeps their overhead costs down and is a benefit if you can make use of extra boards.

In my case, this is the first time I've attempted to build something so dense, or with this many components. It's also the first time I've had hot air soldering equipment available, and there's a reasonable chance I could damage the board. I decided to agree to buy one extra board, if they made an extra. They did, and both passed the electrical test, so I have two boards.

Whenever I make a PCB, I worry that I'm going to get the holes too small or pads the wrong size, and I'll end up with an expensive piece of trash. The evening I placed the order I found that the holes for the connector pins specified in the Eagle library would be a very tight fit. I understand this would be good for a production run where you don't want the connector to rattle around and get out of alignment, but not so good for a hobbyist. To avoid a potential problem I made the holes bigger. I know from experience that this connector will fit nicely in an 0.042" hole so that's what I specified, but since PCB-Pool uses metric drills the holes are probably even bigger than that. Practically the first thing I did after ooohing and aaaahing over the boards was to fit the connectors into the holes. A loose fit, but far better than the alternative. Maybe I'll tighten it up a bit on the next boards. Or not.

My next concern was the shape, size, and positioning of the components on the top of the board. Here's a photo taken through my microscope, showing two SOT-23s pretending to be FDV301s, a real BSS83, and an 0603 resistor sitting on the board. They're all slightly out of alignment -- I must have bumped the board while I was fiddling with the camera -- but it looks like everything fits right. I still think there will be room for me to get my soldering iron between the components, but I don't think I'd want to make things more dense.

In case you're wondering, those are 8 mil (~0.2mm) tracks with 12 mil (~0.3mm) vias. You'll also note that I've placed the vias within the component pads, rather than separate. This allows for a denser layout, but I understand it can cause problems with solder starvation when doing production reflow soldering. Since my problem is usually too much solder, that doesn't worry me. The next couple of days will show whether it should.

If you click on the picture to enlarge it, you may be able to see the divot in each pad. That's the mark made by the electrical test machine, as it probes each pad to verify electrical connectivity.

So the boards look great. I have all the parts, and they appear to fit properly. The only question now is what the hell was I thinking? There are 585 parts on this frelling board, and three and a half more equally-dense boards yet to come. It'll take weeks just to get all the parts soldered down, longer if you count the testing I need to do along the way. I'll be at this for a year.

It's insanity.

Hence the name of this blog.

Thursday, September 6, 2012

PCB Fab, Teil drei

My phone is set to speak the name of an incoming caller, or the number if it's not in my phonebook. When it announced an incoming call starting with 353, I was a bit confused because I couldn't remember ever hearing of that area code, and when the number of digits in the number exceeded 10 I was really confused. As it turns out, there isn't an area code 353 in the USA; it's the country code for Ireland.

The call was from a pleasant young woman from Beta Layout, the folk behind PCB-Pool, calling to let me know that they had indeed shipped my board Wednesday, and that I should expect to receive it Friday. This made me happy on several levels: it's great customer service, especially when you consider that I'm a hobbyist and unlikely to place an order for 10,000 boards anytime soon; it means that I'll be able to spend far too much time playing mad scientist in my basement this weekend; and like many Americans, I find hearing from a woman with a pleasingly foreign accent to be a day-brightener.

Lastly, her call reminded me of the not quite a week I spent in Ireland in 1996. The company I was working for at the time had a customer service office in Shannon, and when I had to visit Portugal on other business I made a case for visiting Shannon to give them some training and a chance to interact directly with a product developer. They, of course, insisted on taking me out to the local pub in the evenings for a pint or two. When I explained that Guinness wasn't to my tastes, they insisted that the problem was that it didn't travel well and it was far better nearer the source. I gave it a fair trial, and they celebrated my open-mindedness even if I still couldn't stomach the stuff. It was a great experience and I plan to visit the country again.

Tracking the package containing my board through the FedEx website shows that since yesterday afternoon it's taken a scenic tour of Shannon, Cork, Paris, New Jersey, and Virginia. And since I wasn't expecting it until tomorrow I didn't leave a note waiving the signature requirement, and the delivery attempt they just made failed because I'm at work. *sigh*

Wednesday, September 5, 2012

Voltage protection, not conversion

While the series resistors would probably serve the function of limiting the input voltage to the FPGA to a non-damaging level, I'm really not happy with that solution. If the i4004 boards are powered before the FPGA, the resistors need to drop about 4 volts instead of 1.7, and that would drive the protection diodes pretty hard. The back-flow of current into the Vcco supply could disrupt the required power sequencing, and all hell could break loose. I really don't want to damage the Spartan-3E reference board.

While re-reading Xilinx Application Note XAPP459, I noted that they mention the use of "FET switches" for level conversion. I've done this with I²C buses, but had dismissed that option here for some reason. The obvious thing was to try it on the breadboard!

My test setup has the input of the FET driven by my trusty PIC, the FET's gate hard-wired to 3.3V, and the output connected to an FDV301 set up as an inverter. First I tried using a BSS83 with the substrate tied to ground. That didn't work too well, with a crisp 3.3V input signal showing up on the output as a somewhat ragged 1.6V signal. That pretty much vetoes running the boards at 3.3V as well.

Swapping in an FDV301, though, worked much better. Driving the drain with a signal between 4.5V and 6.0V results in a nice flat 3.0V output at the source with good rise and fall times and very little propagation lag. As the input drive drops the output level tracks down with about a 0.6V difference. Still, the FPGA considers anything above 2.0V as a valid "high" which leaves plenty of margin. This doesn't appear to adversely affect the input signal. Flipping things around to drive the source to 3.3V also gives a nice output at the drain with less than a half-volt loss, even when driving a 330pF load. If the i4004 circuits allow it (in other words, if they don't depend on the bus capacitance to store a level) I could add a pull-up resistor to 5V on the high side and get full-swing outputs.

What happens when the FPGA is powered down while the i4004 boards are powered up? Since the FET's gate is connected to the FPGA's +3.3V supply, the FET should stay turned off with no back-flow into the FPGA or supply. If the FPGA is powered up while the i4004 boards are powered down they'll see 3.3V but they can handle that: the FDV301 is the worst case with a Vgs of 8.0V maximum.

The joke's on me

When I was just starting layout I asked my friend Peter whether he thought I could consider using 0805 resistors, or if I should play it safe and use the larger 1206 size. During a rambling, friendly conversation he opined that 0805s would be fine, but he found hand-soldering 0603s wasn't really that difficult. He did advise me to stay away from the 0402s, not that I'd ever considered them.

As I worked on the initial DRAM array layout I hadn't yet decided to stick the vias in the corner of the pads. I was having trouble finding places for the vias and the 0805 capacitors seemed to take up a lot of room. Taking Peter's comments to heart I tried the layout with 0603s, and it made things fit much more nicely. Just to reassure myself that I wasn't being stupid, I dug out a dead board a client had given me, and convinced myself that this would work.

My long-delayed Digi-Key order finally arrived this afternoon. Along with the solder paste, it contained the passive components for this project. Eager to see what I'd gotten myself into, I took a roll of resistors, peeled back the plastic cover tape, and tried to remove the device with my fine tweezers. And found the pocket was too narrow. I turned the tape over to drop it onto a piece of paper and promptly lost sight of the thing. Eventually I noticed a rectangular dark spot and carefully centered it under my microscope.

Here's a photo I managed to shoot of the bugger through the microscope. This is an image I really suggest clicking on to enlarge. Those faint blue lines are the printed divisions on a sheet of graph paper. They're 0.1 inch apart. And you could put three of these things side-by-side in that square.

I guess I knew intellectually that 0603 meant that it's 0.06" by 0.03 inches. I'd even looked at some on a PCB, but handling them loose is a new experience. I feel like I need to wear a face mask lest I sneeze and blow it away (or worse, inhale it!).

When I was looking for a cheaper alternative to the BSS83, I considered the Diodes Inc. DMN26D0UT. But at 1.6mm x 0.8mm, I thought it was too small for easy handling. This thing is even smaller (1.52mm x 0.76mm), if only by a hair. The BSS83 and FDV301 transistors I settled on are twice this size.

Did I mention that Peter is an electrical engineer? He's one of the designers I worked with at a previous job. When he says these things aren't hard to work with, he's talking about something he does for a living, not as a hobby.

Still, all is not lost. The DMN26 has three tiny leads protruding from the sides. Each end of this thing is a lead, making them much easier to position and solder. There are only 87 resistors on this board, compared to 418 transistors. By putting a tiny dab of solder paste on each pad and then putting the device onto those, I should be able to hold the thing with one hand while I wield the soldering iron with the other. And they only cost a fraction of a cent each, so if I trash a few learning how to manage them, it's not a concern.

Most of all, it's a learning experience...

Tuesday, September 4, 2012

Delivery updates

My Digi-Key order finally reached my local post office and is "out for delivery" now. Also arriving today, a day early, is the prototyping board I ordered from Digilent.

I spoke with the folk at PCB-Pool about the delivery of my board. The board is very close to completion, and they think they can ship it tomorrow rather than Thursday, which should get it to me Friday. While not critical in any sense of the word, having it for the weekend rather than waiting until the following week would make me happy.

They also corrected my misunderstanding regarding the site making my board. The primary CAM work is done in Ireland, but boards are fabricated at shops in Germany and South Africa. They recently bought another shop in California, but that isn't running yet. My board is being fabbed in Germany, near Wiesbaden.

As if this isn't enough of a sign of the increasing globalization of commerce, the woman who called to discuss delivery had a pronounced Irish accent. It took a couple of seconds for my brain to adapt to the cadence of her speech, as I was expecting a northern California dialect based on the caller-id on my phone. I wonder if she was really in their Vacaville office, or if the call was routed through there to avoid the international calling rates?

Monday, September 3, 2012

Test jig preparation

One of the issues with using the Spartan-3E to drive the test jig is voltage: my i4004 boards are designed to run on +5.0V, while the FPGA uses +3.3V and is not 5V tolerant. The Spartan-3E's maximum input voltage is a hair over 4V, beyond which the circuits may suffer damage.

PCB Fabrication, part deux

Today is a national holiday in the USA, but in Europe it's just another work day. PCB-Pool has continued the fabrication of my PCB, and I have two new sets of pictures to share.

The first set shows the board after the application of the green solder mask and the white silkscreening. These days the printing is applied by something resembling an ink-jet printer instead of a silkscreen, but the name lives on.

Top

Bottom

This step is labeled as "UV Curing", which I guess refers to the use of Ultraviolet light to harden the coatings. You'll note there is no silkscreening on the bottom of the board; I've only placed two components there -- a pair of bypass capacitors directly under the push-pull row select drivers -- and I didn't seen the benefit.

One of the things I wanted to do with the silkscreening is make it easy to identify the groups of components that make individual bits in the DRAM array. If you look closely at the top of the board you'll see dashed lines running horizontally between the rows of the array. What you won't see are the dashed lines running vertically between the columns of the array; they were too close to the component pads, and only little bits of the lines survived the clipping process. If I'd made them solid instead of dashed more of them would have survived. Oh well, something to fix on the next board -- this is intended as a learning process.

At this point the pads where the surface-mount components will be mounted are still bare copper, which would rapidly begin to oxidize and become difficult to solder. To prevent that, the exposed copper areas get a thin coating of tin.

Top

Bottom

Hey, it looks like a real PC board, doesn't it? In fact, the only thing left is cutting the board from the panel. Somewhere along the line they have, or will, perform an electrical test to make sure everything is properly connected. With a multi-layer board, this is an important step.

I'm really hoping to get the board in time to work on it this weekend. The original estimated shipping date is Thursday, but given that the board looks to be complete I'm hoping they'll ship it a day or two early.

Sunday, September 2, 2012

Creating a test jig

Up 'till now I've done my breadboard testing using a Microchip PIC ® microprocessor to generate the signals. This works well for simple patterns, but even clocking the 16F876 at its maximum clock rate of 16 MHz I can just barely make documented i4004 timing. There are faster PICs, but there's a better solution.

One of my past acquisitions is this nifty little Xilinx Spartan-3E FPGA reference board from Digilent:

The Spartan-3E is more than capable of keeping up with the clock rates I need. In fact, one of my lesser problems is that the on-chip PLLs and digital clock managers can't generate clocks slow enough to drive the i4004 CPU, and I've had to resort to dividing the on-board 50 MHz clock using fabric logic.

Obviously there's no space on the reference board for user-added circuitry. The connector visible on the right edge of the picture is for attaching expansion boards, but the connector pitch is not conducive to connecting to a standard perf board. Fortunately, Digilent sells an expansion board for just this purpose:

There is a 32 x 65 hole prototyping area in the middle of the board. An astute reader of this blog will note that the larger connector on the my PCB has 32 x 2 pins on a 0.1" spacing, and the outer rows of pins of the two connectors are 6.3" apart. Sockets matching the IP board's connectors could be soldered to this board with plenty of room for level-shifting circuitry, allowing the FPGA to take the place of the rest of the i4004 circuitry. Quite a coincidence, eh? :-D

The only problem with this scheme is that, although the expansion connector has 100 pins, almost half are grounds. Only 43 signal pins come across, and five of those are input-only. There are 71 inter-board signals, so clearly I can't connect every inter-board signal to an FPGA pin. That's just enough to substitute for the Timing and I/O board, which is the last board I plan to build. But I'll start with a setup that allows me to drive the Instruction Pointer and Scratch Pad boards.

PCB Fabrication

PCB-Pool has a fun feature called WATCH"ur"PCB® in which they email you photos of your board at various stages of fabrication. They claim it's to let you make sure the board is being prepared properly, but I think it's just eye candy for hobbyists. Of course I signed up for it!

Any of the pictures can be enlarged by clicking on it.

The first peek at your board-to-be is a PDF with four images of the board. These depict the top mask and silkscreen, the bottom mask and silkscreen, the top copper layer, and the bottom copper layer:

The images in this picture are shown looking through the board from the top side. This is the same view presented by Eagle during board layout. The rest of the pictures in this post are photographs of the actual board in production, so naturally the bottom is viewed from the bottom.

For some reason they don't give you images of the inner layers. This is somewhat annoying, especially since this is my first 4-layer board and I don't happen to have ready access to an X-ray machine to visualize them after the board arrives. Since I've placed a lot of vias in the corners of SMD pads I want thermals for inner layers, and I'm just going to have to trust that I selected the proper option (Eagle shows that I did, but it'd be nice to see it from PCB-Pool too).

The next step is drilling. There are only three sizes of holes on this board: four large mounting holes in the corners, 104 mid-sized holes for the connector pins, and a gazillion teeny-tiny vias.

Top

Bottom

After drilling comes the application of the photo resist. If I understand the process correctly, a photosensitive laminate is applied to the board, exposed and cured. This leaves the areas to be protected from etching exposed.

Top


Bottom

Tin is then applied to the exposed areas to protect from the copper from the etchant. After etching the tin is removed, leaving bare copper.

Top

Bottom

My understanding is that the round dots surrounding my layout are placed in areas on the panel that will be discarded to balance the chemical copper etching process. In any case, they roughly outline my board, but are not part of it.

That's the last of the photos I've received to date. The board must still be coated with a solder mask, silkscreened, and protected from oxidation by the application of chemical tin. This board is just too densely packed to allow for each component's name to be screened on, so I chose to have only component outlines and circuit blocks printed.