A few months ago I designed, coded, and started testing a Verilog module to emulate the EP102 printer found in the Busicom 141-PF calculator. I wrote a bit about it in the post EP-102 to M-32TL Glue logic
The EP102 is a drum printer capable of printing one of 13 characters in each of 17 columns. When a row of characters is properly aligned for printing, the printer sends a "sector" signal to the computer. To allow the printer to identify which row is aligned for printing, a second "index" signal is sent when the first row is aligned. To print a given character, the computer waits until it sees the sector signal indicating that character is aligned for printing, then sends a signal to the printer that causes a mechanical hammer to press the paper against the drum. It thus takes one full rotation of the drum to print all the columns of that line.
My first task was to code a module that generates the print drum index and sector signals for the calculator, while tracking the active sector number. Since the sector number ranges from 0 to 12, it's represented by a 4-bit value.
My next task was to code module representing a single print column. It has a 4-bit register that captures the active print drum sector number of the when that column's hammer is fired. Another 1-bit register recorded whether that column had been printed or was blank. Since there are 17 printable columns, this module is instantiated 17 times (actually 18, as I added a "column" to indicate whether to print in black or red).
The next issue is that the EP102 printer's drum has the characters in a different order than the M-32TL printer in the Canon P170-DH calculator. To fix this, the sector number is translated from EP102 codes to M-32TL codes when it's clocked into the register.
To get the data from the EP102 emulation to the M-32TL driver, I connected the column modules so they formed a 5-bit wide shift register. When the paper feed signal is received, the contents are shifted one column at a time into a FIFO from which the M-32TL printer driver reads.
As I moved from experimentation to implementation, I realized that I really wanted to send EP102 sector codes through the FIFO rather than M-32TL codes. This meant moving the translation from the EP102 emulator to the M-32TL driver, and that led me to rethink my strategy.
I also realized that my original implementation of the EP102 emulation was going to be a bit of a resource hog. Each of the column modules required 6 LUTs and 5 flip-flops, and there were 18 of these modules. With the other required logic this totaled 117 LUTs and 94 FFs, occupying 42 slices. Okay, so 3% of the 1,430 available slices makes it a piglet rather than a hog, but this just seemed to cry out for an implementation using a LUTRAM rather than individual FFs.
The reason I'd implemented the first version as a shift register was that the EP102 emulator is essentially a parallel-to-serial converter: the firing of the hammers is a parallel action, and the transfer of the data from the column registers to the FIFO is serial. A LUTRAM can only be written one location at a time.
The key to solving this is speed. The hammer fire signals are intended to drive electro-mechanical solenoids and are active for 5ms. Compared to the 50ns period of the 20MHz system clock, that's an eternity. It would take 17 clock cycles to scan the 17 hammer fire signals, leaving 99,983 cycles before the signals go inactive. If I was worried about power consumption I might add logic to scan the hammer signals only once when they go active, but to simplify things I allow it scan the 17 hammer lines continuously. Yes, that means an active column gets scanned 5,882 times in a 5ms period, but who is counting?
The new implementation uses only 7 slices instead of 42: 18 LUTs as logic, 4 LUTs as RAM, and 8 flip-flops. It does, however, make the design dependent on the FPGA having small hardware RAMs available. These are common in Xilinx FPGAs, available in some Lattice FPGAs (MachXO2 and XP2, for example, but not the iCE40), but are not available in Intel (Altera) FPGAs.
Out of curiosity, I disabled the use of LUTRAM by adding a ram_extract="no" constraint to the RAM array definition. This resulted in greater resource requirements than my shift register implementation: 120 LUTs and 81 FFs, occupying 48 slices. This is actually worse than it seems because my shift register implementation was 5 bits wide rather than the 4 bits of the RAM implementation. Since I've already committed to using Xilinx FPGAs for this project I'm going to go forward with the RAM-based implementation, but I'll probably keep the shift register implementation around for reference.
No comments:
Post a Comment