Like the mythical Phoenix that rises from its own ashes, this project has taken wing.
While writing the last post I started to say, "The possible causes are too numerous to itemize." Once posted, I realized that the problem was so repeatable that the problem was likely in the soft implementation of the 4001 ROM, which I'd hacked together rather quickly. I really intended to leave it like that, but my brain wouldn't stop thinking about it.
One of the hacks in the soft 4001 code was to only assert the external data bus during the portions of the M12 and M22 phases while CLK2 was asserted, when I knew the soft 4004 was sampling the internal data bus. That's a far narrower period than a real 4001 would drive. After some research I concluded that I could safely expand it to the entire M12 and M22 phases, which is what I believe a real 4001 would do. That worked OK in simulation (i.e. no invalid states with multiple drivers), so I had to try it with the real IP board.
And it worked! My logic analyzer has a limited buffer and I was only able to capture the first 260us after POC de-assertion. That's a bit more than 16 machine cycles, sampled on the rising edge of the 20ns sysclk. During that period it executed two NOPs, a JUN, three consecutive JMS followed by three consecutive BBL (nested subroutine calls and returns, which test the EA counter and all four rows of the IP array), a LDM (load accumulator), and a JCN (jump condition). That tests most of the functions of the board, and I think that's pretty good!
Of course, this really should work, as the underlying design was in production for 15 years (1971-1986). But this also validates the work I put in selecting components, laying out the board, and placing and soldering the 500-some components on the board. Not to mention over a thousand lines of Verilog HDL. Not too shabby for a software engineer playing at electronics engineering.
Note that I haven't added any of the extra charge-storage capacitors I laid out on the board; this is purely using the MOSFET Gate capacitances and whatever small capacitance the PCB traces add for value storage. Apparently that's sufficient with a 1.36us cycle time. At some point I'll do some margin testing with the clock rate and see just how fast and slow this will run.