Friday, September 7, 2018

Every Now, And Again (why my CPU isn't working)

It can run, but it can't hide (or jump, or fetch)


If my FPGA-based 6502 can "free run" (described in an earlier post), why can't it run even a simple program?

Short answer

My theory is currently this - from the time the CPU indicates what memory address it wants to access, and when the result is actually available to the CPU, there is a delay (latency). That delay is messing up memory accesses, and confusing the CPU.

Long answer

OK, you asked for it. Let's start with synchronous versus combinational circuits.

Every now, and again (quoted from Ben Bailey, with apologies...)


Some digital circuit components and circuits do what they do instantly. As an example, let's take the venerable AND gate. The 2 input AND gate has, as the name implies, two inputs, and one output. When both of the inputs are ON, the output of the AND gate is ON. When one or both of the inputs are OFF, the output of the AND gate is OFF.

Now, how often does the AND gate check its inputs? Well, now. And now. And every now. Basically, the AND gate, always vigilant, watches its inputs and provides the appropriate output instantaneously. What a trooper.

(Some of you who know physics and stuff might be growling at me now about things like ramp time, propagation delay and other practical phenomenon that prevent an AND gate from being truly instantaneous. Disclaimer: I am describing an idealized, abstracted, super-relativistic AND gate...)

If you wire the output of one AND gate into an input of another gate (like an OR gate), the output of the AND gate also changes the state of the OR gate instantly(-ish). Circuits wired like that are called combinational, or "time independent".

To the beat of the drum


As you either know or have guessed, not all digital circuits work like this. A quote that has been (perhaps mistakenly) attributed to Albert Einstein is that "Time is what keeps everything from happening all at once." In synchronous digital circuits, a clock is what keeps everything from happening all at once. More to the point, a clock is how various operations of the circuit maintain synchronization with each other. You could say the clock is like the drum major in a marching band, keeping everyone stepping in sync.

An example of a simple synchronous circuit is a D-style flip flop. In this circuit, there is an input (D), an output (Q), and a clock. (There can be some more I/Os, but ignore those for now...) Whenever the clock "ticks", the value currently at the input of D is copied to Q, and remembered there. D could change a hundred times between clock ticks, and Q wouldn't change - Q only remembers what the value of D was at the last clock tick. It's like a little one-bit storage location.

Just like a combinational circuit with multiple connected gates, synchronous components like the flip flop can be daisy-chained into more complex circuits. Therefore, some operations performed in a synchronous circuit can take more than one clock cycle to propagate. For example, the output of flip-flop A may be connected to the input of flip-flop B, and B to C, and so on. By the time you get to flip-flop Z, a change in A will have taken nearly 30 clock cycles to reach flip-flop Z.

RAM memory access can be like this. During one clock cycle, you might place the address of the memory chunk (e.g., byte) that you want to read on the address lines. Later, in a subsequent clock cycle, the data at that address is available to be read on the data lines. How many clock cycles before the data is available? This depends on the design. It can be anywhere from one clock cycle to many.

In the case of the Block RAM (BRAM) on my FPGA, the latency is 2 clock cycles. But the 6502 CPU core I'm using expects the data to be ready in 1 clock cycle. Therefore, the CPU is getting bogus data, and things are going awry.

CPU, hold your horses


My guess is that this RAM latency issue is truly the culprit. The CPU free-runs OK, which shows that it's operating when it has valid values on the data lines.

But assuming I'm right, how do I fix it?

I'm not sure about the best way to handle this. In my Google searches, I've found discussions of:

  • Using mixed clock "domains" (where the RAM is clocked at a higher, fixed multiple of the CPU clock so it gets data to the CPU on time),
  • Using an input line on the 6502 called "RDY", which pauses the CPU until memory is ready, and,
  • Other stuff that I don't quite understand.

One of the interesting aspects of this project is that I'm flailing in the deep end of the pool. I've found that this is often where I learn things.

No comments: