Sunday, September 30, 2018

Where do we go from here

Initial Goal Reached - Stretch Goals TBD

My goal for the Retrochallenge 2018/09 was to get a retro CPU core running on my FPGA board.

Mission accomplished - I used:

  • The well-respected open-source Arlet 6502 core,
  • Verilog code to create 64K of RAM with pre-loaded contents,
  • Some other Verilog glue code to put those pieces together, and jump-start the 6502 with a reset pulse.

The pre-loaded contents, specified by a simple HEX file, made the 6502 perform a few easy operations (like load a direct value into the accumulator), then jump to an endless loop padded with NOPs. On start-up, the 6502 jumps to the address specified at $FFFC and $FFFD. I placed the address $0400 at those RAM locations, and the program itself at $0400.

Since I had no I/O, it was a challenge to validate the program was actually running. I was able to do this by watching the contents of the PC (program counter) using the built-in debugging tools of Xilinx Vivado, the development software used to program my FPGA.

Initial challenges, some of which were documented in my earlier blog entries, included the RAM not working right and my HEX file not being properly formatted. I did make it through those challenges, borrowing from work documented on the C64 on an FPGA blog.

However, my stretch goals were to expand that into something that looked like a retro computer. I didn't get very far down that road.

The 6502 chip uses memory-mapped I/O. In order to get data in or out of the 6502, it would be necessary for me to modify the RAM interface so that certain addresses would, instead of storing/loading to actual RAM, would transfer data to/from some other circuitry instead. I did get this working just a little bit - I created a Verilog register and connected it to the 4 LEDs on the Zybo board. Then, I modified the RAM interface so that, if location hex E000 was written to, the output (lower 4 bits) would appear on the LEDs. This was actually pretty awesome, and worked well. But it's not very much I/O :-)

Where From Here

Learning digital design by doing has a lot of challenges, but you can learn lots of great stuff. One thing I learned is how FIFO (first-in, first-out) blocks can be useful in an FPGA design. You write info into the goes-in side of the FIFO, and another circuit reads that value in the goes-out side. The FIFO will tell you when it's empty (nothing to read) and full (when the data that's been written has filled the available storage of the FIFO). The goes-in and goes-out sides can use different clocks, meaning that the FIFO can help when you cross clock domains. This is important. I'm still learning more about what this means, but it's super helpful when two disparate digital circuits need to give data to each other.

So, for I/O, what I want to do is have two FIFOs, one for 6502 input, the other for output. If you write a byte to whatever I pick for the output memory address on the 6502, it will push that byte to the output FIFO. If you read a corresponding input address on the 6502, it will read the byte from the input FIFO. In both cases, the full or empty indicators of the FIFO would be checked as appropriate.

On the other end of those FIFOs, I'd like to put a RS232 UART. This will enable the 6502 to send and receive via a standard serial connection. Bang, Bob's your uncle, I/O. It will take some Verilog glue logic to hook this all up, but when done, I don't see any reason it wouldn't work.

Even though Retrochallenge is over, I'm going to keep going with this. Now that I have a working 6502, I'm hungry for more.

Thanks for reading! Stay Tuned!

Saturday, September 8, 2018

That time when I didn't smash my FPGA board into small pieces

As the old sports intro goes, there is the thrill of victory, and the agony of defeat.

Trying to learn how FPGAs work is obviously feasible, and many have succeeded in doing so, even without being a trained EE. However, I'm finding that it's not for the faint of heart. In my last blog post, I mentioned that I like to learn things in the deep end of the pool. That end of the pool can be exhausting, and you might splash a lot of water without making much forward progress.

But as things go, just when you think you're sinking, some kind, sharing soul on the Internet throws you a life preserver.

My goal for the current Retrochallenge is to get a retro CPU core working on my FPGA board, and have it run a simple program. Since that's been done before many times, and most of the work has already been open-sourced, it shouldn't be a heavy lift. But each FPGA board and development toolchain is different. There are breaking changes between versions of the FPGA vendor's tools. And, even more so than in the software development world, documentation can be terse or non-existent.

Some details - I'm trying to get the following to work:
  • A Zybo development board (original, revision B) with a Zynq-7000 XC7Z010 chip,
  • The Vivado HLx toolset, used to create solutions for Xilinx FPGAs,
  • A 6502 core, written in Verilog by Arlet Ottens, and
  • A simple 6502 program written on the above, proving that the 6502 core is functional

And it turns out, that in desperate Google searches, I found a project that does all of the above, and more.

From the work done on the "C64 on an FPGA" blog by Johan Steenkamp, I should be able to learn the necessary steps to get a 6502 running on my Zybo. The author not only does the work, but graciously teaches you how it works.

It's kind of cheating, but my goal is to learn, so I can build upon what I've learned. Good artists copy, great artists steal.

More to come...

Friday, September 7, 2018

Every Now, And Again (why my CPU isn't working)

It can run, but it can't hide (or jump, or fetch)

If my FPGA-based 6502 can "free run" (described in an earlier post), why can't it run even a simple program?

Short answer

My theory is currently this - from the time the CPU indicates what memory address it wants to access, and when the result is actually available to the CPU, there is a delay (latency). That delay is messing up memory accesses, and confusing the CPU.

Long answer

OK, you asked for it. Let's start with synchronous versus combinational circuits.

Every now, and again (quoted from Ben Bailey, with apologies...)

Some digital circuit components and circuits do what they do instantly. As an example, let's take the venerable AND gate. The 2 input AND gate has, as the name implies, two inputs, and one output. When both of the inputs are ON, the output of the AND gate is ON. When one or both of the inputs are OFF, the output of the AND gate is OFF.

Now, how often does the AND gate check its inputs? Well, now. And now. And every now. Basically, the AND gate, always vigilant, watches its inputs and provides the appropriate output instantaneously. What a trooper.

(Some of you who know physics and stuff might be growling at me now about things like ramp time, propagation delay and other practical phenomenon that prevent an AND gate from being truly instantaneous. Disclaimer: I am describing an idealized, abstracted, super-relativistic AND gate...)

If you wire the output of one AND gate into an input of another gate (like an OR gate), the output of the AND gate also changes the state of the OR gate instantly(-ish). Circuits wired like that are called combinational, or "time independent".

To the beat of the drum

As you either know or have guessed, not all digital circuits work like this. A quote that has been (perhaps mistakenly) attributed to Albert Einstein is that "Time is what keeps everything from happening all at once." In synchronous digital circuits, a clock is what keeps everything from happening all at once. More to the point, a clock is how various operations of the circuit maintain synchronization with each other. You could say the clock is like the drum major in a marching band, keeping everyone stepping in sync.

An example of a simple synchronous circuit is a D-style flip flop. In this circuit, there is an input (D), an output (Q), and a clock. (There can be some more I/Os, but ignore those for now...) Whenever the clock "ticks", the value currently at the input of D is copied to Q, and remembered there. D could change a hundred times between clock ticks, and Q wouldn't change - Q only remembers what the value of D was at the last clock tick. It's like a little one-bit storage location.

Just like a combinational circuit with multiple connected gates, synchronous components like the flip flop can be daisy-chained into more complex circuits. Therefore, some operations performed in a synchronous circuit can take more than one clock cycle to propagate. For example, the output of flip-flop A may be connected to the input of flip-flop B, and B to C, and so on. By the time you get to flip-flop Z, a change in A will have taken nearly 30 clock cycles to reach flip-flop Z.

RAM memory access can be like this. During one clock cycle, you might place the address of the memory chunk (e.g., byte) that you want to read on the address lines. Later, in a subsequent clock cycle, the data at that address is available to be read on the data lines. How many clock cycles before the data is available? This depends on the design. It can be anywhere from one clock cycle to many.

In the case of the Block RAM (BRAM) on my FPGA, the latency is 2 clock cycles. But the 6502 CPU core I'm using expects the data to be ready in 1 clock cycle. Therefore, the CPU is getting bogus data, and things are going awry.

CPU, hold your horses

My guess is that this RAM latency issue is truly the culprit. The CPU free-runs OK, which shows that it's operating when it has valid values on the data lines.

But assuming I'm right, how do I fix it?

I'm not sure about the best way to handle this. In my Google searches, I've found discussions of:

  • Using mixed clock "domains" (where the RAM is clocked at a higher, fixed multiple of the CPU clock so it gets data to the CPU on time),
  • Using an input line on the 6502 called "RDY", which pauses the CPU until memory is ready, and,
  • Other stuff that I don't quite understand.

One of the interesting aspects of this project is that I'm flailing in the deep end of the pool. I've found that this is often where I learn things.

Monday, September 3, 2018

RC 2018/09 Part 01 - Run Free Little CPU, Run Free

My first test with a couple of different 6502 CPU cores on the Zybo FPGA board was a partial failure, but that's OK. I'm farther along than I was before.

What Is Goal One

My first goal is to get a retro CPU core on the FPGA board to do something called a "Free Run" - an activity documented on a real chip in this blog post.

Here's the concept - instead of connecting actual RAM or ROM to the CPU, you instead hard-wire the data lines so that the CPU sees the same value at every memory address. In the case of a free run, that value is a CPU instruction called a "NOP", for No Operation. This instruction tells the CPU to do nothing, move to the next memory location, and execute the instruction there.

Since the data lines are hard-wired with the value of NOP, the CPU will see NOPs everywhere. So, it will start at some location (more on that in a future post), move to the next location, and so on, until it reaches the end of its memory space. It will then roll around to to bottom of memory (location zero), and keep going around again, forever.

Why do this? Well, even though the CPU address lines aren't actually changing the data that the CPU is seeing, the address lines still happily increment. You can watch them do this, and seeing them bounce up and down shows the CPU is running.

And for some reason, mine isn't.

What's What

Here's a bit more about the FPGA chip and board that I'm using.

The FPGA I'm using is a Zynq-7000, model XC7Z010. The Zynq family is a really interesting series of chips. They combine, on the same silicon, an ARM processor (with varying speeds and numbers of cores), interfaces for a variety of physical devices, some general purpose I/O pins, and most importantly, FPGA functionality ("fabric"). For this project, I'm not using the ARM at all, and will only use a small bit of the I/O capabilities.

The software used to program this chip is called Xilinx Vivado HLx. If you're used to lightweight tool chains in the software development world, then the FPGA world will surprise you. Vivado requires 20+ GB on your hard drive, and the install process takes a long time. For this project, I downloaded and installed the latest, version 2018.2.

What Works

In the past, I've tried a couple of times to get a retro CPU core in Verilog or VHDL to simply compile under Vivado, and failed. But this time, I got a couple of different cores working, and even set up enough of a project to generate a bitstream and download it to the board. It didn't work, but this is still progress.

What Doesn't Work

I'm using a simple Verilog 6502 CPU core. As far as I can tell, the inputs are wired such that the CPU should run, and I also supplied a clock signal to the CPU to make it go. But the address lines aren't bouncing up and down - they are staying off. So, I've got some more work to do.

What's Next

When creating FPGA designs, Real Hardware Engineers (which I am, most decidedly, not) write test benches and do simulations, using various tools that permit a digital design to be executed in software and debugged. Simulation and test benches are really important, and they're something I really haven't studied enough. My method (let's throw it on the board and see what sticks) is that of a hobbyist, and it's OK, but it might not get me through this project. So, digging further into the simulation and test bench abilities of Vivado may be my next step.

More to come...

Sunday, September 2, 2018

RC2018/09 Part 00

Retrochallenge 2018/09 is off to the races! Here's a description of my devious plans...

The What

My goal for the 2018/09 Retrochallenge event is to get a retro CPU core running on my Zybo FPGA development board.

The end result doesn't have to do could just be a 6502 blinking an LED, or a Z80 running a short program and depositing a value into a register. It would be lovely to wind up with a retro computer implementation like a virtual PET or CP/M machine, but that would be a stretch goal!

The Why

Years before I wrote my first computer program, I loved tinkering with electronics. My parents started me out with a Radio Shack crystal radio when I was 5, then a few years later bought me a Science Fair 65 in 1 kit. That kit was awesome - even Rod Serling thought so. I learned how transistors, capacitors, resistors, etc. worked, and how they combined to form useful stuff like radios, sirens, and other fun gadgets. If I wasn't a geek already, this sealed the deal.

A few years after that, I stumbled upon solid-state digital circuits, and my journey down the rabbit hole was complete. After successfully nagging my dad (on multiple occasions) to buy me a bunch of stuff at the local discount electronics shop, I would spend hours wiring up my solderless breadboard with 74xx TTL ICs, 555 timers, LEDs and switches. The BUGBOOK original book series (BUGBOOK I, BUGBOOK II) was a wonderful introduction to this world of digital chips. This season of my life taught me about rudimentary Boolean logic, and how to fry things by hooking them up backwards. The opportunity to see 1s and 0s up close has been useful my whole career. That said, once I leapt into the world of microcomputers, my electronics hobby took a decades-long back seat.

Enter the world of the Parallax Propeller, Arduino, Raspberry Pi and the like, and I was happily dragged back into the world of hacking around with electronics. These modern microcontroller boards combine programming, breadboarding, built-in I/O, and are all documented with the cumulative knowledge of the Internet. I'm almost glad this wasn't around when I was a kid, or I likely would have never emerged from my bedroom.

Moving along with the maker community flow, I noticed programmable logic devices such as CPLDs and FPGAs being discussed with more frequency. I wondered, what are these funky things, and what can be done with them? A podcast listener heard my musings and graciously sent me a Mojo FPGA board to play around with. I learned that this was a 21st century version of my TTL tinkering, albeit a gajillion times more powerful and very "virtual" in nature. With an FPGA, you dictate the circuit you need (usually in a specialized "hardware description language" or HDL like Verilog or VHDL), and the circuit appears almost magically inside the FPGA chip. Wow!

What kinds of circuits can you make? Almost anything within the speed and size boundaries of the FPGA, it turns out, including modern day digital reproductions of the CPUs of old, like the 6502, Z80, 6809, 68000, 8086, and more. I knew I wanted to, at some point, make an old CPU live on a modern FPGA board. Now, this has been done by many people already, and for my first go, I'll be using their work and just adapting it to my particular FPGA board. This could turn out to be easy or tough. Finding out is part of the point of this Retrochallenge entry.

The How

Designs on FPGAs or other logic devices are often called "cores". (I'm not sure why.) It seems there are "cores" in the open source world for all ye CPUs of olde, many of them written in the HDLs that I referred to earlier. I'll take one of these, compile it for my particular board using the tool set for that board (Xilinx Vivado HLx), and make it do something. (Well, that's the plan.)

If that works out OK, then I'll try to do more, like combining that CPU with other circuits to make a rudimentary computer.

More to come!