Choosing the processor

I'm strongly in favour of the Z80.  

A lot of people working with retro builds gravitate towards the CPU of their first computer. My first computer wasn't a Z80 machine - it was a TI-99/4A, based on the rather obscure TMS9900 processor - a reimplementation of an old TTL minicomputer processor in NMOS microprocessor format, an idea that really didn't work very well... the processor needed very fast RAM to make it work (because, like many minicomputer processors, it stored its registers in external RAM), but that made actually attaching a reasonable amount of RAM to the processor too expensive for a home computer.  In order to be able to compete with computers using commodity DRAM, the TI-99/4A had to limit the memory attached to its processor to just 256 bytes: the rest of its 16KB of RAM was attached to the display processor and only indirectly accessible to the CPU.  I'm not going to build a TMS9900 based machine.

My second computer was a Sinclair ZX Spectrum, and it was on the Spectrum that I first learned assembly programming (assembly programming for the TI required an expensive RAM expansion pack -- otherwise there wouldn't be any free RAM to load your program in -- and an external disk drive, neither of which I could afford).  It was only later that I learned 6502 programming, and ever since I have held a sincere belief that the Z80 is a much more programmer-friendly processor.

In the 1981-1982 time period, the Z80 was available in 3 speed grades: the original 2.5MHz version, the Z80A (4MHz), and the Z80B (6MHz). The faster Z80H (8MHz) didn't show up until later, and in any case would have been expensive to use in 1982 as DRAM fast enough to keep up with it wasn't yet available, so more expensive static RAM would be necessary (or you could use WAIT states to reduce its memory access rate, but that slows it down far enough that you see almost no benefit for having the faster and much more expensive chip anyway).  So the Z80B it is.  Z80s are notoriously easy to overclock (Grant Searle asserts that every Z80A or B he has tested is able to manage at least 10MHz) so I have some flexibility to increase beyond 6MHz - I plan to do this in order to run at an integer divisor of my pixel clock (see video hardware below).

At 6MHz, a bus cycle lasts about 160ns -- it's exactly 160ns for 6.25MHz.  In order to do this with no WAIT states (which would be ideal - WAIT states can really slow a machine down) I'll need to memory that responds within 240ns to each request (because the Z80 requires 1.5 cycle response time when fetching instruction data).  This can be achieved with 4164-120 DRAM chips, which have a cycle time of 230ns, or a page mode cycle time of 120ns, plus 90ns RAS precharge + 20ns RAS-to-CAS delay when changing page.  4164-100 cost a bit more, but may help with implementing video hardware (see below); this has a page mode cycle time of 105ns, plus 95ns overhead on page changes.

 (I originally planned to use 41256 chips to get a reasonably large memory capacity without making the board too big, but it seems those weren't available until 1983)

Memory design

How much of it should the system have, and how should it be organized?

Because of the speed constraints above, no commonly available 16Kb DRAM chip would be fast enough.  And seriously, why bother?  64KB RAM is the minimum specification an 8 bit machine should ship with.  Catering for smaller capacities will end up with a more expensive design so you can handle different installed amounts that within a couple of years nobody will ever want.  The existence of the 16KB spectrum made the 48KB model harder to design, and more expensive, because they needed to share a board design.

It's tempting to just say 64KB and be done with it.  It simplifies everything.  But -- I want to design a machine that's superior to its competitors.  And in very short order, those competitors will release 128KB models.  I'm going to short circuit this, and design for the future.  My base machine will have 128KB, but be designed to expand up to 256KB.  DRAM wasn't expensive, even in 1981/2 (128KB would have cost about $250).

I'll divide the memory space into 8KB pages (a lot of contemporary machines used 16KB pages, but I think this makes working with banked memory harder, as it's quite likely that you'll have to page stuff in and out too often; 8KB pages lets you pick 8 of them at a time, which makes it easier to find an arrangement where everything you want is in memory), so an address is divided into page = A15..A13 and offset A12..A0.  A15 and an inverted copy of it are used as chip selects to enable 2 out of 4 74xx670 register files, while A14 and A13 are the read address inputs for each chip.  The outputs are 8 additional address bits, which are added to A12..A0 to give 21 bits of addressing, allowing the system to expand to 2MB total (of which 1MB is reserved for ROMs).

Thinking about video hardware

I don't want to build a computer that can't be used.  A computer that only works with out-of-date display hardware is nearly useless.  I don't want to have to have an analogue TV or RGB monitor hanging around: the world has moved on from these technologies.  But what can we do?

It is, of course, possible to produce VGA outputs with discrete TTL components. And that's certainly an option to consider (and one I am considering), but there's another option I think is worth contemplating.

The minimum specification required to produce a valid HDMI output is a 25MHz pixel clock, with 10 bits of data per pixel for each of the 3 output channels.  If I've read the specification of the encoding scheme correctly, by choosing pixel intensities where every alternating bit is zero and there are never 4 bits set in the pixel, we can choose output values where pair of bits is identical, so the output can be produced by 5 bits of a shift register clocked at 125MHz, which is achievable using discrete TTL components, at least as long as you're able to keep them cool enough (e.g. a 74F166).  

So, by using a set of 125MHz shift registers, it should be possible to produce an HDMI signal with a pixel clock of 25MHz.  Because blanking intervals are required, this apparently gets you a resolution of up to 720x480.  I'd most likely use 6 bit per pixel colour, with a 16 colour palette (so 2 pixels per byte, plus a byte every line selecting a palette for the line).  Each line of output needs to be produced in 32us.  Additionally, if we're using typical DRAM, every 125 lines of output we'll need to run 256 refresh operations (other than for rows that have been used during the process).  If we can ensure that during each line's processing we hit at least 2 unique RAM rows (e.g. by using a nonlinear bit arrangement), we then only have 6 rows of additional refresh to deal with, which should be reasonably simple.

The Z80 has the following interleaving pattern to its memory accesses:

With 3.5 cycles = 560ns available, if we were using 4164-120 RAM in page mode we can read 3 full bytes per instruction fetch (which in the worst case, because we need to insert a WAIT state, takes 5 cycles = 800ns).  If we really want to push things, 4164-100 RAM gives us enough overhead to get 4 bytes out in the same time.

So, in the worst case, we get 3 bytes read for each operand read and 1 byte per additional set of 3 bus cycles.  The words possible instruction for this is "ex sp, (hl)", which has a single instruction fetch cycle followed by 5 additional M states.  This lets us fetch 8 bytes in 16 bus cycles, so over the 32us (200 bus cycles) we can get 100 bytes (or 112 if we're using 100ns RAM).  This is clearly not enough to allow refreshing a screen of this resolution.

But then, this kind of resolution would only really be necessary in text mode.  If the character bitmaps are stored into a separate memory used by the display hardware (presumably writable by the CPU during retrace intervals), we only need to get a line of text and colours (90 columns * 3 bytes = 270 bytes) every 8 rows (we'd need to buffer the characters for rows after the first they appear in, and we'd have to prefetch the characters at least 3 lines in advance).  For full screen graphics mode, doubling each pixel both horizontally and vertically (using the same memory buffer used by character mode) would give us 400 bus cycles, allowing us to fetch 200 bytes (or 225), which lets us do 360 pixel horizontal resolution and gives us 20 bytes (or 45 bytes) spare for control data (which we'll come to later).  

I have more thoughts about display hardware - I'd like to do something a little bit different to most designs - but that'll need to wait until another day.