Close

Z80 Memory Refresh Cycle for LCD refresh? Getting carried away...

A project log for Vintage Z80 palmtop compy hackery (TI-86)

It even has a keyboard!

eric-hertzEric Hertz 07/20/2021 at 01:2712 Comments

The Z80 has an interesting feature in that it's designed to automatically refresh DRAM after each instruction-fetch.

The process goes: fetch instruction op-code, then while processing that, refresh the next row in the RAM. It handles 128 rows, incrementing through, then repeating. Thus, during the refresh portion of every instruction, the address bus outputs one address from 0-128, in sequence.

Huh.

Then, apparently it was somewhat common to hack that a bit for larger memories, e.g. adding a flip-flop or a counter to handle higher address bits for DRAMs that had more than 128 rows.

But... there's no reason the same couldn't be used for /other/ purposes than refreshing DRAM.... and refreshing /displays/ would be darn-near the perfect use! 

That might well also explain why the (static) RAM's /Output Enable is always active (wired directly to ground). So, in the case of the TI-86 display, 128x64=1024 bytes... 128 on the refresh counter, with three extra flip-flops as a three-bit counter... and now your DRAM refresh is turned into a framebuffer-access for refreshing the LCD. Attach the LCD's "load row" input (essentially hsync) to address bit 4... a /tiny/ bit of glue, maybe, to delay the next loading of 8 pixels until a few nanoseconds after the hsync... and, that's basically all it takes! Oh, and the z80 /refresh output is wired to the LCD's "load [pixel] data" input.

And, it doesn't slow /anything/ down, as far as CPU overhead.

So, it probably wouldn't work with VGA, CGA, MDA, HDMI, etc. Since they rely on steady pixel-clocks... and, I think, the DRAM refresh only occurs once per instruction-fetch (which would vary in duration depending on how many bytes are in the instruction) but minor circuitry and it'd probably work with most "Display Parallel Interface" LCDs, and most of the older graphical LCDs which are similarly interfaced except typically load several pixels with each "pixel clock". Just like the TI-86 and TI-85 (And TI-81, I presume) displays, which load 8 pixels at a time.

Basically the stock Z80 has /most/ of a graphical LCD controller built-in. How cool is that?!

...

Now, here's some other hackery-potential gathered from what I've read... Increasing the Timer-Interrupt frequency, say by 2x, causes the LCD to display the first half of the image twice. Apparently the timer-interrupt resets the LCD's row-counter. What does this mean for hackery? 

In an earlier log I discussed the idea of using the LCD itself for general-purpose outputs. One method is to consider the fact that the row/column drivers in LCDs of this sort of interface are completely unaware of the screen dimensions. So, say, outputting a 160x64 signal to a 128x64 display would simply display the first 128 pixels on each row, and the remaining 32 would be shifted-out the "carry out" outputs of the column-driver chips which, then, could be fed directly into shift-registers of our own, and offscreen data in the framebuffer could then be used to drive outputs for any purpose we'd like. Turns out the column drivers on this LCD are actually designed for 160 pixels, but only 128 are used... and the remaining 32 are physically inaccessible... and the "CPU" seems to only be configurable to 160 pixels. So, that idea is pretty much a nogo.

BUT... Since the timer-interrupt is apparently responsible for resetting the row-counter (and, presumably, sending the "frame" signal, which is equivalent to Vsync, but in my experience tends to get ignored if it comes too soon, which appears to be what's happening to cause the screen data to duplicate, rather than just get cut-off and start again at the top... e.g. if the row drivers are made for 64 rows, then they assume you'll send at least that many rows, before paying attention to "frame")... 

Then... What this all likely means is that the display is really being driven with some /random/ number of rows, greater than 64... and that it just ignores the extra that come in until the next timer-interrupt... probably: passing those undisplayed row accesses right out the row-driver's "carry out" output... which means... a tiny bit of glue, basically an 8-bit latch attached to the LCD's data inputs, and maybe a gate or two tied between the row driver's carry-out and the latch's clock... and now store some data in the bytes /after/ the frame-buffer, and maybe they'll be accessible on our latch outputs.

Either that, or the built-in byte-counter (daisychained with the 7 address bits in the z80's refresh logic) would automatically overflow, causing those undrawn rows to start again at row zero... but, I'm thinking not, since the thing's capable of 1280 bytes, but only using 1024, which'd introduce a lot of extra logic for no real purpose.

So, one more time, the display is likely being sent /more/ than 64 rows of data direct from the RAM... because the timer-interrupt resets the row counter. So, therefore, to assure the display is always /fully/ refreshed, the timer interrupt has to take far longer than the worst-case time it takes for a screen refresh (which takes longer if there are a lot of multibyte opcodes). It's not likely to /stop/ sending row-data because that'd require a lot of unnecessary circuitry. And, similarly, it's almost as unlikely it'd automatically "roll around" the row-counter because... /there is none/. No need for a row counter. It only needs to count bytes in each row to send the "next row" signal to the LCD. And it doesn't need to count them, because all it has to do is look at bit 5 on the address bus. The "row counter" that gets reset with the timer interrupt, frankly, is simply a byte-counter, which, again, is supplied by the address bus. (BUT WAIT: What about 160x64...? Can't look at A5 for that! And division-by-20 is no easy task... could they /really/ have a separate counter just for the bytes in a row? But, again, that needn't count /rows/.) So, yes, I still think it unlikely they actually count rows... And the "row-counter" that gets reset is just the byte-counter, which counts to at least 1280 before overflowing, (if it counts past 20!) and only /need/ count to 1024 with the display attached... So, again, it's my guess that most-likely bytes /outside/ the framebuffer are being spewed out to the LCD input. And those bytes could be used for other things, like /outputs/. BAM.

But wait! How would the timer-interrupt reset the refresh-counter, smartypants? Hmmm.... if they /didn't/ work that out... then the refresh-counter isn't being output to the external address bus, and they've gotta have a dedicated counter that does, and counts to at least 1280... AND... Actually, maybe that makes more sense, sadly-ish for the elegantly-simple display controller idea, because... the framebuffer can allegedly be loaded at, say, 0xC100... 1024bytes thereafter would result in that bit at 0x0100 toggling... which... would mean a separate byte-counter, entirely (kinda doubt they did this with a 16bit adder).

BUT. Doesn't change the fact the refresh counter and such could be used as most of a display controller on another z80 project. So There. And its bus-access timing /probably/ is used for the TI-86's display, since the bus is otherwise unused at that time. So There. And, it's still rather unlikely there's a row counter (as opposed to a byte-counter that needn't count higher than 20 alongside a 16bit presettable byte-counter used only for addressing). So There.

...

Somewhere in here I think about DMA, as well... the memory-refresh cycles could just as easily be used for DMA which /doesn't/ stop the CPU from doing its own thing simultaneously.

...

Anyhow, I'm getting WAY carried-away, drawing schematics, contemplating chips and LCDs in my stockpiles... I have /barely/ written any code for this thing yet, and now I'm contemplating designing an entirely new z80-based "computer". Heh!

...

Also, I somehow hadn't really made the connection, I have a full-on (z80-based!) Logic Analyzer, #OMNI 4 - a Kaypro 2x Logic Analyzer  which would certainly be handy for the part I keep putting off. But, really, my situation doesn't really allow for using that these days... and, really, it's just a handful of pins I'm curious about, presently, I think a logic probe will do. Really, the only thing stopping me from getting answers to questions from, now, months ago is just soldering up some external power, and maybe some leads to the pins I'm curious about, lest I do something stupid like short two pins with the probe while testing, and again have to reload the cleared memory. Heh! 

Frankly, I'm pretty-durn convinced at least the FLASH attached to the extra/unused RAM pages should work that I'm half-tempted to just start soldering it up without even bothering to test if it'll work. But, that'd be stupid. And, well... the FLASH is 256K, the RAM space is only 128, unless there are even /more/ address bits, which actually may be the case, and is yet another thing I've been wanting to test first... because... If there are, then I may put the FLASH in ROM-space instead. We'll see. I just keep putting it off for some reason... get carried away with new ideas (like DRAM refresh refreshing the display). 

Soldering a test setup. TODAY!

...

Another thought about why I think they didn't add a row-counter is due to the memory layout... in the case of the 86's 128x64 screen (loaded as 16bytes per row) it'd be easy to use a byte counter for both column and row counting... the framebuffer stores all the bytes sequentially, even the last byte in one row followed immediately by the first byte in the next row. So, Everything after the byte-counter's bit 4 is the row... yahknow, /if/ they wanted a row-counter. But, the same chip can also be configured for 160 pixels wide, which is 20 bytes. Presumably those bytes are also stored sequentially... because if they aren't the next step up for a simple rowcounter method like described earlier would be 32 bytes per row, which'd be a tremendous waste of 12 bytes per row. So, then, if they wanted a row counter, there, it'd make more sense for it to increment after 20 bytes... two separate counters, one for the column, one for the row, which /would/ make sense, and be sorta de-facto for screen-controller designs, EXCEPT: to convert that to a memory-address means 16bit multiplication by 20, and 16bit addition... and, well, I highly doubt that's done in the silicon when it's not necessary. The alternative, again, is just one counter that counts to 20 (or 16) bytes and resets, alongside a presettable 16bit counter, loaded with the first byte's address. I could be wrong, but I think a 16bit adder would be more complex than a presettable counter. Remember: the typical method for addition uses carry between each bit, so it takes quite some time to propagate 16 carries. I vaguely recall a 16bit adder that somehow doesn't propagate like that, but it's a LOT of circuitry. Meanwhile, 16 flipflops daisychained (an asynchronous counter) also take some time to propagate... but that propagation could occur between pixel loads. Sure, the adder's propagation could occur during pixel loads, but would require several 16bit registers: one to hold the framebuffer start address, a counter for the byte-offset, and that's /not/ counting if they used separate row-counting. And, again some additional circuitry to count to or divide by 16 or 20 [nevermind the multiply necessary if they used separate row/col counters]. OR, they use ONE presettable 16bit register, plausibly even wired for ripple counting (which is /really/ easy, circuitry-wise) since there's plenty of time for ripple-propagation between pixel loads... and, again, one simultaneous 5bit counter configured to reset at 16 or 20 to toggle the "load row" (hsync). And, again, they don't even need a dedicated register to hold the framebuffer start address, since the byte-counter/address register is reset in software with the timer interrupt. And, in fact, that 16-bit register/counter could be daisychained with the already-implemented refresh-address counter, so brought down to 9 bits, or only 9 additional flipflops.

Now: if this were designed with some sort of CAD package with "blocks" ala CPLDs or FPGAs, they mightn't've cared... heck, they might not've even bothered to notice just how much of the hard part is already accomplished by the DRAM refresh circuitry... (they would, however, still have to have shared the memory-bus with the processor, so might've opted for the defacto DMA, which would halt the CPU, or maybe recognized it could be done during the refresh cycles)... but this chip was first found in a product in 1990... my guess is they still thought about such things at the gate, or at most flip-flop level.

Discussions

Paul McClay wrote 01/23/2024 at 05:45 point

Some "best of HaD" right here.

  Are you sure? yes | no

Voja Antonic wrote 01/22/2024 at 02:42 point

The 8th bit D7 actually exists in the R register, but it is not affected by counting. So it goes 0x00 after 0x7F and 0x80 after 0xFF. You can preset that bit using instruction LD R,A but it will stay the same after D6 in the R register flips from 1 to 0. 

I used the R register it in a simple DIY computer project in 1983 to generate composite video signal mostly by software, and the 8th bit was replaced with the output latch. Note that at the same time R register contents is placed on the low portion of the Address bus (A0-A7), the I register is placed on the high portion of the Address bus (A8-A15) at the same time.

At that time (1984), more than 8000 readers built the computer, named Galaksija (Galaxy), as it was the name of the magazine. After 40 years, the publisher decided to reprint the issue of the magazine where the DIY project was published. They asked me to send them Gerbers of the PCB as they had the idea to give the PCB to the readers with the magazine as a present, and then I took some time to rebuild a little the whole project. Now its on the new 2-layer PCB which is half a size (and that much cheaper), with the same chipset, but with larger EPROM and RAM chips. Also, the Flash was added instead of the cassette i/o. The bitbanged serial port towards the computer too. It will be published in Serbia next month.

In the new project, I used the bit D7 of the R register, instead of external latch. Now it works fine, but it was much harder to implement in firmware. Don't ask me why, I don't know exactly, but the simple writing to R instead of the latch didn't work quite well. So I had to add a lookup table to make it possible.

At this moment, there are about 200 preliminary orders for the component set for the new version of the computer. The population of Serbia is only about 6M, so it's a nice number of retro computer enthusiasts and "sentimental fools" :)

https://en.wikipedia.org/wiki/Galaksija_(computer)

There is a mailing list, the following page is mostly in English, except the first few paragraphs:

https://pcpress.info:10000/virtualmin-mailman/unauthenticated/listinfo.cgi/galaksija

  Are you sure? yes | no

Eric Hertz wrote 01/22/2024 at 04:52 point

Interesting timing! I don't think I'd ever heard of the Galaksija, here in the U.S. until @zpekic just mentioned it, here. Sounds like some clever hackery went into its design. Very cool that it's being reproduced and updated. Thanks for sharing some more details on its functionality.

  Are you sure? yes | no

Eric Hertz wrote 01/22/2024 at 05:33 point

Hah, was I mistaken... I had, in fact, stumbled on Galaksija in the past: Looking again through your projects, it seems one I skulled a while-back was exactly regarding how you reused the Galaksija line-drawing loop discussed earlier to do double-duty as ASCII strings.

I think I must've come across that as part of the 1K challenge...(?)

#How to use more than 100% of program memory 

Ironically, as I was writing my earlier response, I started writing: "I wonder if the line drawing loop could be used in other ways..." but stopped short because it looked like there was specific reason for the "arbitrary op-codes" chosen. Yep, Strings.

  Are you sure? yes | no

Eric Hertz wrote 01/22/2024 at 01:50 point

@Voja Antonic  ... interesting nostalgia in the comments below.

  Are you sure? yes | no

zpekic wrote 01/21/2024 at 19:40 point

Exact same trick (using I:R register output during /RFSH) was used to generate composite video on Galaksija home computer: https://www.tablix.org/~avian/galaksija/rom/rom1.html#l0038h

  Are you sure? yes | no

Eric Hertz wrote 01/22/2024 at 01:32 point

Very interesting, thank you. The sleuthed-comments are quite informative. I'll have to look deeper to see how the R register is involved, but I guess the precision T-state counting must have something to do with it.

  Are you sure? yes | no

zpekic wrote 01/22/2024 at 02:03 point

Yes, it does - refresh cycle is generated after each M1 (instruction fetch), so each horizonal line (which contains 32 characters) has to execute 32 instructions of exactly the same duration (4 T cycles, see this sequence starting at comment "VIDEO_LINE_LOOP"). At each refresh, 8-bit pixel pattern is latched from character generator EPROM to 8-bit shift register, which is driven by clock 2x CPU frequency (which is the pixel clock, 6.144MHz). External hardware (CD40xx counters) generated /INT with HSYNC frequency, and then to make the timing exact, a WAIT signal is used to block start of instruction execution perfectly in sync where horizontal scan line should start. See more here (there are different schematics floating around, some are just modernized a bit) https://blog.vladovince.com/building-a-galaksija-the-1980s-yugoslav-8-bit-microcomputer-part-i-the-tech/ 

  Are you sure? yes | no

Eric Hertz wrote 01/22/2024 at 01:48 point

ah hah:

ld r,a ;0089 9 Load R with A

; E holds the number of scan lines to be drawn for this line of characters.

; During one iteration of VIDEO_LINE_LOOP one scanline of video is drawn.

;; VIDEO_LINE_LOOP
l008bh:
ld (hl),d ;008b 7 Latch value in D to the character
; generator register

; Opcodes are arbitrary, but must all consist of 4 T states (one M state) - ; 8 pixels are drawn during each opcode.

inc d ;008c 4
inc d ;008d 4
inc d ;008e 4
inc d ;008f 4
xor a ;0090 4
scf ;0091 4
rra ;0092 4
rra ;0093 4
xor d ;0094 4
ld d,a ;0095 4
ld h,c ;0096 4
ld a,b ;0097 4

.....

hahaha, nice work keeping each instruction exactly 8 pixel-clocks (4T-States), so each character, then, must be 6 pixels wide (*32 per line).

.....

Interestingly, I recognize this name, I think from HaD recently...

"The original assembly listing

...by Voja Antonic 03.01.1984"

  Are you sure? yes | no

zpekic wrote 01/22/2024 at 02:05 point

Lol our replies crossed, but you are right, exactly that sequence and trick. Voja is a computing legend and was a great inspiration for me as a teen obsessed with computers :-) 

  Are you sure? yes | no

Eric Hertz wrote 01/22/2024 at 02:08 point

Oh, oh! I was wondering how they'd deal with the unpredictable timing (in T-States) of interrupt-handling... /WAIT! Thanks for the explanation!

  Are you sure? yes | no

zpekic wrote 01/22/2024 at 02:18 point

Yup - a masterpiece! But the best part is maybe how one 6116 (2k RAM) is enabled during regular /MREQ /RD or /WR to allow program to access it, but also during /RFSH when its data outputs drive part of the address input for the character generator! So same modest and cheap SRAM chip lives alternate life as fancy video RAM, often found in 10x more expensive computers of that era :-)

  Are you sure? yes | no