The Cash involved in Cache

I just saw a few minutes of Adrian's vid wherein he dug out a 386 motherboard with CACHE!

This is one of those long-running unknowns for me... How does cache work. I was pretty certain I recalled that 486's had it, but 386's didn't... which means: for a 386 /to/ have it requires an additional chip... And /chips/ I can usually eventually understand.

So I did some searching to discover Intel's 82385 cache controller and took on some reading, excited to see that, really, there's no reason it couldn't be hacked into other systems I'm more familiar with, like, say, a z80 system. Heck, with a couple/few 74LS574's, I think it could easily be put into service as a memory-paging system. up to 4GB of 8K pages!

But the more I read of the datasheet, the more I started realizing how, frankly, ridiculous this thing is, for its intended purpose.

I mean, sure, if it was an improvement making a Cray Supercomputer a tiny bit more super when the technology first came out, or a CAD system more responsive to zooming/scrolling, I could see it. But... How it works in a home-system seems, frankly, a bit crazy. A Cash-Grab from computer-holics, who /still/ post questions about adding cache to their 35y/o 386's for gaming. LOL.

Seriously.

Read the description of how it actually functions, and be amazed at how clever they were in implementing it in such a way as to essentially be completely transparent to the system except on the rare occasion it can actually be useful.

Then continue to read, from the datasheet itself, how carefully-worded it is that there's a more sophisticated mode that actually only provides a "slight" improvement over the simpler mode which provides basically no improvement over no cache, unless the software is basically explicitely-designed with cache in mind.

Which, frankly, seems to me would have to be written in inefficient ways that would inherently be slower on cacheless systems, rendering benchmarking completely meaningless.

So, unless I'm mistaken about the progression of this trend--as I've extrapolated from the advancement of the "state of the art" between the era of computers without cache, to the first home computers with cache--here's how it seems to me:

If programmers had kept the skills they learned from coding in the before-times, cache would've been nothing more than a very expensive addition (all those transistors! Static RAMs!) to squeeze a few percent more useable clock-cycles out of top-end systems, for folk using rarely-used softwares which could've been carefully programmed to squeeze even more computations out of a system with cache.

But, because programmers started adopting cache as de-facto, they forced the market who didn't need it /to/ need it. The cacheless systems would've run code not-requiring-of cache roughly as-efficiently as code poorly-written to use cache running on systems with it. Which is to say that already-existant systems lacking cache were hobbled not by what they were capable, but by programmers' tactics to take advantage of tools that really weren't needed at the time, and, worse, often for little gain other than to use up the newly-available computing resources.

The 486, then, came with a cache-controller built-in, and basically every consumer-grade CPU has required it, since.

Now, I admit this is based purely on my own extrapolation based on what I read of the 82385's datasheet. Maybe things like burst-readahead and write-back, that didn't exist in the 82385, caused *huge* improvements. But, I'm not convinced, from what I've seen.

Think about it like this: In the time of the 386, 1 measly MB of DRAM was rather expensive... so how much, then, would be 32KB of /SRAM/? The 82385 also had nearly 2KB internal SRAM/Registers just to keep track of what was and wasn't cached. And this all in addition to a huge number of gates for address-decoding, etc.

After all that taken into account, I'm guessing the 82385 and cache chips probably contained*far* more transistors than the 386 itself. Heck, the 82385 alone might've been roughly on-par.

All that, and, frankly, the 82385 really doesn't /do/ much, aside from intercepting and regurgitating reads when the data was already, recently, read previously. Folk talk about 20% improvements in benchmarks, but were those benchmarks written with cache in-mind? If so, then it's not an improvement in computing-power, but an improvement in handling new and generally useless requirements. And, again, frankly, looking at the datasheet, my guess would've been something like a best-case 8% average improvement of code not written for cache.

[Though, again, I admit the cleverness in coming up with this peripheral cache controller in such a way that it's guaranteed not to *slow* the system under any circumstances, due to its transparent design!]

Frankly, it seems to me a bit like a 'hack' trying to squeeze a few more CPU cycles out of existing CPUs. Clever, but certainly not the sort of relic that should've been built-upon. E.G. 64bit 486's might've been a comparative *tremendous* improvement, requiring maybe even fewer transistors. Dual-Core, similar... But cache? And the friggin' slew of rabbit-hole processing/transistors/power required? Nevermind the mental-hurdles of coding with it in mind?

For /that/ to have become so de-facto, so early, and yet at the same time so late, boggles my mind.

Again, think about it: What were folk running on 386s? Windows 3.1. Multitasking. BIOS-calls and TSRs via ISRs. Video games that pushed every limit. The idea *any* program would be under 32KB, nevermind *both* the program *and* its data, nevermind its being switched in and out of context... Absurd.

Pretty much the only places I imagine it being useful, without coding specifically for cache, is in e.g. small loops used for performing iterative calculations (like Pi, which is why I mentioned mainframes), and only then if there's no risk of the data being in the same 8K address-space as the loop itself, except in another page.

...

I'm sure I've lost my train of thought, but that brings us to another topic....

Where is data vs code located? A HUGE problem, for quite some time, has been buffer-overflows causing executable code to get overwritten. Why The Heck are we still using the same memory-space for code and data?

I propose that part of the reason may have to do with cache. As I understand it, from the 82385 datasheet. Now, don't get me wrong, I'm all for the backwards-compatibility that the x86 architecture has provided. But it wouldn't have been at all difficult to separate code and data into separate memory-spaces in newer prgorams *if it weren't for cache*.

Why? Imagine your code is 32KB. And you've got 8KB of strings to manipulate in data. Now, your cache was 32KB... But if your sprintf function looping through each character was located at byte 256 in the code-space and the string you're sprintfing to started at byte 257 in your dataspace, then the cache would "thrash" between the dataspace and the codespace's bytes 257++. OTOH, if your sprintfwas printing to a string defined in your codespace after the loop, there'd be no tharshing, because the loop would be around byte 256 and the string would be around byte 576. In The Same [cache] page. No thrashing necessary... both would be cached into the same cache-"page." [not that it would be a huge benefit that they're both cached, since sprintf would be *writing* which isn't at all sped up by the 82385, but at least the loop could be run from cache].

But, there's no reason it has to have remained that way (code and data being interwoven)... except, it seems to me, because when we discovered how risky buffer-overflows were, we'd already relied on our databeing interwoven with our code in order to reduce cache-misses.

The friggin' Pentium could've, quite simply, added one additional output pin, ala an address bit (which could feed into a cache controller) alongside MEM/IO, to indicate Code/Data, which would, for backwards-compatibility, not really even been used, but for new programs would've allowed for separate address-spaces, should the programmer decide to make use of the new feature (no different than MMX extensions). One additional "address bit" indicating code/data in the cache controller would prevent cache-misses, despite the fact the address offsets of the sprintf loop and the string buffer might be near each other. And now executable overflows are a thing of the past, only concerns with old code. Programmers jumped on efficiently coding for cache, surely they'd've jumped on this!

But, back to cache...

I fought cache issues back in the ARM7 days, which, if I understand correctly, were *long* before ARMv7, and whatever passes for the lowest-end ARMs used in smartphones today.

I could be entirely mistaken, but I gather that on-chip cache controllers, and on-chip cache SRAM, are de-facto today... on basically every processor short of microcontrollers (and even some of them).

Is it possible these huge arrays of transistors de-facto-included in any "current" processor--expensive, space-consuming, heat-producing as they are--are actually *hindering* the abilities of our multicore multithreaded systems, today?

I'm beat. Who knows.?

...

That said, I am a bit intrigued about the hackery possible in the 82385... Seriously, e.g. connecting something like it [preferably in a DIP] to a Z80 could be quite interesting. It seems to be, roughly a slew of 74646(?) address decoders and 1024x(!) 74574 8-bit registers, which could be quite useful in completely unrelated tasks, as well.

Discussions

Eric Hertz wrote 03/08/2023 at 03:22

I've just looked up the 82495 level-2 cache controller's datasheet, which, VERY unlike the 82385's datasheet, doesn't go into *any* detail about how its features work... it just throws around buzzwords. Hmmm... Well, They /sound/ promising!

Are you sure? yes | no

3/4 Quadrature Decoding - NOPE

Discussions

Become a Hackaday.io Member