-
Reset values (bis)
02/23/2018 at 12:29 • 0 commentsI have been suggested the following:
"When initializing the registers after a hardware reset, try to have the CPU hardware revision number in one of the registers. This way, if some instructions are missing or buggy in the CPU you have a fair chance of getting around this by software."
I much prefer the core's type, revision, capabilities etc. stored in ROM in the Special Registers (or IO) space because you don't have to reset the CPU to get the information :-)
-
Delay slots
02/22/2018 at 05:48 • 0 commentsDelay slots, or delayed jumps, are one of these neat tricks you can use for a fixed architecture, and it becomes a nightmare when the architecture changes. That's why it helped MIPS processors take off, but the ALPHA avoided it (which was wise).
So yes, it's a pretty cool trick when you have a canonical RISC pipeline with single-issue. Otherwise, stay away.
-
Interrupt speed
02/22/2018 at 05:44 • 0 commentsSwapping the register set has always been a concern, for various reasons... but yes, mainly because speed.
Some architectures have windowed registers (SPARC has one kind, TMS9900 has a different one). This creates some kinds of issues or others.
Some have two or more banks (DSP often have two sets for almost instant IRQ handling).
Some just prefer the slow way, or even microcoded operations.
Some just don't bother and let the tedious IRQ work be done by smaller, nimbler but better adapted companion processors : the recent ARM "big/little" and "little/big" approach, or simply the CDC6600/CDC7600 PP (Peripheral Processors) delegate the tedious tasks and concentrate on the hard work (thus simplifying the main CPU btw)
F-CPU's FC0 introduced the SRB system : the "Smooth Register Backup" spies on the register set and performs the transition in the background. But it's still not ideal.
Traps are annoying as well, but context switches also occur when sending data to a different process : this is actually the real speedbump if you listen to the microkernel people. Then, different mitigation systems are required...
But overall, don't focus too much on this because CPU waste so much time in so many different things !!
-
Register windows
02/22/2018 at 05:17 • 2 commentsSPARC uses register windows to provide a bunch of fresh registers across function calls. It was touted as a very RISC thing and history has shown that it was not the best idea, overall. So yeah, forget about it, as is, because it only moves the actual problems to where KISS doesn't work.
Instead, why not just map more than one data register to memory for each address register ? (see Memory-mapped registers in the F-CPU project)
-
Endians
02/22/2018 at 04:54 • 0 commentsLittle Endian has won.
Yet, be ready to swap bytes...
-
Tagged registers
02/22/2018 at 04:54 • 0 commentsOne of the tricks I included in F-CPU FC0 was flags associated to each register, holding hidden (and restorable) states about the contents.
One of these flags is the ZERO flag, calculated each time the register is written. This works like a distributed status register.
Another flag is a "valid" flag : the SRB (smooth register backup) steals cycles to save or restore the monolithic register set across thread switches or during IRQ.
Also very interesting is an address valid flag, meaning that the register contains a pointer that has been cleared in the TLB. The tag should also contain the access rights, for example to prevent a store if the page is read-only. More information can be added such as the cache set, or other architecturally-specific details, which accelerates execution of a load/store instruction.
Similarly a flag can indicate whether a register contains a valid instruction pointer, for example for looping or function return. Not only can it say that the TLB should not be checked again, but also indicate the cache line number.
.
As long as you can recover these informations, you can cache them. They might be erased during a context switch, a TLB invalidation, whatever... Restoring the state will add a few cycles of penalty but it will function just as well.
-
Input-Output architecture
02/22/2018 at 04:15 • 0 commentsIn the wild, you will find two approaches, best illustrated by the Motorola vs Intel debate.
- Motorola and their ilk map peripherals in the memory. Typically you end up with a single address decoding logic ("glue" chips) with a pretty wide variety of granularities.
- The Intel school have a dedicated IO space that uses a few dedicated instructions.
In the 70/80s, separate IO spaces would ease decoding at the cost of more IO pins on the CPU.
In the 90s/2k, well, memory has become black art then PCI arrived so the mess is much worse.
I design a dedicated "space" to separate differing resources because they have different requirements : latency, speed, bandwidth, granularity, protection/safety, ordering, restartability...
- Memory can be weakly ordered and optimised for bandwidth, it uses various cache levels and has a coarse granularity for protection. Usually, there is one main area of memory, maybe split among several homogeneous banks. You usually move cache-line-wide chunks of data in interleaved transactions...
- IO can have many uses, from controlling internal CPU resources such as TLB, protection settings, essential peripherals... to yexchanging data with other (more or less dependent) units such as other CPU or coprocessors... You need a clear and clean execution where access rights are immediately evaluated, with maybe some latency, but no speculative execution or risk to re-execute the instruction after a trap (for example) because this would mess with the environment.
I use IN and OUT instructions to access anything that is not related to data storage. This is more or less equivalent to Intel's MSR introduced with the Pentium, 25 years ago. Semaphores, synchronisation, interrupt management, debug, profiling... can only work with word-wide accesses and fine-grained rights. This allows capability-based (or whitelist, or object-based) rights management, for example each peripheral could be accessed only by a given thread ID. Of course this also greatly simplifies the memory system because you don't rely on certain properties, that are relegated to a dedicated channel.