-
Carry on
03/29/2023 at 04:23 • 0 commentsI was wrong...
I thought that having a carry flag would be enough to solve multi-precision arithmetic codes. It is not.
The conditional execution of instructions does not solve it either.
It is really necessary to have a ADC opcode that takes the carry bit as an input (and SBB as well).
And this is very worrying because the opcode map was frozen and now I need 2 more opcodes while all the opcode space is taken
._._. It's not that multi-precision addition is expected to be very common but when it occurs, it gets ugly : it takes multiple instructions and registers, it loses orthogonality. I have covered a possible trick at https://hackaday.io/project/27280-ygrec8/log/199934-add-with-carry-the-macro but I don't like it.
ADC is easy to implement : it's just a AND gate between the output of the carry flag and the carry input of the adder. SBB is almost the same.
The real problem is that I thought it was not necessary. The opcode map has been frozen and now I need 2 more opcodes. I could dump CMPU and CMPS but their "no writeback" behaviour will increase register pressure or bloat code in common code sequences.
Any more complex reorganisation will break a lot of code as well as the electronic devices I have already built. It's not impossible but probably not worth it, as the opcode map has been thoroughly polished.
The last solution I see is ugly in the principle, but convenient : create a "prefix opcode" that enables the carry input.
- An I/O port address could be attributed to this function but it would read or destroy a register. Anyway : OUT 0 PC looks alluring.
- Using the case 3 from https://hackaday.io/project/27280-ygrec8/log/200314-undecided-overlay-options is possible but that would prevent register-defined overlays.
- Using coding space from INV is a last resort.
This also requires an update to the FSM
...
-
A little explanation
04/12/2022 at 20:09 • 0 commentsIt seems that the "main schematic" below, used as the project's avatar image, is not obvious enough, at least before zooming in enough.
@Ken KD5ZXG was not sure how to interpret/decypher the upper-left side.
- I see YGCREC8 mux between Carry Sign Zero Always and the lower four bits of B. What is B? I get what Write Enable and N are for, but how do these interact with the tested condition to drive some rectangle called 3=>8 that I might guess 8 way mux if you hadn't already drawn other mux with the proper symbol. Maybe the 3=>8 rectangle is just a decoder? I've got a condition mixed with N thats probably one of the 3 selects, a write enable thats not clear how it works (chip enable?), SND whose purpose remains a mystery drives some other select bits? Enlighten me, cause I'm clearly not getting it.
So here is the explanation.
- The 4 basic conditions are ACSZ, as already obvious. I now use the alphabetical order for convenience and mnemotechnic help.
- The 4 extended conditions are optional and user-configurable: either fixed (if you have 4 input/GPIO pins or other internal signals) OR you can implement a set of Special Registers that select the source of each condition bit. This is one of the tricks inherited from the YASEP.
The home page says :
- Always
- Z (Zero, all bits cleared)
- C (Carry)
- S (Sign, MSB)
- B0, B1, B2, B3 (for register-register form only, we can select 4 bits to test from user-defined sources)
The COND field has 3 bits (for Imm4) or 4 bits, more than YGREC16, so we can add more direct binary input signals. All conditions can be negated so we have :
So this extra condition bit allows extensions for later, which could speed up some IO intensive algos, such as bit banging. "B" means "bit", it's not a register per se (though it must be latched before to prevent race conditions) and it is user-defined wires. They could be front panel switches, synchronous or asynchronous data over 1 or 2 bits... or some condition inside the extensions blocks like UART ready/overflow/whatever status bits. I was a bit inspired by the CDP1802 on this, I admit.
The "Never" condition could be mapped to another bit/wire/condition but I don't want to play this game yet. ARM mapped this to an extension to the instruction set but YGREC8 is too young for that gymnastics yet.
... Maybe the 3=>8 rectangle is just a decoder?
Yes, once the condition for writing the destination register is determined, it is sent to the appropriate destination register for writeback. I should have made it clearer but the drawing is already pretty crowded :-)
The register set uses latches, and not DFF, to cut the register set power/area/cost in half. Imagine routing the clock signal to 64 bits and only updating 8 (or 16) every time... The diagram misses a buffer latch on the result datapath btw.
I'll try to summarize :
- First the condition must be evaluated. It is by default 1 : enable. The XOR can negate the condition when needed, directed by the relevant bit in the instruction word (in red). The condition can be any of A/C/S/Z/b0/b1/b2/b3 (the B conditions are optional as noted above). So it is quite simple : we have a bit that is usually 1, but also selectively cleared.
- The condition is valid at the output of the XOR. Then this signal will
- enable the update of the C/S/Z condition flags
- go to the 3=>8 decoder to enable and select one byte of latches, so the relevant register (indicated by the SND field) is written (or not).
.
I hope it helps. -
counters
11/22/2021 at 09:35 • 0 commentsIn the log 129. Counters strike ! I started considering the new version of the "counters". It's easier said than made and the many clock domains don't ease the design. But I managed to draft one "bit":
Each byte can be read with one byte select per counter (but the latency is quite high as the data ripple through many gates). Writing is another story because it's asynchronous so a byte clear precedes a selective set. Each byte has their own clock domain, selected among various sources, including the preceding counters. And the counter's value comes either from the local incrementer or the previous counter's value, for the cases of arbitrary frequency dividers (think: baud generator as a trivial example, the previous incrementer is left unused in this case).
This is quite scalable, the size of the pool of counters can be configured at will. A 8×8 bits block seems like a good compromise but nothing keeps one from changing that.
. . . . . . . . . . . .Looking further, 2 main concerns arise:
- Latency : I don't want the I/O space to be too constrained or constraining. There might be a cacophony of wires, registers and decoders that would probably slow everything down. So let's consider the I/O space as "mostly asynchronous" from the core's point of view. This means that IN and OUT should have a "completion" flag that lets the core resume operation once the IO is done. Which means I must adapt/change/update the instruction's FSM...
- The register map. So far I have identified that each counter byte could have 2 addresses: one for read and write of the value, the other for control and status. Thus there is no constraint on the total number of counters: implement as many as you like. However, timing and scheduling becomes critical at this point so look at the previous point.
-
Tri-mode TAP
11/21/2021 at 07:36 • 0 commentsNow that I have an assembler, how do I upload the program into the core ?
The TAP (Test Access Port) allows one to upload data, instruction by instruction, then make the core run them. However this is quite complex and not suitable for autonomous operation, like, when the core works alone.
One thing I would loooove is to hook the circuit to a serial port of my computer and then cat a binary file through /dev/ttyS0 to the circuit. When 512 bytes are transferred, the processor starts the uploaded program. It's convenient from the user's point of view, though it doesn't allow full debugging and requires baud generation circuits. Some sort of external adaptation circuit on a dongle must be designed.
Another interesting situation would be that the TAP circuit itself goes to fetch the program by itself. Usually it comes from a SPI Flash device, and I have already developed such a system in 2014/2015 for #WizYasep.
I am familiar with SPI Flash devices: #SPI Flasher implements a few protocols already. SPI usually works with 4 signals :
- MISO
- MOSI
- CLK
- SEL
Compare this to the TAP interface:
- Din
- Dout
- CLK
- R/W
- /Reset (optional)
It looks quite similar, right? Furthermore, the TAP contains the shift register and the other circuits that write to the program memory, which is indexed by the PC register (the latter conveniently wraps around to 0 when the upload is over, and signals the FSM to start running said program).
So there are already about one half of the circuits in place for autonomous loading, now the trick is how to hack the existing system to add the new features.
First, the Y8 interface needs to know the operating mode. So far, the TAP had only one mode so the question didn't arise but we have identified 3 modes, that would be conveniently selected by weak pull-up/down resistors on the pins:
- TAP mode : R/W low, CLK low (?)
- External/slave programming mode : R/W low, CLK high (?)
- Autonomous SPI programming mode : R/W high
It starts to look like what modern FPGA provide...
Usually, the R/W pin is pulled up by an internal weak resistor, which is overridden by an external upload/control device. The default behaviour is to get the stream of 4096 bitts from external SPI storage. I'm not sure yet about the CRC/scrambling, which can be designed/added later, but should not remain an afterthought for ever.
The state of the R/W pin is sampled by the FSM during the Reset sequence, just after the /Reset pin is brought high. Note that the TAP is still functional even when the rest of the chip is held in RESET state, since TAP as its own reset sequence and clock domain.
Note: the TAP used to have 3 pins only (plus external RESET though it could also be controlled from within the TAP registers). See https://hackaday.io/project/27280-ygrec8/log/182563-tap-pins for the diagrams. Now, in addition to the previously defined interface, the external debug/upload device must control the /Reset pin to take over the internal FSM and prevent conflicts with the internal operation. The minimal number of pins for the probe is 6 but more become desired:
- Din
- Dout
- CLK
- R/W
- SSel (open collector)
- /Reset (open collector)
- Vtarget
- GND
It's on the "high tier" of the pin count range for probes but that's the price for sharing a SPI bus between 2 masters. It's also for safety due to the uncertainty of the support of "half-duplex" mode by SPI Flash chips. I added these 2 pins:
- The debug probe would certainly want to access the SPI Flash for programming, and the SPI Flash should not interfere with the normal TAP operation. A separate pulled-up SPI Sel pin (SSel) becomes necessary so the SPI Flash is deselected during TAP operation.
- I also added Vtarget because the TAP probe shouldn't operate if the target circuit is not energised. Some voltage translation buffers will ensure electrical integrity, prevent "ghost powering" through pins' protection diodes, and the probe must be sure that the target is correctly powered.
I know it's getting more complex but not everything is required in the beginning. Maybe I'll find a trick to remove one of the signals because each pin is a precious resource...
-
The YGREC8 Assembler
11/20/2021 at 16:20 • 0 commentsThis is a copy of the current documentation for y8asm :
_________________________________________________________________
The YGREC8 Assembler
Y8asm is a 2-pass assembler written in VHDL. It transforms source/assembly language .y8 files into .hyx files suitable for flashing/emulation. The VHDL code is compiled and elaborated with GHDL by the provided build/test script, and generates an executable program.
Pre-processing
The program can't include files or manage macros. Use external programs such as cpp or m4. You could also concatenate source files with cat to a temporary file.
Program invocation
The program runs on the command line interface or in a script. There are 3 active parameters:
• Input file name: -gname
$ ./y8asm -gname=example.y8
will assemble the file named example.y8.
The output file name is derived from this parameter, with the .hyx suffix.
The output file will be overwritten without a warning.
• Symbol table output: -gdump
$ ./y8asm -gname=example.y8 -gdump=yes
appends a dump of the user-defined symbols in the comments at the end of the .hyx output file.
The dump also contains the number of times the symbol has been referenced (though it might be over-estimated because it includes both passes). That's still convenient if you want to clean up your source code and prune some useless lines.
The option -gdump=full dumps all the defined symbols, including the reserved words, keywords, opcodes etc.
• Maximum Symbol Length : -gmax_sym_len
$ ./y8asm -gname=example.y8 -gmax_sym_len=12
changes the maximum length of symbols (labels, identifiers etc.) from the defaut 16 characters to 12 characters.
Basic Syntax
• Comments start at ';' and remove the rest of the line.
• All the symbols are translated to upper case during parsing.
• The symbols can not be redefined. Unless they are in a nested context (not yet implemented).
• User's symbols have a range dependent on the VHDL simulator, "at least 32 bits". This is practical for intermediary values since the assembler checks each range for every instruction field.
• Identifiers can contain the following letters: '_', 'A' to 'Z' and '0' to '9' (but no digit at the first position).
• Separators are space ' ', comma ',', horizontal tab, and ASCII character 160 ( in HTML).
• The dollar sign '$' represents/returns the value of the current address.
• Numbers are decimal by default. Binary numbers have the b suffix and h is the hexadecimal suffix.
• Numerical computations ("expressions") are always between parenthesis, to avoid precedence. The assembler supports the following arithmetic operations: '+', '-', '*', '/', '%'. VHDL does not allow easy boolean operations on integers, unless you go the std_logic_vector route, but I can't change the standard... I'll add later if needed.
• By default, code is assembled starting at address 0. Don't forget to ORG if you need otherwise.
Pseudo-instructions
The assembler provides some housekeeping commands that greatly help even basic programs.
• END
Ends parsing of the file. The source file can contain any garbage below this line.
• DEF
DEFines a new user symbol.
The line
DEF plop 42
defines the symbol "PLOP" and assigns the value 42. The symbol "PLOP" can be used later in the source file, and even before for instructions and DW.
See the -gdump=yes command line argument to list all the user-defined symbols.
• Label:
The line
plop:
is a shorthand to the line "DEF plop $".
There is no separator before ':' and the label must be alone on the line.
• ORG
Change the address to which the next instruction will be assembled/stored.
ORG 42
means that the next instruction will be stored at address 42.
The value may be a number, symbol or expression, but can not be post-defined or have a value lower than the current address.
• DW
DW 42
will output the value 42 to the instruction memory space, as if it was an instruction.
Just like an instruction, it is 16-bit wide and can have a post-defined value.
Instructions
The assembler pre-defines the following keywords as elements of an instruction:
• Opcodes
"ADD", "AND", "ANDN", "CALL", "CMPS", "CMPU", "HLT", "IN", "INV", "LDCH", "LDCL", "NOP", "OR", "OUT", "RC", "RO", "SA", "SET", "SH", "SUB", "XOR"
• Registers
"A1", "A2", "D1", "D2", "PC", "R1", "R2", "R3"
• Conditions
"ALWS", "IF0", "IF1", "IF2", "IF3", "IFC", "IFN0", "IFN1", "IFN2", "IFN3", "IFN", "IFNC", "IFNZ", "IFP", "IFZ", "NEVR"
Follow the opcode map to see which operations are possible:
The instruction's textual format follows the binary format (except the condition which is a suffix).
• No operand: "HLT", "INV", "NOP"
Examples:
NOP ; do nothing HLT ; put the core in sleep mode until the next IRQ INV ; trigger a trap/reboot/panic
• One immediate operand and a register: "IN", "OUT"
Examples:
IN 67, D3 ; get value from IOspace at address 67 and write register D3 OUT 45 A1 ; Put the value of A1 into IOspace at address 45
• One short immediate or a source register, a destination register and optional condition: "LDCH", "LDCL", "RC", "RO", "SA", "SH"
Examples:
RC 1 D1 ; Rotate register D1 through carry by 1 position (left) RO -3 D2 IFNC ; Rotate register D2 by 3 positions (right) when the carry bit is clear SA R1 A1 ; Arithmetic Shift of A1 by R1 positions SH R2 A2 IFZ ; Logic shift of A2 by R2 positions if Zero flag is clear
• One long immediate and a destination register, OR One short immediate or a source register, a destination register and optional condition: "ADD", "AND", "ANDN", "CALL", "CMPS", "CMPU", "OR", "SET", "SUB", "XOR"
Examples:
AND 123 R1 ; bit-mask R1 with byte 123 ADD -76 R2 ; subtract byte 76 from register R2 SUB 95 R3 ; Subtract R3 from byte 95 and put the result in R3 ADD 1 A1 IFC ; increment A1 by short 1 if the carry bit is set ADD 0 A1 IFC ; INVALID ! \ short add increments positive ADD 8 A1 IFC ; VALID ! / immediate values by 1 ; to extend the range of jumps ADD D1 A1 IFN ; Add D1 to A1 (result in A1) if the Negative flag is set.
.
-
jumping back and forth, and carry
11/20/2021 at 15:34 • 0 commentsThe Y8 can jump:
set label pc
it can also jump relative :
add (label-$) pc
and it can even jump relative conditionally :
add (label-$) pc ifc
But then the range is limited because only 4 bits are available for the the signed amplitude. And I have already sacrificed one condition bit...
With the new assembler, here is the best that can be reasonably done:
add (forward-$) PC ifnz nop ; 1 nop ; 2 nop ; 3 nop ; 4 nop ; 5 nop ; 6 nop ; 7 forward: backwards: nop ; -8 nop ; -7 nop ; -6 nop ; -5 nop ; -4 nop ; -3 nop ; -2 nop ; -1 add (backwards-$) PC ifz
The output in .hyx:
;;hyx1 ; L1: add (forward-$) PC ifnz 75BF ; @0: ADD 8 PC IFNZ ; L2: nop ; 1 0000 ; @1: NOP ; L3: nop ; 2 0000 ; @2: NOP ; L4: nop ; 3 0000 ; @3: NOP ; L5: nop ; 4 0000 ; @4: NOP ; L6: nop ; 5 0000 ; @5: NOP ; L7: nop ; 6 0000 ; @6: NOP ; L8: nop ; 7 0000 ; @7: NOP ; L9: forward: ; = 8 ; L11: backwards: ; = 8 ; L12: nop ; -8 0000 ; @8: NOP ; L13: nop ; -7 0000 ; @9: NOP ; L14: nop ; -6 0000 ; @10: NOP ; L15: nop ; -5 0000 ; @11: NOP ; L16: nop ; -4 0000 ; @12: NOP ; L17: nop ; -3 0000 ; @13: NOP ; L18: nop ; -2 0000 ; @14: NOP ; L19: nop ; -1 0000 ; @15: NOP ; L20: add (backwards-$) PC ifz 77C7 ; @16: ADD -8 PC IFZ ;;;; SYMBOL DUMP : ; * 'FORWARD'=8 ref:1 / sym_usr ; * 'BACKWARDS'=8 ref:1 / sym_usr
A backwards loop could then contain 8 instructions (including a test for the end of the loop) but the forward jump can only skip over 7 instructions, despite the ability to encode the constant 8 when dealing with the PC register.
The offset 1 is still possible and this represents the next instruction, which would be trivial to execute otherwise. And the offset 8 points to the 8th instruction after the skipped block, it's not the size of the skipped block.
At least it's now impossible to do a pointless loop such as
ADD 0 PC ifnz ; spin endlessly doing nothing
To achieve a more practical goal, the operand should be the NPC, or PC+1, which is being computed at the same time as the addition. But this creates a whole lot of troubles, in particular:
- if we compute PC+2 then the backwards jump will only reach 7 instructions
- Timing becomes too tight, since the pipeline must choose between PC and PC+1 depending on the imm's sign
- This will require a stall cycle, and there is already one because writes to PC must discard the prefetched instruction.
At this point, the "short add trick" requires only a few logic gates (to detect the opcode, the format and the sign of imm4, detecting the PC register is not even necessary) and no deep modification of the state machine.
Trying to squeeze one more instruction, to skip 8 opcodes, would complicate the whole circuit with quite little benefits...
-
Status 20211114
11/14/2021 at 08:17 • 0 commentsThe new source code archive is there ! YGREC8_VHDL.20211114.tgz
Some pretty things are there though it's still missing quite a lot:
- The assembler is not complete: it misses some keywords, thorough tests, padding, proper symbol definitions and backward patches...
update 20211118: VHDL assembler is mostly functional. more tests, doc & examples are welcome, as well as nested expressions. - The ALU8 is more-or-less working, the tests run but find an error.
- TAP is incomplete and more debug infrastructure is required.
- Some files have been split and/or moved around
- the gates library #Libre Gates project is still somehow evolving in parallel, in a soft fork that I'll have to reconcile.
The core is still not complete... there are still many things to manage under the hood. But what works works well :-)
And having a proper assembler helps a lot too. No more estimates, it's now possible to test and reproduce ideas! So I think it's the priority for the next days, then I'll go back to #Libre Gates so I can fix the ALU and progress on the TAP, which is critical to enter programs, control the core and read back its status...
The Shift unit and the register set will then be quite easy to design, I think.
- The assembler is not complete: it misses some keywords, thorough tests, padding, proper symbol definitions and backward patches...
-
Undecided overlay options
11/12/2021 at 01:55 • 0 commentsWork is progressing nicely on the new assembler. This also allows me to find some corner cases that I didn't consider carefully yet. Let's look at the existing disassembler:
if OPC=Op_CALL and SND=Reg_PC then if Imm9="111111111" then result(1 to 3):="HLT"; else result(1 to 7):="OVL " & SLV_to_Hex(Imm8) & "h"; -- /!\ bit 11 not used ? end if; return; end if;
The HLT (halt) opcode uses all the 9 bits but the OVL (overlay) only uses 8, since only a byte can be managed by the overly register.
The 11th bit is not handled so there are 3 ideas that come to mind :
- extend the immediate field to work with IMM9 like IN/OUT (simplest)
- consider the R/I bit so the OVL instruction can use a register argument as well (useful to deal with multiple or indirect overlay numbers)
- create a new instruction that provides another functionality (which ?)
The jury is still out.
-
Towards a better assembler, still in VHDL, sans Lex & Yacc
11/06/2021 at 05:04 • 0 commentsMore and more, I'm looking at the YASEP's JavaScript framework and want to reuse it to develop better programs for the YGREC. This would be great to explore the practical limitations of the Y8...
Then I realise how outdated, clunky, messy and unmanageable #YGWM still is :-/
And I don't want to use C stuff : I have committed to using only bash and VHDL.
And here we are...
The project already provides YGREC8/ASM/Y8asm.vhdl but this is useful only for inline, context-less instructions. I want to create/write/assemble/run real programs, generate .HYX files and load them for simulation.
Without EVEN starting to deal with all the parsing, two critical parts are already required :
- The .HYX filter (only a C in/out filter is available so far)
- The symbols table.
IN VHDL.
I think I'll start with 1. because the algorithms are already written in C and JS.
Then, I'll deal with the dynamic allocation of symbols. I have chosen to use a unified table where the opcodes, the pseudo-opcodes, the unknown symbols and the defined symbols are kept together, to keep the complexity low and ensure there is no "shadowing", such as redefnition of opcodes, numbers etc. (as was the case in the buggy YASEP assembler in JS, where I lazily used string substitutions as a shortcut and it could totally break everything...)
Update: drawing from the YASM experience, the assembler should use a collection of dictionaries, first the pseudocodes and opcodes, then the local symbols, then the global symbols, eventually more, such that function nesting becomes possible (for example) and important things don't get re-defined. At first, only a global symbol table will be defined but more should be able to be allocated and de-allocated. A kind of linked list of symbol tables will ensure precedence.
Ephemeral Local symbol tables will be useful for the macros, for example. In this context, some inspiration from C syntax will help: the pseudo-symbol '{' would create one table and '}' destroys the last created.
Preprocessing would use m4.
_________________________________________________________________
OK, as usual, I say something then do the contrary, so here is dictionary.vhdl.
"Methods" are provided to create a dictionary, look it up, add a symbol and flush the whole dictionary. The idea being that there is one dictionary per context and the contexts can be stacked with "{" and unstacked/flushed by "}".
That will make the assembler way easier to write, and now I need to handle .HYX files...
____________________________________________________________________
20211112: a few days of passionate work and YGREC8_VHDL.20211112.tgz is now available !
It's missing at least two important features: arithmetic expressions and symbol definition&update. Anyway it's starting to be useful to write simple programs. It's a bit bloated and requires some refactoring but it's only 741 lines so far (not including a few external packages).
I have so far been very, overly confident maybe, about the use of the ISA and now I'll be able to prove its worth.
____________________________________________________________________
20211118: YGREC8_VHDL.20211118.tgz provides a refactored assembler that supports more features and works better. I am now able to write whole programs!
And it's all written in VHDL, 700 lines so far.
-
Add with carry : the macro
11/03/2021 at 02:52 • 6 commentsThe Y8 core has a carry flag but no ADC opcode. That's a compromise, turned into a fact now. So how do we perform multi-precision add/sub ?
The first way uses the conditional form that can contains a small immediate. This can skip an instruction that increments the MSB but then comes the problem of the eventual secondary carry, which requires another conditional test.
Another way uses the rotate-through-carry instruction. Again, secondary carry and all...
The last way was imagined a few moments ago and exploits the fact that the SUB opcode force the carry to 1, the trick then is to negate the register operand, which could be simplified in some cases.
Y8 was not meant to be an efficient multi-precision core, but not plainly awkward either.
I'd like to run PEAC16 as a programmed BIST to exercise the RAM, ALU and decoder so the ability to use 16-bit numbers pushes the core to its limits.
The idea is to configure the debug probe to spy on the carry signal and observe the pattern that arrives, then compare to an internally programmed bitstream (this easily fits with a small FPGA or even EPLD). Slowly increase the clock speed and whatch when the output bistream diverges from the internally generated one, and you can bin the chips.
So it turns out that handling multi-precision addition is slightly more important than I thought but I'll find a pretty hack.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
So let's say we have two 16-bit integers in R1-R2 and D1-D2.
The LSB is added by ADD R1 D1 with result in D1. The Carry flag is set accordingly.
The carry can then be merged with D2 : ADD 1 D2 C
At this moment, we look if we need the extra carry, or 17th bit. If not, just do ADD R2 D2 and you're done.
But PEAC requires the 17th bit so the 2nd instruction does not work.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Let's go back to ADD R1 D1 which generates a carry. It must be included in the MSB and this can generate a carry by itself. The second ADD R2 D2 will generate a carry too though not both at the same time so a OR is possible. If a secondary carry occurs when incrementing D2, this means that its new value is 0 and no value of R2 could trigger another/tertiary carry.
The easy way to deal with it is to dedicate R3 to a sort of "carry register".
- SET 0 R3 ; init
- ADD R1 D1 ; primary add
- ADD 1 D2 C ; secondary add
- SET 1 R3 C ; first correction, could also be a RCL 1 R3
- ADD R2 D2 ; tertiary add
- SET 1 R3 C ; final fix. No need of OR.
This code is branchless : 4 is executed only if 3 generates a carry, which only happens if 2 also generates the carry. The ADD opcode overwrites the carry so 4 only occurs when we really need it.
That's 6 instructions and half of them manage the external carry flag. The flag can be kept in place by using the "SUB trick". However a couple of branches are required.
- ADD R1 D1 ; primary add
- ADD 3 PC NC ; conditional branch to normal ADD
- XOR -1 D2 ; pre-correction to compensate the SUB
- SUB R2 D2 ; tertiary add, +1
- ADD 1 PC ; Goto END.
- ADD R2 D2 ; tertiary add, normal
- the end.
That's still 6 instructions but we save one register. But wait ! The jump uses ADD which also destroys the carry flag ! Fortunately it's also possible to do a direct jump when no condition is needed.
- ADD R1 D1 ; primary add
- ADD 3 PC NC ; conditional branch to normal ADD
- XOR -1 D2 ; pre-correction to compensate the SUB
- SUB R2 D2 ; tertiary add, +1
- SET theend PC ; Goto END.
- ADD R2 D2 ; tertiary add, normal
- theend:
Et voilà.
PEAC requires 2 consecutive byte adds with carry, and each takes 5 opcodes. Then the whole block is register-swapped to emulate the copy.
A macro could be created :
Define ADC SRC DST label ADD 3 PC NC XOR -1 DST SUB SRC DST SET label PC ADD SRC DST label:
And the #PEAC16 could be coded as :
ADC R1 D1 ADC R2 D2 ADC D1 R1 ADC D2 R2
phew.
It's still not as handy as a direct ADC opcode and there could be side effects (with the XOR -1) but it does the job.
- SET 0 R3 ; init