close-circle
Close
0%
0%

YGREC8

A byte-wide stripped-down version of the YGREC16 architecture

Similar projects worth following
close
#YGREC16 is getting pretty large and moving away from the original #AMBAP inspiration, making it less likely to be implemented within my lifetime. So here is a "back to minimalism" version with
* 256 bytes of DRAM (plus one parity)
* 8 byte-wide registers
* less relays than the YGREC16
This core is so simple that I focus now on the debug/test access port.
Like the others, it's suitable for implementation with relays, transistors, SSI TTL, FPGA and ASIC.

I give up on the idea of playing the Game of Life (the forte of #YGREC-РЭС15-bis) but I design a VHDL version because @llo sees the YGREC8 as a perfect replacement for PICs for his #SteamBot Willie !


A significant reduction of the register set's size is required so I/O must be managed differently, through specific instructions. The register map is expected to be:

  • D1  <= for NOP
  • A1
  • D2
  • A2
  • R1
  • R2
  • R3
  • PC  <= for INV

I shrunk the instruction word down to 16 bits. It is still reminiscent of the YGREC16 older brother but I had to make clear cuts... The YGREC8 is a 1R1W machine (like x86) instead of the RISCy YGREC16, to remove one field.

I have swapped the condition field and the ALU code field, which is now a more classical opcode.

20171116: The latest evolution of the instruction format has added a 9-bits immediate field address for the I/O instructions.
20180112: Imm9 is now removed again...

There are two classical instruction forms : either an IMM8 field, or a source & condition field, combined with the destination field and a small opcode. The source field can also become a short immediate field (3 bits only but essential for conditional short jumps or increments/decrements).

The opcode field has 4 bits and the following values:

Logic group :

  • OR  => Reg OR Reg does not change Reg
  • XOR
  • AND
  • ANDN

Arithmetic group:

  • CMPU
  • CMPS
  • SUB
  • ADD

Beware : There is no point to ADD 0, so ADD with short immediate (Imm3) will skip the value 0 and the range is from -4 to -1 and +1 to +4. (see 17. Basic assembly programming idioms)

Shift group (optional)

  • SHR
  • SHL
  • SAR
  • ROL

Control group:

The COND field has 4 bits, more than YGREC16, so we can add more direct binary input signals. CALL is moved to the opcodes so one more code is available.  All conditions can be negated so we have :

  • Always
  • Z (Zero, all bits cleared)
  • C (Carry)
  • S  (Sign, MSB)
  • B0, B1, B2, B3 (input signals)

Instruction code 0000h should map to NOP, and the NEVER condition. (???)

Instruction code FFFFh should map to INV, which traps or reboots the CPU (through the overlay mechanism) : condition is implicitly ALWAYS because it's a IMM8 format : CALL PC FFh (thus rebooting/alerting with some code placed there, if any, otherwise keep instruction at FFh equal to INV to make an endless loop)

Overall, it's still orthogonal and very simple to decode, despite the added complexity of dealing with 1R1W code.


Logs:
1. Honey, I forgot the MOV
2. Small progress
3. Breakpoints !
4. The YGREC debug system
5. YGREC in VHDL, ALU redesign
6. ALU in VHDL, day 2
7. Programming the YGREC8  
8. And a shifter, and a register set...
9. I/O registers
10. Timer(s)
11. Structure update
12. Instruction cycle counter
13. First synthesis
14. Coloration syntaxique pour Nano
15. Assembly language and syntax
16. Inspect and control the core
17. Basic assembly programming idioms
18. Constant tables in program space
19. Trap/Interrupt vector table
20. Automated upload of overlays into program memory
21. Making room for another instruction
22. Opcode map
23. Sequencing the core
.

ygrec8_20180116_yg.svg

Core diagram in SVG, added LDCx MUXes

svg+xml - 17.96 kB - 01/17/2018 at 17:38

download-circle
Download

svg+xml - 6.99 kB - 01/12/2018 at 18:57

download-circle
Download

YGREC8_VHDL.20171209.tgz

Added: license, readme, mustfail...

x-compressed-tar - 36.61 kB - 12/08/2017 at 23:21

download-circle
Download

ygrec8.nanorc

Coloration syntaxique pour l'├ęditeur de texte Nano

nanorc - 1.16 kB - 12/08/2017 at 14:43

download-circle
Download

ygrec_debug.svg

How the YGREC8 is split and controlled for debug, development and test

svg+xml - 8.55 kB - 12/03/2017 at 16:26

download-circle
Download

View all 14 files

  • Sequencing the core

    Yann Guidon / YGDES5 days ago 0 comments

    2018 has seen a first significant change happen in the YGREC8 architecture, with the new instruction set map (see 22. Opcode map). This follows the discussions in the logs 18. Constant tables in program space, 20. Automated upload of overlays into program memory and 21. Making room for another instruction. The new core diagram shows the modifications with two added MUX at the bottom:

    The non-glorious control&decoding signals are not shown here. They are rather simple but the new LDCx instructions increase the complexity, and this is what this log is about.

    Here's a quote from a private conversation :

    Well, it IS a kludge.

    I wish I could come up with something better but I have examined other alternatives. The constraints are :
    * information density : we got 16 instruction bits and it'd be a shame to waste one half because we only got 256 instructions to address and so many switches or transistors...
    * minimal gate count : the mechanism should barely increase the number of gates/transistors, so it's necessary to time-multiplex the access because adding another read port is prohibitive
    * Ease of programming : it must be easy to use and code density should not be reduced (hence no access through the IO registers)

    It's not a problem if it takes 2 cycles because LDC is rarely time-critical and the core is already pretty fast. It's just annoying that I break the clean, smooth, lean single-cycle machinery. But at least it's not part of the initial design.

    .

    A previous log 18. Constant tables in program space also explains that reading the program memory requires temporal multiplexing because a 2nd read port (in the instruction memory) would be prohibitive. This implies that LDCx instructions must use 2 cycles:

    1. First cycle (green) brings the address from the SRC field (normally, a register, because an immediate would not make sense) to the program memory address bus. This is why the left-hand MUX is added. It is tied to the RESULT bus on the picture for convenience but the output of the registers MUX8 should be used instead. Conditions should be checked and if OK, then update of PC is inhibited, and instead, a new bit (LDCstate or something) is set.
    2. Second cycle (red) starts with the Instruction word MUXed to select the high or low byte, depending on the previous value of the R/I8 flag of the instruction. The value then goes through the datapath (and not directly to the RESULT bus to avoid adding another MUX in the critical datapath). The new MUX's latency is a bit lower than the MUX8's latency so no time is wasted. The RESULT value is written to the designated DST register.

    .

    But it's more complicated than that...

    The first cycle is almost like others. But it must prepare the state of the 2nd cycle and save data from the instruction word because it will be wiped during the 2nd cycle. Note that this design applies to the FPGA version, so the SRAM address is latched at the end of the cycle and the output changes some ns after the start of the new cycle.

    What is not shown on the diagram is the necessary latches on the opcode and the DST address. Fortunately, the critical datapath goes to the register set and the 4 layers of MUX and one gate layer can be added on the DST write decoder.

    The normal and good way to deal with that is to save the value of the DST address in a DFF on the first cycle, then MUX the DST and delayed DST to feed the register address decoder. But transistor-wise it's not very efficient. A transparent latch uses less transistors and has potentially the same gate delay as a MUX. The delicate part is to drive it properly, with the right timing...

    Concerning the opcode, there is nothing to "remember" from the first cycle. The opcode can  simply be forced, using only a few logic gates, to emulate a MOV instruction.

    So here is a summary of the modifications to the code :

    • Create a new FF called LDCstate....
    Read more »

  • Opcode map

    Yann Guidon / YGDES01/12/2018 at 15:14 0 comments

    As of 20180112:

    opcode
    value
    R/I8nameImm8condReg
    SRC
    flag
    description
    0h
    OR



    SZDST = SRC | DST
    1h
    XOR



    SZDST = SRC ^ DST
    2h

    AND



    SZDST = SRC & DST
    3h
    ANDN



    SZDST = SRC & ~DST
    4h
    CMPU



    CSZCoMPare, Unsigned
    (SUB with no write)
    5h
    CMPS



    CSZCoMPare, Signed
    (SUB with no write,
    MSB tweaked)
    6h
    SUB



    CSZDST=SRC+ (-DST)
    7h
    ADD



    CSZDST = SRC + DST
    8h
    SHR



    SZSHift Right
    9h
    SHL



    SZSHift Left
    Ah
    SAR



    SZShift Arithmetic Right
    Bh
    ROL



    SZROtate Left
    Ch0
    IN
    NONO
    Read INput port
    Ch1
    OUT
    NONO
    Write OUTput port
    Dh0
    LDCLNO

    SZ?Load Constant Low
    (read instr memory)
    Dh1LDCHNO

    SZ?Load Constant High
    (read instr memory)
    Eh
    MOV


    SZCopy register value
    Fh
    CALL


    SZ?SRC=>PC, PC+1=>DST
    FFh

    OVL


    SZ?Halts core, wait for
    loading of OVerLay.
    Special case of
    CALL PC (SRC=>PC)
    FFFFh1
    INV
    NO
    SZ?INValid instruction
    (halt core)
    Special case of
    OVL FFh

  • Making room for another instruction

    Yann Guidon / YGDES01/12/2018 at 14:34 0 comments

    The last log 20. Automated upload of overlays into program memory brings a new instruction: OVL (for "OVerLay") that puts the core in "upload" mode and waits for new instructions to be written in the instruction SRAM. OVL is a special case of the CALL instruction, the degenerate CALL PC which acts like a Jump.

    As a side effect, OVL FFh can also become the general version of INV with the added bonus of halting the core and wait for user intervention. For this to be viable, CALL must exchange the opcode value with MOV. Easy.

    In the case of INV=FFFFh as a side-effect of OVL, the effect is :PC will be loaded with FFh, core will wait for one instruction followed by one CRC. Unless CRC is good, the core will get stuck or simply reboot hard.

    But there is another issue, explained in 18. Constant tables in program space : we need a special instruction to read a byte from the instruction space. The instruction coding space is saturated so another trick is necessary.

    I have decided to shrink the IO addressing space to recover one bit. That bit will then select between IN or OUT. There will be "only" 256 registers but it's already "a lot".

    This makes room for the desired LDC opcodes: LDCH and LDCL each access 256 bytes. The difference is again with the R/Imm8 flag, which is not used because Imm8 is useless in this case (contrary to IN and OUT).

    LDCH/LDCL can now have the required two register operands, for the address and the destination. It looks like a MOV with 2 cycles instead of one. The SRC operand is routed to the Instruction Memory's address bus. The PC and the destination register's address need to be latched to preserve the program's state. Sequencing gets a bit trickier but it's a necessary kludge...

    I've updated the instruction format diagram :

    I must update the whole software now...

  • Automated upload of overlays into program memory

    Yann Guidon / YGDES01/12/2018 at 00:31 0 comments

    I'm focusing here on the FPGA implementation of YGREC8. The instruction memory is made of SRAM blocks that need to be pre-loaded before the program starts. Some families provide this functionality but this is not often convenient (you have to mess with the bitstream files). Other technologies (such as ProASIC and ASIC) don't preload the SRAM. Furthermore, you want to be able to upload any program, at any time, with no fuss. This log explains how it's done for #YGREC8.

    The core can work in 2 states : after reset, it is in "upload mode", as it waits for 512 byte to be streamed through a dedicated port. If the checksum is correct (ok that makes 513 bytes, sorry), the core goes to "run" mode when it executes the program, until the next reset or shutdown.

    The basic trick is to reuse the existing circuits. In particular, the PC circuit increments the 8 bits and the incrementer provides a "carry out" signal that shows an overflow happened. The PC is also directly connected to the instruction memory's address bus so we save a MUX and a counter or state machine.

    In detail, it works like this :

    • Upon reset, PC=0 and state is "upload".
    • The circuit waits for 16 bits of data presented to the instruction memory's write port, followed by a "latch" strobe. PC is incremented (while the instruction sent to the pipeline is NOP and the writeback condition is disabled)
    • If PC overflows, meaning PC=0 again, the last received word is compared to the CRC register. If it is correct, the mode goes to "run".

    This system requires few added gates :

    • The 16-bits write port is connected "somewhere"
    • a few gates control the "state machine"
    • some inhibit bits here and there
    • some sort of CRC (TBD)

    I'm pretty happy with this sytem :-)

    The 16 bits words can be provided asynchronously, from a byte-wide port for example, or a serial port (SPI, RS232 or USB) with minimal handshake. The host computer must simply be able to control the /RESET and read the current status of the core (with some GPIO or ACK pin). This detail is independent from the system's principles.


    Preloading the data RAM blocks is a different story. The write bus and address bus are already tied to the core and adding a MUX would slow the whole circuit down. The written data must go through the normal datapath...

    There is also the issue of granularity, as incoming words are 16-bits wide but memory blocks are 8-bits wide. It is not possible to split a 16-bits word into two parallel 8-bits bytes because there is only one write port (usually). The whole design promotes economy and reuse and a dedicated circuit seems unpractical. Reusing the PC like above creates more problems than it solves.

    There is one easy solution though : reuse the program upload system several times.

    The idea is to allow the program to trigger a "soft reset" (to restart the upload FSM) and indicate which program to load and execute. In other words : create some sort of overlays.

    In the simplest case, the whole program is sequentially split between several consecutive (temporal) overlays, the first one(s) contain all the data and initialise all the peripherals. When each overlay has done its work, it triggers a "soft reset" that requests 512 more bytes to upload and execute. This breaks the addressing problem in a flexible and cheap way.

    In a more elaborate case, instead of receiving consecutive overlays, the program can control which overlay to receive and execute. One interesting way is to dedicate one GPIO register to the current overlay number, which can be read on the outside of the core by the upload system. For example, it can latch the MSB of an address bus to a ROM or a SPI bus master.

    This opens another avenue for tweaks because on a secondary upload, the PC might not be cleared. The upload will then only fill the remaining high addresses of the instruction memory, leaving the lower addresses unchanged. This means : partial overlays are possible ! Complex webs of overlays can then call each other at will, in a FSM...

    Read more »

  • Trap/Interrupt vector table

    Yann Guidon / YGDES01/04/2018 at 08:59 0 comments

    Typically : the YGREC8 starts running at address 0. That's where programs are usually located.

    Another special address is FFh : that's the INV vector code, the invalid instruction. Typically, uninitialised Flash or PROM reads as all-1s, so running into uninitialised regions will return an opcode of FFFFh, which is MOV PC FFh. It's a jump to FFh where, normally, there is the same FFFFh value, which loops there endlessly.

    External IRQ signals will be provided (one day) and they require specific addresses. They could be stored in the IO space (as constants or not) and fetched whenever the request happens.

    But what is interesting here is the software traps. There is no such thing, actually, but implementing memory bounds for the stack for example, would be nice. However nothing distinguishes the SRAM regions and there is no such thing as a dedicated HW stack. So we have to check in SW. There is no bound check instruction but we have conditional jumps on Zero, Carry and Sign.

    Zero is pretty useful : if a stack grows down, we can easily check if there is an overflow. If the stack grows up, the same is true, it detects the wrap-around from FFh to 00h.

    Conveniently, the sign flag can work the same, if you want to restrict the stack to only one half of the memory bank. You can define the zone from 00h to 7FH, or 80h to FFh, and detect overflow when the sign bit changes.

    No comparison needed, so a check is just :

    MOV PC 0 IFP ; when the last decrement went from 80h to 7Fh 

    or

    MOV PC 0 IFN ; when the last increment went from 7Fh to 80h

     The same principle applies for the IFC, IFNC, IFZ and IFNZ conditions.

    OK, it's a crude boundary check that consumes only one instruction and one cycle without any special adaptation. It won't save any

    But where to jump ? Address 0 is the reset vector, so the program will restart without indication that something wrong happened. And going to -1 (FFh) is not very useful either.

    The conditional MOV PC allows a range from -4 to +3 so there are 6 other addresses to jump to, and they can contain a long jump to actual code.

    So you can have the following source code:

    .org 0
      MOV PC EntryPoint; // reset vector
      MOV PC ErrorRoutine1;
      MOV PC ErrorRoutine2;
      MOV PC ErrorRoutine3;
    
    EntryPoint:
      ; your application goes here
    
    .org FCh
      MOV PC ErrorRoutine4;
      MOV PC ErrorRoutine5;
      MOV PC ErrorRoutine6;
      MOV PC INV routine; // Process the INV instruction

    As a consequence, the hardware traps and external IRQs should be mapped outside of this  -4..+3 range, for example starting at address 4.

    It looks a bit similar to the PIC16 architecture, except that there is no mirrored vector table...

    The following "idiom" shows a non-leaf function that tests for stack overflow :

    .org 0
      MOV PC EntryPoint; // Reset vector
      MOV PC StackOverflow;
    
    EntryPoint:
      CALL D1 Nested
      ...
    
    StackOverflow:
       ; send an error message
    
    
    Nested:
      ADD A1 1 ; push
      MOV PC 1 IFZ ; jump to vector 1 if detecting a wraparound from FFh to 00h
    
     ; do some actual work, like
      CALL D1 Nested2 ; another non-leaf function
    
      ADD A1 -1 ; pop
    MOV PC D1

  • Constant tables in program space

    Yann Guidon / YGDES01/03/2018 at 01:33 2 comments

    YGREC8 separates the program and data spaces. Like the Microchip PIC16, it's a "Harvard" architecture which is safe, secure and convenient (PC does not have to deal with unused LSB for example). This creates the problem of transferring data (constants such as tables) from the program (fixed) space, to the data space for example.

    YGREC8 does not provide a way to access the program space, which is actually larger : 256 bytes of data RAM and 512 bytes of instructions (packed as 256 instructions). Using the IO space to read back the program space is not a good idea because the necessary circuits are too cumbersome. And the ability to probe the program space at will is not a good idea (from a security perspective).

    Enters the Microchip PIC16F again. Its architecture has really awkward aspects but it uses a wide range of crazy tricks. For example, constant data tables are implemented with series of RETURN-like instructions that load a constant in the accumulator before returning. It's awkward but cheap.

    Actually the only good way to access the program space is through PC. Doing otherwise messes with the program sequence/schedule or adds a read port so it's better to multiplex accesses in some way. And it's easy to control PC because we can easily jump there :

    MOV R1 TableBase
    ADD R1 R2 ; adds the index into the table
    CALL R1 R2 ; jump into the table and link in R2
    

    OK but then what ? 

    It is not possible to implement the PIC's retlw instruction as is, but a similar effect is possible within the YGREC8's datapath, by reusing tricks developed for the CALL instruction.

    CALL is like a MOV that swaps two write buses : the result goes to PC, and NPC goes to the registers.

    The new desired instruction (let's call it MOVJ) is also like a MOV but it also jumps : we want to write a value to a register (typically an Imm8 but a reg is also possible) while also writing a register to PC. There are the following problems :

    • The opcode space is already saturated, unfortunately... I don't know which opcode to axe.
    • Only one half of the coding space is used. Instructions are 16 bits long and we get only 8 bits.
    • It's a subtly different datapath and it requires a new MUX because it's not possible to make two MOV at the same time. The DST output from the register set needs a direct connexion to the PC MUX input.

    Supporting this MOVJ instruction adds little overhead (a bypass from DST to PC) but the other problems are more severe. If only we could use the whole width of the instructions and pack 2× more constants...


    So @roelh comes along and suggests a different approach, which I have tried to avoid by fear of introducing complexity :

    you might use the pipeline effect to call a single-instruction subroutine. This is done by Marcel: https://hackaday.io/project/20781/logs . A flag (set by the 'calling' instruction) is needed that tells that the single instruction is a value and not an instruction.

    In my https://hackaday.io/project/11012-risc-relay-cpu I have a LDC (load constant) instruction, that loads the pc with an address and fetches the table value from program space. It takes advantage of the fact that the PC has a master and a slave section. The trick is that the master can contain the constant location while the slave still contains the original (incremented) PC, so it is very easy to restore the PC after the constant has been loaded.

    Using such an instruction solves the density problem because it can potentially read 16 bits at once and prevent waste.

    However, the YGREC8 is not pipelined, it's a single-cycle processor. It might be implemented with several phases but accessing the program memory and the data memory each use one cycle (simultaneously). The NPC value (which is the current PC plus one) is not a register but a combinatorial output from the incrementer.

    There are two ways to...

    Read more »

  • Basic assembly programming idioms

    Yann Guidon / YGDES01/02/2018 at 02:38 0 comments

    After the previous logs 7. Programming the YGREC8 and 15. Assembly language and syntax, this log shows a few basic programming idioms and explains how to use the YGREC8 architecture.

    Conditional execution

    Most instructions can be predicated (be executed if a certain condition is met) except IN and OUT (they really stand out, I know). A condition can be given if the SRC operand is a register or a short immediate (imm3), giving a range of -4 to +3. Due to this very narrow range, the ADD opcode will add 1 to imm3 if it ranges from 0 to 3. This extends the range from -4 to +4, thus skipping the value 0 because ADD R1 0 does nothing.

    There are three main cases :

    1. The instruction to conditionally execute is only one opcode and uses only registers, or a imm3 value : the condition is appended to the instruction.
      ADD R1 1 IFC ; increments R1 if carry bit set
      That's the best, and simplest case.
    2. There is a sequence of two or three instructions to conditionally execute : a conditional short jump will do the trick
      ADD PC 4 IFC ; skip 3 instructions if carry bit set
        MOV R1 R2
        MOV R2 R3  ; exchange R2 and R2
        MOV R3 R1
      ; resume here : 
    3. When 4 or more instructions must be jumped over, a long jump is required but this type of instruction can't have a condition. There are two methods :
      - Preload the target address in a register, then conditionally MOV it to PC
      MOV R1 42h
      MOV PC R1 IFC 
      
      - Invert the condition to perform a short jump, over a long jump to the target
      ADD PC 1 IFNC
      MOV PC 42h

      The first method requires an additional register so the second is often preferred.

    All these methods apply for CALL as well as loops. In particular, the first method of 3 might be used to hold the starting address of a loop and save one instruction cycle.

    Call and return

    The instruction CALL acts like a MOV but the core swaps NPC and RESULT so the result value is written to PC and the NPC is written to the selected register.

    There is no return instruction, this is simply a MOV to PC.

    There is no dedicated stack or "link register". Conventions can be drafted but 7 registers are equally possible and they can be optimised easily. CALL to PC however makes no sense and amounts to a MOV (or jump) {this could be used as a special function code or something later}.

    • Terminal calls (that don't call anything else) can store the caller's address in any register, be it R or A (D has more side effects).
      CALL R1 MyFunc
      
      MyFunc:
         ; do something here
         ; like, well, actual work ?
         MOV PC R1 ; return

      In the above example, R1 is the "link register" and this leaves R2, R3, A1 and A2 for passing arguments.

    • Nested calls must save NPC on the stack. There are two memory spaces and registers : A1/D1 and A2/D2 so it's a matter of internal convention. In this case, CALL goes to D1 or D2, and the callee must increment and decrement the stack pointer :
      CALL D1 Nested
      
      ...
      
      Nested:
        ADD A1 1 ; push
      
       ; do some actual work, like
      
        CALL D1 Nested2 ; another non-leaf function
      
        ADD A1 -1 ; pop
      MOV PC D1

      Of course, if the function is a leaf, there is no need to increment or decrement the stack pointer.

    In all cases, it's important to be sure of the calling convention for each function.

    Access to memory is pretty limited (with only 2 ports) so the stack pointer must often be saved and restored if one pointer is not enough to access the required data...

    Example

    The following listing uses both looping and function call/return to compute the sum of all the integers up to R1 :

    ; max register value is 256 so it overflows over sqrt(2×256)
    MOV R1 22
    CALL R2 SUM
    ; display result here,
    ; expect (22×22)/2 = 242
    INV
    
    SUM:
      MOV R3 0 ; clear the accumulator
    
    SUM_LOOP:
      ADD R3 R1 ; accumulate
      ADD R1 -1 ; decrement the counter
      ADD PC -2 IFNC ; loop back to SUM_LOOP if carry not set
         ; (could have been IFNS if R1 argument
         ;   is guaranteed > 0, saving one iteration)
    
     ;...
    Read more »

  • Inspect and control the core

    Yann Guidon / YGDES01/01/2018 at 05:31 0 comments

    The YGREC8 core is a rather simple collection of sequential and logic gates, that are usually controlled by a stream of instructions coming from the instruction memory.

    The core can also be controlled by an outside debugging device through a "test access port" (TAP) that provides a minimal but essential interface to access the state of all the internal resources (memories, registers, IOs...) by reading or writing a sequence of user-invisible registers.

    The complexity and size of the TAP and the internal resources must be kept as small as possible to keep the design fast and compact. The TAP can be removed when debugging or inspection are not required, to save room and increase speed.

    The basic features are :

    • Read the SRC, DST, RESULT, FLAGS, INSTRUCTION, PC and NPC buses, which provide enough information about the core's state (as well as all the user-visible peripherals)
    • Write (if needed)  the INSTRUCTION and CONTROL registers

    The CONTROL register provides several bits:

    • RESET : clears all the resetable registers when 0
    • START/STOP : lets the core run loose (when 1), or prevents it from running (when 0)
    • STEP : runs one clock cycle then stop (when 1, monostable)
    • UPDATE : same as STEP but does not increment the PC
    • BYPASS : select the source of instruction, either from program memory (when 0) or the TAP (when 1)
    • (more bits are reserved for later, for padding and write-as-0)

    The INSTRUCTION (INST) register does more than allow execution of arbitrary instructions, it's a way to control the state of the datapath : the DST, SRC and RESULT buses are combinatorially and directly affected by the SRC and DST fields of the instruction, and visible immediately.

    More complex interactions and control comes from sequences of values written to the CONTROL  and INSTRUCTION registers, by allowing results of fake instructions to be registered or not.

    More registers will be available for read/write access to provide breakpoints.


    The debug system has one logic view (read and write registers) that can be implemented in any suitable way with circuits. The physical interface to the registers could be synchronous, asynchronous, parallel, multiplexed or serial... But the user can access these registers (as described above) :

    • Write :
      - CONTROL / CTRL : 16 bits (see the detail of the bits above)
      - INSTRUCTION / INST : 16 bits (holds an instruction to execute, but may also contain data to write to the breakpoint registers)
      .
      Total : 16+16=32 bits
      .
    • Read :
      - INST (16 bits) : the instruction coming from the program memory
      - DST (8 bits) the DST field, coming from the register set
      - SRC (8 bits) the SRC field (either a register value or an immediate value from INST)
      - RESULT/RES (8 bits) : the result of the current instruction, before writeback to the register set. Can contain data from INput ports.
      - PC (8 bits) : the address of the current instruction
      - NPC (8 bits) : the address of the next instruction (before being written to PC, useful for debugging and tracing CALLs)
      - FLAGS (8 bits) : the Carry, Sign and Zero flags, as well as the 4 external condition flags
      - Core state (16 bits) : reserved for later, one bit is used for the LDCstate bit.
      .
      Total : 16+8+8+8+8+8+8+16=80 bits
      .

    The timing, as well as the order of writing and reading information, are critical. For example, reading the registers while the core is running makes no sense because data will be garbled.

  • Assembly language and syntax

    Yann Guidon / YGDES01/01/2018 at 04:30 0 comments

    So far, there is a short introduction to the YGREC8 assembly language in 7. Programming the YGREC8, which focuses on the syntax and structure of the instructions. The valid combinations are :

    • {IN/OUT} Reg, Imm9
    • OPC Reg Imm8
    • OPC Reg Reg
    • OPC Reg Reg Cond
    • OPC Reg Imm3
    • OPC Reg Imm3 Cond
    • {NOP}

    But a program in assembly language needs more than instructions. Classic asm listings have

    • Directives : these are meta-instructions, they direct the assembler and provide informations or set options. Their name starts with a dot, at the first character of the line.
      For example : .ORG 42h sets the current location of assembly to address 42h.
    • Pseudo-instructions : these actually assemble into exacutable code (or data). For example NOP is a shortcut, not a real opcode. Another case is DB and DW that  directly translate into actual numbers.
    • Symbols and symbolic names : the string is substituted for the actual numeral value. The string must NOT be a reserved word or number.
    • Labels : it's a word that starts at the beginning of a line and ends with ":" and creates a new symbol.
      When the label is alone on the line, this new symbol is equal to the current assembly address.
      When a value or string follows the label declaration, the created symbol receives the optional value.
      Yes, this is a lazy way to define symbols ;-)

    For example :

    ; file example.y8
    ; example source code
    
    .org 42 ; sets the current assembly address to 42
    mylabel:   ; creates the symbol called "mylabel"
                      ; and gives it the value of the current position, that is: 42
    somevalue: 23h ; creates the symbol somevalue and gives it the value 23h (=35)
    DB somevalue ; injects the value 23h into the current assembly stream
    
    MOV PC, mylabel   ; jumps to the address mylabel

    I'm being lazy with the labels, with dual use as symbol declaration, because it saves some code.

    The YGREC8 also provides a new kind of instructions that are used only in debug/trace/development mode through the TAP (Test Access Port). These "para-instructions" are not directives but get encoded alongside the normal instructions. They command key internal signals to control start/step/stop, read or set debug registers, report the values of the buses...

    The resulting code sequence is then interpreted by software that sequences the various simple signals and events to provide sophisticated debugging and internal testing functions. This mode is enabled only when the proper directive is invoked at the start of the code listing.

    For example the following sequence will dump the values of all the registers :

    ; file dumpReg.y8p
    ; /!\ syntax is subject to change !
    .PARA
    
    STOP  ; in case the core was not stopped before
    ASM AND A1 D1
    REPORT "A1=" DST " D1=" SRC
    ASM AND A2 D2
    REPORT "A2=" DST " D2=" SRC
    ASM AND R1 R2
    REPORT "R1=" DST " R2=" SRC
    ASM AND R3 PC
    REPORT "R3=" DST " PC=" SRC

    Such simple sequences can be created to examine and change the contents of :
     - the instruction memory
     - the data memory
     - the registers
     - the I/O registers
     - the debug/trap/breakpoint registers
     - ...

    To reduce complexity, the same assembly routine is used for normal assembly or para-assembly, in interactive prompt (dynamic use) or batch mode (to process an assembly file).

    Writing such a powerful assembler in VHDL is going to be quite a challenge so please be patient ;-)

    And happy new year !

  • Coloration syntaxique pour Nano

    llo12/08/2017 at 14:42 0 comments

    Aujourd'hui, travail préparatoire en vue de l'utilisation intensive de l'assembleur des YGRECs 8 et 16 : création de la coloration syntaxique .y8 pour Nano, un éditeur de texte simple mais complet, très pratique pour coder.

    Libre à vous de réutiliser ce fichier, en l'ajoutant à la liste des colorations déjà supportées (HTML, JS, ASM, etc), probablement dans  /usr/share/nano/ :)

    ## coloration syntaxique pour YGREC8
    
    syntax "assembly YGREC" "\.y8$"
    
    # opcodes en cyan
    icolor cyan "CMPU|CMPS|SUB|ADD|SHR|SHL|SAR|ROL|IN|OUT|CALL|MOV"
    icolor brightcyan "OR|XOR|AND|ANDN"
    
    # nombres en bleu
    icolor blue "[0-9]"
    
    # registres en vert
    icolor green "[A][1-2]|[D][1-2]|[R][1-3]|PC"
    
    # logique en magenta
    icolor magenta "ALWS|NEVR|IF|IF([0-3]|C|P|Z|N|N[0-3]|N(C|Z))"
    
    # labels en bleu
    icolor brightblue "^[0-9A-Z_]+:"
    
    # commentaires en gris
    color brightblack ";.*"
    
    ## Valid colors: white, black, red, blue, green, yellow, magenta, cyan.
    ## For foreground colors, you may use the prefix "bright" to get a
    ## stronger highlight.

View all 23 project logs

Enjoy this project?

Share

Discussions

Bartosz wrote 11/08/2017 at 16:40 point

this will working on epiphany or oHm or other cheap machine?

  Are you sure? yes | no

Yann Guidon / YGDES wrote 11/08/2017 at 18:07 point

I'm preparing a version that would hopefully use less than half of a A3P060 FPGA, which is already the smallest of that family that can reasonably implement a microcontroller.

But it's a lot less fun than making one with hundreds of SPDT relays !

  Are you sure? yes | no

Bartosz wrote 11/14/2017 at 14:13 point

Question is price and posibility to buy

  Are you sure? yes | no

Yann Guidon / YGDES wrote 11/14/2017 at 16:08 point

@Bartosz : what do you want to buy ?

If you can simulate and/or synthesise VHDL, the source code is being developed and available for free, though I can't support all FPGA vendors.

If you want a ready-made FPGA board, that could be made too.

If you want relays, it's a bit more tricky ;-)

I have just enough RES15 to make my project and it might take a long while to succeed. There will be many PCB and other stuff.

However if, in the end, I see strong interest from potential buyers, I might make a cost-reduced version with easily-found minirelays. I don't remember well but the Chinese models I found cost around 1/2$ a piece. Factor in PCB and other costs and you get a very rough price estimate... It's not cheap, it's not power efficient, it's slow and won't compute useful stuff... But it certainly can make a crazy nice interactive display, when coupled with flip dots :-D

So the answer is : "it depends" :-D

  Are you sure? yes | no

Does this project spark your interest?

Become a member to follow this project and never miss any updates