The TAP's eXecute module

A project log for YGREC8

A byte-wide stripped-down version of the YGREC16 architecture

Yann Guidon / YGDESYann Guidon / YGDES 08/01/2020 at 12:430 Comments

The previous modules are quite simple, easy, self-contained, while the X command (described earlier) subtly touches more things at once.

Talking to the Instruction slice is not very hard, but requires some decoding first, and some of it would be best shared with the Selector. The "addresses" 'S' and 'X' are very close and and this would save some gates.

I think it's the perfect time to talk about how I mapped the S-decoder to gates :-)

It started easy enough for the 'S' condition :

valid <= '1' when  SRi( 7 downto 0)="01010011" -- signature
        and SRi(15 downto 11)="00110"   -- command : MSB select ASCII chars '0'-'7'
        and SAT='0' and W='0' and J1='1' and J0='0' --    else '0';

Then another simple step is to sort the '0' and '1' to put them in two separate equations, one with AND for the '1's and the '0's are gathered with a big NOR :

norx <= not (SAT or W or J0 or SRi(15) or SRi(14) or SRi(11)
             or SRi(7)  or SRi(5)  or SRi(3)  or SRi(2));
valid <=  SRi(13) and SRi(12) and SRi(6) and SRi(4)
              and SRi(1) and SRi(0) and J1 and norx;

 Then it's easy to group the ORs and ANDs together into 3-inputs gates. And when there are not enough inputs for the AND gates, they can be used to input the result of the NORs :-)

Finally, bubble-pushing can transform two consecutive ANDs into a NAND followed by a NOR.

So let's do this all over again but this time the X condition is also decoded so some gates are common.

S <= '1' when SRi(7 downto 0)="01010011" and SRi(15 downto 11)="00110"
                    and SAT='0' and W='0' and J1='1' and J0='0'
   else '0';
X <= '1' when SRi(7 downto 0)="01011000"
                    and SAT='0' and W='0' and J2='1' and J1='0'
   else '0';

The common terms are

COM <= '1' when SRi(7 downto 4)="0101" and SRi(2)='0' and SAT='0' and W='0'
  else '0';

and S and X can be taken separatey :

S <= '1' when SRi(3)='0' and SRi(1 downto 0)="11" and SRi(15 downto 11)="00110"
                    and J1='1' and J0='0'
   else '0';
X <= '1' when SRi(3)='1' and SRi(1 downto 0)="00" and J2='1' and J1='0'
   else '0';

Now these 3 can be checked in parallel, let's separate their bits according to their value.

X <= SRi(3) and J2 and
       not ( SRi(1) or SRi(0) or J1); -- nice fit for this one !
S <= SRi(1) and SRi(0) and SRi(13) and SRi(12) and J1 and
       not (SRi(3) or J0 or SRi(15) or SRi(14) or SRi(11));
COM <= SRi(6) and SRi(4) and
       not (SAT or W or SRi(2) or SRi(7) or SRi(5));

From there the gates are easy to cluster and bubble-push.

The result is 11 gates, the speed is not striking but 4 or 5 gates of latency shouldn't be limiting for this slow circuit and it is only 2 more gates than the previous circuit.

   sa: entity  OR3 port map(A=>SRi(15), B=>SRi(14), C=>SRi(11), Y=>tSo  );
   sb: entity NOR3 port map(A=>J1     , B=>SRi( 3), C=>tSo    , Y=>tSn  );
   sc: entity AND3 port map(A=>SRi(13), B=>SRi(12), C=>tSn    , Y=>S2   );
   sd: entity AND3 port map(A=>SRi( 1), B=>SRi( 0), C=>J0     , Y=>S1   );

   c1: entity  OR3 port map(A=>SRi(2) , B=>SRi( 7), C=>SRi( 5), Y=>Co1  );
   c2: entity NOR3 port map(A=>SAT    , B=>W      , C=>Co1    , Y=>Co2  );
   co: entity AND3 port map(A=>SRi(6) , B=>SRi( 4), C=>Co2    , Y=>COM  );

   x1: entity NOR3 port map(A=>SRi(1) , B=>SRi( 0), C=>J2     , Y=>tXo  );
   x2: entity AND3 port map(A=>tXo    , B=>SRi( 3), C=>J1     , Y=>tX   );

  vx:  entity AND3 port map(A=>tX     , B=>COM    , C=>FB     , Y=>X    );
  vld: entity AND3 port map(A=>S1     , B=>COM    , C=>S2     , Y=>valid);

(one thing I dislike about VHDL is the requirement to label ALL the instantiated entities, it really gets nasty fast).

OK !

But the slice requires more than these signals and the FSM is an even tougher beast... Let's just focus on the control of the slice :

But this is getting hard because these signals cross clock domain boundaries.

On top of that, I can't use the other timing tricks because one can't ensure the length of the binary stream (since it's always shifted, no gating) and I can't use a latch because /WR must be the clock => a transparent latch would output a strobe/signal before the end of the message, which could be longer and be an invalid/spurious signal...

For most of the signals, I have chosen to use a DFF, some are simple (output the result of the decoding logic), others (the FSM strobes) are "set" by the decoding logic, then the FSM itself will send an ACK/Clear signal that will asynchronously reset the signal. The DFF's output will then be resynchronised by another DFF inside the FSM.

And then, you need to reset these DFF because they are in unknown state on power-up : one of the Selector addresses could be used for this.

But two other signals create really big timing problems.

continued in The TAP crosses 3 clock domains !with some diagrams...