close-circle
Close

Random Available Memories

A project log for Discrete YASEP

a 16-bits YASEP computer (mostly) made of DIP/SOIC chips like in the 70s and 80s... with 2010's twists!

Yann Guidon / YGDES 10/27/2015 at 23:377 Comments

So I found that old stash of pulled SRAM chips.

Each of these chips is 256K bits / 32K bytes at about 15-20ns, way faster than needed, but it's handy for the prototypes.

The YASEP16 is a 16-bits computer so pointers can access 64K bytes, actually 32K words of 16 bits (the LSB is handled in a specific way but that's explained elsewhere). Each memory bank needs 2 32K chips but since I chose to implement dual port memories, I "shadow"/clone the data so 4 chips are required.

3 memory spaces exist so overall, with this approach, 12 chips are required.

Dual ported memories are a common feature in FPGA. This was not a luxury that people could afford "back in the day" (except maybe some of Cray's customers).

The YASEP architecture does not force a precise memory layout so it's more a matter of convenience and compromises with the available technology.

Another option is to use a couple of IDT7132, dual-ported 2K*8 SRAM, which are smaller and make it a bit easier to write to the memory, but the capacity is smaller and there is still the requirement to multiplex one of the data buses during the write cycle...

Discussions

esot.eric wrote 11/07/2015 at 04:48 point

Interesting... Not sure I totally understand off-hand, but you're saying to use two (parallel) single-port SRAMs to implement dual-port functionality? hmmm....

  Are you sure? yes | no

Yann Guidon / YGDES wrote 11/07/2015 at 04:52 point

Yes this is possible as long as there is only one write port AND read & write cycles are exclusive.
During read cycles, two addresses can be read at the same time, from one of the two "clones".
During write cycles, one data word is written to both banks.

This happens to be what CPU register sets do :-) 2 reads then 1 write.

  Are you sure? yes | no

esot.eric wrote 11/07/2015 at 06:17 point

ah hah! For some reason I always assumed "Dual Port" meant it could write to one location and read from another at the same time, e.g. for a frame-buffer. But I see what you're saying about register-sets... could do some crazy shizzle with 64KB of internal *registers*, but I guess that'd require some really long op-codes ;)

  Are you sure? yes | no

Yann Guidon / YGDES wrote 11/07/2015 at 06:25 point

There ARE real dual ported memories, in FPGA or as ICs (for example the IDT7132, I pulled a number of them from a server mainboard).

In the YASEP architecture, the memory (ies?) is mapped to 5 pairs of registers. The original idea was to access memory with them, through some kind of buffers... in embedded systems, it's more interesting to use several dual ported memories to simplify the system/logic, increase safety (through separation of spaces) and even increase the total capacity of the 16-bits pointers without using segmentation.

Bandwidth is also increased as a single instruction can read 2 operands and write a result, while checking conditions and post-inc/decrementing pointers... That's not possible with a load/store architecture ;-)

  Are you sure? yes | no

danjovic wrote 11/07/2015 at 09:33 point

I came across the same solution some time ago but never implemented it. It is good to know somebosy else have the same idea, in other words, it was not craziness, lol!! Other approach, if you have a sram fast enough is to share in time, using for example different phases of a master clock to mux the address, data and r/w lines and maybe add a pair of registers to act as buffers and help to speed up or store data for example, like a 1-level cache.

  Are you sure? yes | no

esot.eric wrote 11/07/2015 at 17:49 point

@danjovic, almost like DDR made from SDR parts... hmm...

  Are you sure? yes | no

Yann Guidon / YGDES wrote 11/07/2015 at 18:04 point

@danjovic

Register port multiplexing was used in the Alpha 21x64, IIRC (I forget which generation did this). The register set was running at 2×speed, I have no idea how they could achieve that. In the Intel P4, it was the ALU that was dual-pumped, using domino logic so if you had a 3GHz CPU, the ALU was 6GHz... no wonder it was such a power drain !

In the YASEP, the register set is one of the "slow" parts so it takes almost one cycle to read 2 values in parallel. On the second cycle, the ALU does its job but the register set is used again, to read the condition and eventually OR-combine the destination. I'm rather proud of this pipeline ;-)

  Are you sure? yes | no