Close
0%
0%

Hacking Ancient DRAM - Compute In Memory

Massively parallel computation in a 64kx1 DRAM from the 1980ies?

timTim
Similar projects worth following

Compute-in-Memory (CIM) is a very active topic in current research, aiming to perform arithmetic operations directly on information stored in a memory array. There are two benefits to that: Firstly, it would enable massively parallel computation on a large amount of data at the same time. Secondly, it would save power that is usually associated with moving data through the memory hierarchy to a CPU or GPU.

While there are a myriad of approaches to addressing this objective, there is a strong interest to stay close to existing memory technologies.  A few years ago, researchers have proposed a method to implement parallel AND and OR operations in standard DRAM. Essentially, two or more rows of memory capacitors in a DRAM are connected so that their charge can equalize. This allows implementing several  logic operations in the analog domain without adding any additional circuitry to the DRAM array.

Curiously, another group found a way to implement these operations in off-the-shelf DRAM ("ComputeDRAM: In-Memory Compute Using Off-the-Shelf DRAMs") by cleverly violating the timing parameters. Earlier it was also shown how to copy data within the DRAM wihtout leaving the chip ("RowClone: Fast and Energy-Efficient
In-DRAM Bulk Data Copy and Initialization
").

Taking these building blocks, a more recent publication even implements parts of large language model (LLM) computations in memory using this approach. ("MVDRAM: Enabling GeMV Execution in Unmodified DRAM for Low-Bit LLM Acceleration").

Now, if this isn't a glorious hack, what is?

Now, of course working with modern DRAMs is not that easy, and this reasearch required a relatively advanced setup with a modern FPGA ("DRAM Bender")

But, how about using an older, slow, DRAM and pair it with a modern microcontroller? Let's see how far we get.

  • Reset Behavior

    Tim2 days ago 0 comments

    Now we can read and write the DRAM. To inspect the basic behavior of the memory array, i displayed the first word of every of the 256 pages after turning on the power:

    All memory cells start with the same voltage on the capacitor (0V I presume, since the memory was powered off). However, half the memory cells are set to 0 and half to 1, depending on the row.

    The reason for this is that the read amplifiers are actually differential amplifiers. Half of the memory cells are connected to the inverting branch and half to the noninverting branch. The purpose of this is to balance bitline capacitance by ensuring that the same number of memory cells is connected to each bitline. When the a row is activated it will either change the voltage on the inverting or noninverting bitline and hence we either get a 0 or a 1 for 0V initial charge on the capacitor.

    Read more »

  • Implementing DRAM access functions

    Tim4 days ago 0 comments

    Reading and writing can be easily implemented by bitbanging.  To meet DRAM timing requirements we have to introduce delays by introducing various NOP cycles.

    // Compile-time delay macros for exact cycle counts without loop overhead
    #define DELAY_1_CYCLES() __asm volatile ("nop")
    #define DELAY_2_CYCLES() __asm volatile ("nop\nnop")
    #define DELAY_3_CYCLES() __asm volatile ("nop\nnop\nnop")
    #define DELAY_4_CYCLES() __asm volatile ("nop\nnop\nnop\nnop")
    #define DELAY_5_CYCLES() __asm volatile ("nop\nnop\nnop\nnop\nnop")
    
    // Specific delay macros for each timing parameter
    #define DELAY_RAS_CYCLES() DELAY_5_CYCLES() // ~100ns
    #define DELAY_CAS_CYCLES() DELAY_5_CYCLES() // ~100ns
    #define DELAY_RCD_CYCLES() DELAY_2_CYCLES()  // ~20ns
    #define DELAY_RP_CYCLES()  DELAY_5_CYCLES() // ~100ns

    The code below cycles first RAS and then CAS to read a bit from the array at the given address (designated as rows and columns).

    Read more »

  • Accessing the Memory

    Tim05/05/2025 at 21:09 0 comments


    A classical DRAM only has few  control lines

    A0-A7        Multiplexed Address Lines
    Din, Dout    Datain/out. These lines can be shorted together for bidirectional I/O
    nWE          Write enable. When low, the selected memory cell is written to. 
    nRAS         Row Access Strobe - "Open" a row in the memory array
    nCAS         Column Access Strobe - selecto which bits in the row to access.

     A 64k*1 DRAM consists of a memory array of 256 columns and 256 rows.  Each memory cell is made up of a transistor and a capacitor.

    The core functionality of a DRAM is actually controlled by the RAS line. When it is pulled low ("Access"), the row indicated by the address pins is loaded from the memory arrray into the bitlines by activating the transistors in the memory cells. Since the charge on the capacitors is fairly small, the bitlines will only change their voltage slightly. However, at the same time the read amplifiers are activited, which will amplify the small voltage on the bitlines ("Sense") and pull the bitlines to up/down ("Restore").

    Read more »

  • Experimental Set Up

    Tim05/05/2025 at 20:40 0 comments

    I will be experimenting with 4164 DRAMs chips which were used as main memory in many computers in the early 80ies. They are organized as 64k*1, so that eight chips are required for 64kbytes of RAM. Luckily I found a few from different vendors in my parts bin (I think I had more, but just kept one from each vendor.)

    The 4164 required a single 5V supply. Their access times are rather slow, 200ns to 300ns for the devices I have. The generation rationale behind this experiment is that modern microcontrollers are vastily faster than these devices and can be used to generate timing violations required for compute-in-memory instead of having to use a FPGA.

    I am going to use a CH32V003 microcontroller. These are rather low cost devices, but they come with 48MHz system clock and a RISC-V core (RV32EC), which as able to execute instructions 100-150x faster than a 1MHz 6502 from the 80ies. In addition, they offer 5V I/O which greatly simplifies interfacing to the 4164.

    Read more »

View all 4 project logs

Enjoy this project?

Share

Discussions

Ken Yap wrote 05/04/2025 at 23:47 point

Damn, I gave away my old DRAM some time ago. Never mind, if the technique is practical to manufacture then I expect to see a new category of chips come onto the market.

  Are you sure? yes | no

Tim wrote 05/05/2025 at 21:10 point

They seem to go for a lot on ebay nowadays. I also disposed of most my my collection, but luckily i still kept some.

  Are you sure? yes | no

Similar Projects

Does this project spark your interest?

Become a member to follow this project and never miss any updates