FPGA Computer/Eval board

This is a small FPGA development board that I am currently designing. Lots of I/O connectivities: HDD,Ethernet,USB,Audio,PS/2,VGA,BLVDS, 5V tolerant

Similar projects worth following
A couple of HaD articles: Cheap PCB proto service, 68000 computer, Bunnie's laptop with FPGA got me started on this journey: I have found FPGA development board projects that are aimed towards vintage computing/arcade, but they seem to lack the usual connectivities that we have taken for granted. Commercial boards are close, but not quite there.

What features can I fit on a 10cm x 10cm (3.93"x3.93") doubled sided PCB with about $150 worth of parts? Proto PCB Stats: parts occupied 6384mm^2 ( 63.8% of PCB area)

See for FPGA projects that can eventually work on this (same FPGA part and memory sizes as Mist) Hardware is completely designed from scratch.

Project Soapbox Racer

There is a community of people developing open source FPGA  SoC cores to emulate some of the vintage computers such as Amiga, Atari ST, C64 etc.  With a bit of customization, it would be possible to recompile for this board. This board has same amount of FPGA resource and SDRAM size as the Mist

This can be used as a development board for learning ARM and/or FPGA/CPLD.  The FPGA is well supported by Altera with their free suite of design/synthesis and debugging tools including free logic analyzer Spinal Tap II.

As the name implied, it doesn't come with a powerful V8 engine.  This board is not build for speed.  It is designed for emulating computers that fell off the technology curve in the late 80's with today's components that have good performance/price ratios.  I have also added a few of the connectivities (Ethernet, HDD, RTC, Front panel soft On/Off control etc) that we have come to expect but are missing on the other open FPGA projects/products.

Unfortunately, the I/O of modern FPGA are no longer 5V tolerant.  Most of the FPGA boards out there simply breaks out the pins on to a header without any level translations leaving users to fend for themselves.  Even if the inputs are 3.3V, the loose piece of wire in the typical "proto" set up is a signal integrity nightmare.  When combined with a fast rise/fall time can generate high voltage pulse.

The FPGA is the main star of this board.  It has enough resource to implement a 68000 CPU core and Amiga chipset.  It is connected to 8Mx32 (32MB) of SDRAM, Audio CODEC, CPLD and the ARM chip.  The CPLD controls the HDD PATA interface and shares the bus with the 10/100 BaseT Ethernet.

The ARM chip is for loading FPGA configurations stored inside a file system, on screen menu for selecting the FPGA core, configuring PLL frequency, debugging and emulate peripherals (e.g. USB or PS/2 mouse/keyboard, Real Time Clock)

3D Rendering of what the PCB would look like (based on EagleUP ulp): See

Updated pictures: (finalized rev 0 PCB)

User I/O:

The user ports on this board are 5V tolerant.  While the board has been optimized for emulating a computer, the ARM, CPLD and FPGA can be programmed to do something else other than just a computer.

"Nothing is true, everything is permitted." ―The Creed's maxim.

Power: 5V DC

An alternate 3.5mm screw terminal is available if the board is to be power internally inside a case.  The 5V input is protected by AP2511 2.5A current limited switch (UL recognized). There are 4 additional PTC for the user I/O connectors. The current limit switch also allows the ARM to switch on/off the rest of the system under firmware control.

Mass Storage:

Since the I/O can be re-purposed for the end application, it is important to have alternatives means of storing FPGA configuration or hosting a file system for the computer emulation environment.  Onboard 16M bytes SPI FLASH, removable MicroSD are supported by this board. 


10/100 BaseT Ethernet is provide by Microchip ENC624J600 in 16-bit multiplexed address/data bus (Microchip PSP) connected to the FPGA. 

Parallel ATA:

Parallel ATA is fully 5V tolerant and is cconnected to a Xilinx XC9572XL CPLD with series termination resistor.  There is a 16-bit data path connecting the CPLD/FPGA.  The data path is also shared by the Ethernet chip.  Ethernet and ATA data path can be rerouted to ARM inside the FPGA once configured.  (The ARM chip also has an unused SPI data path connected to the CPLD.)  This interface can be reprogrammed if an application requires a parallel high speed LVTTL/TTL/5V bus to the FPGA.  e.g. a 18-bit LVTTL LCD, logic analyzer inputs, 8-bit I/O bus etc. 

There are a lot of below $10 bidirectional PATA to SATA converters available from China at the usual place should one decided to keep up with technology.

A 2.5" HD can be mounted on the lower portion of the PCB using standoffs from the 3 mounting holes. A short 2mm pitched...

Read more »

  • 1 × Altera EP3C24P240 FPGA that houses the soft core CPU, memory controller, peripherals etc
  • 1 × IS42S32800D-6TL Memory ICs / Synchronous Dynamic RAM (SDRAM) 166MHz 32MB (8Mx32)
  • 1 × MK22DX256VLK5 Microprocessors, Microcontrollers, DSPs / ARM, RISC-Based Microcontrollers 256K/32K + 64K FlexMemory
  • 1 × XC9572XL Logic ICs / Programmable Logic: PLDs
  • 1 × ENC624J600 Interface and IO ICs / Ethernet, T1, E1 10/100 BaseT Ethernet

View all 11 components

  • Caching from 20,000 feet

    K.C. Lee10/28/2014 at 16:33 0 comments

    This is my preliminary look into memory interface of the design.

    My design was influenced by [AMR]'s blog in getting more memory bandwidth for better video output at a very late stage of layout. 16-bit colour at 800×600 or above is below the bare minimum these days, so I have decided to double the SDRAM data width to 32-bit. I was glad to spend the extra time redoing the layout as it was a good trade-off to DDR.

    The Amiga FPGA design

    Luckly, a lot of the hard work has been done by [AMR] and documented in his blog ( ): Part 13: Timing closure at last! Part 14 – Improving the SDRAM controller

    The cache line is implemented in Altera Cyclone3 M9K memory using true dual port mode (in DualPortRAM.vhd) with one port is facing the SDRAM side and the other facing the CPU/Graphic chipset. The bus sizing can taken care by the mixed-width feature of the memory block without having the additional delay of a layer of mux/demux logic.

    [AMR] has divided the 16 clock cycle for SDRAM access cycles "ring" into 2 slots. Each of the slots can perform 8 word burst read, 2 word writes.

    In Line (Block) Size Choice for CPU Cache. Memories. ALAN JAY SMITH, SENIOR MEMBER, IEEE [1], Alan examined the 27 memory trace (4 from the MC68000) from five microprocessor/minicomputers. The following simulated/averaged/interpolated results for the unified cache was shown in the paper. (As per usage pattern, YMMV.)

    Basically, you can reduce the cache misses by increasing the size of the cache and/or cache line. There are no free lunches. As you increase the amount of cache too much, its access time increases as the memory blocks requires and muxing the data out bus.

    [AMR] is using 2kx18 configuration for the cache line, so cache size is 4096 and line size is 16 (8x16-bit) yielding a miss ratio of 0.120.

    One would have thought that doubling the memory data width would double the memory bandwidth, that is not true for a small cache line as the access delay is the dominant factor. I need to increase the burst to improve the overall memory bandwidth.

    Because the memory width I used is twice as wide, I could reduce the SDRAM "ring" to 12 cycles which is about 33% faster. were to double the line size to 32. The new miss ratio is 0.082.

    There are changes required in the cache tag and sdram module. This may change as as reality set in the actual work. e.g. clock speeds trade-offs for the SDRAM (166MHz) vs FPGA SDRAM/cache module speed limits, SDRAM latency (CL3) and actual clock speed achievable on my particular PCB layout.

    Note: There are a couple of projects ( Phoenix Core from fallout of Natami project, FPGA Arcade) that are of interest on similar topic, but as they have not released their source code for the last few years, they are effectively closed source as far as I am concerned. Phoenix Core is scheduled to be released in Q3, but I won't hold my breath until I see it. I'll have to revisit this if/when the source become available to the general public.

    Further reading:

    The designs have much better performance, but might not work too well for the Amiga core as is. Graphic chipset relies on deterministic memory cycle.


    Looks like this could be useful for working out external interface timing.

    "The TimingAnalyzer is free to use by anyone without any limitations and is currently licensed as "Freeware" while beta testing. When the first final version 1.0 is released, the license will change to a "Commercial" license. The "Commercial" license will require businesses to purchase the program, but it will continue to be free for personal and academic use."

    Back in the old days, Chronology's TimingDesigner started out as a "free" working program, then a few years later a crippled demo with no-save and now a 10 days node locked demo. Hope this doesn't go down the exact same path.

  • High level PS/2 Keyboard Mouse Driver

    K.C. Lee10/12/2014 at 22:53 0 comments

    I have a preliminary PS/2 keyboard/Mouse driver working. Two threads are used for supporting the two PS/2 devices. It detects mouse/keyboard types, initializes mouse into streaming mode and formats raw data into packets. It also handle hot-removal/plug of keyboard/mouse. That last part is more difficult as all the errors conditions needed to be detected and the devices re-initialized. By handling this in the ARM firmware simplifies the FPGA cores .

    Mouse packet for a "generic 2 button mice" comes in packet of 3. Intellimouse added a 4th packet for the scroll wheel. Bit 3 of the first byte in the packet is marked with a '1' to allow for detecting a packet misalignment. However it is not a sufficient test on its own.

    If a mouse is reconnected, the streaming mode is off by default. The power on message is 0xaa, 0x00 for a mouse. Assuming that the power on message aligns with the packet. The 0xaa would pass the simple test as bit-3 is set. The mouse driver would hang because it is expecting the 3rd byte which never arrives unless the mouse is reinitialized. (While I could use the Remote Mode (cmd 0xf0) to get mouse reports, each PS/2 send command involves a dozen of interrupts.)

    My code also relies on the timing between individual bytes sent. I used the time out feature in iqGetTimeout() to specify the deadline when byte #2 and #3 has to arrive. A misaligned packet would then cause a time out. The mouse is then reset and reinitialized.

    In my design and the MiST, the PS/2 interface is handled in the ARM chip. The ARM passes the raw PS/2 packets into the FPGA core via the SPI bus. This allows for PS/2 device emulation for USB or other devices on the ARM. In this design, PS/2 pins are also mapped to the hardware UART1/2 and I2C0 of the K22 which allows for all kinds of hacking possibilities.

    There are additional work needs to be done on the drivers. The keyboard scancode stream need to be decoded and diverted to the user interface for on-screen menu. etc.


    The device driver now also updates the keyboard LED status which maintains it across a reconnection.

    While "Scan code 3" on PS/2 keyboard seems to be a better design, most of the cheap generic keyboard do not implement that correctly so this is rarely used.

    The driver can also optionally filters off the Typematic Repeat from the event stream as it is not used in the Amiga core.

  • Low Level PS/2 Driver using DMA

    K.C. Lee09/29/2014 at 20:50 0 comments

    The K22 has 16 DMA channels. Thanks to the DMAMUX cross point switch, there are almost no hardwired restrictions on what peripherals that they can be used for. The first 4 can also be triggered from PIT (periodic interrupt timer).

    I rewrote the PS/2 driver using DMA transfer. There is only 1 interrupt per byte (vs 11). e.g. mouse report of 3 bytes/packet at 200 packets/sec requires 600 interrupts/sec instead of 6600.


    - Bitband feature is only available to the Cortex M4 core, so it cannot be used to set/clear individual bits on the GPIO from the DMA channel without affect the entire port nor write to a bit in RAM.

    - While individual bits on a port can be used to trigger a DMA, there is only one source for DMA trigger per I/O Port. So the two PS/2 ports have to be in 2 separate Ports - C and D in this design (which also maps to UART1 & 2).

    PS/2 Rx:

    The IRQC (Interrupt Configuration) field in the Port PCR (Pin Control Register) is programmed to trigger a DMA transfer on the rising edge of PS/2 Clock pulse and storing a snapshot of the input pin of the corresponding GPIO PDIR (Port Data Input Register).

    At the end of 11 clock edges, the DMA triggers an IRQ and the PS/2 data is extracted from the captured port data. The raw scancode (11 bits which includes start/parity/stop bits) are sent as a pair of bytes and queued using byte oriented I/O queue chIQPutI(). The upper level driver uses the raw scancode for error detection.

    A wakeup + message is another way of passing the scan code directly to a thread, but it is not asynchronous which make it messy. e.g. the thread need to wait for sending PS/2 commands and implement timeouts for packet level synchronization.

    PS/2 Tx:

    Since bitband is limited to the Cortex M4 core, DMA cannot be used. IRQ routines is used to send data to PS/2 devices. Each byte sent involves 12 interrupts. Thankfully, most of the PS/2 traffics are ingress.

    Found some tidbits on keyboard commands: (from first link)

    • The keyboard clears its output buffer when it receives any command.
    • If the keyboard receives an invalid command or argument, it must respond with "resend" (0xfe).
    • The keyboard must not send any scancodes while processing a command.
    • If the keyboard is waiting for an argument byte and it instead receives a command, it should discard the previous command and process this new one.
    • For mouse, the streaming mode should be disable first before sending command(s).


    PS/2 communication error detection:

    Sometimes when a PS/2 device is disconnected/reconnected in the middle of a transfer or noise is injected in the process, the bit counting might be out of sync. The upper level driver matches the Start bit ('0'), stop bit ('1') and parity against the bit stream to identify bad bits/out of sync conditions.

    PS/2 communication error recovery:

    The high level driver sends a reset command to the PS/2 device when it detects bad data. While one could try to realign the data, it can get messy as there are no time stamps in the snapshot collected by DMA.

    The bit counting is also reset in the send process which is now synchronized to the incoming bit stream of the device when it comes out of reset/self-test.

    The PS/2 Device driver for keyboard/mouse is responsible to reinitialize device specific settings.



    I have compiled scan code info from various sources into this:

    Keyboard Scan code:

    PS/2 Interrupt version:



  • HAL or Hell?

    K.C. Lee09/25/2014 at 17:49 0 comments

    The way the HAL model is written do not lend itself well to a shared bus e.g. i2c, spi. The upper level drivers mmc_spi.c assume it is the sole owner of the (Low Level Driver) including being able to shut it down. Yes you can spiAcquireBus()/spiReleaseBus(), but that's not what I am talking about. Not like mmc_spi is using that either.

    If you are hanging off 3 or 4 SPI devices time sharing the single bus and mmc stopped the spi driver because you removed the sd card leaving the rest of spi devices to dry.

    While you can start the drivers on each access, that overhead adds up. Ultimately, the LLD or its correspond high level driver should make the decision whether or not it should be shut down. spiSelect()/spiUnselect() can be to determine the temporary owner for a bus. The driver would queue the requests for other devices on the shared bus. An exception is mmcConnect() which kept the SD unselected while generating clock pulses on the bus. That can be worked around by changing the config of the SPI chip select lines. Ditto for the i2c, the temporary ownership is between i2c start and i2c stop, so all I need to do is to keep the i2c transfer exclusive.

    Should it be a transaction based model that breaks the HAL model or should it be a driver with logical units? I'll need to do more thinking.


    Other than the MMC driver, I am interested in SPI FLASH which is not yet supported in ChibiOS. So there isn't any reason for me to stick with spi.c as it doesn't have the right API. Here is what I have in mind for the API:

    As I mentioned previously, ( ) short packets have a costly 5.29us OS synchronization overhead which could be eliminated by polling. Contrast that with 2.3us which is the time for the raw hardware to send 8 bytes packet. In a system with lots of small packets, fixing that could reduce the overhead.

    I think ChaN's sample FatFS project is a good starting point for rewriting the MMC driver. I believe to go back to the source whenever possible.

    SPI driver have its own private data structure and keep track of its open count. spi_lld_stop() should only shut down the device when all the spi devices are no longer needed. i.e. when it sees a matching number of spi_lld_stop() to spi_lld_start() calls.

    Each of the drivers that sits on top of SPI would keep track of its configuration e.g. data format/speed, GPIO pin for the select line and optional dummy access for synchronizing deselect. There is also a similar synchronization requirement for asserting /CS in the sample code. "Cosideration on Multi-slave Configuration"

    However MMC/SDC drives/releases DO signal in synchronising to the SCLK. [...] Therefore to make MMC/SDC release DO signal, the master device must send a byte after CS signal is deasserted.

    The configuration get passed to the driver in the spi_select ()/spi_unselect () calls which also provide exclusive bus ownership/queuing. A separate function can be used for reconfiguring the spi after device discovery.

    While you could also keep track of the SPI device in the config, you wouldn't normally be dealing with multiple slaves if you had multiple SPI in the first place.

  • Milestone reflection

    K.C. Lee09/23/2014 at 19:55 0 comments

    There are a lot of work/clean up to be done at the ARM level, but I am a bit closer to being able to open a FPGA config file on a MicroSD and download to the FPGA.  Believe it or not, I am still on track for the contest's original timeline.  Unfortunately, I don't even qualified for the Fail of the Week.  LOL.

    The porting of bleeding development ChibiOS to an unsupported ARM chip was high risk, but it pays off  once I get the HAL drivers coded.

    My life could have been easier if I used the STM32F series.  Freescale support of K22 wasn't that great as there wasn't even an eval board until recently. I initially designed this board with the STM32F051, but the part was running late and I wasn't comfortable with the small amount of memory. There were a few draw backs of the K22 - no 5V tolerant I/O, no factory preprogrammed bootstrap mode, 1 SPI (vs 2 in STMF32F0), needs crystal for USB etc.  On the other hand, the K22 has lot of FLASH, decent amount of RAM and CM4 is the right path.   Thankfully, my time line lined up with the recent ChibiOS K20 porting effort.

    I would say that I like working with ChibiOS and driver code.  ARM chip especially Freescale's peripherals are a bit complicated. There are a lot of gotcha and the core would simply barf  aka hard fault if you forget to dot the i's.  On the other hand with a decent SWD debugger and IDE, it is workable.  I might at some point make a FPGA peripheral core for using the K22 as the main processor. Thankfully, the ARM core can run code in RAM, so a small DOS like system is possible. There are packages like lwip (Lightweight TCP/IP stack), µGFX (GUI library for microcontrollers) that can be used on ChibiOS. It will be small enough to learn and understand how things work, but with modern RTOS and peripherals.  I think that itself can bring back some of the "retro" computing excitements.

    Everything on this board was new including the cheap Chinese PCB vendor and fine pitch packages.  Once in a while, I need to put myself outside of my comfort zone to grow. 

    Advice: Don't be afraid to push your limits.  After all, it is only time and money to try.


    I have given it some thought as I am no longer constrainted by a deadline, I think I'll continue working on improving the firmware.

  • MicroSD & FatFS in DMA!

    K.C. Lee09/22/2014 at 04:22 0 comments

    The following scope picture shows the voltage droop at the 3.3V rail measured at the SD/SPI breakout near the MicroSD socket when a card is inserted.

    3.3V rail is at 3.32V norminal, with a droop of 200mV, 3.32V - 0.2V = 3.12V. This is well within the 3.3V +/- 0.3V normal operating range of the 3.3V parts. The audio circuits are on their own 3.3V rails (low noise, high bandwidth LDO) regulated from the 5V, so they are not be affected.

    PAM2305D dual switch mode regulator (1.5MHz switching frequency) does a very good job of reacting to the droop quickly ~1us and takes about 2.5us to restore to the steady state voltage without overshooting.

    See: for more detail ""Consideration to Bus Floating and Hot Insertion".


    ChibiOS MMC SPI driver requires 2 stub functions for write protect status. and detecting card presence.

    mmc_lld_is_write_protected(mmcp): There are no write protect switches on a MicroSD, so this FALSE.


    The mechanical switch in the socket isn't connected as I ran out of GPIO lines on the ARM, CPLD and FPGA. I'll need to find a different way to detect card presence may be by polling.


    DAT3/CD (used as /CS for SPI mode) on a MicroSD card can be used as a Card Detection. MicroSD cards have an internal pullup resistor (10K-90K) that is turned on at power up.

    NXP AN10911 page 1 shows a way of detecting the card by detecting the voltage of the 270K pull down resistor which forms a voltage divider with the pullup. A logic '1' means the card is present.

    Unfortunately, the K22 internal programmable pull down resistor is 22K(min) 50K (max) and they are only available when the pin is configured as a digital input pin. So while that pin can be configured as the input to analog comparator/ADC, the voltage values cannot be trusted as the pin could be floating.

    A quick multimeter test shows the pin is at 1.1V for my particular MicroSD and K22 which means the the pull up is at 2X the pull down resistance. That puts it in the 44K to 100K range (which agrees with the MicroSD specs.)

    According to: with my PCB geometry, there is roughly 1pF/in x 4 in = 4pF for my /CS line. The actual number might be a bit higher due to the LVC1G11 , K22 I/O pad capacitance and coupling of nearby tracks.

    Here is my quick & dirty routine:

    Initially the /CS line is configured as an output line at logic low. This will discharge the parasitic capacitance to logic '0'. This will be the default value if the card is not there.

    The pin is then turned into an input pin. If there is a card, the pull up will charge up the capacitance. After waiting a bit, the port is sampled. If the pad is at a logic '1', then the card is detected.

    Quick measurement: (YMMV)

    The minimum value for loop count is about 0x0f when the code starts to detect the card (a few RC time constants for the cap to charge up). The maximum time the capacitor is about 28 ms before the leakages pulls it to logic '1'.


    FatFS by ChaN is an optional FAT file system for ChibiOS. Got FatFs demo code compiled, but

    Here is the init sequence sent. Note the first 5 bytes are 0x00. Probably not what the MicroSD expects to see. See SD Reference:


    /* Use dmaDummy as the source/destination when a buffer is not provided */    
    dmaDummy = 0;    ///  <-- This should be 0xff

    Note that sout value changed. This is because both the Tx Rx shares a common dummy variable and the Tx value get overwritten by the Rx DMA after the Rx FIFO was filled.

    Fixing them seems to get FatFS mounted.

    ChibiOS demo of FatFS here:

    Had to change a few lines as there are changes in the FatFS API. The board uses SDC and not MMC over SPI, but it is a matter of changing the few lines of HAL driver calls.

    f_mkfs() failed. Here is the reason: I forgot to enable the mkfs option in ffconf.h

    #define _USE_MKFS 0 /* 0:Disable or 1:Enable...
    Read more »


    K.C. Lee09/21/2014 at 20:00 0 comments

    Starting to work on DMA SPI driver for ChibiOS. I previously made some notes on the eDMA in K22 here:

    My DMA SPI code structure is loosely based on STM32F4 spi_lld v2 (which is DMA based) in the ChibiOS source tree. Dmaspi ( ) code is very useful as it tells me what the SPI and DMA registers should be set to. I wrote some simple code to dynamically allocate/deallocate DMA channels and managing the corresponding interrupt handlers.

    I have consolidated the SPI routines down to spi_lld_exchange(SPIDriver *spip, size_t n, const void *txbuf, void *rxbuf) as the send(), receive(), ignore() are just variations of it i.e. whether txbuf, rxbuf are provided which can be handled by different DMA settings.

    Here is what I got so far for sending SPI data via DMA for the following code: (data rate slowed down to allow for waveform capture)

    spiSend(&SPID1,sizeof(txbuf),txbuf); <--- Synchronization issue here

    Good news is that (tricky) DMA seems to work for this case, but the bad news is that the spiSend() synchonization is not working. spiUnselect() gets execute while DMA is still working away.

    spiSend() code:

    osalDbgAssert(spip->state == SPI_READY, "not ready");
    osalDbgAssert(spip->config->end_cb == NULL, "has callback");
    spiStartSendI(spip, n, txbuf);
    _spi_wait_s(spip); <-- this is supposed to wait until it hears from the interrupt routine

    Tx DMA interrupt code:

    /* Stop Tx DREQ */
    DMA->CERQ = spip->DMA_Tx;
    /* Portable SPI ISR code defined in the high level driver, note, it is a macro.*/
    _spi_isr_code(spip); <-- this is supposed to send give the code a go ahead.

    I am hoping it is a matter of setting the #define options or something simple as it is not easy following a mess of #define or doing instruction trace inside the OS code.



    Shortly after I posted this, SpaceCoaster already had working code. Procrastination sometime works.


    Ah. I understand why the spiUnselect() get executed early. There are 4 entries in the FIFO, so as soon as that gets filled up, DMA is done. No wonder why they use the Rx FIFO! Changed to Rx DMA interrupt and now it looks okay.


    SCK waveform vs Port setting: This is a multi-drop bus with long stubs driven from the ARM chip.

    High Slew rate/low drive strength: rise time of 8.5ns with no under shoot/overshoot

    Low Slew rate/high drive strength: rise time of 6ns with minor undershoot.

  • ChibiOS 3.0 tests/benchmarks

    K.C. Lee09/20/2014 at 21:28 0 comments

    I came across the directory "demos\STM32\RT-STM32F429-DISCOVERY" in the ChibiOS 3.0 tree and thought I'll try to compile it for the hell of it.  There is no USB stack (yet), so I am using the serial port.

    The demo is a shell with the ChibiOS test/benchmark suite which for the longest time I was trying to figure out how to run.  The following is a copy of the terminal session output.

    ChibiOS/RT Shell
    ch> info
    Kernel: 3.0.0dev
    Compiler: GCC 4.2 (EDG gcc mode)
    Architecture: ARMv7-ME
    Core Variant: Cortex-M4
    Port Info: Advanced kernel mode
    Platform: Kinetis
    Board: Soapbox Racer Rel 0
    Build time: Sep 20 2014 - 17:24:13
    ch> mem
    core free memory : 23968 bytes
    heap fragments : 0
    heap free total : 0 bytes
    ch> test

    *** ChibiOS/RT test suite
    *** Kernel: 3.0.0dev
    *** Compiled: Sep 20 2014 - 17:24:12
    *** Compiler: GCC 4.2 (EDG gcc mode)
    *** Architecture: ARMv7-ME
    *** Core Variant: Cortex-M4
    *** Port Info: Advanced kernel mode
    *** Platform: Kinetis
    *** Test Board: Soapbox Racer Rel 0

    --- Test Case 1.1 (Threads, enqueuing test #1)
    --- Result: SUCCESS
    --- Test Case 1.2 (Threads, enqueuing test #2)
    --- Result: SUCCESS
    --- Test Case 1.3 (Threads, priority change)
    --- Result: SUCCESS
    --- Test Case 1.4 (Threads, delays)
    --- Result: SUCCESS
    --- Test Case 2.1 (Semaphores, enqueuing)
    --- Result: SUCCESS
    --- Test Case 2.2 (Semaphores, timeout)
    --- Result: SUCCESS
    --- Test Case 2.3 (Semaphores, atomic signal-wait)
    --- Result: SUCCESS
    --- Test Case 2.4 (Binary Semaphores, functionality)
    --- Result: SUCCESS
    --- Test Case 3.1 (Mutexes, priority enqueuing test)
    --- Result: SUCCESS
    --- Test Case 3.2 (Mutexes, priority return)
    --- Result: SUCCESS
    --- Test Case 3.3 (Mutexes, status)
    --- Result: SUCCESS
    --- Test Case 3.4 (CondVar, signal test)
    --- Result: SUCCESS
    --- Test Case 3.5 (CondVar, broadcast test)
    --- Result: SUCCESS
    --- Test Case 3.6 (CondVar, boost test)
    --- Result: SUCCESS
    --- Test Case 4.1 (Messages, loop)
    --- Result: SUCCESS
    --- Test Case 5.1 (Mailboxes, queuing and timeouts)
    --- Result: SUCCESS
    --- Test Case 6.1 (Events, registration and dispatch)
    --- Result: SUCCESS
    --- Test Case 6.2 (Events, wait and broadcast)
    --- Result: SUCCESS
    --- Test Case 6.3 (Events, timeouts)
    --- Result: SUCCESS
    --- Test Case 7.1 (Heap, allocation and fragmentation test)
    --- Result: SUCCESS
    --- Test Case 8.1 (Memory Pools, queue/dequeue)
    --- Result: SUCCESS
    --- Test Case 9.1 (Dynamic APIs, threads...

    Read more »

  • eDMA, eh?

    K.C. Lee09/19/2014 at 00:19 0 comments

    The eDMA (Enhanced Direct Memory Access) Controller is probably one of the most complicated peripherals in the K22 series.  Since no one has done any DMA driver work for K2x in ChibiOS, I guess I'll have to tackle that.  Bare with me as I try to understand the eDMA controller.

    The eDMA controller is implemented as eDMA Engine (state machine + logic) which is time shared and a TCD entry for storing the context for each of the channels.

    1. At the beginning of each DMA request,  the current active DMA channel's context is fetched from the TCD.

     2. The eDMA Engine then perform the minor loop transfer.

     3. After each minor loop transfer, the current DMA channel context is stored in the TCD.

    Abstract view of the eDMA Engine:

    The heart of a DMA controller can be think of as a hardware implementation of the following pseudo code (C copy loops) as shown below: 

    The data width of the source, destination data are programmable as are the number of bytes (NBYTES) to be transfered.  e.g. If you want to copy from a peripheral that is 8-bit wide to 32-bit SRAM.

    Both the source and destination address can be (optionally) incremented (SOFF/DOFF). e.g. copy from memory to memory or fixed if you want to read from an I/O port to memory.  

    If you want to make a circular buffer, the DMA can perform (optionally) perform modulus (SMOD/DMOD) on source and destination address.  This is useful when you want to make a circular buffer in memory.

    The DMA controller can optionally cause an interrupt at half or end of major loop.  Optionally  another DMA channel can be triggered  at the end of the minor loop or major loop.  This is called chaining and you can use it for complex data transfer. e.g. Dynamic scatter/gather by using one DMA channel to load the TCD for another.

    The DMAMUX is a cross-point switch that routes the DMA requests from the peripheral devices to any of the 16 DMA channels which allows for a lot of flexibility to how the DMA channels are used in the firmware.

    While DMA is useful for high bandwidth data transfer (e.g. SPI at 24Mbps), here is a not so obvious application:

    DMA could be used to replace the bit-banging interrupts used in the PS/2. ( )  Instead of triggering an interrupt for each rising edge of the PS/2 Clk,  DMA transfers can be used to take snapshot of the GPIO port on each of the rising clock edges in memory.  At the end of 11 clock edges, an DMA completion IRQ can be triggered and the buffered GPIO port be processed in one go.  This could reduce the number of IQR by a factor of 11.  e.g. mouse report at 100 reports/sec x 3 bytes/report x 11 bits/byte x 1 IRQ/bit = 3,300 IRQ/sec can be reduced to just 300 IRQ/sec.

  • I2C Driver/Debugging

    K.C. Lee09/05/2014 at 03:44 0 comments

    Starting to work on I2C driver.  Not really sure about ChibiOS 3.0 K20 I2C driver status.  Right now I am thinking of writing one myself. There is at least 1 interrupt per byte transferred.  400kHz i2C means a burst of  interrupts at 40k/sec (25us intervals).  The percentage of CPU load can be reduced by dropping the data rate.

    It'll be non-thread safe i.e. handled outside of ChibiOS.  ChibiOS likes to have interrupt hooks (to get tasks rescheduled) and that can be too much overhead. Alternatively, DMA can be used, but it does not handle any error conditions. It is therefore only useful for onboard I2C devices that we can assume to be working correctly.  

    Note: I timed the time inside the interrupt to be about 1-1.25us per byte transfered and 2.25us for the final byte.  That time doesn't include the processor core overhead of entering/exiting interrupt routines. "A Beginner’s Guide on Interrupt Latency - and Interrupt Latency of the ARM® Cortex®-M processors"

    I2C is a complicated protocol, so some debugging is needed.  I2C_0 pins can be mapped to the PS/2 keyboard port and from there a scope/logic analyzer can be attached and be used to monitor I2C bus transactions.

    There are a couple of gotcha in programming the I2C driver that the reference manual doesn't mention.

    For the read mode after sending the Command Byte, you would need to issue a Repeat start.  To do this, you actually have to set TX| bit!   To tell the slave device that you want to read data, you have to issue the Control Byte again with the LSB set to '1' for a read mode.

    // Restart
    I2C0->C1 = I2Cx_C1_IICEN|I2Cx_C1_IICIE|I2Cx_C1_MST|I2Cx_C1_TX|I2Cx_C1_RSTA;
    // resend device address + read mode
    I2C0->D = i2c_state.I2C_Addr|I2C_READ_MODE;

    Because the internal state machine reads in the I2C data and issue ACK/NAK, there is a pipeline delay of 1!  On the next interrupt from the I2C hardware following the Control Byte, you need to do a dummy read (to flush the pipeline)!

    For the last byte of data, you have to NAK the slave.  But because of the pipeline, you have to set the NAK when you are servicing interrupt for the second last byte!


      I2C0->C1=I2Cx_C1_IICEN|I2Cx_C1_IICIE|I2Cx_C1_MST|I2Cx_C1_TXAK;   // set NAK before the next byte get sent!

    The shell program that comes with 2.6.5 is not in the 3.0 tree probably because of compatibility issues.  Thankfully, the shell I wrote for 8-bit microcontrollers can be used now that the crashing has been fixed by the new port.

    The following test shows I2C sending a command byte 0x55 and then read 4 bytes of data from my DS1631 at I2C address 0x90 (control byte).

     S = Start, A= Ack, N= NAK (no Ack) for the last byte read from chip, P= Stop.  I2C uses LSB of address for R/W (0 = Write, 1 = Read) 

    On my board, the I2C bus can be used to access GPIO and PLL.  It is also connected to FPGA.  Both the ARM and the FPGA can be I2C master(s) or slave device(s) with the right firmware/HDL. 


    The Gameport/GPIO is connected to the XRA1201 16-bit GPIO Expander, a very nice $1.50 part with 5V tolerant inputs.  All of the pins can be individually programmed as input/output, build-in (weak) pull-up and interrupts for rising/falling/both edges.  The /IRQ is wired to the ARM so that it can keep track of the GPIO status by interrupts (instead of polling).

    It is possible to support quadrature encoded mice, character LCD and keypads etc.  3.3V and 5V  (protected by PTC are available on the connectors.

    The I2C address of the XRA1201 is 0x40 as A[2:0] are strapped to GND.

    Registers GSR1 & GSR2 can be used to read the GPIO pins under software.   I shorted the individual pins to ground and read back these register.  GPIO now passes the live test 

    ( I checked the continuity between the chip and the connector previously by using the diode test on my multimeter on the ESD diodes on the board without power applied:

    Read more »

View all 35 project logs

  • 1

    I have optimized the PCB for those $20-ish Chinese proto PCB deals with 8/8 spacing and 12 mil vias. This board also only have components mounted on primary side for low cost assembly.

    This is an advanced project for the DIY as it requires a custom PCB and a lot of fine pitch (0.5mm) soldering/reflow and not for beginners.  There are also lot of 0402 decoupling caps, 0402 terminating resistors and QFN packages.

    See project log for the built :

    tl;dr (aka excutive summary)   

    The most common soldering issue is often caused by too much solder.  Too much solder can lead to solder bridges and that can be very messy for fine pitched parts.

    I am trying to eliminate that with my experimental soldering method as this prototype has a limited budget for stencils or solder paste. I did some previous experiments here:

    The experiment was inspired by this video.  I learned to reflow the QFN from watching the video. 

    My experiment supported the idea that the amount of solder on the PCB pads is sufficient to form a solder joint.    

    Instead of aligning and tag down a couple of pins on the package, I have decided to solder all the individual pins.

    1. coat PCB with thin layer of solder.  Make sure that there are no solder bridges.  The pads will acquire a thin coating of solder which happens to be around the right thickness.

    2. Clean off the rosin based flux with acetone and let it dry.

    3. Place parts (fine pitched first, then followed by smaller passive one at a time)

    4. Line up the package, add flux paste to pads. This is not an easy task as the pads is convex because of the solder.

    5.  Solder by reheating pad and lead one at a time.  My iron is set to 550F (290C). This is not drag soldering, so I do not add solder to joint.  I make sure that the soldering iron tip has a solder coating to protect it.

    For passives, I make sure that both leads are wet so that it won't tombstone in reflow.

    For leaded chips, gently push pins towards to package. This puts a gentle pressure on the pins towards as the rest of the package is lifted up by the thickness of the solder.

    For QFN (leadless) parts, I used  my Welller butane tool with heat gun attachment.  See the usual youtube video for instructions.  It works well on the 2 QFN chips and the tiny leadless crystals.

    Since no solder is added, I have not encountered any solder bridges even for fine pitched (0.5mm) parts!

    6. Do not solder the through holes parts yet as they might or might not be designed for reflow temperature.

    Reflow the board in the toaster oven.  This helps to self align the tiny passive, solder those thermal pads under chips. Also help the chips reseat/relax a bit better as I bent their pins slightly when I solder.

    I use my thermal couple to measure the board temperature.  I poke it in the ground pin of a through hole connector.  This allows it to make good thermal contact to the ground plane.

    7. clean board.

    8. Solder in the through hole parts

    9. clean board.


    I don't have a video camcorder, so no video for the soldering.  Regular camera requires holding down the button to record and only up to 90 seconds, so not the kind of things I want while soldering.

View all instructions

Enjoy this project?



Joao Ribeiro wrote 06/27/2016 at 16:12 point

Do you think that your design would lend itself to run on a Cyclone IV with 64Mb RAM and a few other toys but more importatnly, with 4 Cyclone 2 for additional horsepower and even add-on bus expansion connectors along the conceptual lines of the Zorro III "family" and for obvious reasons PCIe 

  Are you sure? yes | no

K.C. Lee wrote 06/27/2016 at 17:11 point

Probably, but I would start with DDR memory.  The Cyclone 3 was the largest non-BGA part at the time with sufficient I/O.  This board was my first attempt at a PCB ordering and first time doing that many fine pitched SMT parts.
The BLVDS was intended for bus expansion, so this board would become the I/O board. There are Terasic boards came out with more memory/larger FPGA shortly after this.  With additional latches, PATA could also be hijacked for Zorro II type of bus expansion.

You would need the BGA part for PCIe as most of the leaded parts do not come with SERDES.

  Are you sure? yes | no

K.C. Lee wrote 06/20/2014 at 22:42 point
There should be plenty of left over PCB for the before picture as I am only planning on buying parts for 2. Probably also going to be taking some pictures for the stages of reflow assembly if/when I get to that stage. I am going to plating/coating a board with regular solder and use no clean flux instead of the much more expensive solder paste. I think I can have much better control on the amount of solder (than randomly applying paste with a toothpick) this way. Fingers crossed.

  Are you sure? yes | no

Adam Fabio wrote 06/09/2014 at 03:44 point
Thanks for entering The Hackaday Prize! Don't forget to upload some pics of the actual board when you get it (I'd love to see before and after you build them up)

  Are you sure? yes | no

Similar Projects

Does this project spark your interest?

Become a member to follow this project and never miss any updates