SD card interface with high-speed storage (>50MB/s) on a host computer via gigabit ethernet
To make the experience fit your profile, pick a username and tell us what interests you.
We found and based on your interests.
Pleased to discover there are ready to use, non-emulation SD card mux devices available!
These are great for 99% of the use cases folks message me about.
SD Wire
https://www.tindie.com/products/badgerdnl/sdwire-usb-c-sd-card-reader-sd-mux/ ~ $95
https://3mdeb.com/shop/open-source-hardware/sdwire/ ~ 89 €
USB-SD-Mux from Linux Automation GmbH ~ 100 €
https://shop.linux-automation.com/usb_sd_mux-D02-R01-V02-C00-en
Although utilizing SD emulation as this project does, provides benefits such as speed and durability, the related costs and complexity outweigh these advantages. It only makes sense to use SD emulation if the cost of not having these features is higher.
For ready-to-use applications check out the SD Card Wifi Adapter.
caveat: It uses SPI mode (low speed) so Its slow and sometimes glitch-y.
One goal of this project was to be accessible to lots of people. The total BOM cost using off the shelf components is < $100 (USD):
* (micro)SD card extender
* FPGA [ know of an open source FPGA that works here? Please share! ]
(1) Connect your (micro)SD card extender to your FPGA pins
(1) b. Map out your pins
(2) Load up the open source SD card emulator onto your FPGA
(3) A driver on the host machine that can do 3 things:
Read
Write
lock / unlock the device
open source release coming soon
Fun fact: Consultants quoted this project between $25k and $250k and anywhere from 3 months - 18 months. No one could guarantee it would work..
Here is open-source code if you're interested in doing this yourself: https://github.com/enjoy-digital/litesdcard/blob/master/examples/arty.py#L109 (Thanks to Florent @ Enjoy-Digital + Ramtin and PO @ Lamda Digital for making this available)
Should work on any capable FPGA, Altera and Xilinx are what we have here.
Detailed update to follow.
Create an account to leave a comment. Already have an account? Log In.
Are there any more details on this? The log is a bit sparse!
Where are the files and PCB, can this be duplicated ?
This is a really neat idea, you say its completed but I don't really see anything in comments to pick up on.
I think stuff like the Phison SD-card controller chips will be unsuitable because the CPU is there just to do setup stuff (at quite low speed) but for the main job of sector read/write it'll be a pure hardware datapath (including doing the ECC etc), the CPU will be pretty much just waiting for the hardware to finish.
The FX3 is a good choice; if you were ok with max 40MB/sec you could use an FX2 (which are dirt cheap) - (I achieved 40MB/sec writes from a linux laptop on an FX2 but it'll be a little slower on other platforms). The FX3 eval board is I think only about $40, I have one in a drawer somewhere. The FX series are specifically designed to be master or slave FIFO data pumps to a 8/16/32 bit bus, usually you interface them to an ASIC or FPGA. They're quite flexible (they have programmable hardware for the FIFO interface) but I don't think they'll do 4-bit conversion and the max bus rate is 100mhz (FX3). Actual throughput is generally limited by the host PC but I got well over 100MB/sec out of my FX3; it's significantly faster than Gig-E and much less complicated to get working. Doing the 'last step' of interfacing an FX3 to an SD interface you'd use a really simple CPLD/FPGA - it's a pretty trivial task, you're just converting bus widths, so a $5-$10 chip would probably be ample, and you could probably do the CRC in there while you're at it. I've not really looked at it but I'd probably use the FX3 in 16-bit FIFO mode; you could go 32 bit but it's more wiring and the performance (100Mhz x 2 bytes) is already likely faster than any device hosting the SD card can go. This could be a small, cheap interface PCB (connects to FX3 eval board); you could even use a thin PCB and have it be SD-card shaped. Doing your own FX3 PCB is non-trivial as it's a BGA and USB3 is very high speed.
That Atmel you found - I didn't find any reasonably priced eval boards (>$500 and up), it's a BGA so rolling your own board is non-trivial, it looked like it required external DRAM, etc, etc. To me it seems to be not a great fit for your goals.
How much money are you prepared to put into prototyping this? How many do you expect to make/sell? If it was me I'd be looking at using an off-the-shelf board (either BeagleBone or FX3 eval board) for the hard stuff and make a simple/cheap interface PCB to do the SD interface bit.
The FX3+CPLD route will give you the highest performance and is relatively simple to implement. The Enhanced Beaglebone may work with no extra CPLD/FPGA required (you may be able to hook it directly to an SD card connector) and I expect you can get very good performance out of it (>30MB/sec).
You seem very interested in performance but at some point you're going to hit the limits of what your SD host device will do. Given that even a $5 FX2 board can push >30MB/sec connected to a very cheap CPLD/FPGA, that would be your lowest cost solution (e.g $20-$30 BOM). The FTDI FT232H is quite similar to an FX2 for your purposes (although the FX2 is probably preferable as it has a cpu core - albeit a crappy one - and 16-bit output bus).
I suspect there aren't that that many host devices that will read an SD card at 50MB/sec. What sort of hosts are you wanting to connect to - I assume linux SBCs that boot off sd?
So I guess I'd say;
Simplest (possibly no extra hw required at all): Beaglebone Black/Enhanced
Cheapest (USB2): FX2/FTDI (cheap ebay board) + CPLD
Fastest (USB3): FX3 (CYUSB3KIT-003 board, $45) + CPLD
Personally if I was going to make one I'd do the FX3 option; the CPU in the FX3 is considerably nicer than the one in the FX2 and it'll go insanely fast; people have got over 300MB/sec out of them. If you intend to roll your own board with everything on it, I'd say consider the FX2/FTDI option because the PCB will be significantly easier to get working due to lower speeds (and you can avoid using BGAs).
BGAs aren't too bad.. I've baked a few PCBs and graphics cards.
Maybe I'm missing something basic but can you explain what you mean by: "for the main job of sector read/write it'll be a pure hardware datapath (including doing the ECC etc), the CPU will be pretty much just waiting for the hardware to finish."
What is a pure hardware datapth? and what hardware is the cpu waiting on to finish?
The SD spec states 3.3v signaling at 400KHz for initialization and then switching to 1.8v for high speed data transfer at: SDR104/104MB/s (208MHz), SDR50/50MB/s (100MHz), SDR25/25MB/s (50MHz), DDR50/50MB/s (50MHz), and lower speeds as well. Then commands for read/write, sector address and length are issued and then something has to interpret that and respond with the data.
Data doesn't magically go from one place to another (this isn't a quantum computer) so a controller must be driving/controlling data bits (via some signal encoding) back and forth from the SD interface(commands like read/write, address and length) and the storage (flash). If this isn't how it works and this isn't the PHISON controller's doing, then how does the data get transferred?
In regards to the CRC checking , does a hardware data path mean a circuit that adds the crc bits to the data as it passes through? How susceptible is signaling over the interface to errors? I know NAND is fairly unreliable so it makes sense to have it on there. Ethernet over short runs is fairly reliable and it seems USB is to from reading the spec (<5m cable) but still includes a CRC in the data packet.
The FX2 boards were $5 so I picked up a few, it'll be cool to learn how USB works in practice. I'm investigating some CPLDs and FPGAs though CPLDs probably wont work because of the shift in signal level after initialization.
By "pure hardware datapath" I mean the CPU in a commercial SD card (very likely) won't be handling the sector data directly when doing bulk reads or writes (it's too fast for a CPU to reasonably do it), the CPU would just do the register setup at the start of each sector for a hardware engine that will read from flash (possibly several flash dies in parallel), do the ECC, calculate CRCs, and output to SD, and signal when it's done - the point being that you (likely) wouldn't be able to reprogram these to get the data from a source that wasn't a flash chip. I obviously don't know for sure, but if you're doing many tens of MB a second you'd have to have an extremely fast general purpose CPU to do all this; a non-programmable hardware pipeline seems much more likely here. Obviously the CPU would probably be capable of accessing both SD and flash interfaces, but more for housekeeping/setup tasks than the bulk data transfer.
Secondarily of course there's the question of how (if reprogramming an existing SD card controller) you'd get the data in/out of it to your back-end emulation store.
I imagine SD card manufacturers would not expect to get signal integrity errors anywhere except from the flash itself (especially in the case of MLC flash - a lot of errors) and perhaps the SD-to-host interface. I expect in a properly electrically designed configuration running within spec you would not get a significant number of CRC errors between SD card and host (it's typically a very short path physically of course). With USB and ethernet (well any modern external interface running over any moderate length of wire) a CRC is included because 'why not' - it's basically free in hardware (very simple logic) and of course with those interfaces you'll always get some errors from EMI and other sources. SD has some advantages (shorter wires plus it's fully synchronous so you don't get potential clock recovery issues).
The FX2 is a very handy board - you usually have a small EEPROM on there so you can set the USB VID/PID, and then you have a driver for that VID/PID which downloads some FX2's firmware (it uses an 8051 MCU, runs out of internal ram, you can use the SDCC compiler) which further configures things (often this will set a different USB PID and enumerator), including setting up the GPIF interface. There's a handy linux package called "fx2pipe" which sets it up to be a simple data pipe; I used this a few years ago and you can get ~40MB/sec writes (slightly slower reads). From there you'd hook it up to a fairly simple CPLD (probably 16-bit bus width to the FX2, plus a few control lines) and use that to do (at least) the data bus width conversion to/from 4-bit, and possibly the CRCs too. Trying to calculate the CRC with the FX2's CPU isn't going to fly; usually with these chips you just configure the data path (usb endpoint <-> GPIF interface) and let it do its thing fully automatically (minimal housekeeping by CPU).
One thing that will impact performance with FX2/FX3 is the latency when the SD host requests a sector; because USB is polled by the host PC there are some unavoidable delays in the process, but they won't be hugely significant especially if you can do something like optimistically assume that the next sector requested will be the one after the one just read (i.e. pipeline it) and have some sort of abort/refetch in the FX2 if this isn't the case.
You really should check to see if the host devices you're talking about actually support UHS speeds/voltage signalling; for example the RPi 1-3 doesn't, Allwinner A20 boards don't, etc. Many SBCs actually max out at 25MB/sec (50Mhz 3v3). If you only intend to support this it'll be significantly easier to get working (walk before you can run).
Finally I see people have been saying to you (e.g. on EEVBlog) "Use UBoot and load off ethernet!" and they do have a point; I think there's a reason that there's not many SD card emulators about, because it's a very limited use case when network booting works well.
After more research from the SD spec and from this http://www.eetimes.com/author.asp?doc_id=1283050 + http://www.microchip.com/forums/m497059.aspx . The CRC7 can be a 256 table lookup or calculated on the fly by feeding data into a register > xor > shift to get the CRC value.
BeagleBoneBlack ($62) looks interesting, I didn't know about the two PRUs. My concern is the PRUs are limited to 200MHz which might require dropping down to SDR50 - 50MB/s - 100 MHz vs SDR102 - 102 MB/s - 208 MHz.
Also looking into FPGA which can emulate SDR102 but at significantly higher complexity and BOM cost. These look suitable:
http://www.myirtech.com/list.asp?id=502 - $99
https://www.terasic.com.tw/cgi-bin/page/archive.pl?Language=English&CategoryNo=163&No=941&PartNo=2 - $99
Buffering shouldn't be too RAM intensive, the SD host waits for data at as long as the slave (sd card) sends busy/wait command. With gigabit ethernet latency <1ms and bandwidth 112+ MB/s (tested linux to linux with iperf) so SD w/r 50-102 MB/s <-> ethernet >112 MB/s which would mean lead buffering on reads. This can be an optimized ring buffer for 1s which is probably more than necessary.
Of course, this is all theoretical until prototyping happens :D
I can't think of anything you can really use _except_ an FPGA if you want the kind of speeds you're talking about; obviously that significantly increases the cost/complexity of your project - depends what your goals + limitations are. Just possibly you could figure out a hack using SDRAM sticks (see the hacks on here using SDRAM sticks directly as a video generator or logic analyzer). The BB Black is attractive because it'd be pretty much "just" software - not trivial (optimizing the PRU programming would be quite fun) but minimal extra hardware and low cost. There is an enhanced BBB with Gig-E and 1GB ram that may be a better choice if you want to emulate large SD cards (e.g. full linux images). see https://beagleboard.org/enhanced
An FPGA board is the most obvious choice if you insist on the highest speed but you'd want gig-E on there which isn't trivial to do nor cheap if you buy off the shelf.
If it was me I'd try the enhanced BBB to see if the PRUs are up to it (note there is some latency accessing the main DRAM via DMA), and if it couldn't meet my needs I'd go FPGA. It's not a trivial project given what you're asking.
Oh... mind you - there is one other choice although the sector-read latency might be a little high and impact performance - Have a look at the Cypress FX3 (USB3); you can get an evaluation board from Cypress cheaply. You may be able to use that, possibly you might need a (small, simple) FPGA/CPLD just to format/serialize the data correctly but the bulk of the work would be the FX3 - it's an easier to use data pump than Gig-E.
I originally planned on using a USB3 interface but I couldn't find any reasonable FPGA boards with USB3 and the FX3 was in pre-production at the time. The example application diagram in the Cypress FX3 use cases looks very familiar http://www.cypress.com/file/136056/download :) Coming from software I have a better understanding of ethernet than USB.. what are the advantages vs ethernet? I would guess guaranteed bandwidth (unless on a hub) and lower latency?
I've been flip-flopping between micro-controller and FPGA. From bunnie's blog https://www.bunniestudios.com/blog/?page_id=1022, https://www.bunniestudios.com/blog/?p=3554, https://www.bunniestudios.com/blog/?p=2297, http://bunniefoo.com/bunnie/sdcard-30c3-pub.pdf the SD-to-flash controller is an 8051 or ARM7 CPU. A modern UHS-I card uses a Phison UHS-I/PS8035 or similar http://www.phison.com/English/newProductView.asp?SortID=59&ID=233, http://goughlui.com/2015/05/23/unintentional-teardown-repair-kingston-128gb-uhs-i-sdxc-card/. "PHISON's PS8035 SD-to-Flash micro-controller specially designed for SD Card and embedded NAND applications." so this is an ASIC = more likely emulated via FPGA vs micro controller. However, this is handling complex wear leveling and accessing multiple NAND banks -- where as this use case is packaging the 4-data signals into an ethernet frame (~9000 bytes if jumbo), making me wonder if a microcontroller is feasible.
Someone pointed me towards the Atmel SAMA5 which is a 500+MHz microcontroller with integrated Gigabit ethernet (http://www.atmel.com/products/microcontrollers/arm/sama5.aspx). If its possible to use this, the reduced development time and BOM for using this chip makes it very attractive.
Anything >50MB/s is already besting any SD card on the market and more importantly, this method eliminates manually swapping the card which is the biggest win.
I looked at doing this a while ago; obviously you need to emulate 4-bit mode to get any kind of performance. Some issues with doing it all in software in an mcu include the requirement for all packets to have a CRC field which is calculated in a slightly odd way on each bit (see SD specs), perhaps this could be precalculated and stored with the data if the SD is read-only. Clearly you also need a ton of RAM for the emulation to be useful - hence a microcontroller is probably out. I'd suggest looking at a Beaglebone Black, which has plenty of ram plus - specifically - two PRUs which may do the trick.
Pre-calculating the CRC is an Interesting idea, but it only works if it's known beforehand what the target system will read. If that's the case, and you have a network connected SD emulator, RAM is only needed as a buffer.
IIRC the crc will be constant for any given sector read. Anyway, I think beaglebone black is the first thing I'd try, seems by far the most suitable hw for the job.
FYI:
"Here's the answer to CRC-7 algorithm for the SD Card"
http://www.eetimes.com/author.asp?doc_id=1283050
There is a downloadable link with C source:
http://m.eet.com/media/1077846/crc7expl.zip
This is a background material
"SD Media Format Expands the MAXQ2000's Space for Nonvolatile Data Storage"
https://www.maximintegrated.com/en/app-notes/index.mvp/id/3969
Become a member to follow this project and never miss any updates
Hi, I am very interested in this project.
Do you have schematics?