SD card secure RAID USB storage

Similar projects worth following
This project is for data what the Sankara Stones in "Indiana Jones and the Temple of Doom" were. It is a USB SD card reader, but it requires two cards. The data is striped in the style of RAID 0, but the data is also encrypted with a key that is stored in a key storage block on each card. In essence, each card is useless without the other. With possession of both cards, the data is available without restriction, but with only one, the remaining data is completely opaque.

This allows you to securely transport a data set by writing it onto a pair of cards and separately transporting them to a destination for recombination.

The intent is that only the pairing of two cards becomes in any way special. A card pair could be inserted in any Orthrus device and the data would be made available. But with only one card, all you get is half of the data encrypted with a key which you only half-possess.

The basis for Orthrus is an ATXMega32A4U. This is an AVR with a built-in full speed USB interface and hardware AES support. It also has 32K of program memory and 4K of RAM. It has other peripherals, but of chief interest for us is that it has hardware support for SPI and more than a few GPIO pins. At first glance, it would seem to be ridiculous overkill for what we want to achieve, but it's our choice because it's the minimum available chip that includes the AES accelerator, and being able to do AES at north of 1 MB/sec is worth it. Hardware AES not only runs at more than 100 times faster than my software implementation, it also can proceed in the background, leaving the CPU free for actual I/O.

To review, SPI works by sharing 3 lines among all of the peripherals - MOSI, MISO and SCK. In addition to that, each peripheral on the bus has a unique chip select line (usually active low, so !CS). For each cycle of the CLK line, one bit is shifted out from the master over MOSI to the slave, and at the same time a bit from the slave is shifted out to the master over MISO. There are four choices for configuration of the polarity and phase of the clock signal relative to the setup and sampling of the two data lines, but the SPI system in the controller will generally shift a byte at a time. Since the AVR SPI system is only single-buffered, there will be inter-byte gaps as the data from the peripheral is read and/or the next byte of data to be written is set up.

The controller requires 3.3 volt power. Since that's also what the cards want, there's no need for level shifting and the entire system can run from a single supply. Two SD cards and a rather beefy controller are probably pushing things for an LDO, however, but a buck converter can be used with almost no extra boards space. It is a good idea to provide a mechanism for the controller to turn power to the cards on and off. This way, power can be applied only once the two cards have been inserted. If one card is removed, power to both can be dropped, insuring that both cards will cold-start once the second is installed. We can use a P channel MOSFET as a power switch, and an AP2331 current limiting switch will insure that any inrush from the cards won't impact the supply rail for the controller. Since most of the pins we use are only general purpose and we barely use half of the available pins, we have the luxury of picking pins mostly for convenience. Since the USB port is only in one spot, we place the chip so that those are close to the USB connector. The remaining available SPI pinning is convenient for the SD cards, fortunately, and the rest of the connections can be selected to be near the peripherals in question. That puts the card related lines on port C near the SPI, the LED, switch and card power on port A, and the random number generator on port D. We'll use the USART on port E as a diagnostic I/O for development. It will be put on the two unused pins of the 2x3 PDI programming interface.

All that would be enough to simply provide a RAID 0 double SD card reader, but for the cryptography we need to be able to generate keys. This means coming up with random numbers, and when they have to be cryptographic quality, that's always non-trivial. I've decided to add a hardware RNG. This is done with a transistor in an avalanche configuration as a noise generator. The noise is fed into a second transistor to amplify it, followed by a pair of AC coupled self-biased inverters to amplify it further, before it's fed into one of the controller's GPIO pins. The inconvenient part of this circuit is that it requires a (relatively) high voltage supply. An AP3012 boost converter is used to make around 20 volts.

The first block of each card is reserved as a key storage block. The size of the volume reported to the USB host is determined by taking the smaller of the two card sizes, subtracting one and doubling that. The first block on each card contains...

Read more »

Orthrus v2.0.pdf

Schematic of the 2nd prototype

Adobe Portable Document Format - 59.07 kB - 04/19/2017 at 22:00

Preview Download


AES implementation for AVR - ECB Counter mode and CMAC.

- 50.95 kB - 04/03/2017 at 04:25


View all 2 files

  • Firmware rewrite

    Nick Sayera day ago 0 comments

    I rewrote the firmware today for the 32A4U. Of course, I did this in a complete vacuum, as the hardware isn't here yet.

    The firmware is much smaller, as would be expected given the removal of all the code to do AES in software. It takes around 11K of flash and uses around 1.5K of global RAM.

    I originally chose the 32A4U because it was actually cheaper than the 16A4U, but I'm glad I did - not for the additional flash space, but for the extra RAM.

    The AES system is set up to run almost entirely in the background with DMA both loading in the nonce data and reading out the pre-ciphertext into a buffer for XORing during I/O. DMA completion interrupts ping-pong two DMA channels to make this happen, incrementing the counter within for each block.

    I've also got an interrupt driven ring buffer system for the diagnostic serial output so you don't have to wait around for it to transmit. Not sure yet what use I'll make of that, but it may be quite handy.

  • Random validation

    Nick Sayer4 days ago 0 comments

    The missing part came in for the hardware entropy source. I had it generate 5 MB of unwhitened data and ran ent on it. This is what I got back:

    Entropy = 0.970410 bits per bit.
    Optimum compression would reduce the size
    of this 41943040 bit file by 2 percent.
    Chi square distribution for 41943040 samples is 1708712.21, and randomly
    would exceed this value less than 0.01 percent of the times.
    Arithmetic mean value of data bits is 0.3991 (0.5 = random).
    Monte Carlo value for Pi is 3.638902145 (error 15.83 percent).
    Serial correlation coefficient is -0.016616 (totally uncorrelated = 0.0).

    I also asked ent to generate a histogram:

    What this shows is a slight bias towards generating a zero a little bit more often than generating a 1. Still, when this is whitened, the result should be good enough to be a reasonable key generator.

  • Prototype v2 design

    Nick Sayer5 days ago 0 comments

    I've got a potential design for the v2 prototype. I just want to get some confirmation from Microchip that I've selected the correct chip. The schematic is in the files.

    Since the XMega core runs at 3.3 volts, we can do away with the logic level translation stuff we had on the v1 prototype. It has an internal 32 MHz oscillator and separate clocking for USB, so we don't need to connect a crystal. Since the 3.3 volt supply has to drive everything, we'll upgrade from an LDO to a PAM2305 buck converter. So the incoming USB power simply goes straight into the two switching supplies - the HV supply for the entropy generator and the 3.3 volt logic supply.

    I've confirmed that my compiler supports the 32A4U, and that I have PDI support in avrdude. I do have to select and get (or make) a PDI programmer of some sort. I've got a pile of 32u2s, so I could make an AVR mkII clone - there are a few out there that use LUFA based firmware and are open.

    After basic functionality is working, one thing I hope to try to explore is using the USART in SPI master mode and remapping it to the SPI pins. The reason for this is that the USART is double-buffered, which would potentially eliminate the inter-byte gaps and increase performance. We'll see how that works out eventually, I hope. I still think prospects for initial performance of ~300 kB/sec are good. I may, however, be able to reduce the inter-byte gaps during transfers by being careful about ordering operations. If I perform the SPI write immediately after the SPI read and then prepare for the next operation before waiting for the SPI operation to finish, I can use the time while the hardware is shifting bits out profitably. With that, the inter-byte gaps might not be so bad. Given that the bulk read/write operations have to have an XOR operation for the crypto in the middle, it's already a job the DMA engine can't really help with.

  • ATXmega

    Nick Sayer6 days ago 0 comments

    You have to read datasheets carefully.

    Not all ATXMega chips come with the AES engine, it turns out. I haven't absolutely confirmed it yet, but one contender is the ATXmega384C3 ATXmega32A4U. Unfortunately, it's a TQFP64 TQFP44 - a slight size boost from the 32U2. It's quite an upgrade from the 32u2, but it's only $3 at Digikey (Q:1). Since it does have the AES engine built in, the prospects are excellent for acceptable performance. Unfortunately, the XMega is still only a full-speed USB device, so it'll never beat ~ 1MB/sec, but with no crypto and an 8 MHz SPI clock the 32u2 is hitting 150 kB/sec. It may not only be possible to increase the SPI clock speed with the XMega, but it has a DMA engine, which may make it possible to get rid of (or at least reduce) the inter-byte gaps.

    Of course, I have to figure out whether the current code can be massaged to compile for the XMega, and I want to figure that out first before making a prototype.

  • Rude awakening

    Nick Sayer6 days ago 0 comments

    Well, after looking at some of the access traces on a scope, I saw groups of 16 byte blocks being sent with huge gaps between them. Of course, 16 bytes is the block length of AES, so that was sort of to be expected streaming the data through AES counter mode, but those gaps are on the order of 10 ms long.

    To test out the impact, I commented out the AES block computation that takes place in the counter mode code (effectively gutting the encryption), and sure enough, Orthrus is 5 times faster. It's not a speed demon, by any means, but the performance is actually borderline acceptable (~150 kB/sec instead of ~30 kB/sec).

    So... I don't know which direction I'll go from here. I sort of believe the AES engine I'm using is reasonably performant for a software implementation. I'm not sure how adding a hardware accelerator is going to improve things, given that you have to talk to them over some sort of serial bus (either SPI or i2c). Getting a faster chip seems like the most obvious move, but I'm not sure there's a good fit in the AVR family other than moving into BGA territory, which I'd rather not do. Moving away from AES means moving away from a world class crypto engine, which is unacceptable.

    Some thinking is definitely called for.

  • First results

    Nick Sayer7 days ago 0 comments

    It's very rough around the edges, but I am able to perform some basic I/O on my mac with the prototype.

    The speed is as I feared - I'm getting around 27 kB/sec with sequential reads (dd). It takes a spectacular amount of time to even just format a FAT partition.

    But it works.

  • Improvements from the prototype

    Nick Sayer04/17/2017 at 18:36 0 comments

      I'm making some improvements from the prototype given some of the lessons learned so far.

      1. I'm going to add an AP2331 current limiter to the 3.3 volt LDO output for the cards, just to insure that the host is protected from any transients that may happen when the cards are inserted and/or removed. To that same end...
      2. I'm going to add a power on/off control from the controller to allow it to power up and down the cards. This is simple - all it takes is running a GPIO line to the enable pin on the LDO. The controller will keep the power off on both cards until both are inserted, and it will turn the power off when one is removed. This makes for much easier recovery if something gets wedged.
      3. I'm going to replace the diode+pull-up shifters on the MISO and SCK lines with a buffer chip. This will make the transitions faster, which may be important for high speed stability. The card select lines will stay as they are. The pull-ups on the CS lines will be 1 kΩ instead of 10 kΩ to keep them reasonably fast, but that's probably less critical.

  • Firmware progress

    Nick Sayer04/17/2017 at 07:07 0 comments

    I spent the whole evening hacking on the firmware. With the help of a ton of online resources and looking at other folks' implementations, I got a handle on SD cards. It's not quite as easy as it might seem - particularly because there are two cards on the same bus with separate !CS lines. From a strictly SPI perspective this would seem to be a non-issue, but the problem is that SD cards are not SPI devices until you make them SPI devices, which involves sending a command with the !CS line low. Sending a command with !CS high results in that not happening. So we have to send that initial command to both cards simultaneously, which is... not really kosher, since both will drive MISO, but it can't be helped, and seems to be working ok. After that initial command forces both cards into SPI mode, the commands are repeated individually, but at this point the cards respect !CS, so everything works as expected.

    Orthrus will only support SDHC or SDXC cards (at least at first). I'm not going to try to support older and smaller cards, as it's more trouble than it's worth.

    The state of the code at the moment is that it will wait for you to stick two cards in, then it will attempt to prepare the volume. If the key block tests fail, the error light will turn on. Otherwise, the ready light will turn on. In either case, if you push down the button, the error light will start to blink. After 5 seconds, the volume will be re-keyed. You can pull either card out at any time and the lights will go out. You can swap the cards around and they'll still work.

    Next step is to integrate this code into a LUFA mass storage driver.

  • Wear leveling as an attack

    Nick Sayer04/14/2017 at 19:38 0 comments

      Wear leveling is a technique to spread the writes around a flash storage volume. Reading flash memory causes no reduction in lifespan, but flash memory can tolerate a non-infinite number of writes before it stops working. Since filesystems tend to make write "hotspots" around the filesystem metadata, wear leveling is a technique by which each write of the same block is translated to a different location within the flash array. Where wear leveling becomes an attack is in two ways:

      1. If you can obtain direct access to the flash array, bypassing the controller that does the wear leveling, you could conceivably read previous "versions" of a block, providing the chip knew that it could use "deallocated" blocks as replacements that didn't need their content preserved.
      2. If the controller was "lazy" and allowed reading remapped "deallocated" blocks (at their mapped locations) rather than always returning a fixed value for a read on deallocated blocks.

      If either of these happen, then it's conceivable that a re-keying of an Orthrus volume would leave behind an accessible copy of the previous key block.

      Orthrus itself doesn't support TRIM operations (I honestly don't know if it's supported generally on SD media), but in principle, by doing enough writing onto a volume without using TRIM, you could eventually cause all blocks to be marked as used, which would mean that the controller would be unable to be "lazy" about moving blocks without swapping them.

      If you're particularly paranoid, then the thing to do is to treat an Orthrus card the same way you'd treat an ordinary hard disk that had the secret on it. The only protection Orthrus offers is that if someone has only one card, they don't have the key material that's on the other card, and therefore can't get at any of the data. If you re-key cards, then because of wear leveling, you can't be absolutely sure that that destroyed all traces of the previous key material. If your threat model is that powerful, then you should be destroying the cards when you're done with them.

  • Speed

    Nick Sayer04/11/2017 at 05:38 0 comments

    Let's just get this out of the way. Orthrus, at least when compared to modern SD card readers, is going to be slow.

    The ATMega32U2 only supports full-speed USB, which is 12 mb/sec. That's more like at best 1 MB/sec. Compare that to a benchmark of SD cards in Raspberry Pis that start at 10 times faster.

    But it's worse than that. The AVR SPI system can only be clocked at 8 MHz - half the system clock speed. Making things worse, it's not double-buffered, so there is an inter-byte gap as well. And add on top of that that every 16 bytes an AES ECB operation will need to be performed in software. I think we can count ourselves fortunate if Orthrus achieves a throughput of 100 kB/sec.

    How can we do better? Well, Atmel offers the AT32UC3A464S. That's a chip with a hi-speed USB interface, built-in support for SD v2.0, and a built-in AES accelerator. The downside is that that chip is $6 in Q:4K, comes in a 100 ball BGA package, and is otherwise way, way overpowered for what we need. Even that wouldn't likely be able to operate a modern SD card at its maximum speed, but it'd be in the ballpark at least.

    So you're going to want to store small things on Orthrus, like secure key material, small text files, things like that. Today's 8 or 16 GB SD cards are going to be a waste, but at least they're cheap.

View all 18 project logs

  • 1

    Using Orthrus

    To use Orthrus, just stick any two SDHC or SDXC microSD cards in the slots and connect a USB cable to your host. You can do this in the opposite order if you wish - the microSD slots are hot-swappable. If the two cards have not been previously paired with Orthrus, then the error light will turn on. Press and hold the button and the error light will blink for 5 seconds and then the cards will be paired and initialized. At that point the ready light will turn on and the host will see a volume with twice the space of the smaller of the two cards. You will need to use your host to initialize this volume. After that, it works just like any other USB storage. When ejecting the volume, you can either remove the USB cable or the two cards first.

    If you insert an Orthrus paired card into a computer, it will look like a card filled with garbage. If you damage the key block (block 0 on the card), then THE ENTIRE VOLUME ON BOTH CARDS WILL BE DESTROYED. Once the key material is corrupted, then all the data is irrecoverably lost. That's kinda the point, of course.

    There are three lights on Orthrus - ready, activity and error. "Ready" indicates that a correctly matched pair of cards have been inserted and the volume is available to the host. "Error" means that the two cards that are inserted are not a matched pair. You can press the button to pair two such cards, but that will destroy any data on both of them. You can hold the button down for 5 seconds (the error light will blink while you do this) at any time and the two cards will be initialized. If you do this while two paired cards are inserted then all the data on the volume will be destroyed and the volume made ready for new data.

    It does not matter which card of a pair is inserted into each slot. The two slots are marked on the board, but in use they are fungible.

View all instructions

Enjoy this project?



Martin wrote 04/14/2017 at 08:50 point

Be careful with your entropy generator. When you want to use noise as a RNG you have to keep noise out :-) That means any non thermal, non random noise. So you have to use good decoupling and shielding for your noise generator. Otherwise there could be some interference from power line hum or your local (AM) radio station which compromises your randomness and thus your security, because it ads a deterministic element.

If  the Atmel is too weak perhaps the recently discussed STM32F103 could be a solution.

  Are you sure? yes | no

Nick Sayer wrote 04/14/2017 at 13:40 point

I plan on gathering a goodly chunk of the entropy from the generator and running it through DieHarder to insure that it's of good quality, plus it's going to be run through AES to whiten it before it's actually used. This design is well worn. It's the basis for several open hardware entropy source peripherals out there, so I am fairly confident.

  Are you sure? yes | no

Similar Projects

Does this project spark your interest?

Become a member to follow this project and never miss any updates