SD card secure RAID USB storage

Similar projects worth following
This project is for data what the Sankara Stones in "Indiana Jones and the Temple of Doom" were. It is a USB SD card reader, but it requires two cards. The data is striped in the style of RAID 0, but the data is also encrypted with a key that is stored in a key storage block on each card. In essence, each card is useless without the other. With possession of both cards, the data is available without restriction, but with only one, the remaining data is completely opaque.

This allows you to securely transport a data set by writing it onto a pair of cards and separately transporting them to a destination for recombination.

The intent is that only the pairing of two cards becomes in any way special. A card pair could be inserted in any Orthrus device and the data would be made available. But with only one card, all you get is half of the data encrypted with a key which you only half-possess.

The basis for Orthrus is an ATXMega32A4U. This is an AVR with a built-in full speed USB interface and hardware AES support. It also has 32K of program memory and 4K of RAM. It has other peripherals, but of chief interest for us is that it has hardware support for SPI and more than a few GPIO pins. At first glance, it would seem to be ridiculous overkill for what we want to achieve, but it's our choice because it's the minimum available chip that includes the AES accelerator, and being able to do AES at north of 1 MB/sec is worth it. Hardware AES not only runs at more than 100 times faster than my software implementation, it also can proceed in the background, leaving the CPU free for actual I/O.

To review, SPI works by sharing 3 lines among all of the peripherals - MOSI, MISO and SCK. In addition to that, each peripheral on the bus has a unique chip select line (usually active low, so !CS). For each cycle of the CLK line, one bit is shifted out from the master over MOSI to the slave, and at the same time a bit from the slave is shifted out to the master over MISO. There are four choices for configuration of the polarity and phase of the clock signal relative to the setup and sampling of the two data lines, but the SPI system in the controller will generally shift a byte at a time. Since the AVR SPI system is only single-buffered, there will be inter-byte gaps as the data from the peripheral is read and/or the next byte of data to be written is set up.

One way to alleviate those inter-byte gaps is to use USART0 in SPI master mode. When you do this, the transmit register is double-buffered, so you can write a new value to it while one is going out. There is a REMAP register for the port we're using for SPI which allows USART0 to be mapped to the same pins used by SPI. Unfortunately, the USART synchronous pin mapping swaps TXD (MOSI) and the clock pin relative to SPI. There's a bit in the REMAP register to accommodate that as well, but unfortunately Atmel in all their wisdom made this bit work backwards from what you'd expect. The SPI bit in the remap register changes the SPI port wiring to match the USART layout. What this effectively means is that the most versatile way to wire the SPI port is with MOSI and clock swapped and using the upper nibble of the port. If you want to use traditional SPI, you can turn on the SPI flag in REMAP and the SPI subsystem will line up properly. If you want USART in SPI master mode, you turn on the USART0 bit in REMAP and that subsystem is shifted into place with the pins correct.

The controller requires 3.3 volt power. Since that's also what the cards want, there's no need for level shifting and the entire system can run from a single supply. Two SD cards and a rather beefy controller are probably pushing things for an LDO, however, but a buck converter can be used with almost no extra boards space. It is a good idea to provide a mechanism for the controller to turn power to the cards on and off. This way, power can be applied only once the two cards have been inserted. If one card is removed, power to both can be dropped, insuring that both cards will cold-start once the second is installed. We can use a P channel MOSFET as a power switch, and an AP2331 current limiting switch will insure that any inrush from the cards won't impact the supply rail for the controller. Since most of the pins we use are only general purpose and we barely use half of the available pins, we have the luxury of picking pins mostly for convenience. Since the USB port is only in one spot, we place the chip so that those are close to the USB connector. The remaining available SPI pinning is convenient for the SD cards, fortunately, and the rest of the connections can be selected to be near the peripherals in question. That puts the card related lines on port C near the SPI, the LED, switch and card power on port A, and the random number generator on port D. We'll use the USART on...

Read more »


Schematic as a PDF

Adobe Portable Document Format - 61.47 kB - 04/27/2017 at 15:45

Preview Download


EAGLE Schematic

sch - 357.14 kB - 04/27/2017 at 15:42

See BOM Download


EAGLE board file

brd - 94.71 kB - 04/27/2017 at 15:42


Data files for the Orthrus challenge

Zip Archive - 3.01 MB - 04/26/2017 at 15:40



AES implementation for AVR - ECB Counter mode and CMAC.

- 50.95 kB - 04/03/2017 at 04:25


View all 5 files

  • 1 × ATXMega32A4U Microprocessors, Microcontrollers, DSPs / ARM, RISC-Based Microcontrollers
  • 1 × PAM2305 Power Management ICs / Switching Regulators and Controllers
  • 1 × AP3012 Power Management ICs / Switching Regulators and Controllers
  • 1 × AP2331 Discrete Semiconductors / Power Transistors and MOSFETs
  • 2 × MMBT3904 Discrete Semiconductors / Transistors, MOSFETs, FETs, IGBTs
  • 1 × 74LVC2G04 Logic ICs / Gates and Inverters
  • 1 × FDV304P Discrete Semiconductors / Diode-Transistor Modules
  • 1 × NSR0530HT1G
  • 1 × 5 pin USB MicroB SMD jack
  • 2 × DM3D-SF Connectors and Accessories / Other PCB Connectors

View all 24 components

  • No double-buffered SPI for you!

    Nick Sayer17 hours ago 0 comments

    At the outset, I had hoped that I would be able to switch to using USART0 in SPI master mode by setting the SPI and USART0 bits in the port C REMAP register. The USART0 bit shifts USART0 from bits 0-3 to bits 4-7, which realign it with the SPI bits. But it turns out that the SPI bit swaps the MOSI and XCK pins on the SPI device so that it matches the USART layout, not vice versa.


    So the only way to even try the USART in SPI master mode is to swap the two pins on the board and actually remap the SPI device to match the USART layout, and then try USART in SPI mode by undoing that remap and shifting USART0 up to the top 4 bits of the port.

    I'm going to have to spin some new boards and give that a try. I do think the double buffering that the USART in SPI master mode can do is going to be worth the effort.

    Meanwhile, with interrupt-driven AES and ordinary SPI the current board is getting just shy of 200 kB/sec, which is actually usable. If I can just work out some SCSI behavior issues with the card removal, I'll be able to put the current version up for sale.

  • v2 Prototype Report

    Nick Sayera day ago 0 comments

    The version 2 prototype boards arrived, and after an evening of hacking on the firmware, it is functional. I'm getting around 250 kB/sec even with full crypto. There are some rough edges still - the crypto isn't working with DMA (though it is interrupt driven), and I haven't yet attempted the USART-in-SPI-master mode to see if it makes any difference.

  • Orthrus Challenge

    Nick Sayer2 days ago 0 comments

    Schneier's Law states that "anyone can design a cryptosystem that they themselves can't break." The upshot of this is that the only thing that offers any hope that a cryptosystem is secure is that it survives peer review. Since I am aware of Schneier's Law and I know for a fact that there are many, many folks who know cryptography better than me, I'd like to offer a challenge to test Orthrus' cryptographic design.

    Let me say at the outset that I don't have a prize to offer. Sorry, I spend all my money on getting PCBs made.

    Moving on from there, the challenge is rather simple (to describe). In the project files, there is an Orthrus Challenge ZIP file. It has in it the OrthrusDecrypt java code and two card images (it's also got a copy of bouncycastle, which is necessary to add AES CMAC support to Java). If you run the java code on the two cards, you'll get back just shy of 200K of zeros - the decrypted content of the volume. The challenge is, if you had only one of those card images, could you decrypt the content of that card without having to brute-force the missing key material (which I'm fairly confident is infeasible)?

    A successful answer to the challenge will demonstrate taking one card and discerning the plaintext stream of zeros from it without directly referencing the content of the other card. It's not interesting to show that if you know the other card's material in advance that you can decrypt one card - the whole idea behind Orthrus is that it's the user's responsibility to insure that they keep the two cards separate from each other in the presence of adversaries.

    Comments or questions can be posted in the comments to this log (below).

    Thanks for your time and consideration.

  • Correctness testing

    Nick Sayer2 days ago 0 comments

    I burned the midnight oil this evening. I wrote a Java program that decrypts an Orthrus volume given images of the two cards. It should be no surprise that this is possible to do - the whole security of Orthrus is based around the idea that you're going to keep the cards separate so that no one has a chance to get both images (unless they're supposed to).

    I then paired two cards with a real Orthrus and zeroed out a goodly chunk of blocks. I then read in images of the two cards and ran them through the Java program, expecting to get all zero bytes out. Of course, there were some bugs to find, but at the end of that effort, I did, indeed, get the expected result.

    Most of the bugs were in my translation of the BouncyCastle AES implementation used in the first prototype, so they (in principle) won't have any impact on the next prototype (which will do AES natively in hardware), but there were one or two in the actual logic of Orthrus, so it's good that they were found. And while AES ECB is hardware accelerated, CMAC is still done in software, so it's important to validate that it interoperates as well.

    I've checked in the java code in question into the GitHub repository. It's useful in that the Java code is MUCH simpler and easier to read. It should be fairly straightforward for (almost) anyone to understand what it's doing. And you can test the hardware yourself by pairing two cards, writing a filesystem on them, then reading in the encrypted images and running them through the java code.

  • On "why"

    Nick Sayer3 days ago 0 comments

    Security is hard. If you're going to encrypt something, the first thing you have to work out is how you're going to manage the keys. More to the point, if you expect them to be managed by a human, that means coming up with some way for people to remember or otherwise manage them. And that means passwords. Passwords are an awful solution to security, and having a USB mass storage device that demands a key be presented means requiring some sort of user interface for that.

    Orthrus is a way to do truly world-class encryption without any key management at all, because the key material is automatically spread between the two cards. The security solution is trivial. Got both cards? You have access. Got fewer? None for you.

  • Firmware rewrite

    Nick Sayer5 days ago 0 comments

    I rewrote the firmware today for the 32A4U. Of course, I did this in a complete vacuum, as the hardware isn't here yet.

    The firmware is much smaller, as would be expected given the removal of all the code to do AES in software. It takes around 11K of flash and uses around 1.5K of global RAM.

    I originally chose the 32A4U because it was actually cheaper than the 16A4U, but I'm glad I did - not for the additional flash space, but for the extra RAM.

    The AES system is set up to run almost entirely in the background with DMA both loading in the nonce data and reading out the pre-ciphertext into a buffer for XORing during I/O. DMA completion interrupts ping-pong two DMA channels to make this happen, incrementing the counter within for each block.

    I've also got an interrupt driven ring buffer system for the diagnostic serial output so you don't have to wait around for it to transmit. Not sure yet what use I'll make of that, but it may be quite handy.

  • Random validation

    Nick Sayer04/20/2017 at 17:52 0 comments

    The missing part came in for the hardware entropy source. I had it generate 5 MB of unwhitened data and ran ent on it. This is what I got back:

    Entropy = 0.970410 bits per bit.
    Optimum compression would reduce the size
    of this 41943040 bit file by 2 percent.
    Chi square distribution for 41943040 samples is 1708712.21, and randomly
    would exceed this value less than 0.01 percent of the times.
    Arithmetic mean value of data bits is 0.3991 (0.5 = random).
    Monte Carlo value for Pi is 3.638902145 (error 15.83 percent).
    Serial correlation coefficient is -0.016616 (totally uncorrelated = 0.0).

    I also asked ent to generate a histogram:

    What this shows is a slight bias towards generating a zero a little bit more often than generating a 1. Still, when this is whitened, the result should be good enough to be a reasonable key generator.

  • Prototype v2 design

    Nick Sayer04/19/2017 at 15:31 0 comments

    I've got a potential design for the v2 prototype. I just want to get some confirmation from Microchip that I've selected the correct chip. The schematic is in the files.

    Since the XMega core runs at 3.3 volts, we can do away with the logic level translation stuff we had on the v1 prototype. It has an internal 32 MHz oscillator and separate clocking for USB, so we don't need to connect a crystal. Since the 3.3 volt supply has to drive everything, we'll upgrade from an LDO to a PAM2305 buck converter. So the incoming USB power simply goes straight into the two switching supplies - the HV supply for the entropy generator and the 3.3 volt logic supply.

    I've confirmed that my compiler supports the 32A4U, and that I have PDI support in avrdude. I do have to select and get (or make) a PDI programmer of some sort. I've got a pile of 32u2s, so I could make an AVR mkII clone - there are a few out there that use LUFA based firmware and are open.

    After basic functionality is working, one thing I hope to try to explore is using the USART in SPI master mode and remapping it to the SPI pins. The reason for this is that the USART is double-buffered, which would potentially eliminate the inter-byte gaps and increase performance. We'll see how that works out eventually, I hope. I still think prospects for initial performance of ~300 kB/sec are good. I may, however, be able to reduce the inter-byte gaps during transfers by being careful about ordering operations. If I perform the SPI write immediately after the SPI read and then prepare for the next operation before waiting for the SPI operation to finish, I can use the time while the hardware is shifting bits out profitably. With that, the inter-byte gaps might not be so bad. Given that the bulk read/write operations have to have an XOR operation for the crypto in the middle, it's already a job the DMA engine can't really help with.

  • ATXmega

    Nick Sayer04/18/2017 at 19:43 0 comments

    You have to read datasheets carefully.

    Not all ATXMega chips come with the AES engine, it turns out. I haven't absolutely confirmed it yet, but one contender is the ATXmega384C3 ATXmega32A4U. Unfortunately, it's a TQFP64 TQFP44 - a slight size boost from the 32U2. It's quite an upgrade from the 32u2, but it's only $3 at Digikey (Q:1). Since it does have the AES engine built in, the prospects are excellent for acceptable performance. Unfortunately, the XMega is still only a full-speed USB device, so it'll never beat ~ 1MB/sec, but with no crypto and an 8 MHz SPI clock the 32u2 is hitting 150 kB/sec. It may not only be possible to increase the SPI clock speed with the XMega, but it has a DMA engine, which may make it possible to get rid of (or at least reduce) the inter-byte gaps.

    Of course, I have to figure out whether the current code can be massaged to compile for the XMega, and I want to figure that out first before making a prototype.

  • Rude awakening

    Nick Sayer04/18/2017 at 15:55 0 comments

    Well, after looking at some of the access traces on a scope, I saw groups of 16 byte blocks being sent with huge gaps between them. Of course, 16 bytes is the block length of AES, so that was sort of to be expected streaming the data through AES counter mode, but those gaps are on the order of 10 ms long.

    To test out the impact, I commented out the AES block computation that takes place in the counter mode code (effectively gutting the encryption), and sure enough, Orthrus is 5 times faster. It's not a speed demon, by any means, but the performance is actually borderline acceptable (~150 kB/sec instead of ~30 kB/sec).

    So... I don't know which direction I'll go from here. I sort of believe the AES engine I'm using is reasonably performant for a software implementation. I'm not sure how adding a hardware accelerator is going to improve things, given that you have to talk to them over some sort of serial bus (either SPI or i2c). Getting a faster chip seems like the most obvious move, but I'm not sure there's a good fit in the AVR family other than moving into BGA territory, which I'd rather not do. Moving away from AES means moving away from a world class crypto engine, which is unacceptable.

    Some thinking is definitely called for.

View all 23 project logs

  • 1

    Using Orthrus

    To use Orthrus, just stick any two SDHC or SDXC microSD cards in the slots and connect a USB cable to your host. You can do this in the opposite order if you wish - the microSD slots are hot-swappable. If the two cards have not been previously paired with Orthrus, then the error light will turn on. Press and hold the button and the error light will blink for 5 seconds and then the cards will be paired and initialized. At that point the ready light will turn on and the host will see a volume with twice the space of the smaller of the two cards. You will need to use your host to initialize this volume. After that, it works just like any other USB storage. When ejecting the volume, you can either remove the USB cable or the two cards first.

    If you insert an Orthrus paired card into a computer, it will look like a card filled with garbage. If you damage the key block (block 0 on the card), then THE ENTIRE VOLUME ON BOTH CARDS WILL BE DESTROYED. Once the key material is corrupted, then all the data is irrecoverably lost. That's kinda the point, of course.

    There are three lights on Orthrus - ready, activity and error. "Ready" indicates that a correctly matched pair of cards have been inserted and the volume is available to the host. "Error" means that the two cards that are inserted are not a matched pair. You can press the button to pair two such cards, but that will destroy any data on both of them. You can hold the button down for 5 seconds (the error light will blink while you do this) at any time and the two cards will be initialized. If you do this while two paired cards are inserted then all the data on the volume will be destroyed and the volume made ready for new data.

    It does not matter which card of a pair is inserted into each slot. The two slots are marked on the board, but in use they are fungible.

View all instructions

Enjoy this project?



Martin wrote 04/14/2017 at 08:50 point

Be careful with your entropy generator. When you want to use noise as a RNG you have to keep noise out :-) That means any non thermal, non random noise. So you have to use good decoupling and shielding for your noise generator. Otherwise there could be some interference from power line hum or your local (AM) radio station which compromises your randomness and thus your security, because it ads a deterministic element.

If  the Atmel is too weak perhaps the recently discussed STM32F103 could be a solution.

  Are you sure? yes | no

Nick Sayer wrote 04/14/2017 at 13:40 point

I plan on gathering a goodly chunk of the entropy from the generator and running it through DieHarder to insure that it's of good quality, plus it's going to be run through AES to whiten it before it's actually used. This design is well worn. It's the basis for several open hardware entropy source peripherals out there, so I am fairly confident.

  Are you sure? yes | no

Similar Projects

Does this project spark your interest?

Become a member to follow this project and never miss any updates