SD card secure RAID USB storage

Similar projects worth following
This project is a hardware mechanism to provide secure "two man control" over a data store. It is a USB microSD card reader, but it requires two cards. The data is striped in the style of RAID 0, but the data is also encrypted with a key that is stored in a key storage block on each card. In essence, each card is useless without the other. With possession of both cards, the data is available without restriction, but with only one, the remaining data is completely opaque.

This allows you to securely transport a data set by writing it onto a pair of cards and separately transporting them to a destination for recombination.

The intent is that only the pairing of two cards becomes in any way special. A card pair could be inserted in any Orthrus device and the data would be made available. But with only one card, all you get is half of the data encrypted with a key which you only half-possess.

I'd like to express my gratitude to Dean Camera of the LUFA project.

Orthrus dramatically simplifies the problems of providing a securely encrypted data store. There are no passwords or key material to manage. The act of pairing two cards together automatically creates all of the key material necessary to secure the store without any human action (other than initiating the paring process by pressing and holding a single button). The security offered by Orthrus is simple to explain and trivial to use. It's simply that if you have both of the cards, you have the data. If you have only one of them, then it is cryptographically opaque.

The basis for Orthrus is an ATXMega32A4U. This is an AVR with a built-in full speed USB interface and hardware AES support. It also has 32K of program memory and 4K of RAM. It has other peripherals, but of chief interest for us is that it has hardware support for SPI and more than a few GPIO pins. At first glance, it would seem to be ridiculous overkill for what we want to achieve, but it's our choice because it's the minimum available chip that includes the AES accelerator, and being able to do AES at north of 1 MB/sec is worth it. Hardware AES not only runs at more than 100 times faster than my software implementation, it also can proceed in the background, leaving the CPU free for actual I/O.

To review, SPI works by sharing 3 lines among all of the peripherals - MOSI, MISO and SCK. In addition to that, each peripheral on the bus has a unique chip select line (usually active low, so !CS). For each cycle of the CLK line, one bit is shifted out from the master over MOSI to the slave, and at the same time a bit from the slave is shifted out to the master over MISO. There are four choices for configuration of the polarity and phase of the clock signal relative to the setup and sampling of the two data lines, but the SPI system in the controller will generally shift a byte at a time. Since the AVR SPI system is only single-buffered, there will be inter-byte gaps as the data from the peripheral is read and/or the next byte of data to be written is set up.

One way to alleviate those inter-byte gaps is to use USART0 in SPI master mode. When you do this, the transmit register is double-buffered, so you can write a new value to it while one is going out. There is a REMAP register for the port we're using for SPI which allows USART0 to be mapped to the same pins used by SPI. Unfortunately, the USART synchronous pin mapping swaps TXD (MOSI) and the clock pin relative to SPI. There's a bit in the REMAP register to accommodate that as well, but unfortunately Atmel in all their wisdom made this bit work backwards from what you'd expect. The SPI bit in the remap register changes the SPI port wiring to match the USART layout. What this effectively means is that the most versatile way to wire the SPI port is with MOSI and clock swapped and using the upper nibble of the port. If you want to use traditional SPI, you can turn on the SPI flag in REMAP and the SPI subsystem will line up properly. If you want USART in SPI master mode, you turn on the USART0 bit in REMAP and that subsystem is shifted into place with the pins correct.

The controller requires 3.3 volt power. Since that's also what the cards want, there's no need for level shifting and the entire system can run from a single supply. Two SD cards and a rather beefy controller are probably pushing things for an LDO, however, but a buck converter can be used with almost no extra boards space. It is a good idea to provide a mechanism for the controller to turn power to the cards on and off. This way, power can be applied only once the two cards have been inserted. If one card is removed, power to both can be dropped, insuring that both cards will cold-start once the second is installed. We can use a P channel MOSFET as a power switch, and an AP2331 current limiting switch will insure that any inrush from the cards won't impact the supply rail for the controller. Since most of the pins we use are only general purpose and we barely use...

Read more »


Schematic as a PDF

Adobe Portable Document Format - 62.63 kB - 05/08/2017 at 15:59

Preview Download


EAGLE Schematic

sch - 358.07 kB - 05/08/2017 at 15:59

See BOM Download


EAGLE board file

brd - 96.30 kB - 05/08/2017 at 15:59


Orthrus Decryption java code and two challenge card images

Zip Archive - 2.00 MB - 05/07/2017 at 17:47


  • 1 × ATXMega32A4U Microprocessors, Microcontrollers, DSPs / ARM, RISC-Based Microcontrollers
  • 1 × PAM2305 Power Management ICs / Switching Regulators and Controllers
  • 1 × AP3012 Power Management ICs / Switching Regulators and Controllers
  • 1 × AP2331 Discrete Semiconductors / Power Transistors and MOSFETs
  • 2 × MMBT3904 Discrete Semiconductors / Transistors, MOSFETs, FETs, IGBTs

View all 24 components

  • Chip choice for next gen

    Nick Sayer05/21/2017 at 01:13 1 comment

    I had a very nice chat just now with a very nice guy from Microchip (he used to work for Atmel before the acquisition). I didn't ask permission to use his name, so I won't mention it, but he was quite helpful. I went over my requirements for the next-gen Orthrus and he recommended the ATSAM4E16E. I haven't looked myself yet at the datasheet, but he said it has high speed USB, hardware AES (with 256 bit keys) and 4 bit SD support, which is exactly the feature set I need.

    Since it requires me to get a whole new toolchain for 32 bit micros, I'm not really beholden to AVR over ARM or PIC (I believe this one is an ARM cortex M4), and (again, I haven't verified it - I'm typing this from Maker Faire) these come in QFN packages as opposed to BGA (I'm familiar with the former as opposed to the latter).

    So I think that's definitely a direction to go in.


    Nick Sayer05/11/2017 at 16:35 0 comments

    I've started a GoFundMe campaign to get a USB VID.

    If you google around, you'll find a couple of avenues to obtain product IDs for open hardware projects. I've inquired of both of them and the silence has been deafening. The other avenue is to use Microchip's VID, since I'm using Microchip chips. Unfortunately, they haven't fixed their sign-up widget since acquiring Atmel, and they're not answering their e-mails either.

    So I have no choice but to "squat" on 0xf055 for the short term and try and raise money to obtain a legitimate USB VID longer term.

    I actually hope that this campaign gets enough notoriety to put some pressure on the USBIF to solve the problem of USB VID/PID for small manufacturers and makers. That would be a better solution than all of us trying to raise $5K to get a range of 65,536 PIDs of which we will each use a tiny fraction.

  • Switching off the RNG

    Nick Sayer05/08/2017 at 16:20 0 comments

    A lot of folks have said that leaving a transistor in avalanche mode is bad for its long term health, but I'm not convinced. The metrics of that health that continuous avalanche impact are important for the normal use of the transistor, but in this case the transistor has no other function.

    Still, if nothing else it seems a waste of power to run the 20v boost converter and avalanche circuit continuously when they're needed only for a few ms every once in a while.

    To that end, I've decided future hardware will include a logic output from the controller to the boost converter's !SHDN pin to allow the avalanche supply to be turned on and off.

    But there's a wrinkle there: you can turn off a boost controller, but there will still be a conduction path from the boost input supply through the inductor and catch diode to the output. Without taking extra steps, you can never turn a boost supply completely off.

    Fortunately, there's an easy solution to this, and it's apparently a classic one. You connect a P channel MOSFET up on the output of the boost converter and connect the gate to the input power supply. P channel MOSFETs are high impedance ("off") when the gate voltage is (nearly) equal to the source voltage, which will be the case when the boost converter isn't switching. Usually you turn on a P channel MOSFET by dropping the gate voltage, but in this case it will turn on because the source voltage will rise relative to the gate. The result is a true on-off controlled "high" voltage power supply. Exactly what we want. This doesn't switch the inverter chip on and off, but that's ok. It should wind up in a stable state without any input and take relatively little power on its own.

  • On diffusion

    Nick Sayer05/08/2017 at 04:39 0 comments

    In searching around for information about the state of the art in WDE, I came across this particularly interesting article.

    One problem with whole-disk encryption is that you're generally not allowed to alter the block size. At this point, it's almost completely universal that we use disks (or pseudo-disks) that are simple one-dimensional arrays of 512 byte blocks.

    One desirable quality of encryption is that you'd like to know if someone tried to tamper with the ciphertext. In general, this means either using authenticated modes or adding a MAC to the ciphertext. Unfortunately, this means that the ciphertext (or ciphertext plus MAC) is longer than the plaintext. For WDE, this is untenable.

    Since we can't add any bits to the block to authenticate the content, the best we can do is try to use encryption to perturb errors so that an adversary can't, for example, be allowed to flip arbitrary bits in the ciphertext to flip the same bits in the plaintext. Such an adversary would be able to modify files in place, which is almost as good as being able to read them.

    XEX (or XTS) will cause a 16 byte corruption in decrypting a block that has a single bit flipped. That blunts an attacker's ability to modify files. It would, however, be better if the mode could cause an entire block to be corrupted beyond recognition if a single bit of the ciphertext is altered. This property is called diffusion. Diffusion and confusion are two basic properties of a cipher. Confusion means that each bit of the ciphertext relies on more than one bit of the key, and that different bits of the key combine in an unpredictable pattern to alter bits of the ciphertext. Diffusion means more or less the same thing with regard to the plaintext during encryption and ciphertext during decryption. Altering one input bit will cause radical changes to the entire output. Both confusion and diffusion are necessary to prevent statistical analysis of a cipher. This was all worked out by Shannon in 1945.

    Ideally, we'd use a 4096 bit block size cipher for WDE, but that isn't practical. XEX provides confusion by perturbing the plaintext and ciphertext on both sides of the encryption operation, but because it handles each 16 byte AES block individually, it supplies no diffusion.

    So far as I can find, since the BitLocker post was written, there haven't really been significant advances on the diffusion front for WDE. So far as I am aware, most solutions still use plain XTS (or XEX), meaning that a single bit flip will cause a 16 byte aligned block diffusion error and no other changes beyond. It certainly blunts bit-flipping attacks, but doesn't really eliminate every possibility of efficacy.

    What does this mean for Orthrus? Not much. Orthrus differs from most WDE systems in that Orthrus isn't really intended to be a primary volume (not something on which you'd install an operating system to boot) so much as an offline storage system. It's intended to take away the job of key management for a particular, limited use case. So we're going to stick with XEX.

  • The impact of XEX

    Nick Sayer05/07/2017 at 03:14 0 comments

    Just for completeness' sake, I coded up an implementation of XEX for Orthrus just to do a speed comparison. It's a third slower - around 150 KB/sec instead of 225 KB/sec. I'm fairly confident that most of this stems from the fact that the encryption cannot be precomputed in the background and must be done interactively as the block is read and written from the card. It's not as bad as I had feared, but it's certainly an impact on what is already quite a slow mass storage device.

    Still, I think the weakness of straight counter mode make the changeover to a very widely used encryption mode for the given purpose seem like a good move. With this change, we can truly say with a straight face that we're doing whole-disk encryption using universally accepted standards.

    Incidentally, if you google it, you'll find that most implementations of WDE talk about using XTS rather than XEX. However, the two are equivalent if the disk sector size is an even multiple of the cipher block size, which is the case for us. Some implementations use two separate keys - one to encrypt the nonce to form the tweak and one to encrypt or decrypt the data. However, the value of doing that seems (in the literature) to be disputed, so we just use the same key for both. If we had to pick two different keys, we could do so by cutting the volume ID in half and performing the key derivation twice - once on each half.

  • Crypto standards validation

    Nick Sayer05/02/2017 at 14:39 0 comments

    It turns out that the method I'm using to derive the volume key is just AES-CMAC-PRF as described in RFC-4615. In other words, we're just calculating the AES-CMAC-PRF with the concatenation of the two card keys as the "key" and the volume ID as the "data."

    On the other hand, counter mode isn't the best choice for the block encryption. If an adversary can force you to write a known plaintext to a disk block and then observe the encrypted result, they can discover the pre-ciphertext stream for that block. It is then possible for them to trivially recover any plaintext written to that block anytime after that. The only mitigation possible for this scenario is to use a mode that includes the plaintext in the cipher usage itself (rather than just XORing it as the last step). XEX mode is widely used in whole-disk encryption and has this property. The trouble with this for Orthrus is that it means that the pre-ciphertext can no longer be pre-computed in the background, so performance would suffer, possibly fatally (performance is already quite constrained compared to other microSD card readers).

    So Orthrus will retain counter mode at least for the initial version. That means that Orthrus won't be resilient against more sophisticated attacks which assume an adversary can force various requests of his choosing.

    An improved performance version of Orthrus would have high-speed USB and perhaps a native SDHC controller of some sort. There are more sophisticated microcontrollers that have these features, and they might have the horsepower to support XEX mode as well (and use AES-256 possibly), but they're 144 pin TQFP or BGA packages and at least double the price of the current device. Not out of the question, but not... today.

  • USART in SPI master mode FTL

    Nick Sayer04/30/2017 at 23:57 0 comments

    I took a scalpel to my Orthrus prototype today to swap the wiring for MOSI and SCK so that I could try out USART0 in SPI Master mode. It took quite a bit of swearing to get the kludge wires to work, but they finally do, and at least with the code I've written, USART0 in SPI master mode is around 5% slower than straight-up SPI.

    I'm surprised by this, but I've stared at the code and experimented with it for a while now and I can't see any improvements to be made. The USART code works, it just doesn't work any better.

    So with that, the final performance numbers I'm getting on a small variety of different SD card makes is around 225 KB/sec.

    It's possible that a future version of LUFA might bring improvements in performance - in particular the ability to use ping-pong buffers might be a big boost (if it's the USB performance that's throttling the system). To test that out, I replace the disk block read method with one that skips all of the SPI stuff and just reads zeros. That achieved a throughput of ~270 KB/sec - only 20% faster than actually doing the I/O properly.

    So with that, I'm going to declare that v2.0.1 is ready for prime time. v2.0.2 just swaps SCK and MOSI. I will keep that change going forward just in case there's some sort of epiphany down the road that makes USART SPI mode work, but there's no reason not to release the current design now.

  • DMA FTW!

    Nick Sayer04/29/2017 at 04:47 0 comments

    After a lot of fussing around this evening, I finally got DMA based AES working.

    It turns out we have to use 3 DMA channels to get it working - one each to transfer the key and nonce into AES and the third to transfer the pre-ciphertext out. The first two can run simultaneously and there's tricky logic in the ISR (it's common for both of those channels) to figure out when both transfers are finished before starting AES. The third channel triggers on AES completion, and its ISR checks for completion, increments the counter and kicks off the two inward channels.

    The net result is a 20% speed boost. We're now up to 220 KB/sec. And that tops out this hardware rev. We'll have to wait for the next one to come back to see how much (if anything) we get from USART in SPI master mode. And that will likely mark the completion of the project.

    EDIT: If that wasn't enough, I followed it up with automatic AES triggering. That gets rid of the first two ISRs, which gives us another 5 kB/sec. Now AES automatically starts when the key and data are filled in, and then channel 2 is triggered when it's done. The ISR for channel 2 just checks for completion, increments the nonce counter and triggers channels 0 and 1.

  • SCSI bugs ironed out

    Nick Sayer04/29/2017 at 00:04 0 comments

    Thanks to Dean Camera for giving me some hints as to how to best handle a removable media SCSI device on USB. His suggestions were spot on and now the SCSI layer is working absolutely correctly, so far as I can tell.

    So if you yank out a card, you get an immediate "forceable disk ejection" reaction from your host computer. Same thing happens if you push the button - when you push it, the disk is forcibly ejected. If you let go before the time runs out, it's re-mounted without changes, and should be seen by the host just fine. If you push the button long enough, it will re-key the volume and remount the disk, and your host should notice right away that it's all filled with garbage and needs to be initialized with a filesystem.

    Dean, if you weren't aware, is the guy behind LUFA, which is the USB library that Orthrus uses to do the USB stuff. Despite the documentation stating the XMega support is experimental, it seems to be working just fine, at least for what I need it to do. So, big ups to Dean - without him and LUFA, Orthrus wouldn't have been nearly as easy to create.

  • No double-buffered SPI for you!

    Nick Sayer04/28/2017 at 01:19 0 comments

    At the outset, I had hoped that I would be able to switch to using USART0 in SPI master mode by setting the SPI and USART0 bits in the port C REMAP register. The USART0 bit shifts USART0 from bits 0-3 to bits 4-7, which realign it with the SPI bits. But it turns out that the SPI bit swaps the MOSI and XCK pins on the SPI device so that it matches the USART layout, not vice versa.


    So the only way to even try the USART in SPI master mode is to swap the two pins on the board and actually remap the SPI device to match the USART layout, and then try USART in SPI mode by undoing that remap and shifting USART0 up to the top 4 bits of the port.

    I'm going to have to spin some new boards and give that a try. I do think the double buffering that the USART in SPI master mode can do is going to be worth the effort.

    Meanwhile, with interrupt-driven AES and ordinary SPI the current board is getting just shy of 200 kB/sec, which is actually usable. If I can just work out some SCSI behavior issues with the card removal, I'll be able to put the current version up for sale.

View all 32 project logs

  • 1

    Using Orthrus

    To use Orthrus, just stick any two SDHC or SDXC microSD cards in the slots and connect a USB cable to your host. You can do this in the opposite order if you wish - the microSD slots are hot-swappable. If the two cards have not been previously paired with Orthrus, then the error light will turn on. Press and hold the button and the error light will blink for 5 seconds and then the cards will be paired and initialized. At that point the ready light will turn on and the host will see a volume with twice the space of the smaller of the two cards. You will need to use your host to initialize this volume. After that, it works just like any other USB storage. When ejecting the volume, you can either remove the USB cable or the two cards first.

    If you insert an Orthrus paired card into a computer, it will look like a card filled with garbage. If you damage the key block (block 0 on the card), then THE ENTIRE VOLUME ON BOTH CARDS WILL BE DESTROYED. Once the key material is corrupted, then all the data is irrecoverably lost. That's kinda the point, of course.

    There are three lights on Orthrus - ready, activity and error. "Ready" indicates that a correctly matched pair of cards have been inserted and the volume is available to the host. "Error" means that the two cards that are inserted are not a matched pair. You can press the button to pair two such cards, but that will destroy any data on both of them. You can hold the button down for 5 seconds (the error light will blink while you do this) at any time and the two cards will be initialized. If you do this while two paired cards are inserted then all the data on the volume will be destroyed and the volume made ready for new data.

    It does not matter which card of a pair is inserted into each slot. The two slots are marked on the board, but in use they are fungible.

View all instructions

Enjoy this project?



matt venn wrote 05/09/2017 at 08:58 point

Hey Nick,

thanks a lot for posting this - I learnt a lot over my morning coffee! You made it very easy to browse - even a pdf schematic, nice.

  Are you sure? yes | no

Nick Sayer wrote 05/09/2017 at 13:34 point

You're quite welcome, and thanks for saying so! For security related projects, I feel like it's really important to shine a bight light onto every aspect so it's obvious that nothing is hiding behind a curtain. Inviting scrutiny is the only way you can have any confidence that you got it right.

  Are you sure? yes | no

tz wrote 05/08/2017 at 21:12 point

Depending on configuration, you might find my SPI transfer faster

I also have write protect and password protect routines, and the whole is a minimal FAT32 implementation, originally for Sparkfun's OpenLog.

  Are you sure? yes | no

Nick Sayer wrote 05/08/2017 at 23:13 point

Thanks for that! I'll definitely take a look. I bought an OpenLog a while ago to capture logs from my GPSDOs. I don't need filesystem support for Orthrus, but I'll take any opportunity to see if there are better ways to get the block I/O done.

  Are you sure? yes | no

Nick Sayer wrote 05/09/2017 at 00:51 point

The big difference I see is that you've been very clever about timing your writes to SPDR to eliminate the dead time caused by the read-and-test-branch busy-wait operation. I thought I could achieve something similar by using the ATXmega USART-in-SPI-master mode functionality - the transmit register is double-buffered, which in principle means that you can always have a byte going out. At least for my first experiments that didn't work out so well, but I am contemplating giving that another go at some point.

  Are you sure? yes | no

Clayton G. Hobbs wrote 04/29/2017 at 00:25 point

Very interesting idea!  Couldn't the whole thing be done in software though, using two normal SD card read/writers, and with faster data transfer speeds than are possible with Full Speed USB?

  Are you sure? yes | no

Nick Sayer wrote 04/29/2017 at 00:57 point

Very likely. This does, however, turn the whole concept into an appliance that's very easy to use. You could write a FUSE module to do an interoperable version of this for Linux, certainly.

  Are you sure? yes | no

Martin wrote 04/14/2017 at 08:50 point

Be careful with your entropy generator. When you want to use noise as a RNG you have to keep noise out :-) That means any non thermal, non random noise. So you have to use good decoupling and shielding for your noise generator. Otherwise there could be some interference from power line hum or your local (AM) radio station which compromises your randomness and thus your security, because it ads a deterministic element.

If  the Atmel is too weak perhaps the recently discussed STM32F103 could be a solution.

  Are you sure? yes | no

Nick Sayer wrote 04/14/2017 at 13:40 point

I plan on gathering a goodly chunk of the entropy from the generator and running it through DieHarder to insure that it's of good quality, plus it's going to be run through AES to whiten it before it's actually used. This design is well worn. It's the basis for several open hardware entropy source peripherals out there, so I am fairly confident.

  Are you sure? yes | no

Similar Projects

Does this project spark your interest?

Become a member to follow this project and never miss any updates