close-circle
Close
0%
0%

Orthrus

SD card secure RAID USB storage

Similar projects worth following
This project is a hardware mechanism to provide secure "two man control" over a data store. It is a USB microSD card reader, but it requires two cards. The data is striped in the style of RAID 0, but the data is also encrypted with a key that is stored in a key storage block on each card. In essence, each card is useless without the other. With possession of both cards, the data is available without restriction, but with only one, the remaining data is completely opaque.

This allows you to securely transport a data set by writing it onto a pair of cards and separately transporting them to a destination for recombination.

The intent is that only the pairing of two cards becomes in any way special. A card pair could be inserted in any Orthrus device and the data would be made available. But with only one card, all you get is half of the data encrypted with a key which you only half-possess.

I'd like to express my gratitude to Dean Camera of the LUFA project.

Orthrus dramatically simplifies the problems of providing a securely encrypted data store. There are no passwords or key material to manage. The act of pairing two cards together automatically creates all of the key material necessary to secure the store without any human action (other than initiating the paring process by pressing and holding a single button). The security offered by Orthrus is simple to explain and trivial to use. It's simply that if you have both of the cards, you have the data. If you have only one of them, then it is cryptographically opaque.

The basis for Orthrus is an ATXMega32A4U. This is an AVR with a built-in full speed USB interface and hardware AES support. It also has 32K of program memory and 4K of RAM. It has other peripherals, but of chief interest for us is that it has hardware support for SPI and more than a few GPIO pins. At first glance, it would seem to be ridiculous overkill for what we want to achieve, but it's our choice because it's the minimum available chip that includes the AES accelerator, and being able to do AES at north of 1 MB/sec is worth it. Hardware AES not only runs at more than 100 times faster than my software implementation, it also can proceed in the background, leaving the CPU free for actual I/O.

To review, SPI works by sharing 3 lines among all of the peripherals - MOSI, MISO and SCK. In addition to that, each peripheral on the bus has a unique chip select line (usually active low, so !CS). For each cycle of the CLK line, one bit is shifted out from the master over MOSI to the slave, and at the same time a bit from the slave is shifted out to the master over MISO. There are four choices for configuration of the polarity and phase of the clock signal relative to the setup and sampling of the two data lines, but the SPI system in the controller will generally shift a byte at a time. Since the AVR SPI system is only single-buffered, there will be inter-byte gaps as the data from the peripheral is read and/or the next byte of data to be written is set up.

One way to alleviate those inter-byte gaps is to use USART0 in SPI master mode. When you do this, the transmit register is double-buffered, so you can write a new value to it while one is going out. There is a REMAP register for the port we're using for SPI which allows USART0 to be mapped to the same pins used by SPI. Unfortunately, the USART synchronous pin mapping swaps TXD (MOSI) and the clock pin relative to SPI. There's a bit in the REMAP register to accommodate that as well, but unfortunately Atmel in all their wisdom made this bit work backwards from what you'd expect. The SPI bit in the remap register changes the SPI port wiring to match the USART layout. What this effectively means is that the most versatile way to wire the SPI port is with MOSI and clock swapped and using the upper nibble of the port. If you want to use traditional SPI, you can turn on the SPI flag in REMAP and the SPI subsystem will line up properly. If you want USART in SPI master mode, you turn on the USART0 bit in REMAP and that subsystem is shifted into place with the pins correct.

The controller requires 3.3 volt power. Since that's also what the cards want, there's no need for level shifting and the entire system can run from a single supply. Two SD cards and a rather beefy controller are probably pushing things for an LDO, however, but a buck converter can be used with almost no extra boards space. It is a good idea to provide a mechanism for the controller to turn power to the cards on and off. This way, power can be applied only once the two cards have been inserted. If one card is removed, power to both can be dropped, insuring that both cards will cold-start once the second is installed. We can use a P channel MOSFET as a power switch, and an AP2331 current limiting switch will insure that any inrush from the cards won't impact the supply rail for the controller. Since most of the pins we use...

Read more »

Orthrus_2_0_3.pdf

Schematic as a PDF

Adobe Portable Document Format - 62.63 kB - 05/08/2017 at 15:59

eye
Preview
download-circle
Download

Orthrus_2_0_3.sch

EAGLE Schematic

sch - 358.07 kB - 05/08/2017 at 15:59

blank
See BOM
download-circle
Download

Orthrus_2_0_3.brd

EAGLE board file

brd - 96.30 kB - 05/08/2017 at 15:59

download-circle
Download

OrthrusChallenge.zip

Orthrus Decryption java code and two challenge card images

Zip Archive - 2.00 MB - 05/07/2017 at 17:47

download-circle
Download

  • 1 × ATXMega32A4U Microprocessors, Microcontrollers, DSPs / ARM, RISC-Based Microcontrollers
  • 1 × PAM2305 Power Management ICs / Switching Regulators and Controllers
  • 1 × AP3012 Power Management ICs / Switching Regulators and Controllers
  • 1 × AP2331 Discrete Semiconductors / Power Transistors and MOSFETs
  • 2 × MMBT3904 Discrete Semiconductors / Transistors, MOSFETs, FETs, IGBTs

View all 24 components

  • Possible SD mux solution

    Nick Sayer08/05/2017 at 13:41 0 comments

    It appears that I may have a good solution for the problem of two SD cards on a single interface. The QS3VH257PAG8 is a 4 x 2:1 bidirectional bus multiplexer. It essentially is an on-off switch for signals going in either direction, and this particular chip's configuration has two switches in parallel with one side wired to a common pin and configured so that a selector input choses one or the other. Since this chip is a 4x, it would take 2 chips to handle the 6 pins on an SD card. Each chip also has an absolute enable, which we can tie to the GPIO pin from the controller that turns the card power on. The selector lines will go to another GPIO pin that will act as an A/!B line. When the select and enable lines are stable, the maximum propagation delay through the chip is 200 ps, which is mighty small (25 MHz is 40 ns), so I have fairly high hopes it won't get in the way.

    Edit:

    I've designed an experimental board with the v2.0 circuit (without the RNG) and one of these chips in place of the shared SPI bus. The SD initialization code will need a bit of adjustment, as the two separate !CS lines are replaced by an A/!B pin and a shared !CS line. But if it works, it will give me some measure of confidence that this will be an acceptable way to mux the cards on the next variant once the design with the SAME70 is ready. See? Knocking away the question marks a few at a time instead of all at once.

  • Skipping ahead a bit

    Nick Sayer08/05/2017 at 02:20 0 comments

    I've gotten a SAM E70 XPlaind developer board in the mail today. I need to figure out what to do with regards to a development environment still. I am strongly inclined to use an ARM compiler for Mac and the command line, as it's where I'm most comfortable. But the weight does seem to be behind running Atmel studio on a Windows VM.

    Meanwhile, some of the nice things about potentially using an ATSAMS70E19 for this is that it has a TRNG built-in. That means that the whole boost converter and avalanche transistor can go away.

    But to make up for that, I need to figure out some way to multiplex the HSMCI port, since the S70 only supports a single SD slot. What really complicates the hell out of things is that there are 6 pins of the HSMCI interface for an SD card: data lines 0-3, a command line, and a clock line (along with that is the power and ground).

    All of the data signals except the clock are bidirectional.

    My fervent hope is that all of the data lines can be shared and that just the clock line can be switched back and forth with a simple pair of gates. If that isn't going to fly, then the only choice is foregoing the HSMCI interface and just using SPI. We already know that works from the current generation of Orthrus. 

    In theory, a 25 MHz single-bit SPI system could transfer around 3 MB/sec, so having to start from a place of such low throughput would make it hard for the rest of the system to not make it worse, particularly given that there isn't (so far as I am aware) any support for pipelined or multiplexed I/O over USB. A 25 MHz 4 bit setup could do 12 MB/sec, which is much more in line with expectations. The only thing that would really get in the way is the fact that we need to intersperse the AES computations in every 16 bytes of I/O.

    Going to a faster ARM processor would let us go from 16 MHz SPI to 25 MHz as well as going to 480 mb/sec USB. But I'm dubious that those changes and the faster AES engine by themselves will be enough for us to crack the 1 MB/sec barrier.

  • Best Product Semi-finals!

    Nick Sayer08/01/2017 at 20:37 1 comment

    I am overjoyed that Orthrus has been chosen as one of 20 semi-finalists for the Hackaday Prize best product round. I totally did not see this coming (to be truthful, I expected my other entry to move on). I've not been doing a lot with Orthrus of late mostly because the current design as it exists on Tindie hasn't sold even one and I had other more interesting irons in the fire.

    But all of that changed today!

    The basic functionality of Orthrus as it is today is there, but Orthrus is just too slow to be taken seriously. If this is going to be worthy of the label Best Product, then it needs to be at least within an order of magnitude or so of customary USB/SD mass storage device speeds - something north of 1 MB/sec (instead of the current 150 kB/sec), while retaining the current basic feature set and operational characteristics.

    The last long entry held out hope for the ATSAM4E16E, but it only supports full-speed USB. Given our new expectations, we need to find an interface capable of high speed USB (480 mb/sec, not 12). That, of course, will bring with it a whole new set of challenges - primarily getting the interface wiring just right. At first glance, the AT32UC3A4128S looks like it might be a contender. They're $6.10 @ Q:1 from DigiKey and in stock. But in addition to the aforementioned high speed USB challenges, this chip also brings with it the challenge of programming over JTAG (which I've never done before) and BGA reflow (which I've also never done before). And since it's BGA, that means moving to a RoHS reflow process, which - again - is something I've never done. I'm also going to have to figure out how to use the hardware SD interface on this chip as well as adapting the existing firmware to the UC3 architecture generally (the good news is that LUFA does support it).

    It's always really nerve-wracking to have so many "firsts" all at one time... the really hard part is if it doesn't work, it's not always easy to tell which of the firsts is the one you've gotten wrong. Fingers crossed.

    EDIT:

    After a nice twitter conversation with MarkAtMicrochip, another contender is the ATSAMS70N19. There's a nice eval board for the E70, which Mark explained is a superset, so I've ordered one to start getting familiar with the toolchain and architecture and whatnot. One of the remaining questions will be how to multiplex two SD card sockets across a single HSMCI interface, but I can't imagine there isn't some easy way to do that with just a GPIO pin as a "slot select" and some external buffers.

  • Chip choice for next gen

    Nick Sayer05/21/2017 at 01:13 1 comment

    I had a very nice chat just now with a very nice guy from Microchip (he used to work for Atmel before the acquisition). I didn't ask permission to use his name, so I won't mention it, but he was quite helpful. I went over my requirements for the next-gen Orthrus and he recommended the ATSAM4E16E. I haven't looked myself yet at the datasheet, but he said it has high speed USB, hardware AES (with 256 bit keys) and 4 bit SD support, which is exactly the feature set I need.

    Since it requires me to get a whole new toolchain for 32 bit micros, I'm not really beholden to AVR over ARM or PIC (I believe this one is an ARM cortex M4), and (again, I haven't verified it - I'm typing this from Maker Faire) these come in QFN packages as opposed to BGA (I'm familiar with the former as opposed to the latter).

    So I think that's definitely a direction to go in.

  • USB VID

    Nick Sayer05/11/2017 at 16:35 0 comments

    I've started a GoFundMe campaign to get a USB VID.

    If you google around, you'll find a couple of avenues to obtain product IDs for open hardware projects. I've inquired of both of them and the silence has been deafening. The other avenue is to use Microchip's VID, since I'm using Microchip chips. Unfortunately, they haven't fixed their sign-up widget since acquiring Atmel, and they're not answering their e-mails either.

    So I have no choice but to "squat" on 0xf055 for the short term and try and raise money to obtain a legitimate USB VID longer term.

    I actually hope that this campaign gets enough notoriety to put some pressure on the USBIF to solve the problem of USB VID/PID for small manufacturers and makers. That would be a better solution than all of us trying to raise $5K to get a range of 65,536 PIDs of which we will each use a tiny fraction.

  • Switching off the RNG

    Nick Sayer05/08/2017 at 16:20 0 comments

    A lot of folks have said that leaving a transistor in avalanche mode is bad for its long term health, but I'm not convinced. The metrics of that health that continuous avalanche impact are important for the normal use of the transistor, but in this case the transistor has no other function.

    Still, if nothing else it seems a waste of power to run the 20v boost converter and avalanche circuit continuously when they're needed only for a few ms every once in a while.

    To that end, I've decided future hardware will include a logic output from the controller to the boost converter's !SHDN pin to allow the avalanche supply to be turned on and off.

    But there's a wrinkle there: you can turn off a boost controller, but there will still be a conduction path from the boost input supply through the inductor and catch diode to the output. Without taking extra steps, you can never turn a boost supply completely off.

    Fortunately, there's an easy solution to this, and it's apparently a classic one. You connect a P channel MOSFET up on the output of the boost converter and connect the gate to the input power supply. P channel MOSFETs are high impedance ("off") when the gate voltage is (nearly) equal to the source voltage, which will be the case when the boost converter isn't switching. Usually you turn on a P channel MOSFET by dropping the gate voltage, but in this case it will turn on because the source voltage will rise relative to the gate. The result is a true on-off controlled "high" voltage power supply. Exactly what we want. This doesn't switch the inverter chip on and off, but that's ok. It should wind up in a stable state without any input and take relatively little power on its own.

  • On diffusion

    Nick Sayer05/08/2017 at 04:39 0 comments

    In searching around for information about the state of the art in WDE, I came across this particularly interesting article.

    One problem with whole-disk encryption is that you're generally not allowed to alter the block size. At this point, it's almost completely universal that we use disks (or pseudo-disks) that are simple one-dimensional arrays of 512 byte blocks.

    One desirable quality of encryption is that you'd like to know if someone tried to tamper with the ciphertext. In general, this means either using authenticated modes or adding a MAC to the ciphertext. Unfortunately, this means that the ciphertext (or ciphertext plus MAC) is longer than the plaintext. For WDE, this is untenable.

    Since we can't add any bits to the block to authenticate the content, the best we can do is try to use encryption to perturb errors so that an adversary can't, for example, be allowed to flip arbitrary bits in the ciphertext to flip the same bits in the plaintext. Such an adversary would be able to modify files in place, which is almost as good as being able to read them.

    XEX (or XTS) will cause a 16 byte corruption in decrypting a block that has a single bit flipped. That blunts an attacker's ability to modify files. It would, however, be better if the mode could cause an entire block to be corrupted beyond recognition if a single bit of the ciphertext is altered. This property is called diffusion. Diffusion and confusion are two basic properties of a cipher. Confusion means that each bit of the ciphertext relies on more than one bit of the key, and that different bits of the key combine in an unpredictable pattern to alter bits of the ciphertext. Diffusion means more or less the same thing with regard to the plaintext during encryption and ciphertext during decryption. Altering one input bit will cause radical changes to the entire output. Both confusion and diffusion are necessary to prevent statistical analysis of a cipher. This was all worked out by Shannon in 1945.

    Ideally, we'd use a 4096 bit block size cipher for WDE, but that isn't practical. XEX provides confusion by perturbing the plaintext and ciphertext on both sides of the encryption operation, but because it handles each 16 byte AES block individually, it supplies no diffusion.

    So far as I can find, since the BitLocker post was written, there haven't really been significant advances on the diffusion front for WDE. So far as I am aware, most solutions still use plain XTS (or XEX), meaning that a single bit flip will cause a 16 byte aligned block diffusion error and no other changes beyond. It certainly blunts bit-flipping attacks, but doesn't really eliminate every possibility of efficacy.

    What does this mean for Orthrus? Not much. Orthrus differs from most WDE systems in that Orthrus isn't really intended to be a primary volume (not something on which you'd install an operating system to boot) so much as an offline storage system. It's intended to take away the job of key management for a particular, limited use case. So we're going to stick with XEX.

  • The impact of XEX

    Nick Sayer05/07/2017 at 03:14 0 comments

    Just for completeness' sake, I coded up an implementation of XEX for Orthrus just to do a speed comparison. It's a third slower - around 150 KB/sec instead of 225 KB/sec. I'm fairly confident that most of this stems from the fact that the encryption cannot be precomputed in the background and must be done interactively as the block is read and written from the card. It's not as bad as I had feared, but it's certainly an impact on what is already quite a slow mass storage device.

    Still, I think the weakness of straight counter mode make the changeover to a very widely used encryption mode for the given purpose seem like a good move. With this change, we can truly say with a straight face that we're doing whole-disk encryption using universally accepted standards.

    Incidentally, if you google it, you'll find that most implementations of WDE talk about using XTS rather than XEX. However, the two are equivalent if the disk sector size is an even multiple of the cipher block size, which is the case for us. Some implementations use two separate keys - one to encrypt the nonce to form the tweak and one to encrypt or decrypt the data. However, the value of doing that seems (in the literature) to be disputed, so we just use the same key for both. If we had to pick two different keys, we could do so by cutting the volume ID in half and performing the key derivation twice - once on each half.

  • Crypto standards validation

    Nick Sayer05/02/2017 at 14:39 0 comments

    It turns out that the method I'm using to derive the volume key is just AES-CMAC-PRF as described in RFC-4615. In other words, we're just calculating the AES-CMAC-PRF with the concatenation of the two card keys as the "key" and the volume ID as the "data."

    On the other hand, counter mode isn't the best choice for the block encryption. If an adversary can force you to write a known plaintext to a disk block and then observe the encrypted result, they can discover the pre-ciphertext stream for that block. It is then possible for them to trivially recover any plaintext written to that block anytime after that. The only mitigation possible for this scenario is to use a mode that includes the plaintext in the cipher usage itself (rather than just XORing it as the last step). XEX mode is widely used in whole-disk encryption and has this property. The trouble with this for Orthrus is that it means that the pre-ciphertext can no longer be pre-computed in the background, so performance would suffer, possibly fatally (performance is already quite constrained compared to other microSD card readers).

    So Orthrus will retain counter mode at least for the initial version. That means that Orthrus won't be resilient against more sophisticated attacks which assume an adversary can force various requests of his choosing.

    An improved performance version of Orthrus would have high-speed USB and perhaps a native SDHC controller of some sort. There are more sophisticated microcontrollers that have these features, and they might have the horsepower to support XEX mode as well (and use AES-256 possibly), but they're 144 pin TQFP or BGA packages and at least double the price of the current device. Not out of the question, but not... today.

  • USART in SPI master mode FTL

    Nick Sayer04/30/2017 at 23:57 0 comments

    I took a scalpel to my Orthrus prototype today to swap the wiring for MOSI and SCK so that I could try out USART0 in SPI Master mode. It took quite a bit of swearing to get the kludge wires to work, but they finally do, and at least with the code I've written, USART0 in SPI master mode is around 5% slower than straight-up SPI.

    I'm surprised by this, but I've stared at the code and experimented with it for a while now and I can't see any improvements to be made. The USART code works, it just doesn't work any better.

    So with that, the final performance numbers I'm getting on a small variety of different SD card makes is around 225 KB/sec.

    It's possible that a future version of LUFA might bring improvements in performance - in particular the ability to use ping-pong buffers might be a big boost (if it's the USB performance that's throttling the system). To test that out, I replace the disk block read method with one that skips all of the SPI stuff and just reads zeros. That achieved a throughput of ~270 KB/sec - only 20% faster than actually doing the I/O properly.

    So with that, I'm going to declare that v2.0.1 is ready for prime time. v2.0.2 just swaps SCK and MOSI. I will keep that change going forward just in case there's some sort of epiphany down the road that makes USART SPI mode work, but there's no reason not to release the current design now.

View all 35 project logs

  • 1
    Step 1

    Using Orthrus

    To use Orthrus, just stick any two SDHC or SDXC microSD cards in the slots and connect a USB cable to your host. You can do this in the opposite order if you wish - the microSD slots are hot-swappable. If the two cards have not been previously paired with Orthrus, then the error light will turn on. Press and hold the button and the error light will blink for 5 seconds and then the cards will be paired and initialized. At that point the ready light will turn on and the host will see a volume with twice the space of the smaller of the two cards. You will need to use your host to initialize this volume. After that, it works just like any other USB storage. When ejecting the volume, you can either remove the USB cable or the two cards first.

    If you insert an Orthrus paired card into a computer, it will look like a card filled with garbage. If you damage the key block (block 0 on the card), then THE ENTIRE VOLUME ON BOTH CARDS WILL BE DESTROYED. Once the key material is corrupted, then all the data is irrecoverably lost. That's kinda the point, of course.

    There are three lights on Orthrus - ready, activity and error. "Ready" indicates that a correctly matched pair of cards have been inserted and the volume is available to the host. "Error" means that the two cards that are inserted are not a matched pair. You can press the button to pair two such cards, but that will destroy any data on both of them. You can hold the button down for 5 seconds (the error light will blink while you do this) at any time and the two cards will be initialized. If you do this while two paired cards are inserted then all the data on the volume will be destroyed and the volume made ready for new data.

    It does not matter which card of a pair is inserted into each slot. The two slots are marked on the board, but in use they are fungible.

View all instructions

Enjoy this project?

Share

Discussions

matt venn wrote 05/09/2017 at 08:58 point

Hey Nick,

thanks a lot for posting this - I learnt a lot over my morning coffee! You made it very easy to browse - even a pdf schematic, nice.

  Are you sure? yes | no

Nick Sayer wrote 05/09/2017 at 13:34 point

You're quite welcome, and thanks for saying so! For security related projects, I feel like it's really important to shine a bight light onto every aspect so it's obvious that nothing is hiding behind a curtain. Inviting scrutiny is the only way you can have any confidence that you got it right.

  Are you sure? yes | no

tz wrote 05/08/2017 at 21:12 point

Depending on configuration, you might find my SPI transfer faster

https://github.com/tz1/sparkfun/blob/master/fat32lib/sdhc.c

I also have write protect and password protect routines, and the whole is a minimal FAT32 implementation, originally for Sparkfun's OpenLog.

  Are you sure? yes | no

Nick Sayer wrote 05/08/2017 at 23:13 point

Thanks for that! I'll definitely take a look. I bought an OpenLog a while ago to capture logs from my GPSDOs. I don't need filesystem support for Orthrus, but I'll take any opportunity to see if there are better ways to get the block I/O done.

  Are you sure? yes | no

Nick Sayer wrote 05/09/2017 at 00:51 point

The big difference I see is that you've been very clever about timing your writes to SPDR to eliminate the dead time caused by the read-and-test-branch busy-wait operation. I thought I could achieve something similar by using the ATXmega USART-in-SPI-master mode functionality - the transmit register is double-buffered, which in principle means that you can always have a byte going out. At least for my first experiments that didn't work out so well, but I am contemplating giving that another go at some point.

  Are you sure? yes | no

Clayton G. Hobbs wrote 04/29/2017 at 00:25 point

Very interesting idea!  Couldn't the whole thing be done in software though, using two normal SD card read/writers, and with faster data transfer speeds than are possible with Full Speed USB?

  Are you sure? yes | no

Nick Sayer wrote 04/29/2017 at 00:57 point

Very likely. This does, however, turn the whole concept into an appliance that's very easy to use. You could write a FUSE module to do an interoperable version of this for Linux, certainly.

  Are you sure? yes | no

Martin wrote 04/14/2017 at 08:50 point

Be careful with your entropy generator. When you want to use noise as a RNG you have to keep noise out :-) That means any non thermal, non random noise. So you have to use good decoupling and shielding for your noise generator. Otherwise there could be some interference from power line hum or your local (AM) radio station which compromises your randomness and thus your security, because it ads a deterministic element.

If  the Atmel is too weak perhaps the recently discussed STM32F103 could be a solution.

  Are you sure? yes | no

Nick Sayer wrote 04/14/2017 at 13:40 point

I plan on gathering a goodly chunk of the entropy from the generator and running it through DieHarder to insure that it's of good quality, plus it's going to be run through AES to whiten it before it's actually used. This design is well worn. It's the basis for several open hardware entropy source peripherals out there, so I am fairly confident.

  Are you sure? yes | no

Similar Projects

Does this project spark your interest?

Become a member to follow this project and never miss any updates