In my previous post I proposed a series of primitive data types with only three possible sizes: 1 tryte (9 trits), 3 trytes (27 trits) and 6 trytes (54 trits).
Simplifying the number of data type sizes is not just an aesthetic desire for design elegance. It could also potentially improve the performance of reads from RAM. Conventional memory devices are designed to access chunks of data that are multiples of a byte. This is because reading from memory is agonizingly slow compared to the speed of a modern processor. If a memory device accessed each bit individualy it would have to do eight separate actions to store a single byte and each one of those actions would be at the same slow rate (slow compared to the processor). The most time-efficient solution would be to only ever deal with chunks of memory the same size as the largest possible data type. That way there is no decision making process or any need for multiple reads. However, this would waste lots of space since even a char would take up 8 bytes (64 bits).
There are various architectures, but most RAM devices take the middle road and store data in individual bytes (never bits) while providing access to 4 or 8 bytes at a time. These groups of bytes are called a chunk. If the processor is storing a char, it only uses one of the bytes in the chunk, a short would use two, etc. The problem is that this can lead to misalinged data storage. For example, an integer takes up four bytes but could span accross the boundry between two four-byte chunks in memory. When this happens the processor has to read both chunks in, then do a series of logical shifts on both chunks to remove the unwanted bytes from each chunk and finally combine the two chunks to recover the original integer. You end up having to perform half a dozen processor operations just to get the data you originally wanted to perform an operation on. Having fewer data sizes means fewer misalingments overall and fewer operations necessary to recover misaligned data.
If we use our 9-trit, 27-trit, 54-trit example above, a memory device that presented 3-tryte chunks (27 trits) would only have one boundary that could be crossed and only two ways to cross it. One way is that a 3-tryte data type could be misaligned with a 3-tryte chunk. The other is that storing a 6-tryte data type would always span two chunks, but might also be misalinged with the chunks and therefore span accross three chunks. Therefore, three reads would be needed to recover the data. This is analogous to a 64-bit double being misaligned with the 4 byte chunks in memory. The two chunks with most of the double would need to be read as well as the chunk with the overlapping piece of the double.
If a memory device presented 6-tryte chunks instead of 3-tryte chunks then both situations would still be possible, but in the case of the misaligned 54-trit data type, only two reads would be necessary since a 6-tryte variable couldn't span three 6-tryte chunks no matter how misalligned it was. The downside is that more shifts would be needed to correct a misalignment and more space would be wasted by data types smaller than 54 trits.
Of course, this all presupposes the availability of memory devices that not only store ternary values but also present chunks in multiples of three rather than two. In the meantime, an intermediary device to do translation between a prototype ternary processor and a conventional memory device wouldn't be exceptionally difficult to throw together. This would support proof-of-concept work on a ternary processor.