Header and payload checksums with PEAC (deprecated)

Update 20230620 : The header format has changed ! Check the newer log.

-------------------------------------

I have played with different ideas for header checksums for a long time. At one point I even considered Hamming SECDED but #PEAC is "just right" (and it works with 16 bits of granularity). Here I'll define the header, how to build it and how to check it.

Let's have a look at a description of the S/PDIF logic format : https://www.ni.com/fr-fr/support/documentation/supplemental/06/developing-a-spdif-input-module-in-labview-fpga.html (and kudos to #Propeller S/PDIF Receiver as well)

Sample data can use 16 to 24 bits but the packet structure is not absolutely strictly defined, the number of channels is left to the implementation. This could create an ambiguity in the ordering of the samples, although there is usually a good understanding of what is left and what is right : S/PDIF differentiates channel A (left) and channel B (right) in the header. The S/PDIF stream could be either mono or stereo, but an adaptation layer should properly reorder the samples (and pre-parse packets). Let's assume for now that we get a continuous stream of 16-bit words, that may be both channels interleaved OR a mono stream. These data could also come from a serial port or a UDP socket... It's just a stream. Just make sure you don't mix left and right or that a mono stream is not expanded to duplicated samples in a stereo stream, or the channels are shifted or...

In a "continuous stream" context, each packet/message should ideally be separated by a pair of 0 samples. If transmitting over UDP however, this is not required since the datagram provides a clear structure of the dataflow. Thus the header format does not include the extra leading or trailing 0s (they are a bit superfluous, maybe I am superstitious). See also 12. Streams vs packets.

S/PDIF transmits LSB first and ChanA/Left first. The header must be aligned to this 32-bit boundary even though it could be in a mono stream, just pad with a 0 sample until the header is correctly aligned.

Let's now look at the header's structure :

typedef struct {
  uint16_t
     Sign1,  // 'N', '0'
     Sign2,  // '0', 'n'
     Timecode_Frac,
     Timecode_Sec,
     Channel_ID,
     Type_Flags,
     Payload_Words,
     Header_checksum;
} N00N_header_struct;

To prevent Endian problems, I use only this 16-bit representation. Specific implementations could be optimised and process 32 bits at once but a reference is needed first.

Two fields Sign1 and Sign2 are fixed so they are not considered during the checksum. The last field is the checksum itself, the result, so only 5 fields remain. PEAC16x2 uses 2 additions per word, plus a finishing round, so that's about 12 arithmetic operations. It's not a lot so they can be unrolled. I started from the reference code of PEAC found in snippets_v3.c :

void CHKS(uint16_t m) {
  // extra caution with the extension of the sizes:
  CHKSC += CHKSX;     // 0..10001
  CHKSC += CHKSY;     // 0..2FFFF
  CHKSY  = CHKSX + m; // 0..1FFFE
  CHKSX  = CHKSC;     // 0..FFFF
  CHKSC >>= 16;       // 0, 1 or 2
}

I unrolled, substituted some variables and values, et voilà :

#define PEAC16X2_INIT1 (0xABCD)
#define PEAC16X2_INIT2 (0x4567)

uint16_t N00N_header_checksum_ref2(N00N_header_struct* header) {
  uint32_t C, X, Y;

  C  = ((2*PEAC16X2_INIT1) + PEAC16X2_INIT2) + header->Timecode_Frac;
  Y  =   ( PEAC16X2_INIT1  + PEAC16X2_INIT2) + header->Timecode_Sec;
  X  = C & 0xFFFF;
  C >>= 16;

  C += X+Y;
  X += header->Channel_ID;
  Y  = C & 0xFFFF;
  C >>= 16;

  C += X+Y;
  Y += header->Type_Flags;
  X  = C & 0xFFFF;
  C >>= 16;

  C += X+Y;
  X += header->Payload_Words;
  Y  = C >> 16;

  return (uint16_t)(C+X+Y);
}

Total : 20 operations, half of them are pairable (ILP>2)

This is used both by the encoder and the decoder, so there is no discrepancy or code duplication.

From there it is also easy to deduce the corresponding code for the payload's checksum :

void N00N_payload_checksum(uint16_t* buffer,
    uint16_t* r, uint16_t* s, uint16_t Payload_Words) {
  uint32_t C, X=0, Y;

  C = Payload_Words
    + (PEAC16X2_INIT1 << 16)
    +  PEAC16X2_INIT2;
  Y = C >> 16;
  C &= 0xFFFF;

  while (Payload_Words > 1) {
    Payload_Words--;
    C += X+Y;
    X += buffer[0];
    Y  = C & 0xFFFF;
    C >>= 16;

    C += X+Y;
    Y += buffer[1];
    X  = C & 0xFFFF;
    C >>= 16;

    buffer+=2;
  }

  C += X+Y;
  Y += PEAC16X2_INIT1;
  *r = (uint16_t)C;
  *s = (uint16_t)Y;
}

Pro tip :

If you're not sure the code really is a PEAC algorithm, remove all the references to external data, and you get the typical "add X to Y and Y to X" mantra.

...

The inner loop is about 15 opcodes for 4 bytes, it's a bit slower than Fletcher but way more secure and more than half of the opcodes can be excuted in parallel. It's not the fastest ever but it remains simple, with very low overhead : there is no corner case to handle because the payload's granularity is 32 bits for the whole protocol. There could be a way to make it even better by loading the buffer's values 32 bits at a time but I won't go there yet, yet I know how modern CPUs don't like 16-bit memory accesses. However, I don't want to deal with Endianness and no mainstream programming language accepts to handle carries after an addition :-(

Pro tip :

Even more drafting

The project is renamed !

Discussions

Become a Hackaday.io Member