Close

More drafting

A project log for n00n - Real Time Music Sensor Streaming Protocol

MIDI is so outdated, welcome to the 20s and the 16-bit world !

yann-guidon-ygdesYann Guidon / YGDES 12/02/2022 at 14:510 Comments

(updated 20230625)

This project is on hold for a long time because for now #PEAC Pisano with End-Around Carry algorithm is a critical priority and it must be explored and examined thoroughly, completely, before I can move on. But ideas keep bubbling up...

I wanted to rename this format "NOON" because "MIDI" means noon in French. Whatever, the domain names are all taken. RTFM domain names are mostly taken as well so... I'll have to find a better definitive name. Later.


UPDATE 20221215 : n00n.org is now dedicated to the project.

All that aside, one of the main characteristics of the format is "it's a bit over the top". Let's allocate ample space because there is never enough of it once people start to use it. The most direct sign is its granularity : 16 bits per value, grouped in pairs, little-endian style, so the basic data transmission element is a 32-bit word. Just like a SPDIF stereo sound sample from a CD player (though SPDIF can increase to 24 bits if all are used). This eases alignment a LOT for modern CPUs and also helps with the checksum optimisation (fewer corner cases to manage).

Since we're using pairs of 16-bit values, the time code can become "sort of" an absolute time code, with 15 bits for the number of seconds, and a fractional 16-bit part (so it's a S15.16 fixed point value, 9 hours should be enough and negative times are allowed). It's a good compromise that evades the question of the sampling speed, and signal sources can use a 18.432.000 Hz clock source, divide by 5×5×5×3×3×3=1125 and get a clean 16384 ticks per second resolution. So clock drift is mostly avoided and sources with different clock speeds or sampling speeds can adjust easily. A 1pps signal could even synchronise all the devices if needed, to reduce inter-device drifts/biases.

Some of the packets can contain absolute, raw sample data, at least one per second, up to the maximum possible rate (64Ki per second ?) but bandwidth can be reduced through compaction and compression (looking at you #Recursive Range Reduction (3R) HW&SW CODEC)

At first the purpose is to transmit absolute key pressure in real time, so it's not a sound format, though other things than keys can be sampled, in particular knob or pedal positions. This means that each type or family of sensor must be on a separate "channel", which is again a 16-bit value. The companion half-word will be flags for the channel, as well as type (so each sink can "accept" a stream of the proper type, knobs will be used as a different "auxiliary" input while the keys are the main input)

So far we have packets split into a fixed-size header and payload, each protected by a word or half-word PEAC16x2 checksum.

Header

Pretty basic and general, not too compact but who cares today. After all it's not 32 bits per value right ? I try to keep the features down to the minimum : no complex fragmentation info for example, and the timecode (timestamp) has the dual use of a sequence number.

That looks nice and square : 16 bytes (not counting the prefix) or 128 bits fits nicely in a cache line. Checksumming the header can take a short, fixed amount of code (because the prefix is known so no need to checksum it) see log 4. Header and payload checksums with PEAC (deprecated).

The CTF (channel, type, flags) field can be flexible because one channel can accept multiple types so... Maybe we can define subtypes or type ranges. There is a bit of wiggle room, and uncertainty.

Note : the header has its own shorter checksum to prevent corruption of the size field. Along with the "fixed" prefix, there are 32+16=48 bits for the parser to perform a first rough validation of the header. The timecode must be consistent too, and the CTF must also be coherent. If something does not smell right, the packet can be discarded, since another will arrive very soon and the gap should not be noticeable.

Payload

OK here it gets a bit more complex...

  1. the payload size field is in which granularity ?
  2. each "type" of data packet has their own peculiarities and format.

What is sure is that the payload gets a full PEAC16×2 checksum (32 bits) inside the header. It's the same algorithm but it's not unrolled/inlined like the header. Enhancements can reuse some code from the header to handle blocks of 16 bytes / 4 words.

Can a payload be empty ? Well, sure, why not, so the header is used as a "hearbeat" for example. Like a clock pulse. Then the size field and payload checksum are zero.

Otherwise the count is in 32-bit words granularity (of course excluding the checksum word). This allows block transmission of almost 256Ki bytes chunks if needed. Which is a lot, sure. Ideal average max size would be 4K bytes or 1K words. But anyway : this is the abstract format that can be packetised or fragmented in any way you want (over S/PDIF, UDP, pigeons, whatever).

Keys

The keyboard's key positions are transmitted as a string of as many 16-bit values. There is a prefix too : the first value is an offset, or the index of the first key, to enable transposition. Beware of buffer overflows, of course packets that contain too many values must be rejected.

__________________________________________________________

Overall this is still quite simple, rough but sufficient to transmit asynchronous streams of data. The stream can be played by a keyboard, recorded, replayed, re-allocated, without putting too many constraints. The new/latest trick in the bag being the fractional time code, which allows more seamless temporal operations/edition. I hope it can be implemented in my #RD1000 mod one day.

Part of the simplicity comes from the one-way nature of the link. Unlike TCP for example, there is no notion of feedback, of management, of connexion to establish or break : just shoot the packets and count on the redundancy and oversampling to deal with the losses/alterations. Just throw packets away, it's much less critical than in an audio stream or a general/generic digital communication protocol, and the source does not need feedback.

Discussions