Close

Even more drafting

A project log for n00n - Real Time Music Sensor Streaming Protocol

MIDI is so outdated, welcome to the 20s and the 16-bit world !

yann-guidon-ygdesYann Guidon / YGDES 12/13/2022 at 19:590 Comments

The last draft gave a broad outline of the system. Let's refine it.

The very first defining characteristic is that the data stream must be able to be transmitted over a SPDIF link. Hence the 32 bits per atomic data. This is equivalent to what a CD player would output but instead of sounds, the data would represent sensors' values.

Chaining

Like MIDI, the devices can have an input and/or an output. Devices can be daisy-chained (but there is no "Thru") so several instruments (keyboards, knob panels, pedals, expanders, mixers, whatevers) share a single stream than can be recorded by the last devices in the chain.

Of course it is possible to have multiple inputs in a multiplexer-like devices that will coalesce all the data into a coherent stream, but it will have to rewrite all the timecodes.

In a chain, the first device receives no information so it outputs its own timecode, which synchronises the rest of the chain. A timecode generator could be an independent device, or generated by a master DAW. The rule is simple though : if a device does not receive an external timecode in one second, it outputs its own local timecode, otherwise it follows and synchronises to the external timecode (using a software PLL for example). Of course, the received timecode must be valid, with like 3 consecutive timecodes with coherent values (they must increase monotonously at a reasonable rate).

The problem with chains is when/if a device in the middle is dysfunctional... or if one link is broken. This also cuts all the upstream devices.

Bandwidth

At the end of the chain, the last device receives the sum of all the streams generated upstream. The result might exceed the bandwidth available. Several techniques are borrowed from switched packet networks.

Timing

The stream contains timecodes that may be ignored in real-time performance but allow recording and playing in plain files (as do MIDI streams already). The timecode format allows a simple and direct timeshifting (because it's plain fractional binary) and can wrap around when the range is exceeded (so you can record more than 9h of performance. Only timecodes that leap forward (within a decent window) are legal, otherwise they get discarded (unless too many are received during 1 second).

Well there is also the exception of a null timecode, value #0000, indicating "no internal clock", for example for dumb sources with no input. This is not recommended because the data/packets could be discarded and the sink (or the next device in the chain) could rewrite the timecode (though it's extra efforts that would be best spared for more useful things).

But there is the question of a source that does not increment its own timecode for a while, to send a large chunk of data for example, because it would clog the chain if the chunk exceeds the FIFO size.

Here, the timecodes also replace the "sequence numbers" found in other formats/protocols. This is due to several factors :

This results in a shorter/smaller header.

Compression

Since all the packets must be stand-alone and no resend is possible, the compressed data can't rely on time-based deltas. The #Recursive Range Reduction (3R) HW&SW CODEC looks ideal for this situation. A baseline compaction/decompaction routine would be developed to handle 256 numbers of 16 bits. Negative numbers would be pairs of 16-bit numbers with one of them zero, and the result is the subtraction of these numbers (which can be positive or negative and 3R would manage the 0 value).

Data sinks

Since the stream is potentially asynchronous and lossy, the data sinks (the expanders, oscillators etc.) must correctly interpolate data at the internal higher sampling rate.

Any source could fall offline at any moment so failsafes, watchdogs, timers etc. must detect and correct "anomalous conditions". Proper operation should be resumed within 1 second. After all, you never know when your mate will trip on a dangling wire, and you don't want it to end the whole show.

Naming

The sinks "attach" or "listen" to a given channel, associated to a given source. Each channel has a UTF-8 label sent regularly by the sources or upon receiving a "ping"/"enumerate" packet, so the users don't have to rely on numbers only. If a source receives a packet that contain its own channel ID, it must change its own ID to a different random one to prevent collision (should be changed) : this is a dynamic arbitration system, so the label is what really matters. Thus, as long as all the labels are different, you could plug any device anywhere in the chain and forget about low-level IDs, they would be random anyway (unless you decide to fix them statically). Eventually, a message could order a given channel to change its ID or label for convenience.

With 16 bits for channel ID (leaving 16 bits for type and flags), there are low chances that 2 random IDs will collide. See the Birthday Paradox however. But with half a dozen devices in a chain, and a decent re-allocation scheme with decent entropy, the operation should be pretty smooth. Even crazy long chains should work, and bandwidth+latency will become the problem (particularly with a store&forward mechanism) long before ID collision becomes a critical thing.

Anyway, each source MUST have a means to input its UTF8 label. It could be a simple push-button on the front panel that will enable the reception of a rename packet command, or a full-blown keyboard, or whatever...

Length of the label could be "up to" 255 bytes though maybe 32 or 64 bytes could be stored by smaller implementations...

 
-o-O-0-O-o-
 

Wow, the more I write about it, the more it draws from 2 decades of design experience. The #PEACand #3R algorithms were designed for this sort of application and purpose, so I'm glad they all come together at last. The protocol does not look like Open Sound Control at all since OSC is a verbose JSON/XML-like textual format, while N00N is raw binary and lightweight to interpret with limited CPU resources. Maybe I'll succeed one day in implementing it in my #RD1000 ?

Discussions