Close
0%
0%

n00n - Real Time Music Sensor Streaming Protocol

MIDI is so outdated, welcome to the 20s !

Similar projects worth following
This is a draft for a one-way continuous-flow packet protocol for all those digital music instruments that suffer so much from degraded expression with MIDI's crude quantisation.
MIDI was adapted to the state-of-the-art systems and technologies of the 1980 era, it's a venerable and reliable protocol but it had no headroom. 128 levels are not enough for everybody and the whole MIDI landscape has become a miserable kludgefest! Come on, we now have "plug and play", USB and more powerful, smarter devices today...

So let's bump to 16-bit quantities, high bandwidth, bulkier packets that fit in Ethernet, SP/DIF, TCP or other modern transport, and high-speed refresh for true-to-life capture. The actual bandwidth can be close to this of sampled sounds since "CD quality" has used 1.5Mbps for decades now, and storage is dirt cheap, why do we limit ourselves ?

Using the datastream or converting it to actual sounds is left as an exercise to the reader.

Let's replace MIDI at last, after 40 years of holding us all back. Today, it is not limited to a slow serial link, let's tunnel the data through the higher speed S/PDIF or Toslink interfaces, capable of 1.5Mbps. Just don't listen to the stream as a sound. Or send over raw UDP packets, whatever.

At this moment, N00N is mostly a real-time oriented, one-way, general encapsulation protocol. There is room to define almost anything, even encapsulate MIDI if you wanted, but there is also support for compression.

While MIDI represents music performances with a lot of "discrete events" that describe state changes (like a press or release of a key), N00N is stateless and represents the complete state of the devices, all the time, allowing for "a certain amount of losses" that makes it more resilient.

 
-o-O-0-O-o-
 

Logs:
1. 16 bits
2. More drafting
3. Even more drafting
4. Header and payload checksums with PEAC
5. The project is renamed !
6. Packet types
7. Topology
8.

N00N_header.h

Header file with constants and field definitions

x-chdr - 1.18 kB - 12/22/2022 at 04:15

Download

N00N_checksum.c

Computes the checksums for the header and the payload.

x-csrc - 1.17 kB - 12/22/2022 at 04:15

Download

  • Topology

    Yann Guidon / YGDES12/15/2022 at 22:56 0 comments

    Similarly to MIDI, N00N uses a model where the devices are chained, which creates a hierarchy and enforces some priorities. But this is not the only possible topology. This mostly depends on the type of physical transmission interface, and we suppose a unidirectional stream.

    Each device should have an input and an output so it can be integrated in a chain, otherwise it is forced to be at one end or another of the chain, which makes it harder to integrate smoothly.

    "Hubs" or concentrators are possible but they require enough CPU to resynchronise the incoming streams. This does not solve many issues and introduces more of them.

    N00N over IP (UDP) is possible but this requires more complex methods to configure each device. Static or dynamic addresses are only one easy aspect, devices must also "discover" where to send their own stream... The LSB of the IP address then matches the device's ID. 10BaseT and 100BaseT have latencies in the millisecond range when taking all the OS overhead into account, and it's a big source of jitter... S/PDIF has a much lower latency because data are processed at sample speed, in very small chunks and with very short FIFO.

    I was able to reach sub-millisecond latencies with Wiznet modules in a barebones configuration but i'm not sure it applies here.

    It would be interesting to use RJ45+50ohms twisted pairs for connection between devices because the parts and cables are very widely spread, though there is always a risk of compatibility with other standards (and PoE could burn the devices). RCA and 75 ohms is used in some older video applications and is not expensive.

    ___________________

    Back to the daisy chain.

    The simplest configuration is a source device (like a keyboard) directly connected to a sink device (an expander).

    The chain can be expanded by adding more sources (more keyboards, more knobs) upstream, and more sinks downstream. The ends of the chain allow this addition, though there is the risk of breaking it in the middle.

    Performance recorders can be added at the end of the chain to capture (and replay) the stream, there can be sequencers as well... But most probably, a computer will act as a DAW at the ends of the chain :

    • upstream : it will generate the Tick (beware of OS jitter !) and send configuration messages (label, ID etc.)
    • downstream : it will be able to receive all the source data, massage and process the raw information, implement flexible synthesis...

  • Packet types

    Yann Guidon / YGDES12/15/2022 at 20:32 0 comments

    Here I will try to list all the types of packets I can think of, to allocate numerical identifiers.

    Each data source, or device, can generate any type of packet. A device can generate "keyboard" and "knobs"  packets, but can only have one set thereof. So if your device has 2 keyboards, you have to emulate 2 separate devices.

    (note : after being suggested the mixing tables, it seems I'll have to create structured packets with sub-fields...)

    There are 16 bits in the header for the type and flags fields so each could take one byte at first glance, though there is no strict boundary. Flags could supplement the type and implement subtypes, for example, or an important type could use fewer bits and allocate more flags.

    The 2 LSB are not reserved but that's where the compression flags are located :

    • 00 : no compression, raw data
    • 01 : 16-bit 3R
    • 1x : reserved, could be LZW or others TBD.

    Compression is optional and provided to save bandwidth so it might be ignored by some data sinks. That's why the source should send raw data once per second. Furthermore : compression does not work all the time and could eventually expand data, so raw transmission is always possible as a failsafe.
     

    There are 4 big classes of packet types, using the 2 MSB :

    • 00 : Instrument (actual useful data)
    • 01 : Experimental (playground in the protocol, expect things to break)
    • 10 : Reserved (leave it alone)
    • 11 : Management (non-music data that keeps the system working)

    As usual, if a packet type is not recognised or understood, it is ignored.

     
    -o-O-0-O-o-
     

    The Instrument class :

    Type 0 is invalid (just in case).

    1. Knobs : just a collection of 16-bit values that could represent potentiometers, buttons, slide pots, ribbons, whatever : it's the most generic type.
      Format : 16-bit size prefix, followed by as many 16-bit unsigned values (that may be compressed, see the compression flags).
      The data sink is in charge of "patching"/associating each value with a meaning or function.
    2. Keys : a string of 16-bit values representing the key's absolute position for all the keyboard.
      Format : 1 byte of length prefix (the number of keys),
       1 byte of offset (the position of the lowest note/key)
      Followed by as many 16-bit unsigned values (that may be compressed, see the compression flags).
      Note : no other flags than the 2 compression LSB are defined so there is a lot of room to play with.
    3. Aftertouch : a shadow of the Keys type that adds pressure information. Could be redundant but it's available anyway. Same format as Keys.
    4. Bend : another shadow of the Keys for lateral pressure on the keys. A fun novelty. Same format as Keys.
      A second Bend might be possible, like BendY and BendZ...
    5. Mixing table : a collection of Knobs packets ?

    .

    Well, that's about it for this class, there is not much else to add, but there is room for extension.  Knobs can represent about anything anyway, so other types would be required if a different data format is required, for example events for drums ?

    .

     
    -o-O-0-O-o-
     

    The Management class

    That's where the kludges are.

    For convenience, the numbers are decreasing so I'll use increasing negative numbers at this moment.

    -1 : Tick

    It's just a header that is passed along the chain to synchronise all the other clocks downstream. No data payload. Upon reception :

    • Make sure the time code is coherent (sample for 1 second ?)
    • Update the local clock (and/or the soft PLL)
    • Make sure that the Channel_ID does not collide (otherwise, rename yourself randomly and send a rename message downstream)

    If no tick received within 1 second, send your own tick (8-16Hz ?)

    -2 : Ping

    This is another header with no payload, that is used to enumerate the devices and their IDs in a chain.

    When a device receives this header, it must send its own Ping header before forwarding the one(s) it received. This is to prevent storms of packets, FIFO stuffing/overflow, and it also preserves the order of the chain, unlike other methods. The...

    Read more »

  • The project is renamed !

    Yann Guidon / YGDES12/15/2022 at 02:20 0 comments

    I didn't believe it but the deal is done !

    "Noon" is the English word for "midi", but most good domain names for "noon" were already taken.

    Only tonight did it occur to me that I could simply replace O with 0 and n00n.org is now mine.

    Now I have to rename a few stuff but it's fortunately not a large corpus yet.

  • Header and payload checksums with PEAC

    Yann Guidon / YGDES12/14/2022 at 23:32 0 comments

    I have played with different ideas for header checksums for a long time. At one point I even considered Hamming SECDED but #PEAC is "just right" (and it works with 16 bits of granularity). Here I'll define the header, how to build it and how to check it.

    Let's have a look at a description of the S/PDIF logic format : https://www.ni.com/fr-fr/support/documentation/supplemental/06/developing-a-spdif-input-module-in-labview-fpga.html (and kudos to #Propeller S/PDIF Receiver as well)

    Sample data can use 16 to 24 bits but the packet structure is not absolutely strictly defined, the number of channels is left to the implementation. This could create an ambiguity in the ordering of the samples, although there is usually a good understanding of what is left and what is right : S/PDIF differentiates channel A (left) and channel B (right) in the header. The S/PDIF stream could be either mono or stereo, but an adaptation layer should properly reorder the samples (and pre-parse packets). Let's assume for now that we get a continuous stream of 16-bit words, that may be both channels interleaved OR a mono stream. These data could also come from a serial port or a UDP socket... It's just a stream. Just make sure you don't mix left and right or that a mono stream is not expanded to duplicated samples in a stereo stream, or the channels are shifted or...

    In a "continuous stream" context, each packet/message should ideally be separated by a pair of 0 samples. If transmitting over UDP however, this is not required since the datagram provides a clear structure of the dataflow. Thus the header format does not include the extra leading or trailing 0s (they are a bit superfluous, maybe I am superstitious).

    S/PDIF transmits LSB first and ChanA/Left first. The header must be aligned to this 32-bit boundary even though it could be in a mono stream, just pad with a 0 sample until the header is correctly aligned.

    Let's now look at the header's structure :

    typedef struct {
      uint16_t
         Sign1,  // 'N', '0'
         Sign2,  // '0', 'n'
         Timecode_Frac,
         Timecode_Sec,
         Channel_ID,
         Type_Flags,
         Payload_Words,
         Header_checksum;
    } N00N_header_struct;

    To prevent Endian problems, I use only this 16-bit representation. Specific implementations could be optimised and process 32 bits at once but a reference is needed first.

    Two fields Sign1 and Sign2 are fixed so they are not considered during the checksum. The last field is the checksum itself, the result, so only 5 fields remain. PEAC16x2 uses 2 additions per word, plus a finishing round, so that's about 12 arithmetic operations. It's not a lot so they can be unrolled. I started from the reference code of PEAC found in snippets_v3.c :

    void CHKS(uint16_t m) {
      // extra caution with the extension of the sizes:
      CHKSC += CHKSX;     // 0..10001
      CHKSC += CHKSY;     // 0..2FFFF
      CHKSY  = CHKSX + m; // 0..1FFFE
      CHKSX  = CHKSC;     // 0..FFFF
      CHKSC >>= 16;       // 0, 1 or 2
    }
    

    I unrolled, substituted some variables and values, et voilà :

    #define PEAC16X2_INIT1 (0xABCD)
    #define PEAC16X2_INIT2 (0x4567)
    
    uint16_t N00N_header_checksum_ref2(N00N_header_struct* header) {
      uint32_t C, X, Y;
    
      C  = ((2*PEAC16X2_INIT1) + PEAC16X2_INIT2) + header->Timecode_Frac;
      Y  =   ( PEAC16X2_INIT1  + PEAC16X2_INIT2) + header->Timecode_Sec;
      X  = C & 0xFFFF;
      C >>= 16;
    
      C += X+Y;
      X += header->Channel_ID;
      Y  = C & 0xFFFF;
      C >>= 16;
    
      C += X+Y;
      Y += header->Type_Flags;
      X  = C & 0xFFFF;
      C >>= 16;
    
      C += X+Y;
      X += header->Payload_Words;
      Y  = C >> 16;
    
      return (uint16_t)(C+X+Y);
    }

    Total : 20 operations, half of them are pairable (ILP>2)

    This is used both by the encoder and the decoder, so there is no discrepancy or code duplication.

    From there it is also easy to deduce the corresponding code for the payload's checksum :

    void N00N_payload_checksum(uint16_t* buffer,
        uint16_t* r, uint16_t* s, uint16_t Payload_Words) {
      uint32_t C, X=0, Y;
    
      C = Payload_Words
        + (PEAC16X2_INIT1 << 16)
        +  PEAC16X2_INIT2;
      Y = C >> 16;
      C &= 0xFFFF;
    
      while...
    Read more »

  • Even more drafting

    Yann Guidon / YGDES12/13/2022 at 19:59 0 comments

    The last draft gave a broad outline of the system. Let's refine it.

    The very first defining characteristic is that the data stream must be able to be transmitted over a SPDIF link. Hence the 32 bits per atomic data. This is equivalent to what a CD player would output but instead of sounds, the data would represent sensors' values.

    • The stream could be played from a pre-recorded audio CD for example
    • conversely the stream could be recorded to a CD, but not a MiniDisc because it uses lossy compression.
    • The medium can reuse cheap TOSlink transceivers and fibres, or 75Ω RCA connectors/patch cables, but UDP over RJ45 could also work for encapsulation.
    • The stream could be received and emitted by a DAW (Digital Audio Workstation) that recognises the specific type of data.
    • Since it's considered as raw audio data by underlying components, it is transparent and could find its way in the web browsers thanks to the HTML5 Web Audio API. N00N could even encapsulate other one-way protocols over these links...
    • SPDIF/Toslink transmits ancillary data in extra bits but they are not accessible to most data sinks. For example, track names can be sent by a CD player, but this is not available in the raw audio data stream. Hence ALL the management data must be transmitted "in band". Which is nice because this allows easy storage in a dumb computer file.
    • Inactive state of the stream is "0" samples. They must separate the packets on a continuous stream (this eases parsing a bit).

    Chaining

    Like MIDI, the devices can have an input and/or an output. Devices can be daisy-chained (but there is no "Thru") so several instruments (keyboards, knob panels, pedals, expanders, mixers, whatevers) share a single stream than can be recorded by the last devices in the chain.

    Of course it is possible to have multiple inputs in a multiplexer-like devices that will coalesce all the data into a coherent stream, but it will have to rewrite all the timecodes.

    In a chain, the first device receives no information so it outputs its own timecode, which synchronises the rest of the chain. A timecode generator could be an independent device, or generated by a master DAW. The rule is simple though : if a device does not receive an external timecode in one second, it outputs its own local timecode, otherwise it follows and synchronises to the external timecode (using a software PLL for example). Of course, the received timecode must be valid, with like 3 consecutive timecodes with coherent values (they must increase monotonously at a reasonable rate).

    The problem with chains is when/if a device in the middle is dysfunctional... or if one link is broken. This also cuts all the upstream devices.

    Bandwidth

    At the end of the chain, the last device receives the sum of all the streams generated upstream. The result might exceed the bandwidth available. Several techniques are borrowed from switched packet networks.

    • A decent FIFO is recommended at the input of the device. 4KiB seems to be a minimum. Too large would create "bufferbloat" and losses in subsequent devices which don't have such a large FIFO. 16KiB is a decent value that I hope is never getting filled completely. BTW : ideally, FIFO size should not be required to be larger than the size of the packets that the device can send, but some margin can always help.
    • If the input FIFO overflows, the next incoming packets are discarded entirely
    • This is why each packet should be "standalone" and not depend on the next or previous packet.
    • For a link with limited capacity, activity LEDs would indicate the FIFO usage and average bandwidth occupation.
    • Eventually, the device could have "nice" settings such as refresh rate (10, 100, 1000Hz ?) and prioritisation of its own packets (like: drop X% of my own packets and 100-X% of the incoming ones)
    • Devices closer to the end would mechanically have the higher priority because less chances to drop the data

    Timing

    The stream contains timecodes that may be ignored in real-time performance...

    Read more »

  • More drafting

    Yann Guidon / YGDES12/02/2022 at 14:51 0 comments

    This project is on hold for a long time because for now #PEAC Pisano with End-Around Carry algorithm is a critical priority and it must be explored and examined thoroughly, completely, before I can move on. But ideas keep bubbling up...

    I wanted to rename this format "NOON" because "MIDI" means noon in French. Whatever, the domain names are all taken. RTFM domain names are mostly taken as well so... I'll have to find a better definitive name. Later.


    UPDATE 20221215 : n00n.org is now dedicated to the project.


    All that aside, one of the main characteristics of the format is "it's a bit over the top". Let's allocate ample space because there is never enough of it once people start to use it. The most direct sign is its granularity : 16 bits per value, grouped in pairs, little-endian style, so the basic data transmission element is a 32-bit word. Just like a SPDIF stereo sound sample from a CD player (though SPDIF can increase to 24 bits if all are used). This eases alignment a LOT for modern CPUs and also helps with the checksum optimisation (fewer corner cases to manage).

    Since we're using pairs of 16-bit values, the time code can become "sort of" an absolute time code, with 15 bits for the number of seconds, and a fractional 16-bit part (so it's a S15.16 fixed point value, 9 hours should be enough and negative times are allowed). It's a good compromise that evades the question of the sampling speed, and signal sources can use a 18.432.000 Hz clock source, divide by 5×5×5×3×3×3=1125 and get a clean 16384 ticks per second resolution. So clock drift is mostly avoided and sources with different clock speeds or sampling speeds can adjust easily. A 1pps signal could even synchronise all the devices if needed, to reduce inter-device drifts/biases.

    Some of the packets can contain absolute, raw sample data, at least one per second, up to the maximum possible rate (64Ki per second ?) but bandwidth can be reduced through compaction and compression (looking at you #Recursive Range Reduction (3R) HW&SW CODEC)

    At first the purpose is to transmit absolute key pressure in real time, so it's not a sound format, though other things than keys can be sampled, in particular knob or pedal positions. This means that each type or family of sensor must be on a separate "channel", which is again a 16-bit value. The companion half-word will be flags for the channel, as well as type (so each sink can "accept" a stream of the proper type, knobs will be used as a different "auxiliary" input while the keys are the main input)

    So far we have packets split into a fixed-size header and payload, each protected by a word or half-word PEAC16x2 checksum.

    Header

    Pretty basic and general, not too compact but who cares today. After all it's not 32 bits per value right ? I try to keep the features down to the minimum : no complex fragmentation info for example.

    • Prefix : "N00n" in ASCII for frame start (that's 32 bits, right)
    • timecode in S15.16 signed fixed point format, in seconds. MUST be > the last one.
    • CTS : channel, type, flags (such as compression) (should fit in 32 bits)
    • Payload size (16) and header checksum (16)

    That looks nice and square : 16 bytes or 128 bits fits nicely in a cache line. Checksumming the header can take a short, fixed amount of code with 6 additions (because the prefix is known so no need to checksum it).

    The CTF (channel, type, flags) field can be flexible because one channel can accept multiple types so... Maybe we can define subtypes or type ranges. There is a bit of wiggle room, and uncertainty.

    Note : the header has its own shorter checksum to prevent corruption of the size field. Along with the "fixed" prefix, there are 32+16=48 bits for the parser to perform a first rought validation of the header. The timecode must be consistent too, and the CTF must also be coherent.

    Payload

    OK here it gets a bit more complex...

    1. the payload size field is in which granularity ?
    2. each "type" of data packet...
    Read more »

  • 16 bits

    Yann Guidon / YGDES06/17/2021 at 01:12 0 comments

    MIDI was awesome for its time and perfectly suited for the 8-bit microprocessors of the era (Z80, 6809, 6502, you name it). Now we're in 2020 and the interfaces and electronics have evolved !

    OK, Open Sound Control uses 32 bits but I'm not wanting to waste bandwidth for the sake of it. The motivation is mostly because I want to use SPDIF which uses pairs of 16 bits values and reasonably fast ADC easily handle 16-bit samples (hey, that's CD quality !) and beyond this resolution, it's more noise than anything. And the https://www.wiznet.io/product-item/w5300/ has a convenient high-speed bus with 16 bits.

    Of course, if only 8 bits make sense for something, that's fine and bytes will be packed.
     

View all 7 project logs

Enjoy this project?

Share

Discussions

Yann Guidon / YGDES wrote 12/02/2022 at 13:59 point

Time to revive this project...

  Are you sure? yes | no

Similar Projects

Does this project spark your interest?

Become a member to follow this project and never miss any updates