-
Topology
12/15/2022 at 22:56 • 0 commentsSimilarly to MIDI, N00N uses a model where the devices are chained, which creates a hierarchy and enforces some priorities. But this is not the only possible topology. This mostly depends on the type of physical transmission interface, and we suppose a unidirectional stream.
Each device should have an input and an output so it can be integrated in a chain, otherwise it is forced to be at one end or another of the chain, which makes it harder to integrate smoothly.
"Hubs" or concentrators are possible but they require enough CPU to resynchronise the incoming streams. This does not solve many issues and introduces more of them.
N00N over IP (UDP) is possible but this requires more complex methods to configure each device. Static or dynamic addresses are only one easy aspect, devices must also "discover" where to send their own stream... The LSB of the IP address then matches the device's ID. 10BaseT and 100BaseT have latencies in the millisecond range when taking all the OS overhead into account, and it's a big source of jitter... S/PDIF has a much lower latency because data are processed at sample speed, in very small chunks and with very short FIFO.
I was able to reach sub-millisecond latencies with Wiznet modules in a barebones configuration but i'm not sure it applies here.
It would be interesting to use RJ45+50ohms twisted pairs for connection between devices because the parts and cables are very widely spread, though there is always a risk of compatibility with other standards (and PoE could burn the devices). RCA and 75 ohms is used in some older video applications and is not expensive.
___________________
Back to the daisy chain.
The simplest configuration is a source device (like a keyboard) directly connected to a sink device (an expander).
The chain can be expanded by adding more sources (more keyboards, more knobs) upstream, and more sinks downstream. The ends of the chain allow this addition, though there is the risk of breaking it in the middle.
Performance recorders can be added at the end of the chain to capture (and replay) the stream, there can be sequencers as well... But most probably, a computer will act as a DAW at the ends of the chain :
- upstream : it will generate the Tick (beware of OS jitter !) and send configuration messages (label, ID etc.)
- downstream : it will be able to receive all the source data, massage and process the raw information, implement flexible synthesis...
-
Packet types
12/15/2022 at 20:32 • 0 commentsHere I will try to list all the types of packets I can think of, to allocate numerical identifiers.
Each data source, or device, can generate any type of packet. A device can generate "keyboard" and "knobs" packets, but can only have one set thereof. So if your device has 2 keyboards, you have to emulate 2 separate devices.
There are 16 bits in the header for the type and flags fields so each could take one byte at first glance, though there is no strict boundary. Flags could supplement the type and implement subtypes, for example, or an important type could use fewer bits and allocate more flags.
The 2 LSB are not reserved but that's where the compression flags are located :
- 00 : no compression, raw data
- 01 : 16-bit 3R
- 1x : reserved, could be LZW or others TBD.
Compression is optional and provided to save bandwidth so it might be ignored by some data sinks. That's why the source should send raw data once per second. Furthermore : compression does not work all the time and could eventually expand data, so raw transmission is always possible as a failsafe.
There are 4 big classes of packet types, using the 2 MSB :
- 00 : Instrument (actual useful data)
- 01 : Experimental (playground in the protocol, expect things to break)
- 10 : Reserved (leave it alone)
- 11 : Management (non-music data that keeps the system working)
As usual, if a packet type is not recognised or understood, it is ignored.
-o-O-0-O-o-
The Instrument class :
Type 0 is invalid (just in case).
- Knobs : just a collection of 16-bit values that could represent potentiometers, buttons, slide pots, ribbons, whatever : it's the most generic type.
Format : 16-bit size prefix, followed by as many 16-bit unsigned values (that may be compressed, see the compression flags).
The data sink is in charge of "patching"/associating each value with a meaning or function. - Keys : a string of 16-bit values representing the key's absolute position for all the keyboard.
Format : 1 byte of length prefix (the number of keys),
1 byte of offset (the position of the lowest note/key)
Followed by as many 16-bit unsigned values (that may be compressed, see the compression flags).
Note : no other flags than the 2 compression LSB are defined so there is a lot of room to play with. - Aftertouch : a shadow of the Keys type that adds pressure information. Could be redundant but it's available anyway. Same format as Keys.
- Bend : another shadow of the Keys for lateral pressure on the keys. A fun novelty. Same format as Keys.
.
Well, that's about it for this class, there is not much else to add, but there is room for extension. Knobs can represent about anything anyway, so other types would be required if a different data format is required, for example events for drums ?
.
-o-O-0-O-o-
The Management class
That's where the kludges are.
For convenience, the numbers are decreasing so I'll use increasing negative numbers at this moment.
-1 : Tick
It's just a header that is passed along the chain to synchronise all the other clocks downstream. No data payload. Upon reception :
- Make sure the time code is coherent (sample for 1 second ?)
- Update the local clock (and/or the soft PLL)
- Make sure that the Channel_ID does not collide (otherwise, rename yourself randomly and send a rename message downstream)
If no tick received within 1 second, send your own tick (8-16Hz ?)
-2 : Ping
This is another header with no payload, that is used to enumerate the devices and their IDs in a chain.
When a device receives this header, it must send its own Ping header before forwarding the one(s) it received. This is to prevent storms of packets, FIFO stuffing/overflow, and it also preserves the order of the chain, unlike other methods. The device must then ignore other Pings for a second or until it receives several packets that are not Pings.
-3 : Label
This packet allows getting and setting the "label" of a device (when possible). This packet can be emitted when the user changes the label on the device itself, so the DAW can update its display. But the DAW can also send this packet to inquire and/or change the label remotely.
When receiving a Label message, check if the ID matches, otherwise forward.
If the ID matches, check the set/get flag : if the flag is "set" then update the device's local label.
Then confirm by sending a Label message containing the device's label (1 size prefix byte and 0 to 255 UTF-8 bytes).
-4 : Serial
Get the read-only identification of the device : manufacturer, model, revision, serial number...
When receiving an empty Serial message with an ID that matches the device, the device sends a Serial message with a UTF-8 payload containing the detailed information.
More options or details could be obtained or selected using the Flags field. TBD.
-5 : Capabilities
This is where it gets somewhat MIDI-ish but capability enumeration is not strictly necessary because the available data streams are output anyway.
Maybe this could even be used to "tune"/enable/disable/configure each stream type from the device (sensitivity, max/min values, speed...)
-6 : Rate
Manage the update frequency, the relative priority for stream insertion...
.
.
Other possible type : Tunnel (to encapsulate other types of traffic, such as MIDI, files, network data...)
-o-O-0-O-o-
This numbering with 2 ends (management at the Max of the range, and instruments at the bottom) allows the numbering and attributions to grow towards the middle of the range, at their own pace, while allowing efficient table-driven decoding of the most frequent messages.
-
The project is renamed !
12/15/2022 at 02:20 • 0 commentsI didn't believe it but the deal is done !
"Noon" is the English word for "midi", but most good domain names for "noon" were already taken.
Only tonight did it occur to me that I could simply replace O with 0 and n00n.org is now mine.
Now I have to rename a few stuff but it's fortunately not a large corpus yet.
-
Header and payload checksums with PEAC
12/14/2022 at 23:32 • 0 commentsI have played with different ideas for header checksums for a long time. At one point I even considered Hamming SECDED but #PEAC is "just right" (and it works with 16 bits of granularity). Here I'll define the header, how to build it and how to check it.
Let's have a look at a description of the S/PDIF logic format : https://www.ni.com/fr-fr/support/documentation/supplemental/06/developing-a-spdif-input-module-in-labview-fpga.html (and kudos to #Propeller S/PDIF Receiver as well)
Sample data can use 16 to 24 bits but the packet structure is not absolutely strictly defined, the number of channels is left to the implementation. This could create an ambiguity in the ordering of the samples, although there is usually a good understanding of what is left and what is right : S/PDIF differentiates channel A (left) and channel B (right) in the header. The S/PDIF stream could be either mono or stereo, but an adaptation layer should properly reorder the samples (and pre-parse packets). Let's assume for now that we get a continuous stream of 16-bit words, that may be both channels interleaved OR a mono stream. These data could also come from a serial port or a UDP socket... It's just a stream. Just make sure you don't mix left and right or that a mono stream is not expanded to duplicated samples in a stereo stream, or the channels are shifted or...
In a "continuous stream" context, each packet/message should ideally be separated by a pair of 0 samples. If transmitting over UDP however, this is not required since the datagram provides a clear structure of the dataflow. Thus the header format does not include the extra leading or trailing 0s (they are a bit superfluous, maybe I am superstitious).
S/PDIF transmits LSB first and ChanA/Left first. The header must be aligned to this 32-bit boundary even though it could be in a mono stream, just pad with a 0 sample until the header is correctly aligned.
Let's now look at the header's structure :
typedef struct { uint16_t Sign1, // 'N', '0' Sign2, // '0', 'n' Timecode_Frac, Timecode_Sec, Channel_ID, Type_Flags, Payload_Words, Header_checksum; } N00N_header_struct;
To prevent Endian problems, I use only this 16-bit representation. Specific implementations could be optimised and process 32 bits at once but a reference is needed first.
Two fields Sign1 and Sign2 are fixed so they are not considered during the checksum. The last field is the checksum itself, the result, so only 5 fields remain. PEAC16x2 uses 2 additions per word, plus a finishing round, so that's about 12 arithmetic operations. It's not a lot so they can be unrolled. I started from the reference code of PEAC found in snippets_v3.c :
void CHKS(uint16_t m) { // extra caution with the extension of the sizes: CHKSC += CHKSX; // 0..10001 CHKSC += CHKSY; // 0..2FFFF CHKSY = CHKSX + m; // 0..1FFFE CHKSX = CHKSC; // 0..FFFF CHKSC >>= 16; // 0, 1 or 2 }
I unrolled, substituted some variables and values, et voilà :
#define PEAC16X2_INIT1 (0xABCD) #define PEAC16X2_INIT2 (0x4567) uint16_t N00N_header_checksum_ref2(N00N_header_struct* header) { uint32_t C, X, Y; C = ((2*PEAC16X2_INIT1) + PEAC16X2_INIT2) + header->Timecode_Frac; Y = ( PEAC16X2_INIT1 + PEAC16X2_INIT2) + header->Timecode_Sec; X = C & 0xFFFF; C >>= 16; C += X+Y; X += header->Channel_ID; Y = C & 0xFFFF; C >>= 16; C += X+Y; Y += header->Type_Flags; X = C & 0xFFFF; C >>= 16; C += X+Y; X += header->Payload_Words; Y = C >> 16; return (uint16_t)(C+X+Y); }
Total : 20 operations, half of them are pairable (ILP>2)
This is used both by the encoder and the decoder, so there is no discrepancy or code duplication.
From there it is also easy to deduce the corresponding code for the payload's checksum :
void N00N_payload_checksum(uint16_t* buffer, uint16_t* r, uint16_t* s, uint16_t Payload_Words) { uint32_t C, X=0, Y; C = Payload_Words + (PEAC16X2_INIT1 << 16) + PEAC16X2_INIT2; Y = C >> 16; C &= 0xFFFF; while (Payload_Words > 1) { Payload_Words--; C += X+Y; X += buffer[0]; Y = C & 0xFFFF; C >>= 16; C += X+Y; Y += buffer[1]; X = C & 0xFFFF; C >>= 16; buffer+=2; } C += X+Y; Y += PEAC16X2_INIT1; *r = (uint16_t)C; *s = (uint16_t)Y; }
Pro tip :
If you're not sure the code really is a PEAC algorithm, remove all the references to external data, and you get the typical "add X to Y and Y to X" mantra.
...
The inner loop is about 15 opcodes for 4 bytes, it's a bit slower than Fletcher but way more secure and more than half of the opcodes can be excuted in parallel. It's not the fastest ever but it remains simple, with very low overhead : there is no corner case to handle because the payload's granularity is 32 bits for the whole protocol. There could be a way to make it even better by loading the buffer's values 32 bits at a time but I won't go there yet, yet I know how modern CPUs don't like 16-bit memory accesses. However, I don't want to deal with Endianness and no mainstream programming language accepts to handle carries after an addition :-(
-
Even more drafting
12/13/2022 at 19:59 • 0 commentsThe last draft gave a broad outline of the system. Let's refine it.
The very first defining characteristic is that the data stream must be able to be transmitted over a SPDIF link. Hence the 32 bits per atomic data. This is equivalent to what a CD player would output but instead of sounds, the data would represent sensors' values.
- The stream could be played from a pre-recorded audio CD for example
- conversely the stream could be recorded to a CD, but not a MiniDisc because it uses lossy compression.
- The medium can reuse cheap TOSlink transceivers and fibres, or 75Ω RCA connectors/patch cables, but UDP over RJ45 could also work for encapsulation.
- The stream could be received and emitted by a DAW (Digital Audio Workstation) that recognises the specific type of data.
- Since it's considered as raw audio data by underlying components, it is transparent and could find its way in the web browsers thanks to the HTML5 Web Audio API. N00N could even encapsulate other one-way protocols over these links...
- SPDIF/Toslink transmits ancillary data in extra bits but they are not accessible to most data sinks. For example, track names can be sent by a CD player, but this is not available in the raw audio data stream. Hence ALL the management data must be transmitted "in band". Which is nice because this allows easy storage in a dumb computer file.
- Inactive state of the stream is "0" samples. They must separate the packets on a continuous stream (this eases parsing a bit).
Chaining
Like MIDI, the devices can have an input and/or an output. Devices can be daisy-chained (but there is no "Thru") so several instruments (keyboards, knob panels, pedals, expanders, mixers, whatevers) share a single stream than can be recorded by the last devices in the chain.
Of course it is possible to have multiple inputs in a multiplexer-like devices that will coalesce all the data into a coherent stream, but it will have to rewrite all the timecodes.
In a chain, the first device receives no information so it outputs its own timecode, which synchronises the rest of the chain. A timecode generator could be an independent device, or generated by a master DAW. The rule is simple though : if a device does not receive an external timecode in one second, it outputs its own local timecode, otherwise it follows and synchronises to the external timecode (using a software PLL for example). Of course, the received timecode must be valid, with like 3 consecutive timecodes with coherent values (they must increase monotonously at a reasonable rate).
The problem with chains is when/if a device in the middle is dysfunctional... or if one link is broken. This also cuts all the upstream devices.
Bandwidth
At the end of the chain, the last device receives the sum of all the streams generated upstream. The result might exceed the bandwidth available. Several techniques are borrowed from switched packet networks.
- A decent FIFO is recommended at the input of the device. 4KiB seems to be a minimum. Too large would create "bufferbloat" and losses in subsequent devices which don't have such a large FIFO. 16KiB is a decent value that I hope is never getting filled completely. BTW : ideally, FIFO size should not be required to be larger than the size of the packets that the device can send, but some margin can always help.
- If the input FIFO overflows, the next incoming packets are discarded entirely
- This is why each packet should be "standalone" and not depend on the next or previous packet.
- For a link with limited capacity, activity LEDs would indicate the FIFO usage and average bandwidth occupation.
- Eventually, the device could have "nice" settings such as refresh rate (10, 100, 1000Hz ?) and prioritisation of its own packets (like: drop X% of my own packets and 100-X% of the incoming ones)
- Devices closer to the end would mechanically have the higher priority because less chances to drop the data
Timing
The stream contains timecodes that may be ignored in real-time performance but allow recording and playing in plain files (as do MIDI streams already). The timecode format allows a simple and direct timeshifting (because it's plain fractional binary) and can wrap around when the range is exceeded (so you can record more than 9h of performance. Only timecodes that leap forward (within a decent window) are legal, otherwise they get discarded (unless too many are received during 1 second).
Well there is also the exception of a null timecode, value #0000, indicating "no internal clock", for example for dumb sources with no input. This is not recommended because the data/packets could be discarded and the sink (or the next device in the chain) could rewrite the timecode (though it's extra efforts that would be best spared for more useful things).
But there is the question of a source that does not increment its own timecode for a while, to send a large chunk of data for example, because it would clog the chain if the chunk exceeds the FIFO size.
Here, the timecodes also replace the "sequence numbers" found in other formats/protocols. This is due to several factors :
- There are more than one source and their streams get interleaved, and each device is a chain should not have to rewrite the packet header anyway,
- each source can have its own sequence number, but they get synchronised such that the time codes should not go backwards anyway
- Each packet should be standalone, fragmentation
This results in a shorter/smaller header.
Compression
Since all the packets must be stand-alone and no resend is possible, the compressed data can't rely on time-based deltas. The #Recursive Range Reduction (3R) HW&SW CODEC looks ideal for this situation. A baseline compaction/decompaction routine would be developed to handle 256 numbers of 16 bits. Negative numbers would be pairs of 16-bit numbers with one of them zero, and the result is the subtraction of these numbers (which can be positive or negative and 3R would manage the 0 value).
Data sinks
Since the stream is potentially asynchronous and lossy, the data sinks (the expanders, oscillators etc.) must correctly interpolate data at the internal higher sampling rate.
Any source could fall offline at any moment so failsafes, watchdogs, timers etc. must detect and correct "anomalous conditions". Proper operation should be resumed within 1 second. After all, you never know when your mate will trip on a dangling wire, and you don't want it to end the whole show.
Naming
The sinks "attach" or "listen" to a given channel, associated to a given source. Each channel has a UTF-8 label sent regularly by the sources or upon receiving a "ping"/"enumerate broadcast packet, so the users don't have to rely on numbers only. If a source receives a packet that contain its own channel ID, it must change its own ID to a different random one to prevent collision : this is a dynamic arbitration system, so the label is what really matters. Thus, as long as all the labels are different, you could plug any device anywhere in the chain and forget about low-level IDs, they would be random anyway (unless you decide to fix them statically). Eventually, a message could order a given channel to change its ID or label for convenience.
With maybe 16 bits for channel ID (leaving 16 bits for type and flags), there are low chances that 2 random IDs will collide. See the Birthday Paradox however. But with half a dozen devices in a chain, and a decent re-allocation scheme with decent entropy, the operation should be pretty smooth. Even crazy long chains should work, and bandwidth+latency will become the problem (particularly with a store&forward mechanism) long before ID collision becomes a critical thing.
Anyway, each source MUST have a means to input its UTF8 label. It could be a simple push-button on the front panel that will enable the reception of a rename packet command, or a full-blown keyboard, or whatever...
Length of the label could be "up to" 255 bytes though maybe 32 or 64 bytes could be stored by smaller implementations...
-o-O-0-O-o-
Wow, the more I write about it, the more it draws from 2 decades of design experience. The #PEACand #3R algorithms were designed for this sort of application and purpose, so I'm glad they all come together at last. The protocol does not look like Open Sound Control at all since OSC is a verbose JSON/XML-like textual format, while N00N is raw binary and lightweight to interpret with limited CPU resources. Maybe I'll succeed one day in implementing it in my #RD1000 ?
-
More drafting
12/02/2022 at 14:51 • 0 commentsThis project is on hold for a long time because for now #PEAC Pisano with End-Around Carry algorithm is a critical priority and it must be explored and examined thoroughly, completely, before I can move on. But ideas keep bubbling up...
I wanted to rename this format "NOON" because "MIDI" means noon in French. Whatever, the domain names are all taken. RTFM domain names are mostly taken as well so... I'll have to find a better definitive name. Later.
UPDATE 20221215 : n00n.org is now dedicated to the project.
All that aside, one of the main characteristics of the format is "it's a bit over the top". Let's allocate ample space because there is never enough of it once people start to use it. The most direct sign is its granularity : 16 bits per value, grouped in pairs, little-endian style, so the basic data transmission element is a 32-bit word. Just like a SPDIF stereo sound sample from a CD player (though SPDIF can increase to 24 bits if all are used). This eases alignment a LOT for modern CPUs and also helps with the checksum optimisation (fewer corner cases to manage).
Since we're using pairs of 16-bit values, the time code can become "sort of" an absolute time code, with 15 bits for the number of seconds, and a fractional 16-bit part (so it's a S15.16 fixed point value, 9 hours should be enough and negative times are allowed). It's a good compromise that evades the question of the sampling speed, and signal sources can use a 18.432.000 Hz clock source, divide by 5×5×5×3×3×3=1125 and get a clean 16384 ticks per second resolution. So clock drift is mostly avoided and sources with different clock speeds or sampling speeds can adjust easily. A 1pps signal could even synchronise all the devices if needed, to reduce inter-device drifts/biases.
Some of the packets can contain absolute, raw sample data, at least one per second, up to the maximum possible rate (64Ki per second ?) but bandwidth can be reduced through compaction and compression (looking at you #Recursive Range Reduction (3R) HW&SW CODEC)
At first the purpose is to transmit absolute key pressure in real time, so it's not a sound format, though other things than keys can be sampled, in particular knob or pedal positions. This means that each type or family of sensor must be on a separate "channel", which is again a 16-bit value. The companion half-word will be flags for the channel, as well as type (so each sink can "accept" a stream of the proper type, knobs will be used as a different "auxiliary" input while the keys are the main input)
So far we have packets split into a fixed-size header and payload, each protected by a word or half-word PEAC16x2 checksum.
Header
Pretty basic and general, not too compact but who cares today. After all it's not 32 bits per value right ? I try to keep the features down to the minimum : no complex fragmentation info for example.
- Prefix : "N00n" in ASCII for frame start (that's 32 bits, right)
- timecode in S15.16 signed fixed point format, in seconds. MUST be > the last one.
- CTS : channel, type, flags (such as compression) (should fit in 32 bits)
- Payload size (16) and header checksum (16)
That looks nice and square : 16 bytes or 128 bits fits nicely in a cache line. Checksumming the header can take a short, fixed amount of code with 6 additions (because the prefix is known so no need to checksum it).
The CTF (channel, type, flags) field can be flexible because one channel can accept multiple types so... Maybe we can define subtypes or type ranges. There is a bit of wiggle room, and uncertainty.
Note : the header has its own shorter checksum to prevent corruption of the size field. Along with the "fixed" prefix, there are 32+16=48 bits for the parser to perform a first rought validation of the header. The timecode must be consistent too, and the CTF must also be coherent.
Payload
OK here it gets a bit more complex...
- the payload size field is in which granularity ?
- each "type" of data packet has their own peculiarities and format.
What is sure is that the payload gets a full PEAC16×2 checksum postfix (32 bits) which is independent of the header. It's the same algorithm but it's not unrolled/inlined like the header. Or it can reuse some code from the header to handle blocks of 16 bytes / 4 words.
Can a payload be empty ? Well, sure, why not, so the header is used as a "hearbeat" for example. Like a clock pulse. Then the size field is 0.
Otherwise the count is in 32-bit words granularity, excluding the checksum word. This allows block transmission of 256Ki bytes chunks if needed. Which is a lot, sure. Ideal average max size would be 4K bytes or 1K words. But anyway : this is the abstract format that can be packetised or fragmented in any way you want (over SPDIF, UDP, pigeons, whatever).
Keys
The keyboard's key positions are transmitted as a string of as many 16-bit values. There is a prefix too : the first value is an offset, or the index of the first key, to enable transposition. Beware of buffer overflows, of course packets that contain too many values must be rejected.
__________________________________________________________
Overall this is still quite simple, rough but sufficient to transmit asynchronous streams of data. The stream can be played by a keyboard, recorded, replayed, re-allocated, without putting too many constraints. The new/latest trick in the bag being the fractional time code, which allows more seamless temporal operations/edition. I hope it can be implemented in my #RD1000 mod one day.
Part of the simplicity comes from the one-way nature of the link. Unlike TCP for example, there is no notion of feedback, of management, of connexion to establish or break : just shoot the packets and count on the redundancy and oversampling to deal with the losses/alterations. Just throw packets away, it's much less critical than in an audio stream or a general/generic digital communication protocol, and the source does not need feedback.
-
16 bits
06/17/2021 at 01:12 • 0 commentsMIDI was awesome for its time and perfectly suited for the 8-bit microprocessors of the era (Z80, 6809, 6502, you name it). Now we're in 2020 and the interfaces and electronics have evolved !
OK, Open Sound Control uses 32 bits but I'm not wanting to waste bandwidth for the sake of it. The motivation is mostly because I want to use SPDIF which uses pairs of 16 bits values and reasonably fast ADC easily handle 16-bit samples (hey, that's CD quality !) and beyond this resolution, it's more noise than anything. And the https://www.wiznet.io/product-item/w5300/ has a convenient high-speed bus with 16 bits.
Of course, if only 8 bits make sense for something, that's fine and bytes will be packed.