05/20/2021 at 06:54 •
It's been a long time since I worked on this and my other projects. It's been very difficult to find time and space to work on things. But in the last few evenings I was successful in making small steps to achieve significant goals to advance some of my projects. Such as this one.---------- more ----------
TXX, We Love You, but...
As you may remember, TXX is a module that I wrote to allow the Propeller 1 to generate data in various formats to a serial port at high speed, up to 8 megabits per second. It started here in this project because I needed a way to generate fast data logging, but it grew out into a fairly significant reusable module which was well-received in the Parallax forums. I even uploaded it to the OBEX website where Parallax previously made user-submitted projects public.
However, OBEX doesn't exist anymore (Parallax moved their user-submitted projects to GitHub) and though I can't find my TXX protect in there, I realized that it wasn't that easy to find it in my own GitHub repositories either: it was in the PropSPDIFDecoder repo, in a branch that kind of dead-ended, and that branch wasn't the default branch in the GitHub web-gui so you had to know where it was.
So I figured out how to do a filter-branch on a clone of the repo, and start a new repository with just the TXX module (and it's demo module) in it, so it could have its own rightful place. You can find it at https://github.com/jacgoudsmit/TXX.
I pruned the dead-end branch and the txx project away from the PropSPDIFDecoder project. I will still use TXX as part of the project, but now I can just insert it as an external. That might give some trouble with the Propeller tool which doesn't like to import modules from other directories, but I'm sure there are ways around that.
Still Struggling with Subchannels
I've been thinking for a long time what the best way would be to get the subchannel data from an SPDIF signal from a DCC recorder into a PC or into another microcontroller. I thought of serializing the subchannel data by itself on the Propeller 1, and sending it over the serial port in blocks. Maybe even use multiple serial ports for the User and Channel subchannels.
But then I thought: "Why not both?" With TXX it should be possible to forward all the SPDIF data to a PC and a PC could process it.
Snag 1: Bandwidth
The first snag to hit that plan is that it's apparently really difficult to find USB to serial converter chips that can be used with baud rates higher than 3 megabits per second. And of course the FTDI chip that Parallax puts on their FLiP modules is no exception and can't be easily replaced anyway, so if someone would want to replicate my circuit, they wouldn't be able to use the FLiP which would make it much harder to build the circuit on a breadboard.
3 mbps is pretty respectable for a serial port but it won't do for a 48kHz stereo SPDIF data stream. For that you need roughly 3.072 megabits per second: 48000 subframes per second, times two for stereo, with (for simplicity's sake) 32 bits per subframe. 48000 * 2 * 32 = 3072000. And that's continuous traffic, not including things such as start-bits, stop-bits and time for the Propeller to set things up in the TXX cog. So yeah, that's not going to work.
Snag 2: Synchronization
A second snag is that serial ports transfer their data as bytes, not as 32 bit words like SPDIF. Imagine we would somehow find a way of sending the 32 bit subframes as 4 bytes each without the bandwidth problem; it would be almost impossible for the receiver to reliably determine where each subframe starts, unless there's some sort of synchronization protocol. A 9-bit scheme where the first bit would indicate the start of a frame or subframe would be possible if another microcontroller is the receiver, but is pretty much impossible with a USB to serial converter.
So I thought about possible compromises. Let's see what kind of data is available in each subframe:
- There are potentially 24 bits of audio data. However, no DCC recorder generates more than 20 bits of audio data.
- There are two subchannel bits (User data and Channel Status), however the Channel Status subchannel only requires half the bandwidth: one subchannel bit per frame, not subframe.
- There are two bits that indicate whether a subframe is usable: The Parity bit and the Validity bit. But the transmitter can easily just mute the audio when it detects that either of these two bits is wrong.
- The preambles help us distinguish between the left channel and right channel, and indicate the beginning of a block, once every 192 frames. Internally, we store these as two bits in the subframe (and we store them twice so as not to influence the parity, but that's irrelevant).
There's some redundancy in there, and I thought it would be great if it would be possible to transfer each subframe as 3 bytes instead of 4 bytes in order to reduce the required bandwidth by 25%, and then switch the bits around a bit so that, for example, the most-significant bit of each byte could be used for synchronization. If I reduced the audio to 20 bits of a 3-byte frame, it would leave 4 bits per subframe that could be used for other data and synchronization.
I thought about setting the first msb of a 3-byte subframe to 1, and setting the last msb of a subframe to 0, so that the synchronization code could look for a byte with a cleared msb followed by a byte with a set msb. But on second thought, I didn't like that idea. I didn't work it out any further but let's just say I had a funny feeling that that wasn't going to work very well. Besides, it wouldn't leave enough space to put the other bits, unless I would reduce the audio data even further.
But if I would design a protocol that was a full frame (i.e. samples for the left and right channel), in 6 bytes, it would make things a little easier:
- There would not be a need to distinguish between a left subframe or a right subframe, because if the receiver could sync to a frame, it would know that the left subframe is always the first 3 bytes and the right subframe is the last 3 bytes.
- The beginning-of-block marker and the Channel Status only need to be transferred once per frame,
So I came up with the matrix as shown above. I may make some changes later on depending on how practical it turns out to be once I implement it. As a matter of fact, I've already thought of a change to make synchronization easy:
0 User Left 19 Left 18 Left 17 Left 16 Left 15 Left 14 Block Left 13 Left 12 Left 11 Left 10 Left 9 Left 8 Left 7 0 Left 6 Left 5 Left 4 Left 3 Left 2 Left 1 Left 0 Ch.Status User Right 19 Right 18 Right 17 Right 16 Right 15 Right 14 Validity Right 13 Right 12 Right 11 Right 10 Right 9 Right 8 Right 7 1 Right 6 Right 5 Right 4 Right 3 Right 2 Right 1 Right 0
The protocol consists of 6 bytes per frame. The lowest 7 bits of each byte contain audio bits as well as the User subchannel bits. The most significant bits contain the once-per-frame bits: The block detection and the Channel Status subchannel.
The first byte has its msb cleared to 0, and the last two bytes of a frame have their msb set to 1. The way the other bits are arranged in the msb's of each byte, makes it so that the beginning of a frame can easily be detected by finding the first byte that has its msb cleared, after at least two bytes that have their msb set.
All of this is not implemented yet, of course. I just wanted to get it out here because I'm excited that the project can make progress again after it got a bit stuck. I think it's a viable solution for getting SPDIF data including subchannels out of a DCC recorder (or another device), and into a PC for further processing, through a common interface that uses chips that are common and easy to get, and very few passive components. I'm looking forward to sharing recorded subchannel data with fellow DCC enthusiasts and especially those hackers (you know who you are!) who are working on ITTS decoders and are eager to get their hands on some real data, even without the need to reproduce my SPDIF decoder hardware or even the need to own the DCC's that contain the data that they want to work with.
01/17/2018 at 05:32 •
I added a module to the project which is capable of generating fully formatted serial output. This is not really directly related to S/PDIF receiving of course but it helps to have a module that can quickly send debugging output to the serial port, e.g. for debugging the subchannel decoder as I tried to do in the previous log. And this should definitely be fast enough: the theoretical bitrate is a whopping 8 megabits per second, though the measured throughput at 3mbps (the highest speed that a Prop Plug and the Propeller Terminal will allow) is about 250,000 characters per second -- still pretty respectable compared to the 115200 maximum bit rate of the Full Duplex Serial module from the Parallax library.
By "fully formatted", I mean:
- Text: nul-terminated strings or fixed-length arrays of characters stored in the hub, unfiltered or filtered (i.e. unprintable characters replaced by a period)
- Numbers: bytes, words or longwords (or arrays of any of those), in decimal (signed or unsigned), binary or hexadecimal.
- Memory hexdump: combination of the address, hex bytes and filtered ASCII in the usual format address/hexdump/filtered-ascii
The module can be easily controlled from Spin or from PASM: commands are passed through a single longword with bit fields. All you do is wait for the longword to be 0 (indicating the cog is done with the previous command) and set it to the value that represents the new command.
I called the module TXX.spin and uploaded it to the Parallax Object Exchange (OBEX), at
http://obex.parallax.com/object/870(Update: OBEX is no longer available. You can get TXX from my Github repo at https://github.com/JacGoudsmit/TXX). I also wrote a post in the Parallax forums at https://forums.parallax.com/discussion/167981/txx-8mbps-serial-transmitter-with-extended-features-for-use-by-pasm-and-spin-code.
01/09/2018 at 09:08 •
It's been a while since I worked on the Propeller S/PDIF decoder, and there are really two reasons for that.
One reason was that I was having trouble wrapping my head around what the best way would be to get the subchannel data out of the subchannel decoders. The other reason was that I realized that the maximum speed (115200bps) of the FullDuplexSerial module from the Propeller library would probably not be fast enough to keep up with the subchannels.
An audio CD plays 44100 frames of audio per second, each divided into two subframes: one for the left channel, one for the right channel. Besides the audio information, each subframe also has two bits that are used to store and transmit extra information about the CD, such as track markers. Each subchannel can be regarded as a bitstream that's multiplexed into the main data stream, so the subchannel data is transferred at 88200 bits per second for each subchannel on a CD. On a DAT tape that's recorded at 48kHz, the bitrate for the subchannels is 96 kilobits per second for each subchannel; on a DAB tuner that generates 32kHz audio, the subchannels run at 32 kbps each.
There is some interesting information in the subchannels, which is what I want to get to in this project. But even though the subchannels generate a lot less data than the total volume of data coming in through S/PDIF, the maximum of 96 kbps is still a lot of data.
When I wrote the subchannel decoder module for the project, I was mostly focused on demultiplexing the subchannel bits, and putting them in memory in some efficient way to analyze them by comparing them to known values, or whatever. Though the timing of the subchannel decoder is not nearly as tight as the biphase decoder, there's still too much to do to let the subchannel decoder take care of all the decoding. Besides, each type of medium encodes the data in a different way and I wanted to be able to use the module for the Channel Status subchannel as well as the User Data subchannel.
For synchonization, the subchannels are organized in Blocks, and at every start of a block, the transmitter uses a special preamble. I wrote the original subchannel decoder to gather up all the bits in a block, and then copy all those bits to the hub at the end of the block. I used a counter to make it possible to see if a block didn't get decoded fast enough.
Subchannel Decoder Rewrite
The original implementation of the subchannel decoder turned out to not be very efficient or convenient. I decided to do a partial rewrite based on the following:
- Instead of writing an entire block at the end of an incoming block, the code now copies the subdata to the hub one longword at a time. 32 divides evenly into 384 and 192 (the number of bits per block) so this is convenient.
- Because of this, it was no longer possible for other cogs to recognize whether an incoming block of data was ready for processing. To fix this, I changed the code to use two buffers instead of one. One buffer gets written by the subchannel decoder while the other buffer is processed further, elsewhere. Also, to indicate that a buffer is ready for processing, I made the code use two locks (one per buffer).
The subchannel module makes the pointers to the buffers and the lock numbers available, which makes it relatively easy for another cog to process the buffers using Spin or PASM (though Spin is likely to be much too slow for all but the simplest decoding). Such an analysis module (which will be written in the future) would:
- Set the lock for a buffer and check if the lock was taken. If no, repeat 1.
- Process the buffer
- Optionally set the lock again to see if there is an overrun situation
- Switch to the other lock and the other buffer.
- Repeat from step 1.
Improving the Output Stage
The second problem that I wanted to deal with, was that I had run into a bit of a wall: at 48000kHz, each subchannel bit comes in at 96kbps, and if I wanted to do a hex dump of a subchannel, I would have to deal with dropping lots of data because the serial driver that's part of the Propeller library just isn't fast enough: 115200bps.
Fortunately I found an open-source module on OBEX (the Parallax Object Exchange website where programmers could store their open-source projects). It's a module called tx.spin, written by a guy called Barry Meaker. It's a module with an Assembler routine that can only transmit (not receive) but at very high speeds. He claimed speeds of almost a megabit per second, but with a few simple optimizations I actually got it up to 4 megabits per second. Now we're talking!
The module is heavily specialized towards transmitting null-terminated strings, and if you want to print a decimal or hexadecimal number, the Spin subroutine basically prepares a buffer and then stores a pointer that is picked up and printed by the PASM code. Afterwards, the PASM code resets the pointer to let other cogs know it's available.
I want to change this (EDIT: Done, see next log) so the pointer changes into a command with a buffer and a length. Commands can be:
- Print a nul-terminated string as before (no length needs to be given in the command)
- Print a buffer with a given length (that way all possible values can be printed)
- Print a single character
- Print a decimal number (I just discovered there's an easy algorithm to convert binary numbers to BCD in a few assembler instructions, called the "double dabble" algorithm)
- Print a hexadecimal number
- Do a hex dump of a buffer
The last command in that list should be particularly useful to decode the bits in the subchannel streams and analyze the values showin in the terminal window. Then at a later stage, I can add some code to do the analysis and print interesting information on the terminal at a speed that's high enough to make it possible to keep up with all the subframes.
(Note, the source code for these changes is in a new branch "subchannel_rewrite" for now. The changes will be merged back to the main branch as soon as they do something useful. Currently the code doesn't compile successfully because the subchannel decoder API changed. As soon as I get, say, a hexdump of a subchannel to work, I'll merge it back to the main branch).
06/23/2017 at 13:22 •
This is what the Channel Status subchannel looks like in binary:
The code that I used for this output prints a block counter, followed by the Channel Status subchannel in binary (chronological order). As you can see, it contains the expected data: 11000011 in the second byte, because I connected my Propeller to a Digital Compact Cassette recorder which uses category code 110 0001L and this is a prerecorded tape so L=1. The rest of the data is all zeroes, which is a little boring but this at least demonstrates that I understand how block decoding and subchannel demultiplexing works.
To get this output, I created a quick-and-dirty copy-paste module based on the status channel decoder, which doesn't just copy the status channel bits but entire blocks of subframes. This is what full blocks might look like in hexadecimal (actually the following output only shows the subframes for the left channel):
The Channel Status subchannel is bit 30 of each subframe, in case you want to dig through the hexadecimal and make sure it checks out :-)
So as part of narrowing down the problem with my subchannel decoder which only produces zeroes, I've proven that the data is there and has the expected format. Obviously there's something wrong with the PASM code that extracts the bits and sends them to the hub.
I must be overlooking something in the PASM code or I'm misunderstanding how Spin works to copy the data.
More research is needed :)
UPDATE: Now we're talking!
I found out what the problem was with the subchannel decoder: there was a rogue JMP in there that made another instruction unreachable that was needed to increase the destination address of the rotate-instruction. So yeah I saw a lot of zeroes because I had bit storage in the cog initialized to 0 and the $C3 in the second byte wasn't showing up because the first longword (that has the $C3 in it) was overwritten by 5 other ones by the time I got to see it.
I rewrote the status channel module to be more universal (it can now also be used for the user data channel) and it looks like that works. Yay!
However: remember I was hearing horrible distortion in the audio generator sometimes, that I could eliminate by resetting the Propeller? It looks like this is not a problem in the audio module, but in the biphase decoder. Apparently sometimes (and with sometimes I mean: way too often) it somehow locks onto the signal in the wrong way. I'm going to have to fix that because it's crucial for that to work. I must have overlooked some sort of corner case. I'm pretty confident I can fix it with a change to the initialization of the biphase decoder and/or preamble detector.
06/21/2017 at 07:24 •
...But it doesn't work.
The Status Subchannel is a group of 192 bits that's continuously transmitted as part of the S/PDIF signal, interleaved with the audio. For every stereo sample, one bit of the status subchannel is added to the stream.
There's supposed to be some interesting information in this subchannel, for example it can indicate what the type is of the device that the sound is coming from.
But as you can see in the above screenshot, I'm not getting ANY data from my player. I'm not sure why.
The User Data subchannel should prove more interesting for my "secret" final purpose of this project. For starters, it has twice as much data because there is a separate bit of information for each audio channel, and I know from looking at the logic analyzer earlier that there's definitely data in there. But when I changed the Assembly instruction that pulls the Channel Status bit from the subframes so that it puts the User Data bit (or any other bit for that matter), the output of the hex dump stays zero, though I've seen some glitches where I had some random data. That probably means I'm trashing memory somewhere...
Well, it's night night time, I'll have to do a thorough code review tomorrow.
PS: By the way, I decided that modifying the biphase decoder to write the subframes to a block of memory in the hub was too much work. The audio player is easiest to implement when it can just wait for the next subframe by checking PRADET, and it's not too difficult to implement the subchannel decoders that way too. And for other future purposes (CD+G decoders, I2S output generators, etc.) it's also not that hard to just wait for a single sample. Other advantages of this method are that the propagation delay stays low (one subframe delay instead of one block delay) and it keeps things easy for the Spin parts of the code, so I don't have to wrestle with partially filled circular buffers and other stuff. The buffer is one single longword and the synchronization method is the PRADET signal, and that's good enough for pretty much everything. If necessary, I can always use another cog to serialize each block of samples somehow.
06/18/2017 at 08:12 •
What's this, another change to the hardware? Well... Yes and no. I like the new Parallax FLiP for breadboard development but I wanted to do something today that actually makes the S/PDIF decoder do something that's (arguably) useful, so I connected a QuickStart board with a Human Interface Board for the Quickstart. The HIB board has filters and a connector to connect stereo headphones (just like the Propeller Demo board, by the way) and I wanted to write a small module that just grabs samples from the Biphase decoder and sends them to the headphones.
It took me a while to get it to work, because apparently I had made a mistake and passed the pointer to a parameter variable (which was already a pointer) instead of passing the parameter directly.
Because of this, I now have a sort-of sanity check that lets me know that the biphase data is decoded and processed correctly (well... at least the audio part of the subframes).
The next step will be to change the Biphase decoder to write the data into a buffer instead of a single long word. This should come in useful for extracting subchannel data. And that's what this project is all about.
06/12/2017 at 06:51 •
The idea that I presented in my previous log, to count pulses (instead of measuring time between pulses, or sampling for a second pulse in the middle of a bit) works great!
The Biphase Decoder now consists of two cogs that take care of all the following:
- Decoding the biphase bits in each subframe to regular binary values
- Detecting the preamble to detect the start and end of each subframe
- Decoding the preamble type to distinguish left and right channel samples
- Decoding the preamble type to distinguish the first subframe of a block from all the other subframes
- Storing a LONG word into the hub with all the information above, at the end of each subframe.
That took quite a few smart tricks with Propeller Assembler (PASM). For example I found out that it was more efficient to set the channel output pin to 1 for the left channel and 0 for the right channel instead of the other way around, and I found out that it was more efficient to decode the biphase bits in one's complement.
Let's have a look at some of the important parts of the code; for the full story, check out the source code on Github. The file biphasedec.spin has a lot of documentation at the top.
Two Cogs to Decode Biphase
When I started on writing code to count pulses instead of measure time intervals, I thought it might be possible to detect preambles in the same cog as the biphase decoder. I quickly discovered this was going to be very difficult or impossible. It's much easier to just leave the preamble decoding to a separate cog. So now there are two cooperating cogs: the Biphase Decoder cog and the Preamble Detector cog.
As a reminder, the XORIN input is produced by a small circuit that combines the S/PDIF input with a slightly delayed version of itself with an XOR port (see the diagram above, schematics are in previous log entries and in the source code).
The Biphase decoder decodes the bits by letting a timer count pulses on the XORIN input while the code stays in sync with the bit clock: There's always at least one pulse per bit, but there are two pulses for all bits whose value is "1".
The Preamble Detector uses a timer in NCO mode (Numerically Controlled Oscillator) to detect preambles. The timer is configured to make the PRADET pin go high when a long pulse (characteristic for preambles) comes in, and the code resets the timer at the beginning of each bit, unless the timer went off already. When a preamble is detected, the Preamble Detector decodes the type of the preamble. Then it reconfigures the timer again so that the PRADET output will go low again just before the XORIN pulse that marks the end of the preamble. The detected preamble type is then encoded on two output pins: BLKDET (BLocK DETect) and LCHAN (Left CHANnel).
The Biphase decoder tests whether the PRADET (PReAmble DETECT) pin is high when it starts processing a bit. If so, it "knows" that it's done with the entire subframe, and it stores the value in the hub. Then it waits for PRADET to go low again, which happens just before the beginning of the bit after the preamble (bit 4 of the subframe). Then it starts counting pulses again like before.
Biphase Decoder Cog
The following diagram illustrates the algorithm of the Biphase Decoder cog:
The Biphase cog sets up timer A of its cog to run in POSEDGE mode for the XORIN pin. That means that every Propeller clock cycle, the timer hardware checks if the XORIN input went from low to high, and if so, it increments register PHSA.
The code is designed to synchronize with the bit clock by executing 5 regular instructions of 4 Propeller clock cycles each, followed by a WAITPxx instruction which takes at least 6 clocks, to wait for a pulse on the XORIN input. Those instructions take a minimum of 26 clocks (325ns at 80MHz).
The purpose of the code is to test PHSA immediately after the WAITPxx instruction to see if the current count is even or odd. If the oddness changed since the previous bit, it means a 0 was encoded in the previous bit; if the oddness didn't change, it means there was an extra pulse, so a 1 must have been encoded.
Even Loop and Odd Loop
So the code needs to determine whether the value in PHSA went from odd to even, even to odd, even to even or odd to odd between the start of the previous bit and the start of the the current bit. It's easy to test for oddness by testing bit 0 of PHSA and storing the result in one of the flags, but there are no instructions to process one flag into another flag in a way that's useful for this.
So instead of storing the oddness, I wrote two loops: the Even Loop and the Odd Loop. The Even loop is executed when PHSA was even after the previous bit, the Odd loop is executed when PHSA was odd after the previous bit.
Both loops start by testing the new oddness of PHSA. If the count is odd, the Even loop should rotate a 0 into the result, whereas the Odd loop should rotates a 1 into the result. Then, depending on the new oddness, execution should jump to the Even loop or the Odd loop.
The Even Loop happens to be at the top of the code, so it can conditionally "fall through" to the Odd loop to save 4 clocks (because no JMP is needed). At the end of the Odd Loop, there is an unconditional JMP that jumps to the conditional JMP at at the end of the Even loop. This way, the number of executed instructions in the Odd Loop is identical to the number of the Even Loop, regardless of the current oddness of PHSA.
The Biphase decoder also tests whether the PRADET (PReAmble DETECT) pin (whose signal is generated by the Preamble Detector cog) goes high. When that happens, it jumps to the preamble code to store the value in the hub (see below).
There's a problem with this: The code uses the Carry flag to check the oddness of PHSA so that it can use the RCR (Rotate with Carry Right) instruction to store the decoded bit into the data that will eventually be stored in the hub, but the Even loop would need to store a "1" when CF=0 and a "0" when CF=1, and there's no time to add an extra instruction to invert the bit in the Even Loop.
On the other hand, the Odd loop doesn't have to do the test for the preamble, so though it has the extra JMP (to ensure that execution time is always the same), there is time in the Odd loop to reverse the bit after it gets rotated into the result data longword.
This made the solution very simple: keep track of the result data in one's complement. The Even loop doesn't have to reverse the Carry flag that way, and the extra instruction to invert the bit after shifting the Carry in, can be in the Odd Loop. When the code processes a preamble, it can reverse the data again before posting it to the hub.
The Even Loop checks the PRADET pin to check if a preamble is coming in. If so, it jumps to a separate section that encodes the detected preamble type and stores the result in the hub as a long.
PRADET is activated just before the end of the first long pulse of a preamble (see the Preamble Detector Cog section below). It goes low again just before the start of the first bit after the preamble. The Even Loop doesn't detect the preamble until the first XORIN pulse inside the preamble occurs, and then it has to get all its processing done before the end of the preamble, which is 814 ns later (worst case). Let's see how the preamble code in the Biphase Decoder cog has just enough time to get its work done.
- The Biphase decoder cog has 814ns from the first pulse in the preamble to the end of the preamble. That's 65 Propeller clocks (at 80MHz).
- It takes 3 instructions (3*4=12 Propeller cycles) to get from the Even Loop into the actual Preamble processing code. That leaves 65-12=53 clocks.
- The code needs to undo the one's complement encoding of the data, and encode the channel and the start-of-block flag into the result longword. We'll get back to this in a minute.
- Then it needs to post the result longword into the hub. The WRLONG instruction to do this is a hub instruction, which may take anywhere between 8 and 23 Propeller clocks. That leaves 53-23=30 clocks.
- To synchronize with the bit clock again at the end of the preamble (and the beginning of bit 4 in the subframe), the preamble processing code falls through to the WAITPxx at the beginning of the Even Loop. But before it's safe to wait on XORIN again, we first have to make sure that PRADET is low. It's okay if we're too late to wait for PRADET to go low, as long as we're waiting for XORIN before the pulse at the beginning of bit 4 happens. Each WAITPxx instruction takes a minimum of 6 clocks, so that leaves 30-(2*6)=18 clocks.
The Preamble detector generates two signals on the Propeller external pins, so that the Biphase detector can decode them and encode them into the result longword quickly. One pin is used to indicate which audio channel the subframe is for, and one pin indicates that the subframe is the first in a block. We could encode these bits into the data with two TEST instructions and two MUX instructions but then we don't have time to XOR the data with $FFFF_FFFF to undo the fact that the Even Loop and Odd Loop store one's complement bits into the data.
In a previous setup, I encoded the the channel as Left=0 and Right=1. But it's actually more efficient to encode Left as 1 and Right as 0 because it makes it possible to test both signals on the INA port at the same time using the WC as well as WZ modifiers of the instruction (otherwise, two separate TEST instructions would have been needed to get a non-ambiguous result). With those two modifiers, the Zero flag gets set if the test result is zero, and the Carry flag gets set if there is an odd number of ones in the result. So depending on the two pins (whose signal is generated by the Preamble Detection cog), we get:
- ZF=0 and CF=0 if LCHAN high and BLKDET is high (Left, first in block)
- ZF=0 and CF=1 if LCHAN high and BLKDET is low (Left)
- ZF=1 and CF=0 if LCHAN low and BLKDET is low (Right)
- Any other combination is impossible.
The code encodes the channel and the block detection into the 4 bits of the result longword, two bits at a time so that the parity stays even.
It can use a simple MUXNZ (or MUXNZ) instruction to encode the channel into the longword, but to encode whether BLKDET was true or not, it has to do a conditional MUXC or MUXNC (otherwise it would look as if BLKDET is true for all right channel subframes as well as some left channel subframes. So to encode the BLKDET bits, it must use a conditional MUXC/MUXNC which would mean that the BLKDET bits would be undefined for the right channel because the instruction wouldn't be executed.
So the code sets the bits for the channel and BLKDET first, and the values of the bits are chosen such that the value for the right channel is equal to the value that stands for "no BLKDET". Then we can conditionally encode the BLKDET bits (again) to overwrite them if we're working on the left channel
So to encode the bits from the INA register (generated by the Preamble Detector cog), we need 3 instructions (one TEST followed by an unconditional MUX instruction and a conditional MUX instruction. So a total of 4 instructions (including the XOR to undo one's complement encoding) to finalize the data before posting to the hub. That leaves 2 clock pulses. So the preamble code in the Biphase Decoder cog is done right on time. Nice!
The Preamble decoder is in charge of detecting preambles in the first place, and making sense of the various preamble types.
Every subframe starts with a preamble, and there are three possible types, all of which start with a long pulse.
- The B preamble indicates that the following subframe is for the left channel, and also marks the beginning of a 384 subframe block (which is important for subchannel decoding)
- The M preamble indicates that the following subframe is for the left channel but isn't the start of a block
- The W preamble indicates that the following subframe is for the right channel
The Preamble Detector cog uses the following algorithm:
We use two timers to decode preambles:
- Timer A counts pulses on the XORIN input, just like timer A in the Biphase Decoder cog (each cog has two timers which can only be used by their own cog). We use this to distinguish between a B preamble and the other preambles, because the B preamble has two short pulses in the middle and the others don't.
- Timer B is used as an NCO timer that uses PRADET as output and is configured in such a way that it sets that output high after a little more time than 2*t (where t is the minimum pulse length that we have to deal with, which is the time that two pulses in a biphase encoded "1" bit are apart). We use this to detect the start of a preamble, and to distinguish between the M preamble and the other preambles, because the M preamble has a long pulse in the middle as well as at the start.
The Preamble detection code repeatedly waits for a pulse on XORIN to stay in sync with the bit clock, and executes 5 normal 4-cycle instructions in between to make sure that no false triggers happen, just like the Biphase Decoder cog. During normal incoming bits, the NCO timer doesn't go off before the end of the bit, so PRADET stays low. During each loop, the code resets the timer unless it already went off. It also stores the pulse count increased by 2. This code repeats itself unless the NCO timer went off.
All of the above happens after the end of the first long pulse, and takes exactly 5 instructions. The code then does a WAITPxx to wait for the next pulse.
- For B preambles, this happens after the two short (1*t each) pulses in the middle of the preamble
- For M preambles, this happens after the long pulse (3*t) pulse in the middle of the preamble
- For W preambles, this happens after the single short (2*t) pulse in the middle of the preamble
So to decode the preamble, the code checks the counter against the previous counter value plus 2, and it compares the elapsed time in the NCO phase register against a known value. Depending on the results, the LCHAN pin is set to 1 for B or M preambles, and the BLKDET pin is set to 1 for B preambles.
Generating the BLKDET and LCHAN outputs takes just a little too long to be done before the preamble is over. So while it's processing the above, the code resets the NCO timer to make PRADET go low. At that time the Biphase decoder knows that the preamble is almost over and it's time to wait for the next bit pulse for the start of bit 4. But the Preamble Detector cog still has to do some work, including putting one of the flags onto an output pin. It's not a problem that the BLKDET and LCHAN pins get updated so late because the pins aren't probed until the end of the subframe.
The code now contains enough functionality to decode the biphase data and store each subframe (with additional bits set for the channel and for block markers) in the hub at a specific location. But the location never changes, so any code that wants to use the data probably has to be written in PASM just to keep up.
I tried an experiment to change the Biphase Decoder cog so that it stores the subframes in an array of 384 longs (one block), but it takes too many instructions to get it done before the end of the preamble. However, the good news is that it doesn't need to be done by the end of the preamble!
The Biphase Decoder cog is basically always one bit behind, and while bit 4 (the first bit right after the preamble) is arriving, the code basically tries to decode bit 3 which doesn't actually exist: because of the way we store our state, it always decodes bit 3 as zero, but that doesn't even matter: once the Biphase Decoder encodes the BLKDET and LCHAN signals into the longword, bit 3 gets overwritten anyway.
That's a waste of time. So the plan is to modify the Preamble detector cog to turn the PRADET signal off a little later: not before the start of the first bit, but before the end of the first bit. That gives the Biphase Decoder cog enoug extra time to update pointers, buffers and counters to fill a buffer with subframes.
The Preamble detector cog doesn't have a whole lot to do while there's no preamble to decode. But it does need a carefully crafted timing constant. I'm thinking of trying to add code to that cog to automatically determine the timing constants while it's not time to decode preambles yet.
But I'm also thinking I should write some demonstration code based on what I have now. A VU meter cog that works with LEDs on output pins, or maybe a PWM based DA converter that plays the digital data to a speaker through a single pin.
06/03/2017 at 22:02 •
I haven't had time to work on this for a few days but I wanted to share a new idea of decoding the biphase input with the Propeller timers, and this picture:
The photo shows a minor change of venue: I rebuilt the schematic on a different breadboard using the new Parallax FLiP module. This is a great new product ftom the producers of the Propeller (I'm not affiliated, just a fan as you may have noticed) that offers an entire Propeller circuit including USB to serial converter, 5 MHz crystal, and a great flexible power supply on a board that's the size of a DIP Propeller. They even had space to solder two LEDs onto pins 26 and 27 which can help greatly with debugging. Using the FLiP makes it much easier to put together a Propeller circuit because (unlike with the PE kit) you don't have to mess around with the power supply, the crystal, and/or a prop plug.
Anyway, about that new idea I had: As I've mentioned before, the Propeller has to be able to process one bit of biphase data in 326ns if I want to make it work at 48kHz stereo sampling rate. That's about 6 assembler instructions. I already got it to (kinda) decode a signal by using two cogs (one for recovering the clock and one for measuring time) but I wasn't happy with the result.
The last implementation of the biphase decoder counted flanks on the XORIN input which is better, but it was very difficult to get the timing right because of all the propagation delays, and I had the feeling that that wasn't going to be robust enough. For one thing, the timing would have to be adjusted to the input frequency, even if it changed only slightly.
But earlier this week I had an epiphany: what if I just keep counting flanks from the beginning of a subframe to the end of a subframe, and never reset the counter? That would have some serious advantages!
In the previous version of the code, I would read the timer/counter (PHSA register) at the start of every bit, and then reset it. But by the time that the timer actually gets reset, the next pulse is already almost coming in if the current bit is a 1. That's exactly what I didn't like about that code: I had to time the reset exactly right (within one 12.5ns Propeller clock cycle accuracy) so the timers wouldn't miss anything, and I had the feeling that this was pretty much impossible given the amount of jitter. That's not something I want the end user to have do (besides, it would probably be difficult to do while the system is running).
I thought up a simple loop in PASM that goes about as follows:
- Set timer A to count positive edges on XORIN.
- Wait for a positive edge on XORIN using a WAITPxx instruction. This signifies the start of a new bit.
- Test the lsb of PHSA to find out if there was an odd or even number of transitions in the spdif input.
- Shift the odd/even result into a long word as a single bit.
- Check if a preamble was detected and jump out of the loop if so (I may do this a different way but that's not relevant for this discussion)
- By now, 6 instructions should have passed so jump back to the beginning of the loop (step 2). Alternatively, I may unroll the loop and just paste the above 32 times.
At the end of the subframe, there are 32 bits of data available but each bit doesn't represent an actual data bit value but a record of the oddness of the total number of biphase flanks at the end of each bit time.
On the SPDIF input, a zero-bit is represented as a single transition and a one-bit is represented as two transitions. So if the total number of transitions goes from even to odd or from odd to even in one bit-time, the encoded data bit must have been a 0 because one transition on the SPDIF line causes one pulse on the XORIN line and an odd number plus 1 is always an even number and vice versa. Similarly if the total number of positive edges on XORIN stayed even or stayed odd, the encoded data bit must have been a 1 because the oddness of a number doesn't change if you add (a multiple of) 2.
I'll have to figure out if it's possible to process the oddness-changes on the fly (if so, the biphase decoding would only take a single cog, that would be perfect!), otherwise I'll stick to my earlier idea of running two cogs in parallel; one for the left channel and one for the right channel, and each cog processes the data while the other cog is sampling the next subframe.
But I'm pretty confident that this is a solid way of decoding biphase without ANY need to adjust timing constants (except maybe the expected length of a preamble) as long as the input frequency is between 32kHz and 48kHz and as long as the Propeller frequency is at least 80MHz. If the Propeller is too fast for the signal (i.e. it's able to execute all the instructions before the secondary pulse of a biphase "1" comes in) or too slow (i.e. it can't distinguish a preamble from a biphase "0" bit), it will stop working but I think with the current requirements, I can make it work.
I'll try to program something around this idea in the next few days. I'm getting pretty confident that I'm close to writing a robust biphase decoder.
Update: I tried it out and it works: I'm getting a reliable stream of bits, decoded from the biphase data! Now to figure out how to decode preambles and store the data in the hub of the Propeller for processing.
05/31/2017 at 07:57 •
A quick update:
I rewrote the code that regenerates the clock from the incoming SPDIF signal, and as you can see in the picture above (next to the RECCLK label), it's very steady now, and it runs at half the bit rate. So instead of generating one clock pulse (i.e. an up/down cycle) for each input bit, the RECCLK signal only changes once for each input bit.
That means that the code in the data sampling cog(s) has to be repeated twice: first it waits for RECCLK to go high, then it reads a bit, then it waits for RECCLK to go low and reads another bit. This may look a bit sloppy but is not uncommon in Assembly programming: it's basically a partially unrolled loop. I may go one step further and unroll the entire loop for a single subframe (so the same code will be in the source file 32 times), if there's not enough time to get things done.
The code uses a constant to delay the RECCLK signal by a variable number of clock cycles (it uses a timer in NCO mode). Basically:
- The sync cog waits for the XORIN input (the "original" SPDIF xor'ed with the delayed SPDIF to detect flanks) to go high
- It then starts a timer to toggle the RECCLK output
- Then it waits until the output is actually toggled and starts the timer to toggle the RECCLK output again
- After this, things start over from the top.
I also added preamble detection code which correctly identifies preambles (see the PRADET trace) but needs a little work: I want to make it generate a single pulse that starts right when the first long input pulse is detected, and ends when the first bit of the next subframe comes in, but as you can see, it triggers multiple times. For now it's good enough to trigger my Logic Analyzer so I can record an entire subframe.
I also implemented a small "debug monitor" cog that puts the data bits on a pin that's shown here as "DEBUG". As I mentioned in the previous log, the idea was to synchronize the data cogs on the recovered clock, and sample the XORIN pin at the time when RECCLK changes state. But this is actually a bad idea: The delay circuitry is very simple and there's a lot of jitter: I've seen recovered clock cycles that were more than 50ns too long or too short because of the jitter on the delay. That makes it pretty much impossible to get the timing right if I want to sample the signal right at the time when a second pulse comes in over XORIN.
But the Propeller timers can be programmed to count negative or positive edges, so that the exact delay time in the external circuitry becomes a lot less critical. The Debug Monitor cog that I programmed, basically waits for the RECCLK signal to change, and then checks if the Propeller timer detected two negative edges in the last cycle. If so, it sets the DEBUG output to 1, otherwise it sets it to 0. As you can see, it correctly decoded the subframe in the middle (between the PRADET pulses) as (1)0000'0010'0110'1001'0001'0111'0001(0) (the 1 at the start and the 0 at the end are false readings because of the preamble, I'll deal with those later).
What bothers me a bit though is that it was difficult to get the debug cog to work right: it's synchronized to the RECCLK with WAITPEQ/WAITPNE instructions but when I simply had the code read the data after each WAIT and then restart the timer, I couldn't get reliable results no matter how I set the delay for the sync cog. I had to basically:
- Wait for the recovered clock RECCLK using WAITPEQ/WAITPNE
- Wait for a little while longer using NOP
- Check if there were 2 negative edges of XORIN
- Reset the edge counter
- Wait for the next edge of RECCLK.
It bothers me that this way, the timing is partially based on the time it takes to execute the instructions. It means I'm doing something wrong or I need to use another timer in the debug cog (which may later become the data cog).
I'll have to take a thorough look at the debug cog and analyze what's going on and where the delays are. For one thing, the timer that detects negative edges is delayed by one clock cycle.
Another thing is that if I need to start the edge-counter at a different time compared to the RECCLK signal, it may be easier to just wait for a positive-going XORIN pulse instead of going through so much trouble to regenerate the clock and synchronize it with the time that an extra pulse would come in.
In short: it looks like it works but this needs some more analysis.
05/30/2017 at 08:44 •
As I noted in the previous log, S/PDIF is way too much to handle for one cog of a Propeller. That's okay, I have 7 more cogs. But in projects like this where you have to divide the work over several different cogs, it's difficult to keep your head around things. At the time that I'm typing this, the plan is somewhat as follows:
- Unfortunately, some external circuitry is unavoidable. In the previous log I said I would like to get rid of it, but that's just impossible. The external circuitry converts the S/PDIF bi-phase signal of any polarity to a simple signal where each transition of the input is converted to a pulse. Then it's just a matter of timing the distances between those pulses to get the original binary stream.
- One cog (let's call it the "sync cog") will be in charge of recovering the clock signal from the input, and will put that signal on a helper pin that can be used by other cogs for synchronization.
- The same "sync cog" will also detect preambles and will generate output on another pin to synchronize the other cogs.
- A "data cog" will use the output signals from the first cog to synchronize to the clock and read input bits. At the end of a subframe, it will store the 32-bit decoded word in the hub.
- If I can't make the "data cog" fast enough to store data into the hub at the end of each subframe, there will be two "data cogs", one that reads the left subframe and one that reads the right subframe.
- Once the data is in the hub, the subchannels can be demultiplexed and dumped to the serial port or to the screen. It will probably also be possible to use existing open-source modules to send the code out to a recorder or a playback device, after optionally modifying the data.
I already noted that it was going to be necessary to mess around with the timers in the Propeller. Timers in the Propeller are incredibly useful -- as long as you just want to use them for output. For input tasks such as measuring time in a high-speed environment, the timers need a lot of help from the code:
- You can't let the code wait for a timer (only for the system counter and it takes three assembly instructions to set that up, and that's two too many for our purposes)
- There are timer modes that automatically start a timer when e.g. an input pin is low or high, to measure how long a timer input has been in that state. Unfortunately, the code still has to babysit the timer: if you're measuring how long a signal is high or low, the only way to tell that the signal is no longer in that state is to test it with assembly instructions (either by testing the signal directly or by testing if the timer is counting; both of which are several instructions that I don't have time for in this project)
- There are 2 timers per cog which is really useful (16 timers total is not too shabby), but that means that one cog doesn't have direct access to a timer's state in another cog. So to synchronize one cog with the timer of another cog, you have to sacrifice a pin and use it for output on one cog and input on another cog.
- There are some "logic" modes which allow you to analyze two input pins in various ways, but unfortunately it's not possible to generate direct output from these modes: the code has to examine the timer registers to see if anything has been registered. Nice for slow events but unusable for our purposes.
- The timer modes that let you measure the time that an input pin was low or high, allow you to provide an output pin but the output is always just a delayed version of the input, and the delay is always simply the input pin delayed by one clock cycle.
- There are various timer modes that generate useful output, especially the NCO and DUTY modes which use bit 31 of the counter, or the carry flag of an adder to generate output. This can be used to generate a pulse of a predefined length or after a predefined delay.
When I started on the project, I had set up a simple circuit with a 74HC04 and some passive parts to amplify the input signal to the full range of CMOS, and to generate a slightly delayed version of that signal that could be used to detect flanks, regardless of the polarity of the original signal. With a WAITPEQ or WAITPNE instruction, it's easy to detect whether pins are equal or different (just test both pins at the same time and use the Carry flag to detect the odd parity). As I already mentioned in the previous log, that wasn't going to be fast enough by a long shot.
With timers, it's not possible to use two input pins at the same time. So I added a 74HC00 to my breadboard to make an XOR port, so I would have a low latency signal that indicated whether a flank (upgoing or downgoing) had been detected in the input signal. The signal on my breadboard is now roughly as follows:
I combined the two delay ports into one by putting the two 100pF capacitors in parallel. Of course now there's only one inverter between the first stage (called SPDIN for SPDIF IN in this picture) and the second stage (called SPDDEL for SPDIF DELAYED) so the polarity of the delayed signal is now reversed compared to the schematic in the previous log.
The three NAND ports that I added and the two NOT-gates generate an output that's HIGH when the SPDIN and SPDDEL signals are the same, which happens for a short time when the input changes polarity (remember there's an inverter so the delayed signal is different as long as the input doesn't change). By the way, it looks like I may not need the NAND ports after all. I'll regard that as an optimization so I won't worry about it for now.
I decided to send in the big guns and dig out my HP 16500C logic analyzer / oscilloscope (see the photo near the top of the log) from the garage, so I could play around with this. I bought it from eBay a few years ago and I don't use it often because I don't really have space for it. When I got it, it was noisy too, but I replaced the hard disk by a 64MB CF card and replaced the two loud fans with a single quiet one that still moves plenty of air to keep the system cool.
Having an oscilloscope and a logic analyzer allowed me to quickly try out a bunch of different approaches to recover the clock from the input (and generate a signal that I can hopefully use in a future "data cog" to read the data.
This piece of code sets up two timers, both in NCO (Numerical Controlled Oscillator) mode. It then strategically resets the timers when certain things happen.
- At the beginning of the loop, the program assumes that the P3 input (the XOR output of the circuit diagram above) is LOW.
- It waits until the pin goes HIGH, indicating that the input changed.
- Then it resets timer A which generates a pulse that is exactly the right size to end at the time when (during a bit with value 1) a second pulse comes in on XOR.
- Then it tests whether the time since the previous change in the SPDIF is long enough to indicate the start of a Preamble (all preambles start with a 3 biphase bit absence of changes in the SPDIF signal). The Z flag is cleared if a preamble was detected, or set if it wasn't.
- If this wasn't a preamble, clear the preamble timer (i.e. start measuring from now), wait for the end of the pulse on the XOR line if necessary, and the repeat the loop to wait for the next pulse.
- If this was a preamble, keep busy for a while, then do the same things that a non-preamble loop would do and repeat.
Obviously this could use some cleanup but the result can be seen in the photo of the logic analyzer screen: the code correctly generates clock pulses and a longer pulse indicating that a preamble was encountered.
I wrote some code to test whether another cog could successfully read the data by waiting for the recovered clock to go low and then sampling the XOR pin, but this didn't work. First of all, it took too many instructions (basically it can't get its work done on time). Second, there is not enough time to wait for the clock and then read the data: by the time the data-reading instruction is being executed, the input is gone.
In the next log I'll try to fix this by doing two things:
- First, I'm going to change the code so that the recovered clock runs at half speed. That way, the data cog(s) only has/have to wait for the next state with WAITPEQ/WAITPNE; they don't have to wait for the end of that state.
- Also, I'll have to change the timing so that it's possible to either do a WAITPEQ/WAITPNE that waits for the clock and also checks the data, or I'll have to change the timing of the sync cog to allow for sufficient delay in the data cog to pick up the data at the right time. This might be difficult: I have a funny feeling that the sync cog may not have enough time to do this.