Protocol suggestions

Hi there,

As someone who's interested in the "reimplement HDL" option for the tiles - I thought a suitable name might be "OpenNX4" (resisting the temptation to run the two n's together) and had a few thoughts/suggesions to throw out there.

For the sake of this discussion, let's say an "end user" is someone who (at least initially) has one or more NX4 tiles and a Raspberry Pi and wants to use them as a video panel (daisychained, with the tiles in a plane or just placed artistically), with minimum extra hassle.

Because the only limit of how many tiles you can daisychain is your power source, your frame rate and your wallet, let's look at how we can get the best frame-rate (=highest speed pixel interface).

The tiles come with an IN and OUT port. The IN port has 2x in and 1x out LVDS pairs, and the OUT port has 2x OUT and 1x IN pairs. The pairs can be used as single-ended i/os as well of course.

As NX4 is FPGA driven it means almost any protocol can be implemented (although HDMI video is not reasonable). You really just want a lot of downstream performance; an upstream link back from the tiles can be arbitrarily slow.

If we look at what protocols are available on typical hacker boards (OPi, RPi, Allwinner, or even Teensies):

I2C up to about 1-2mbps, UART ~4mbps.. and..

SPI (and/or I2S) ports at up to ~100Mbit/s

(There is also SD - as 'make the tile a fake SD card' which is fiddly and takes too many wires in fast 4-bit SD mode)

Another option is that some SBC's have a connector for an LCD display which is usually LVDS, and we could probably provide an interface to that too assuming there are enough pins (or we ignore some of them).

The current crop of cheap allwinner H3 boards are great, and some tricks are possible including running both I2S ports synchronously (=2 bits @ 80+Mhz); they're my fave platform right now, but much of this is doable on a Raspberry Pi too (possibly lower performance).

Suggested configuration

For the sake of discussion let's say you stick an Orange Pi right physically on the back of an NX4 tile, and that's your video processor; you can send it stuff over wifi or ethernet, generate trippy images, play video off SD, whatever. It runs some code that preprocesses and maps the pixels to the tiles, and blasts the result to the first (and subsequent daisychained) NX4 tiles. (You can probably power the Orange Pi off the 4v6 supply in the tile).

First tile in array receives single-ended

So the NX4 is an SPI slave. All our tiles boot up in "spi slave" mode until they hear otherwise.

So from your OPi you run SPI MOSI and SCK over nice short terminated wires into the tile data input. Add MISO so you've got a data stream back (we could connect CS but could also design the protocol so it's not required). This will probably achieve in the ballpark of 30-80Mbps actual throughput.

Even higher performance option : I2S on Allwinner boards; instead of using SPI it's likely possible to use synchronous dual channel I2S; i.e. run PCM0_DOUT and PCM1_DOUT, PCM0_CLK (and PCM0_DIN for a return data channel) which might achieve ~100Mbps.

Daisychained are differential

Ok so that's the first tile, what about daisychaining? Note that we can use the first tile as a "single ended to differential converter" and have the OUT port sending LVDS signals downstream to all the other tiles; hence we gain the noise immunity, longer wire length and general better performance of LVDS for all the other tile data links, however many there are (limited only by your desired frame rate).

Advanced hacker bonus points would be so only one tile needs to be reflashed with different Xilinx code, and it accepts a single-ended input from the linux board and converts it to Barco format output for the other daisychained tiles...

Bottom Line - performance

Ok so if in practice we get say 30mbps data rate with a negligible amount of overhead that isn't actual pixels, what frame rates can we get?

Each tile is 32x36 = 1152 pixels.

Starting simplistically; If we use RGB888 pixel format (with internal 8->12 bit per pixel upscaling in the tile), that's 27.6kbits per tile; at 60fps each tile consumes 1.6Mbit/sec.

If we have 30Mbps usable data rate, we can run 18 tiles (2x9 sets) at 60fps. This seems conservative.

...this is without any fancy tricks of course; we can use <24 bits per pixel quite reasonably with the right encoding, use compressed pixel streams, partial updates, all sorts of things; I'd guess you can double the above performance with some optimization. There also seems to be plenty of processing power in each tile remaining to use doing pixel decompression.

(I would also expect to be able to get >30mbps out of the SPI link if the wiring is reasonably carefully done; currently in my HDL I'm not using a PLL so my SPI receiver is clocked at CLK_40, which limits it to somewhere <20Mhz, but that's easily fixed)

Simple "Barco-lite" setup suggestion

So; a chunky 24v power supply (or several) off ebay, some cables+connectors (possibly the most expensive part) and an Orange Pi, one for each 18 (or more) NX4 tiles.

This setup could display live HDMI video input using one of those ~$30 "HDMI capture" dongles which send H264 frames over ethernet (UDP); these have some latency (maybe 1-2 secs) but may be fine for your use.

The Orange Pi can also of course act as a remote desktop/VNC etc output, video player, etc, but could make a decent replacement for (at least) two Barco NX4 controllers and the main Barco video processor thingy. Obviously put a web interface on it for panel control from your phone... :-)

Large arrays

For a larger array (say 24 tiles wide by 18 high for 768x648 on 432 tiles), you might use one OPi per 36 daisychained tiles (@ 30-60fps) so that's 12 OPi's, which could all be on a 100mbps lan and receive the same multicast H264 stream; extracting their own tile areas from the image.

Alternatively if you don't want to feed your display H264 (i.e. send it raw pixels instead, e.g. from a MAME framebuffer) it's very doable, maybe you'd use ethernet switches with 100M ports and a Gig-E uplink (i.e. practically any ethernet switch on ebay) fanning out to each OPi and then unicast the raw pixels as UDP jumbo frames.

Discussions

Richard Aplin wrote 11/07/2017 at 18:53

It seems odd that they didn't go for a higher voltage; I mean why not? I'd assume 24v because the power supplies are so readily available but if you asked me to guess blind I'd have said 36v/48v just because of the wattage involved in a 3x3 array. You can get a spare PSU on the barco site but it doesn't list the voltage

Are you sure? yes | no

modder_mike wrote 11/07/2017 at 01:58

Keep in mind that the connector pins are only good for 7.5A... you'll probably only get 6-8 panels in a single daisy chain. In the NX-4 installation only 3 panels are cascaded. Of course if you run power to each panel individually from your power supply, no problem.

Does a small ARM SoC have enough horsepower to break apart a video frame and reserialize it in real time? I kind of expected we'd have to put another FPGA in as a head node to do the video conversion and chunking.

Are you sure? yes | no

Richard Aplin wrote 11/07/2017 at 02:17

yeah I think so; Orange Pis are an H3; quad 1Ghz Cortex's with >=256MB of 32-bit DDR3; plenty of oomph. Hardware H264 decode. The Orange Pi Plus 2E (specific model) has Gig-E although they're a bit hard to find now; 100M ethernet should be fine - the pixel rates aren't that high and you could put them on a switch w/ Gig-E uplink anyway. I don't personally recommend/like Raspberry Pi's b/c they have such bottlenecked i/o bandwidth compared to a cheaper OPi (which has 3 separate HS USB host ports which you can saturate in parallel, plus - uncontended with usb - 100/1000M ethernet , plus etc etc.
The power supply stuff is basically an end-user thing to sort out :-)

Are you sure? yes | no

Richard Aplin wrote 11/07/2017 at 16:42

It's not exactly clear what the max power supply voltage is; there's only one main buck converter that feed everything else on the tile, and it doesn't have any electrolytics on it, so it's probably just down to the two switching FETs (it's synchronous I think) and the single yellow tantalum cap (code 226T; the 'T' should be the voltage code but I can't find a sensible answer for what 'T' is). The FETs are Renesas parts that are rated at 30v VDSS so I guess that's the answer; very probably 24v (the LTC1778 goes up to 36v). Personally I'd have gone higher than that - why not use 48v - but hey...

Are you sure? yes | no

modder_mike wrote 11/07/2017 at 18:32

Oh, good catch! I hadn't thought to check on the FETs. Would they have used 30V FETs on a 24V line though? I wouldn't, if I were designing it. For a more rugged design, maybe the nominal Vin is only 18V, for 40% headroom?

[Edit] Then again, the Linear appnotes use 30V FETs for 28V inputs... to each their own amount of overkilling, I guess.

Are you sure? yes | no

Some words on the I/O connectors

OpenNX4 source + binaries posted!

Discussions

Become a Hackaday.io Member