How do the machines communicate?

A project log for Maelstrom: 35 machines discuss you

Using sensors, scrapers, and questionnaires, this art installation gets your data--and spreads rumors about you with NRF24L01+ radios.

chris-combsChris Combs 02/03/2021 at 18:280 Comments

One of the most fun parts of building Maelstrom has been getting ~35 machines to speak to each other. In a setting with solid infrastructure, perhaps I would use WiFi. Art galleries aren't always happy about this idea. I also wasn't thrilled about the machines possibly getting Internet access; in my perfect world, they would be completely isolated from wider networks. 

I have worked with Nordic NRF24L01+ radios in other projects and decided to use them in Maelstrom. These radios accept 32 bytes of data and let you, as a developer, "press send and walk away" while they handle the mechanics of retransmission, error checking, etc.

They have other, interesting properties, like a five-byte "pipe" identification scheme that (in radio firmware) lets you ignore messages on the same channel that are not intended for a given client. And it's possible to set up a multicast network. I chose this approach for Maelstrom, using a common "message bus" with target device identifiers baked into the data payload. So all the radios are reading and writing from a common channel, at a rate of a message every few seconds (with some exceptions).

There are additional complexities with this, particularly when sourcing NRF24L01+ radios on a budget. Some of the modules sold today are using mostly-compatible clone parts. One, in particular, clones the datasheet very well--one signal described by Nordic in their own datasheet is flipped in practice in actual Nordic devices, and the clone devices cloned the datasheet instead of the real device behavior… making them incompatible in practice with real devices. 

In my perfect world, I'd be able to keep each radio listening both on a group channel and on an individual address. I wasn't able to get this working across my mix of radio modules. I ended up using Pipe 1, as recommended by the RF24 documentation, to listen to just the multicast channel, and encoding an individual address into the message types that need it.

What data is shared over the radio?

I wanted to have visitor data literally being shared among the machines. A given datum is observed by one particular Maelstrom node and it shares it to two others. Each of them, if they successfully receive it, can choose to share it to two others, themselves. A given node's output generally shows the most recent datum it received. In this way, a given rumor (datum) can quickly spread among the Maelstrom network. 

It exhibits an "echo chamber" effect in real time, with some facts growing in prominence and other facts being suppressed over time. After a few minutes all of the machines might be repeating the same thing back to each other.

It's very important to me that visitor data not be retained for longer than 15 minutes. So each internal radio transmission includes an expiration time, along with the data, the visitor ID (and color--it's a 6-digit hex code, as used in HTML/CSS). If a node receives a radio transmission with an implausible expiration date, it throws it out. 

There are also some "housekeeping" transmissions sent among the nodes--to wit:

How do the nodes know which other two nodes to send their data towards?

This is the fun part. So, if all the machines always functioned perfectly, I could set this per machine in advance and call it a day. This would lead to a boring structure where a given machine always shared its data with the same two machines, and so on, but it would work fine. More like a rushing river than a truly chaotic environment.

The problem is that in the real world, these computers and radios don't function perfectly. Some radios in particular seem to drift out of frequency or something, causing them to stop "hearing" after many hours or days. I can reset them every so often, but I can easily imagine a scenario where a few particular nodes fail and the entire installation becomes unresponsive. 

That would not be good! So, I built a system to dynamically build a directed graph of all responsive Maelstrom nodes and renegotiate it as needed as nodes appear or disappear.

Synthetic test of a 50-node digraph
Synthetic test of a 50-node digraph

It works something like this:

The hard parts have been working around these failure cases:

One interesting aspect to all of this is that as someone trying to replicate rumormongering in device format, I don't necessary want perfect message transmission. I like when the machines are playing "Telephone," sharing a typo-ridden version of a fact instead of the real thing. 

To help with this, I don't apply any extra CRC or error correction in my Maelstrom software. I do leave hardware CRC enabled at 16 bits, and throw away messages that are obviously badly structured--but if a given message turns "Christopher" into "Chrisjrote," to me, that's just part of the fun.

As part of building and troubleshooting all of this, I have an internal diagnostic tool that outputs the graph in visual form. Sometimes it makes fun shapes. As a bonus, here are some of the weirder graphs it's formed.