We now have about two dozen sensors in the field and about a dozen communicators. Early on, we made a test unit. That unit has been operating for weeks sitting on a counter here in San Francisco, and it has never skipped a beat, missed a report, or had a problem of any kind since we got everything working. The units in the field have not all been so reliable. Some of them have worked flawlessly for months, and some of them never powered up. Over all, they are working about 90 percent of the time. Testing electronics in the wet, salty sea environment tests your design in extreme ways.
We are building and programming more than 2500 miles from where the units are being deployed. We need them to run, unattended, for more than two months, enduring moisture, salt, wind, blowing sand, and who knows what else. When they stop working, we can't easily determine what went wrong, and that is a challenge. With two custom devices interconnected and running a third device talking to a server, there are lots of places where things can go wrong. This is where "hacking" turns into full-scale engineering.
It is one thing to get the basic functions of a device up and running. That part is fun--it requires innovation and creative design. But getting everything to work flawlessly in the field requires diligence, attention to detail, and lots of trouble shooting. That is much less sexy. For this project to be successful, we have to make units that are totally reliable. Ninety percent is not sufficient.
We've learned quite a bit about the ways things can go wrong. Boards can come loose, connections can break, moisture and salt can get into places they are not supposed to get into. Cell phone reception can be interrupted by lightning, and that can cause embedded code to hang.
We anticipated several processes that could fail and programmed the code to handle the problems as they happened. For everything else, there is a watchdog timer that resets the device. But it is difficult to test how things will behave when they fail, if you can't observe them failing. We don't get thunderstorms in San Francisco, and being in the middle of the city, the cell phone reception is excellent. This summer is teaching us about all the problems we will need to deal with.
There are two main things we have learned thus far. First, the watchdog timer works for most unanticipated problems, and the system recovers most of the time. But we need to handle recovering from resets better. Sometimes the system recovers and no data is lost, but sometimes we lose a few hours' or a day's worth of data. We knew this might happen, and we knew how to fix it, but unfortunately, we ran out of code space in the 16K devices we are using (TI's MSP430s) to implement the fixes. Just at the point when we ran out of code space, TI introduced new versions with up to 128K of memory. These new processors should be more than adequate for what we need. But making these fixes is out of the question for this year. The devices are in the field, and the turtle season is half over. So for now, we are accumulating more knowledge about the ways our devices can fail. We will incorporate recovering from those failures into the next version of the software.
Second, the connection between the Smart Sensor and the Communications Unit is the weak link in our design. We consulted with a few engineers about what connector to use, and we ended up choosing a 9 pin Molex connector with gold contacts. The first thing we discovered was that the cable connection to the connector was prone to failure because the outer cover of the cable was removed near the connectors. In the field, the cable bent easily during connecting and disconnecting. The thin wires in the Cat5e cable could easily break after a few uses. So we cast the ends of the connectors and the cables in quick-setting polyurethane to make it strong. It did make it strong, but the polyurethane wicked its way up into the connector and made some of the crimp connections unreliable. So, we started soldering the crimp connections after they were crimped. This seems to work much better. But the solution became very labor intensive to make and difficult to repair. Ultimately, we need a better, simpler, and cheaper way to connect these units. We are considering using a single coax cable for the next version. We would send RF signals in both directions over the coax line, and we could also use the coax line to send power to the sensor. Coax has a long history of being used outdoors, and there are a selection of waterproof connectors and tools to do quick repairs. We are also considering ways to make a wireless connection between the devices, but that might create more problems than it solves. The "power over coax" solution could be engineered so that device can easily be adapted for wireless use as well. This would be very good for applications that do not involve underground nests.