A lot of the fundamental design for this project has already been done on #TritiLED - the basics of driving LEDs with peak efficiency pulses and aggressively sleeping the microcontroller to bring the current draw down to uA levels have been proven. What remains to be done is a thorough analysis of all the ways such a system could fail, and what can be done about it. I'm going to keep a table of all the identified failure modes here, and I'm asking for your help - when you think of anything I've missed, please comment and let me know! In the build logs, I'll discuss some of these issues and possible solutions. I'd ideally like to have a few units running by January 1, 2017.
I'm also going to have a look at the available NASA resources to see what advice they have on building ultra-reliable systems. If anyone has good links to this or other relevant information, please post them.
What Could Possibly Go Wrong?
It is exceedingly difficult to make predictions, particularly about the future.- variously attributed to Niels Bohr, Yogi Berra, Samuel Goldwyn, Robert Storm Petersen, and Mark Twain
I'll be updating this table as issues are identified (entries are in no particular order):
Defect or Event
|PIC12LF1571 flash program memory degradation||Program gets corrupted, device fails||Microchip used to specify 40-year program memory endurance, although I can't find it for this part. They also used to make the distinction between a "prototype" programmer and a "production" programmer - I need to research this. Probably OK (from @jaromir.sukuba see discussion)||me|
|RAM corrupted by radiation||Program hangs or malfunctions||Periodically re-initialize all RAM from flash||me|
|Battery vents, releasing high-pressure gas||sealed case could rupture or explode||provide vent holes or sacrificial vent in case||me|
|Program hangs causing MOSFET to short battery across inductor||Large current drain; inductor heats, causing burns or fire||Include current-limiting resistors in battery connections (accept < 1% efficiency loss)||me|
|Battery contacts / springs corrode||Increased resistance to battery; possible failure||Spot-weld battery contacts? Silver-epoxy battery connections? Battery doesn't have to be user-replaceable!||me|
|Flux residue or contaminants (from handling) cause corrosion||Device failure||Thoroughly clean board after assembly, enclose in protective case||me|
|Thermal cycling on board breaks SMD components||Device failure||Use vendor-approved pad layout and reflow solder profile. Consider using through-hole components - are they more rugged in this way?||me|
|Rogue comet hits planet||Life on Earth ceases||Enjoy it while you can||me|
|Component failure||Device failure||Burn-in all components to catch early failures. (Most) components are hugely de-rated in this circuit, which should greatly extend lifetime.||me|
|PIC GPIO pins may exceed ratings on peak MOSFET gate drive current||Early PIC degradation or failure||Measure peak drive current to ensure it's within spec. Include gate drive resistor (at cost of circuit efficiency) if required.||me|
|Post-assembly ESD event damages PIC or MOSFETs||Device failure||Enclose circuit to prevent direct contact with board||me|
|ESD event during assembly||Latent damage to PIC or MOSFET(s) reduce lifetime||Use proper ESD protection in all phases of assembly||me|
|Errant program behavior (unspecified cause)||Device malfunction||Code is periodically awakened by WDT; this should also break out of unintentional hung loops. The assumption is that the WDT has been designed to be robust.||me|
|Environmental contaminants (inc. moisture)||Corrosion or parasitic power drain||Conformal coating on assembled board after thorough cleaning||me|