Thursday, May 24, 2012

Flip happens :(

Hello dear DS readers,

Let me start this post with a sad story about a recent Russian space mission Phobos-Grunt.

"16 February 2012—The failure of Russia’s ambitious Phobos-Grunt sample-return probe has been shrouded in confusion and mystery, from the first inklings that something had gone wrong after its 9 November launch all the way to inconsistent reports of where it fell to Earth on 15 January." More detailed info you can find here. The image by Michael Carroll.

According to the official report of Roscosmos, the most likely cause of this failure was an SRAM fault caused by ”a local influence of heavy charged particles”, aka galactic cosmic rays.
This is a particular case of a well-known hardware fault, so-called "bit-flip".


A negative environmental impact like increasing heat, lowering voltage, or cosmic radiation, like in the case of the Phobos Grunt, corrupt a part of system’s memory. This can result in a single or several bit-flips, like it shown in the figure. The bit-flips may change the application state, for instance, the value of a critical variable. Later, during the execution of some software function, this erroneous value can be read and propagate further as a system error. Such an error may lead to various unintended consequences. Similar hardware failures can happen not only in memory, but in the CPU or on a BUS. 

A number of research projects aim this problem. Roughly speaking, all of them can be classified into two groups: hardware-based and software-based. Heat and radiation protected hardware or memory/CPU/cache redundancy are typical hardware-based solutions. However, these approaches usually have a number of disadvantages like cost, limited markets, and extremely low performance. 

The second group contains software-based approaches to bit-flip detection and masking. In my opinion, these solutions are much more advanced, interesting, and feet better to the scope of the DS blog. In the next post I plan to give an overview of existing methods and even tools to cope with bit-flips. 

Here, as a teaser, I want to share the next fantastic video created by my colleagues from German R&D сompany Silistra.