Air New Zealand Flight 901, Mount Erebus
What happened
A sightseeing DC-10 flew directly into Mount Erebus, a 12,448-foot active volcano, because its navigation computer had been loaded with the wrong coordinates.
- The original 1977 route passed over Mount Erebus. When routes were computerized in 1978, a data entry error shifted a key waypoint 27 nautical miles west—inadvertently placing flights over the open water of McMurdo Sound instead. Crews flew this safer but incorrect route for fourteen months.
- In the early hours before the fatal flight, Air New Zealand's navigation section “corrected” the coordinates back to what they believed was the original route—unknowingly shifting the flight path 27 nautical miles east, directly over Mount Erebus.
- The crew was never informed of the coordinate change. They descended through cloud expecting to be over water, following a path the computer told them was safe.
The dead
All 237 passengers and 20 crew were killed. Air New Zealand initially blamed the pilots. Justice Peter Mahon's Royal Commission found an “orchestrated litany of lies” by the airline to cover up the navigation error.
When a navigation computer says you are safe, you trust it. A silent coordinate change, invisible to the crew, turned a routine sightseeing flight into a collision course. Data integrity in navigation systems is a life-or-death matter.
Therac-25 Radiation Therapy Machine
What happened
The Therac-25 was a computer-controlled linear accelerator for radiation therapy. Unlike its predecessors, it removed all hardware safety interlocks and relied entirely on software. Two bugs conspired to kill patients:
- A race condition allowed an experienced operator who edited parameters quickly (within ~8 seconds) to bypass a critical safety check. The machine would fire a high-current beam configured for X-ray production directly at the patient without the tungsten target in place—roughly 100 times the intended dose.
- A counter overflow bug: a shared flag used
flag = flag + 1instead offlag = true. Every 256th pass, it overflowed to zero, failing to detect the turntable was in the wrong position. - The machine displayed cryptic "MALFUNCTION" codes not explained in the manual. Operators routinely pressed proceed, unaware a massive overdose had occurred.
The dead
At least six patients received massive radiation overdoses across four hospitals. At least two died directly from radiation injuries—both at the East Texas Cancer Center, one within weeks. Others suffered severe radiation burns, neurological damage, and chronic pain; total deaths linked to the incidents range from three to six.
AECL, the manufacturer, initially denied the machine could overdose patients and blamed operators. Nancy Leveson and Clark Turner's 1993 investigation became the canonical case study in software safety. The Therac-25 taught us: software must never be the sole safety mechanism in life-critical systems.
Patriot Missile Failure, Dhahran
What happened
A Patriot missile battery failed to intercept an incoming Iraqi Scud missile. The root cause was a fixed-point arithmetic error in the weapon control computer.
- The system tracked time by counting tenths of a second, then multiplying by 0.1 to convert. But 0.1 cannot be represented exactly in the 24-bit fixed-point register, introducing a truncation error of ~0.000000095 seconds per tick.
- After 100 hours of continuous operation, the accumulated error was ~0.34 seconds. At Scud speeds (~Mach 5), this shifted the tracking gate by ~687 meters. The Patriot looked in the wrong place and never found the missile.
- Israeli forces had reported accuracy degradation after 8+ hours two weeks earlier. A software patch was in transit. It arrived at Dhahran on February 26—one day too late.
The dead
The Scud struck a barracks. 28 U.S. Army soldiers were killed and 99 wounded.
The canonical example of lethal numerical error. Cumulative truncation in a fixed-point register over a long-running system can be fatal. Known bugs must be patched with urgency proportional to risk.
London Ambulance Service LASCAD Failure
What happened
The London Ambulance Service deployed a new Computer Aided Dispatch system that collapsed within 36 hours, causing catastrophic delays in emergency response across the entire city.
- The system went live with 81 known bugs. A memory leak in code handling incident records caused the file server to progressively slow and eventually fail.
- The software could not handle invalid or incomplete data about ambulance positions. It generated duplicate and phantom calls, lost track of ambulance locations, and could not process corrections when crews pressed wrong buttons on their mobile terminals.
- Response times increased catastrophically. Documented cases include an 11-year-old girl who died after waiting 53 minutes and a man who died of a heart attack after waiting two hours.
The dead
The exact death toll was never officially established. The official inquiry acknowledged deaths but did not attribute a specific number; media and union estimates at the time suggested up to 30 people may have died due to delayed response. Individual cases were documented.
Deploying safety-critical software with known bugs is gambling with lives. A dispatch system failure doesn't kill anyone directly—it kills by absence, by the ambulance that never arrives. The dead are invisible in the logs.
China Airlines Flight 140
What happened
An Airbus A300-600R crashed during landing approach when the pilots and the autopilot entered a physical tug-of-war for control of the aircraft.
- The copilot accidentally triggered the Go-Around lever, engaging go-around thrust. Moments later, attempting to regain the glideslope, he inadvertently re-engaged the autopilot—which was now in go-around mode and began commanding a climb.
- The pilots pushed the nose down manually to continue the approach. The autopilot simultaneously drove the trimmable horizontal stabilizer to full nose-up over 18 seconds. The two forces worked against each other.
- The autopilot did not disengage when pilots applied opposing inputs. An Airbus service bulletin addressing exactly this scenario existed but was classified "recommended" rather than "mandatory." China Airlines had not installed it.
The dead
264 of 271 people on board were killed. After the crash, the French DGAC issued an airworthiness directive making the service bulletin mandatory.
Autopilot systems must not silently fight pilot inputs. "Recommended" safety fixes for known design flaws should be mandatory. The human must always be able to override automation clearly and immediately.
Chinook ZD576, Mull of Kintyre
What happened
A RAF Chinook HC.2 helicopter crashed into a hillside in fog, killing all 29 on board including senior Northern Ireland intelligence officials. The engine control software was known to be deeply flawed.
- The Chinook HC.2’s Full Authority Digital Engine Control (FADEC) software had been partially reviewed by EDS-SCICON, who examined only 18% of the code (2,897 of 16,254 lines) and found 486 anomalies before abandoning the review entirely.
- Documented problems included uncommanded engine run-up and run-down, and undemanded flight control movements. A September 1993 MoD memo described one anomaly as “positively dangerous.” A separate internal memo written the day of the crash stated that recommendations regarding FADEC software had “been ignored.”
- The pilots were blamed for gross negligence. A 2011 Parliamentary review overturned this verdict, finding it unjustified given the known software deficiencies. The software was never proven to have caused this specific crash—but it could not be ruled out.
The dead
All 25 passengers and 4 crew were killed. The two pilots, Flight Lieutenants Jonathan Tapper and Richard Cook, had their names cleared 17 years after their deaths.
486 known anomalies in 18% of the code. The review was abandoned, not completed. Dead pilots cannot defend themselves. When software with known defects is deployed in safety-critical systems, the people who die may also be blamed for the crash.
Panama Radiotherapy Overdoses
What happened
Treatment planning software (Multidata RTP/2) used to calculate Cobalt-60 radiation doses had a critical flaw in how it handled shielding blocks.
- The software only allowed four shielding blocks. Doctors needed five. They discovered they could enter all five as a single irregular block with a hole in the middle.
- The software gave different dose calculations depending on which direction the outline of the hole was drawn. One direction: correct dose. The other direction: approximately double the necessary exposure.
- No input validation warned that something was wrong. No sanity check flagged the anomalous results. The error went undetected for seven months.
The dead
28 patients received overdoses of +10% to +105%. By 2005, at least 23 of the 28 had died, with at least 18 deaths attributed to radiation effects. Three medical physicists were charged with second-degree murder. Two were convicted and sentenced to four years in prison; the third received a lesser sentence with a fine.
Multidata Systems International was permanently barred from manufacturing medical devices. Medical software must validate all input and flag anomalous results. Workarounds discovered by users can interact with software in ways no one predicted.
Patriot Missile Fratricide, Iraq
What happened
During the 2003 Iraq War, Patriot missile batteries shot down two friendly aircraft because the system’s software misclassified them as incoming missiles.
- The automated target classification algorithm used overly broad criteria for identifying “Anti-Radiation Missiles.” Friendly aircraft matched the profile closely enough to trigger engagement.
- The IFF (Identification Friend or Foe) interrogation system had low reliability due to electronic interference between closely-spaced batteries, generating false targets that were correlated with real aircraft.
- On March 22, a Patriot battery shot down an RAF Tornado, killing Flight Lieutenants Kevin Main and David Williams. On April 2, a Patriot battery shot down a US Navy F/A-18C Hornet, killing Lt. Nathan White.
The dead
Three allied aircrew were killed by their own air defense system—two in the Tornado (Main was the pilot, Williams the navigator) and one in the Hornet. The Defense Science Board found the Patriot had been given “too much autonomy” and that its automated functions were a contributing factor in misidentifying friend as foe.
Automated weapons systems that cannot reliably distinguish friend from foe should not be given autonomous firing authority. The same Patriot system that failed to intercept a Scud in 1991 due to a software bug killed allied pilots in 2003 due to a different software flaw.
Helios Airways Flight 522
What happened
A Boeing 737-300 gradually depressurized after the cabin pressurization system was left in "manual" mode after maintenance. Everyone aboard slowly lost consciousness as the plane flew on autopilot for two hours.
- At 12,040 feet, the cabin altitude warning horn sounded. This horn produces the exact same sound as the takeoff configuration warning—an alarm that can only trigger on the ground. The captain apparently confused the two and never donned an oxygen mask.
- As cabin pressure dropped, the crew became hypoxic and lost consciousness. The aircraft continued to its cruising altitude on autopilot.
- Greek Air Force F-16 fighters intercepted and could see unconscious passengers through the windows. A flight attendant with a UK Commercial Pilot Licence, Andreas Prodromou, reached the cockpit but the engines flamed out from fuel exhaustion.
The dead
All 121 people on board were killed. In 2011, the FAA required all 737-100 through -500 models to install additional cockpit warning lights to differentiate pressurization problems from takeoff configuration issues.
Different emergencies must have distinct, unambiguous warnings. A single shared alarm sound for multiple failure modes is a dangerous design choice. A warning system that confuses rather than informs is worse than no warning at all.
Epinal Radiotherapy Accident
What happened
24 prostate cancer patients received 20–30% more radiation than prescribed after the hospital switched from physical to dynamic wedges without adequate training or verification.
- In May 2004, the hospital switched wedge types for treatment delivery. The planning software calculated doses differently for each type, but staff continued to enter parameters as if using physical wedges while the machine delivered radiation using dynamic wedges.
- Staff had not been adequately trained on the new technique. The English-language software manual had not been translated into French.
- No independent system existed to verify that calculated doses matched delivered values. The systematic 20–28% overdose went undetected for over a year.
The dead
At least 12 patients died from complications attributed to the overdoses across the broader incident, which affected nearly 450 patients in multiple cohorts. Many survivors suffered severe rectal and urinary damage. Two doctors and a radiophysicist were convicted of manslaughter. The two doctors were sentenced to four years (18 months non-suspended) with lifetime practice bans; the radiophysicist received 18 months in prison.
The most severe radiotherapy accident in French history. Changing a treatment technique without retraining staff or verifying output is lethal negligence. Independent dose verification is essential. Software that accepts incompatible parameters without warning is complicit.
Air France Flight 447
What happened
An Airbus A330 crashed into the Atlantic after a cascade of automation failures left the pilots unable to understand what was happening to their aircraft.
- Ice crystals blocked the pitot tubes. The autopilot disconnected because it could no longer determine airspeed. Flight control law reverted from "Normal" (where the computer prevents exceeding safe parameters) to "Alternate" (where most protections are removed).
- The stall warning created a perverse feedback loop: it sounded when pilots did the correct thing (push nose down, angle-of-attack decreases into the valid range), and went silent when they did the wrong thing (pull nose up, angle-of-attack exceeds the validity threshold, data declared implausible, warning suppressed).
- The A330 uses non-coupled side-sticks. Neither pilot could see or feel what the other was doing.
The dead
All 228 people on board were killed. The wreckage was not found for nearly two years, at a depth of approximately 3,980 meters. The aircraft had been in a full aerodynamic stall for approximately three and a half minutes.
The automation paradox: the more reliable automation becomes, the less prepared humans are to take over when it fails. Warning systems must not give contradictory signals. The stall warning was technically correct at every individual moment and catastrophically misleading in aggregate.
Toyota Unintended Acceleration
What happened
Toyota vehicles experienced sudden, unintended acceleration where the throttle would open without driver input and resist braking. For years, Toyota blamed floor mats. Then experts examined the code.
- Expert Michael Barr spent 20 months reviewing Toyota's source code. An internal Toyota document from 2007 had already described the engine control application as “spaghetti-like.” Barr found 67 functions scoring above 50 on Cyclomatic Complexity (rated “untestable”) and 81,514 MISRA-C coding rule violations.
- A critical software task controlled throttle, cruise control, and many failsafe functions. If this task died due to stack overflow, buffer overflow, or memory corruption—all possible given the code quality—the throttle could open with no software failsafe.
- The system's watchdog timer, meant to detect crashes, was poorly implemented. The system lacked protection against single-bit memory flips from cosmic rays or EMI.
The dead
NHTSA estimated at least 89 deaths and 57 injuries. Toyota recalled ~9 million vehicles and paid $1.2 billion to the DOJ to settle criminal charges of concealing safety defects.
A jury found Toyota acted with "reckless disregard." Code quality in safety-critical systems is literally a life-or-death matter. A NASA study found "no electronic defect" but didn't review the full source. The independent expert who did found catastrophic flaws.
Wenzhou High-Speed Train Collision
What happened
A high-speed train rear-ended a stopped train at full speed because the signaling software violated the most fundamental rule in railway safety: when something fails, show red.
- A lightning strike burned out fuses in the signal assembly. Instead of defaulting to a “stop” indication—the standard fail-safe behavior for railway signaling worldwide—the LKD2-T1 train control system sent an erroneous “track clear” signal to dispatch.
- The dispatch center showed the track section containing stopped train D3115 as unoccupied. Following train D301 was authorized to proceed at full speed into the same block.
- The official investigation found “serious design flaws” in the signaling software. The Railway Research & Design Institute had never organized a formal R&D team for the LKD2-T1 system or conducted comprehensive testing.
The dead
40 people were killed and 172 injured. Authorities initially attempted to bury the wreckage before the investigation was complete, provoking public outrage.
Fail-safe is the oldest principle in railway signaling: if anything goes wrong, show a stop signal. The LKD2-T1 system did the opposite. A signal system that shows green when it should show red is not merely broken—it is actively lethal.
Airbus A400M Acceptance Flight Crash
What happened
A military transport aircraft lost power on three of four engines shortly after takeoff on its first production acceptance flight because critical software configuration files had been accidentally wiped during engine installation.
- During installation, technicians accidentally deleted the torque calibration parameter files from three of the four engines’ Electronic Control Units. Without this data, the ECUs could not correctly interpret engine sensor readings.
- Without valid calibration data, the three affected engines’ power became frozen and unresponsive to throttle inputs. When the crew moved the throttles to flight idle attempting to manage the situation, the engines complied—and then locked at idle, unable to respond to any further commands.
- Airbus was reportedly aware of the risk of calibration data being wiped during installation but had not implemented any safeguard to prevent it or detect it before flight.
The dead
Four of the six crew members were killed. Two survived with serious injuries. Airbus confirmed that “incorrectly installed engine control software” caused the crash.
A known risk with no safeguard is a decision to accept casualties. Engine control software that can be silently wiped during routine installation, with no pre-flight check to detect the absence, is a system designed to fail.
Uber Self-Driving Car Fatality
What happened
An Uber self-driving test vehicle struck and killed Elaine Herzberg, 49, as she walked her bicycle across a road at night. She was the first known pedestrian killed by a self-driving vehicle.
- The system detected something in the road 5.6 seconds before impact but cycled the classification between “other,” “vehicle,” and “bicycle.” Each reclassification reset the object’s predicted path, preventing the system from recognizing an imminent collision.
- The system had no concept of "jaywalking pedestrian" as an object category. It was not designed to identify pedestrians outside of crosswalks.
- Uber had disabled Volvo's built-in automatic emergency braking to prevent "erratic vehicle behavior." Uber's own system could not initiate emergency braking autonomously.
- The sole safety driver was watching a video on her phone.
The dead
Elaine Herzberg was killed. The NTSB found Uber's safety culture "inadequate"—the program lacked a formal safety plan, dedicated safety staff, and proper operating procedures.
Autonomous systems must handle real-world edge cases, not just designed scenarios. Disabling existing safety systems is unconscionable. A single distracted safety driver is not a substitute for robust software. If your system can't classify what it's about to hit, it must stop.
Boeing 737 MAX MCAS
What happened
Two 737 MAX aircraft crashed within five months because a flight control system called MCAS repeatedly pushed the nose down based on a single faulty sensor, while pilots had no idea the system existed.
- MCAS relied on data from only one of two angle-of-attack sensors. If that sensor failed, MCAS activated on false data.
- MCAS reactivated every 5 seconds after pilots manually corrected, creating a relentless tug-of-war.
- Boeing did not disclose MCAS to pilots or airlines. It was not in the manual. It was not in the training. A software defect silently tied the AOA disagree alert—intended as a standard feature—to an optional indicator display, leaving it non-functional on roughly 80% of MAX aircraft. Boeing knew about this defect for over a year and told no one.
The dead
189 on Lion Air Flight 610 (October 29, 2018). 157 on Ethiopian Airlines Flight 302 (March 10, 2019). No survivors in either crash. All 737 MAX aircraft were grounded worldwide for 20 months. Boeing paid $2.5 billion in penalties.
Safety-critical systems must not rely on single points of failure. Automation must be transparent to its operators. A known software defect that disables a safety alert, left unpatched for over a year, is not an accident—it is a choice. Boeing prioritized speed-to-market over the lives of 346 people.
Recurring Patterns
Software as sole safety net
The Therac-25 removed hardware interlocks. Boeing MCAS lacked sensor redundancy. When software is the only barrier between the user and catastrophe, the software must be perfect. Software is never perfect.
The automation paradox
Air France 447 and China Airlines 140 demonstrate that the more reliable automation becomes, the less prepared humans are to take over when it fails.
Warnings that confuse
Helios 522's identical alarm sounds. Therac-25's meaningless codes. Air France 447's contradictory stall warnings. A warning system that confuses is worse than none.
Fail-dangerous defaults
Wenzhou's signal system showed green when it should have shown red. The A400M had no pre-flight check for missing engine data. LASCAD launched with 81 known bugs. Systems that fail open instead of fail-safe kill people.
Manufacturer denial
AECL denied Therac-25 could overdose. Boeing concealed MCAS. Toyota fought unintended acceleration claims for years. Air New Zealand blamed the pilots of Flight 901. The dead cannot defend themselves.
Known bugs, deployed anyway
Chinook ZD576's FADEC had 486 known anomalies. LASCAD launched with 81 known bugs. The Patriot's drift was reported two weeks before Dhahran. Deploying known-defective software in safety-critical systems is a decision to accept casualties.
Excess autonomy
The Patriot system in 2003 was given autonomous firing authority it could not exercise responsibly. Uber's self-driving car could not brake autonomously. Too much autonomy and too little autonomy can both be lethal.