Therac-25 Club

A catalog of incidents where software defects or human-machine interface failures directly or indirectly caused human death. These are not abstractions. Each entry represents real people who died because code was wrong.

1,400+
Known dead from the incidents documented below

Air New Zealand Flight 901, Mount Erebus

November 28, 1979 Ross Island, Antarctica 257 dead
Navigation Computer Coordinate Error

What happened

A sightseeing DC-10 flew directly into Mount Erebus, a 12,448-foot active volcano, because its navigation computer had been loaded with the wrong coordinates.

  • The original 1977 route passed over Mount Erebus. When routes were computerized in 1978, a data entry error shifted a key waypoint 27 nautical miles west—inadvertently placing flights over the open water of McMurdo Sound instead. Crews flew this safer but incorrect route for fourteen months.
  • In the early hours before the fatal flight, Air New Zealand's navigation section “corrected” the coordinates back to what they believed was the original route—unknowingly shifting the flight path 27 nautical miles east, directly over Mount Erebus.
  • The crew was never informed of the coordinate change. They descended through cloud expecting to be over water, following a path the computer told them was safe.

The dead

All 237 passengers and 20 crew were killed. Air New Zealand initially blamed the pilots. Justice Peter Mahon's Royal Commission found an “orchestrated litany of lies” by the airline to cover up the navigation error.

When a navigation computer says you are safe, you trust it. A silent coordinate change, invisible to the crew, turned a routine sightseeing flight into a collision course. Data integrity in navigation systems is a life-or-death matter.

Therac-25 Radiation Therapy Machine

1985–1987 Georgia, Ontario, Washington, Texas At least 6 overdosed, 2–6 dead
Race Condition / Integer Overflow

What happened

The Therac-25 was a computer-controlled linear accelerator for radiation therapy. Unlike its predecessors, it removed all hardware safety interlocks and relied entirely on software. Two bugs conspired to kill patients:

  • A race condition allowed an experienced operator who edited parameters quickly (within ~8 seconds) to bypass a critical safety check. The machine would fire a high-current beam configured for X-ray production directly at the patient without the tungsten target in place—roughly 100 times the intended dose.
  • A counter overflow bug: a shared flag used flag = flag + 1 instead of flag = true. Every 256th pass, it overflowed to zero, failing to detect the turntable was in the wrong position.
  • The machine displayed cryptic "MALFUNCTION" codes not explained in the manual. Operators routinely pressed proceed, unaware a massive overdose had occurred.

The dead

At least six patients received massive radiation overdoses across four hospitals. At least two died directly from radiation injuries—both at the East Texas Cancer Center, one within weeks. Others suffered severe radiation burns, neurological damage, and chronic pain; total deaths linked to the incidents range from three to six.

AECL, the manufacturer, initially denied the machine could overdose patients and blamed operators. Nancy Leveson and Clark Turner's 1993 investigation became the canonical case study in software safety. The Therac-25 taught us: software must never be the sole safety mechanism in life-critical systems.

Patriot Missile Failure, Dhahran

February 25, 1991 Dhahran, Saudi Arabia 28 dead, 99 wounded
Fixed-Point Truncation Drift

What happened

A Patriot missile battery failed to intercept an incoming Iraqi Scud missile. The root cause was a fixed-point arithmetic error in the weapon control computer.

  • The system tracked time by counting tenths of a second, then multiplying by 0.1 to convert. But 0.1 cannot be represented exactly in the 24-bit fixed-point register, introducing a truncation error of ~0.000000095 seconds per tick.
  • After 100 hours of continuous operation, the accumulated error was ~0.34 seconds. At Scud speeds (~Mach 5), this shifted the tracking gate by ~687 meters. The Patriot looked in the wrong place and never found the missile.
  • Israeli forces had reported accuracy degradation after 8+ hours two weeks earlier. A software patch was in transit. It arrived at Dhahran on February 26—one day too late.

The dead

The Scud struck a barracks. 28 U.S. Army soldiers were killed and 99 wounded.

The canonical example of lethal numerical error. Cumulative truncation in a fixed-point register over a long-running system can be fatal. Known bugs must be patched with urgency proportional to risk.

London Ambulance Service LASCAD Failure

October 26–27, 1992 London, England Estimated up to 30 dead
81 Known Bugs at Launch

What happened

The London Ambulance Service deployed a new Computer Aided Dispatch system that collapsed within 36 hours, causing catastrophic delays in emergency response across the entire city.

  • The system went live with 81 known bugs. A memory leak in code handling incident records caused the file server to progressively slow and eventually fail.
  • The software could not handle invalid or incomplete data about ambulance positions. It generated duplicate and phantom calls, lost track of ambulance locations, and could not process corrections when crews pressed wrong buttons on their mobile terminals.
  • Response times increased catastrophically. Documented cases include an 11-year-old girl who died after waiting 53 minutes and a man who died of a heart attack after waiting two hours.

The dead

The exact death toll was never officially established. The official inquiry acknowledged deaths but did not attribute a specific number; media and union estimates at the time suggested up to 30 people may have died due to delayed response. Individual cases were documented.

Deploying safety-critical software with known bugs is gambling with lives. A dispatch system failure doesn't kill anyone directly—it kills by absence, by the ambulance that never arrives. The dead are invisible in the logs.

China Airlines Flight 140

April 26, 1994 Nagoya Airport, Japan 264 dead
Autopilot vs. Pilot Conflict

What happened

An Airbus A300-600R crashed during landing approach when the pilots and the autopilot entered a physical tug-of-war for control of the aircraft.

  • The copilot accidentally triggered the Go-Around lever, engaging go-around thrust. Moments later, attempting to regain the glideslope, he inadvertently re-engaged the autopilot—which was now in go-around mode and began commanding a climb.
  • The pilots pushed the nose down manually to continue the approach. The autopilot simultaneously drove the trimmable horizontal stabilizer to full nose-up over 18 seconds. The two forces worked against each other.
  • The autopilot did not disengage when pilots applied opposing inputs. An Airbus service bulletin addressing exactly this scenario existed but was classified "recommended" rather than "mandatory." China Airlines had not installed it.

The dead

264 of 271 people on board were killed. After the crash, the French DGAC issued an airworthiness directive making the service bulletin mandatory.

Autopilot systems must not silently fight pilot inputs. "Recommended" safety fixes for known design flaws should be mandatory. The human must always be able to override automation clearly and immediately.

Chinook ZD576, Mull of Kintyre

June 2, 1994 Mull of Kintyre, Scotland 29 dead
FADEC Software Anomalies

What happened

A RAF Chinook HC.2 helicopter crashed into a hillside in fog, killing all 29 on board including senior Northern Ireland intelligence officials. The engine control software was known to be deeply flawed.

  • The Chinook HC.2’s Full Authority Digital Engine Control (FADEC) software had been partially reviewed by EDS-SCICON, who examined only 18% of the code (2,897 of 16,254 lines) and found 486 anomalies before abandoning the review entirely.
  • Documented problems included uncommanded engine run-up and run-down, and undemanded flight control movements. A September 1993 MoD memo described one anomaly as “positively dangerous.” A separate internal memo written the day of the crash stated that recommendations regarding FADEC software had “been ignored.”
  • The pilots were blamed for gross negligence. A 2011 Parliamentary review overturned this verdict, finding it unjustified given the known software deficiencies. The software was never proven to have caused this specific crash—but it could not be ruled out.

The dead

All 25 passengers and 4 crew were killed. The two pilots, Flight Lieutenants Jonathan Tapper and Richard Cook, had their names cleared 17 years after their deaths.

486 known anomalies in 18% of the code. The review was abandoned, not completed. Dead pilots cannot defend themselves. When software with known defects is deployed in safety-critical systems, the people who die may also be blamed for the crash.

Panama Radiotherapy Overdoses

2000–2001 Panama City, Panama 28 overdosed, 18–23 dead
Direction-Dependent Calculation

What happened

Treatment planning software (Multidata RTP/2) used to calculate Cobalt-60 radiation doses had a critical flaw in how it handled shielding blocks.

  • The software only allowed four shielding blocks. Doctors needed five. They discovered they could enter all five as a single irregular block with a hole in the middle.
  • The software gave different dose calculations depending on which direction the outline of the hole was drawn. One direction: correct dose. The other direction: approximately double the necessary exposure.
  • No input validation warned that something was wrong. No sanity check flagged the anomalous results. The error went undetected for seven months.

The dead

28 patients received overdoses of +10% to +105%. By 2005, at least 23 of the 28 had died, with at least 18 deaths attributed to radiation effects. Three medical physicists were charged with second-degree murder. Two were convicted and sentenced to four years in prison; the third received a lesser sentence with a fine.

Multidata Systems International was permanently barred from manufacturing medical devices. Medical software must validate all input and flag anomalous results. Workarounds discovered by users can interact with software in ways no one predicted.

Patriot Missile Fratricide, Iraq

March–April 2003 Iraq/Kuwait border 3 dead
Automated Misclassification / Excess Autonomy

What happened

During the 2003 Iraq War, Patriot missile batteries shot down two friendly aircraft because the system’s software misclassified them as incoming missiles.

  • The automated target classification algorithm used overly broad criteria for identifying “Anti-Radiation Missiles.” Friendly aircraft matched the profile closely enough to trigger engagement.
  • The IFF (Identification Friend or Foe) interrogation system had low reliability due to electronic interference between closely-spaced batteries, generating false targets that were correlated with real aircraft.
  • On March 22, a Patriot battery shot down an RAF Tornado, killing Flight Lieutenants Kevin Main and David Williams. On April 2, a Patriot battery shot down a US Navy F/A-18C Hornet, killing Lt. Nathan White.

The dead

Three allied aircrew were killed by their own air defense system—two in the Tornado (Main was the pilot, Williams the navigator) and one in the Hornet. The Defense Science Board found the Patriot had been given “too much autonomy” and that its automated functions were a contributing factor in misidentifying friend as foe.

Automated weapons systems that cannot reliably distinguish friend from foe should not be given autonomous firing authority. The same Patriot system that failed to intercept a Scud in 1991 due to a software bug killed allied pilots in 2003 due to a different software flaw.

Helios Airways Flight 522

August 14, 2005 Near Grammatiko, Greece 121 dead
Identical Alarms for Different Failures

What happened

A Boeing 737-300 gradually depressurized after the cabin pressurization system was left in "manual" mode after maintenance. Everyone aboard slowly lost consciousness as the plane flew on autopilot for two hours.

  • At 12,040 feet, the cabin altitude warning horn sounded. This horn produces the exact same sound as the takeoff configuration warning—an alarm that can only trigger on the ground. The captain apparently confused the two and never donned an oxygen mask.
  • As cabin pressure dropped, the crew became hypoxic and lost consciousness. The aircraft continued to its cruising altitude on autopilot.
  • Greek Air Force F-16 fighters intercepted and could see unconscious passengers through the windows. A flight attendant with a UK Commercial Pilot Licence, Andreas Prodromou, reached the cockpit but the engines flamed out from fuel exhaustion.

The dead

All 121 people on board were killed. In 2011, the FAA required all 737-100 through -500 models to install additional cockpit warning lights to differentiate pressurization problems from takeoff configuration issues.

Different emergencies must have distinct, unambiguous warnings. A single shared alarm sound for multiple failure modes is a dangerous design choice. A warning system that confuses rather than informs is worse than no warning at all.

Epinal Radiotherapy Accident

2004–2005 Epinal, France 24 severely overdosed, at least 12 dead
Procedural / Training Failure

What happened

24 prostate cancer patients received 20–30% more radiation than prescribed after the hospital switched from physical to dynamic wedges without adequate training or verification.

  • In May 2004, the hospital switched wedge types for treatment delivery. The planning software calculated doses differently for each type, but staff continued to enter parameters as if using physical wedges while the machine delivered radiation using dynamic wedges.
  • Staff had not been adequately trained on the new technique. The English-language software manual had not been translated into French.
  • No independent system existed to verify that calculated doses matched delivered values. The systematic 20–28% overdose went undetected for over a year.

The dead

At least 12 patients died from complications attributed to the overdoses across the broader incident, which affected nearly 450 patients in multiple cohorts. Many survivors suffered severe rectal and urinary damage. Two doctors and a radiophysicist were convicted of manslaughter. The two doctors were sentenced to four years (18 months non-suspended) with lifetime practice bans; the radiophysicist received 18 months in prison.

The most severe radiotherapy accident in French history. Changing a treatment technique without retraining staff or verifying output is lethal negligence. Independent dose verification is essential. Software that accepts incompatible parameters without warning is complicit.

Air France Flight 447

June 1, 2009 Mid-Atlantic Ocean 228 dead
Automation Paradox / Contradictory Alerts

What happened

An Airbus A330 crashed into the Atlantic after a cascade of automation failures left the pilots unable to understand what was happening to their aircraft.

  • Ice crystals blocked the pitot tubes. The autopilot disconnected because it could no longer determine airspeed. Flight control law reverted from "Normal" (where the computer prevents exceeding safe parameters) to "Alternate" (where most protections are removed).
  • The stall warning created a perverse feedback loop: it sounded when pilots did the correct thing (push nose down, angle-of-attack decreases into the valid range), and went silent when they did the wrong thing (pull nose up, angle-of-attack exceeds the validity threshold, data declared implausible, warning suppressed).
  • The A330 uses non-coupled side-sticks. Neither pilot could see or feel what the other was doing.

The dead

All 228 people on board were killed. The wreckage was not found for nearly two years, at a depth of approximately 3,980 meters. The aircraft had been in a full aerodynamic stall for approximately three and a half minutes.

The automation paradox: the more reliable automation becomes, the less prepared humans are to take over when it fails. Warning systems must not give contradictory signals. The stall warning was technically correct at every individual moment and catastrophically misleading in aggregate.

Toyota Unintended Acceleration

2000–2010 United States, worldwide ~89 dead
Spaghetti Code / Task Death

What happened

Toyota vehicles experienced sudden, unintended acceleration where the throttle would open without driver input and resist braking. For years, Toyota blamed floor mats. Then experts examined the code.

  • Expert Michael Barr spent 20 months reviewing Toyota's source code. An internal Toyota document from 2007 had already described the engine control application as “spaghetti-like.” Barr found 67 functions scoring above 50 on Cyclomatic Complexity (rated “untestable”) and 81,514 MISRA-C coding rule violations.
  • A critical software task controlled throttle, cruise control, and many failsafe functions. If this task died due to stack overflow, buffer overflow, or memory corruption—all possible given the code quality—the throttle could open with no software failsafe.
  • The system's watchdog timer, meant to detect crashes, was poorly implemented. The system lacked protection against single-bit memory flips from cosmic rays or EMI.

The dead

NHTSA estimated at least 89 deaths and 57 injuries. Toyota recalled ~9 million vehicles and paid $1.2 billion to the DOJ to settle criminal charges of concealing safety defects.

A jury found Toyota acted with "reckless disregard." Code quality in safety-critical systems is literally a life-or-death matter. A NASA study found "no electronic defect" but didn't review the full source. The independent expert who did found catastrophic flaws.

Wenzhou High-Speed Train Collision

July 23, 2011 Wenzhou, Zhejiang Province, China 40 dead, 172 injured
Fail-Dangerous Signal Design

What happened

A high-speed train rear-ended a stopped train at full speed because the signaling software violated the most fundamental rule in railway safety: when something fails, show red.

  • A lightning strike burned out fuses in the signal assembly. Instead of defaulting to a “stop” indication—the standard fail-safe behavior for railway signaling worldwide—the LKD2-T1 train control system sent an erroneous “track clear” signal to dispatch.
  • The dispatch center showed the track section containing stopped train D3115 as unoccupied. Following train D301 was authorized to proceed at full speed into the same block.
  • The official investigation found “serious design flaws” in the signaling software. The Railway Research & Design Institute had never organized a formal R&D team for the LKD2-T1 system or conducted comprehensive testing.

The dead

40 people were killed and 172 injured. Authorities initially attempted to bury the wreckage before the investigation was complete, provoking public outrage.

Fail-safe is the oldest principle in railway signaling: if anything goes wrong, show a stop signal. The LKD2-T1 system did the opposite. A signal system that shows green when it should show red is not merely broken—it is actively lethal.

Airbus A400M Acceptance Flight Crash

May 9, 2015 Near Seville, Spain 4 dead, 2 seriously injured
Deleted Engine Configuration Data

What happened

A military transport aircraft lost power on three of four engines shortly after takeoff on its first production acceptance flight because critical software configuration files had been accidentally wiped during engine installation.

  • During installation, technicians accidentally deleted the torque calibration parameter files from three of the four engines’ Electronic Control Units. Without this data, the ECUs could not correctly interpret engine sensor readings.
  • Without valid calibration data, the three affected engines’ power became frozen and unresponsive to throttle inputs. When the crew moved the throttles to flight idle attempting to manage the situation, the engines complied—and then locked at idle, unable to respond to any further commands.
  • Airbus was reportedly aware of the risk of calibration data being wiped during installation but had not implemented any safeguard to prevent it or detect it before flight.

The dead

Four of the six crew members were killed. Two survived with serious injuries. Airbus confirmed that “incorrectly installed engine control software” caused the crash.

A known risk with no safeguard is a decision to accept casualties. Engine control software that can be silently wiped during routine installation, with no pre-flight check to detect the absence, is a system designed to fail.

Uber Self-Driving Car Fatality

March 18, 2018 Tempe, Arizona 1 dead
Classification Failure / Disabled Safety

What happened

An Uber self-driving test vehicle struck and killed Elaine Herzberg, 49, as she walked her bicycle across a road at night. She was the first known pedestrian killed by a self-driving vehicle.

  • The system detected something in the road 5.6 seconds before impact but cycled the classification between “other,” “vehicle,” and “bicycle.” Each reclassification reset the object’s predicted path, preventing the system from recognizing an imminent collision.
  • The system had no concept of "jaywalking pedestrian" as an object category. It was not designed to identify pedestrians outside of crosswalks.
  • Uber had disabled Volvo's built-in automatic emergency braking to prevent "erratic vehicle behavior." Uber's own system could not initiate emergency braking autonomously.
  • The sole safety driver was watching a video on her phone.

The dead

Elaine Herzberg was killed. The NTSB found Uber's safety culture "inadequate"—the program lacked a formal safety plan, dedicated safety staff, and proper operating procedures.

Autonomous systems must handle real-world edge cases, not just designed scenarios. Disabling existing safety systems is unconscionable. A single distracted safety driver is not a substitute for robust software. If your system can't classify what it's about to hit, it must stop.

Boeing 737 MAX MCAS

2018–2019 Java Sea & Ethiopia 346 dead
Single Sensor / Hidden Automation

What happened

Two 737 MAX aircraft crashed within five months because a flight control system called MCAS repeatedly pushed the nose down based on a single faulty sensor, while pilots had no idea the system existed.

  • MCAS relied on data from only one of two angle-of-attack sensors. If that sensor failed, MCAS activated on false data.
  • MCAS reactivated every 5 seconds after pilots manually corrected, creating a relentless tug-of-war.
  • Boeing did not disclose MCAS to pilots or airlines. It was not in the manual. It was not in the training. A software defect silently tied the AOA disagree alert—intended as a standard feature—to an optional indicator display, leaving it non-functional on roughly 80% of MAX aircraft. Boeing knew about this defect for over a year and told no one.

The dead

189 on Lion Air Flight 610 (October 29, 2018). 157 on Ethiopian Airlines Flight 302 (March 10, 2019). No survivors in either crash. All 737 MAX aircraft were grounded worldwide for 20 months. Boeing paid $2.5 billion in penalties.

Safety-critical systems must not rely on single points of failure. Automation must be transparent to its operators. A known software defect that disables a safety alert, left unpatched for over a year, is not an accident—it is a choice. Boeing prioritized speed-to-market over the lives of 346 people.

Recurring Patterns

Software as sole safety net

The Therac-25 removed hardware interlocks. Boeing MCAS lacked sensor redundancy. When software is the only barrier between the user and catastrophe, the software must be perfect. Software is never perfect.

The automation paradox

Air France 447 and China Airlines 140 demonstrate that the more reliable automation becomes, the less prepared humans are to take over when it fails.

Warnings that confuse

Helios 522's identical alarm sounds. Therac-25's meaningless codes. Air France 447's contradictory stall warnings. A warning system that confuses is worse than none.

Fail-dangerous defaults

Wenzhou's signal system showed green when it should have shown red. The A400M had no pre-flight check for missing engine data. LASCAD launched with 81 known bugs. Systems that fail open instead of fail-safe kill people.

Manufacturer denial

AECL denied Therac-25 could overdose. Boeing concealed MCAS. Toyota fought unintended acceleration claims for years. Air New Zealand blamed the pilots of Flight 901. The dead cannot defend themselves.

Known bugs, deployed anyway

Chinook ZD576's FADEC had 486 known anomalies. LASCAD launched with 81 known bugs. The Patriot's drift was reported two weeks before Dhahran. Deploying known-defective software in safety-critical systems is a decision to accept casualties.

Excess autonomy

The Patriot system in 2003 was given autonomous firing authority it could not exercise responsibly. Uber's self-driving car could not brake autonomously. Too much autonomy and too little autonomy can both be lethal.