Ethics of Engineering Presentation
Therac-25 Accidents
Andrew Prichard
Reyna Rodriguez
Fernando Ocampo
Stephen Tometi
Brendan Schaefer
Temi Etuwewe
History of the Therac Machines
In the early 1970’s companies Atomic Energy of Canada Limited (AECL) and a French company named (CGR) worked together and began building new linear accelerators. The products of their work were the Therac-6, a 6 MeV accelerator and later the Therac-20, a 20 MeV accelerator.
These machines accelerate electrons into an X-ray beam that can destroy tumors without damaging tissue in surrounding areas. In the Therac-6 and Therac-20 software functionality was limited and the computer interface was added to existing standalone hardware for convenience.
In the mid 1970’s AECL would go on to independently develop a new double pass concept to be used in their new product called the Therac-25. AECL would design the Therac-25 machine from the beginning to utilize computer controls. With this new design idea, AECL would rely on the computer’s ability to control and monitor the hardware and decided to remove all previous hardware safety mechanisms from the Therac-6 and Therac-20. [1][2]
The Therac-25 Disaster
By the mid 1980’s the Therac-25 machines were in use in Canada and in the US. In 1986 a patient named Ray Cox would go to a clinic for a regular treatment.
When the technician was setting up the Therac-25 machine he mistakenly entered the wrong commands but quickly corrected the error. The correction was too quick for the machine to process and triggered an error message. The technician cleared the error and proceeded with the dosage, but this resulted in an additional error message.
The technician would repeat the setup process without realizing the Therac-25 machine was applying the treatments through the error messages at maximum dosage. By the end of the procedure the Therac-25 machine had delivered a dosage that was 125 times more than the desired treatment resulting in the patient dying 4 days later. This was not a single event and similar scenarios of massive overdosages happened to at least 6 other patients. [2][3]
Ethical Engineering Problems
Safety being an afterthought[5][6]
Removed hardware locks and depended on the software to monitor the status of the machine, and dosage. The Hardware locks were previously installed in the Therac-6 & 20 to prevent the operators from doing something dangerous, if an invalid mode was attempted the hardware locks would bring everything to a halt. By tasking the software to position everything and detect dangerous situations, the operators saved a lot of time. [5][6]
Four bugs as well as other potential bugs were found in the Therac-25 which would cause radiation overdose and were later found in the Therac-20, however no accidents occured because of its hardware locks.[8][9]
When AECL was made aware of the situation after the first accident, they responded three days later stating that it was impossible for the Therac-25 to operate in electron mode without scanning to spread the beam. [5][6]
Causes For Disaster
Therac-25 software the difference being that the Therac-20 software was limited and served as a convenience to the hardware. [5][6]
Software controlled many of the critical safety checks, and the hardware locks were removed.
Therac-25 software was also redesigned so the operators could enter the treatment plan once and just verify, as well as some other convenient features. [5][6]
Manual did not clearly state what any of the Malfunction errors meant. [5][6]
Operators ignored the error codes. [5][6]
Patients felt a burning sensation each time, during the 5th accident “the intercom was working and the operator heard a loud noise followed by moaning from the patient.” [6]
Out of the 6 accidents, two came from the came from East Texas Cancer Center, the same operator made a mistake entering the modes but quickly fixing the error. Similar to the first accident from that Center the machine shut down with a “Malfunction 54” message. Each time the patients were sent to the Therac-25 treatment, the operator witnessed an error message. The operators were used to “the quirks in the machine” and continued after each error message not knowing they were delivering more radiation to the patient. It was later found that quickly changing modes” resulted in radiation overdose.” [5][6]
Decision Making Behind the Disaster
By entrusting the software alone and removing any secondary safety controls, the AECLs was overconfident that this machine was incapable of errors. After the second accident they sent a service engineer to investigate. The engineer could reproduce the malfunction message however the engineers suspected the problem was being caused by a microswitch used to determine the turntable position. They claimed to have “increased the safety of the machine by, at least five orders of magnitude, but did not discover why the accidents occurred. [4][6]
How Disaster Continued
The AECL denied that the Therac-25 was capable of the delivering an overdose because of the frequent error codes and paused treatment during the medical procedure.
The only way a patient would suffer from an overdose is if the turntable was in the wrong position; this would cause the patient to be hit by a laser-like beam.
AECL was unable to reproduce the error so the only possible cause for this error was a temporary failure in the three microswitches that could cause the shift the turntable’s position.
A talented physicist Fritz Hager was able to diagnose the problem. The error was known as “Malfunction 54” and once he was able to produce the error on command they were able to correct the applications code. [2][7]
What Went Wrong
Overconfidence in the software led to removing the hardware interlocks making the software responsible for any accidents. Investigations only looked at hardware for malfunctions.
Unacceptable software engineering practices, there were few specifications and documentation, and few software tests.
The software did not have any self-checks or other error detection features.
Not enough investigation on the reported accidents. If the manufacturer would have thoroughly investigated the cause of the first overdose, they could have prevented the other accidents.
A lack of communication between the AECL and the engineers responsible for development of the new machine, was a major issue that led to the overdoses of patients.
In April 1986, the AECL filed a medical device report with the FDA, notifying them of the cause of the two Tyler Hopital incidents. The FDA deemed the Therac-25 defective, and the AECL’s response was a temporary fix to the device’s software.
In February 1987, the FDA finally required all units of the Therac-25 to be shut down until permanent modifications were made.
The amount of time the FDA and the AECL took to consider these software problems an issue for patients, is indicative of a slow response by governing institutions to protect patient health. [1][8]
Ethical Errors
When building the Therac-25 the designers removed critical safety features in favor of a software only design.
Not all errors and bugs were found when testing and designing the Therac-25 resulting in the operator manual not having solutions to many error messages.
Technicians would operate the Therac-25 machines and would often ignore error messages and continue their procedures ultimately resulting in overdosing a patient.
After multiple cases of patients dying, reports were finally made, but the Therac-25 remained operational for multiple years.
Ethical Course of Action
Once Fritz Hager and other physicists could produce the same error, the software for the Therac-25 was later investigated and reviewed. The computer was redesigned to handle real-time control of the machine, both its normal operation and safety system.
Today this could be handled by using one or two microcontrollers with a PC running a GUI front end.
The knowledge of specific incidences between patients exposed to these machines were completely ignored by governing agencies responsible for protecting patients.
Problematic results could have been mitigated by effective communication between the FDA and AECL, to protect patients’ health. [3][7]
Changes After the Accident
In February 1987 USA and Canada placed a shutdown on all Therac-25 until permanent repairs could be implemented. After 6 months AECL issued software patches and hardware updates which eventually allowed the machine to return to service. After the Therac-25 accidents, the FDA improved the reporting system and augmented their procedures and guidelines to include software. [3][7]
Conclusion
The Therac-25 was developed in the early 1970’s and was used to treat cancer patients via radiation.
A technician was using the Therac-25 and mistakenly entered the wrong command. The machine gave the patient 125 times their radiation dosage resulting in the patient dying 4 days later. It was discovered that a “Malfunction 54” error caused the system to fail.
Because of this AECL was able to discover the malfunction and recreate it thanks to Fritz Hager they were able to redesign the hardware and remodel the code for the Therac-25.
References
Medical Devices: Therac-25 http://sunnyday.mit.edu/papers/therac.pdf
An Engineering Disaster: Therac-25 https://tildesites.bowdoin.edu/~allen/courses/cs260/readings/therac.pdf
An Investigation of the Therac-25 Accidents https://onlineethics.org/cases/therac-25/investigation-therac-25-accidents-abstract
Killed by a Machine: The Therac25 https://hackaday.com/2015/10/26/killed-by-a-machine-the-therac-25/#more-175082
Killer Bug. Therac-25: Quick-and-Dirty https://pvs-studio.com/en/blog/posts/0438/
Death and Denial: The Failure of the THERAC-25 http://users.csc.calpoly.edu/~jdalbey/SWE/Papers/THERAC25.html
Integrating Ethics Into a Computing Curriculum: A Case Study of the Therac-25: https://onlineethics.org/sites/onlineethics/files/2021-08/Therac.Huff_.pdf