It’s been a week since the tragic Amtrak derailment in my home town of Philadelphia. Being an avid train passenger—commuting to and from DC several times per week—and having taken the ill fated Northeast Regional Train No. 188 on multiple occasions, this has struck close to home. I am posting this blog entry from the café car of Train No. 111, the first Southbound train to commence full Amtrak service since the disaster.
I realize that it is very early, and the National Transportation Safety Board investigation is still ongoing. Speculation—especially by someone like me who is not a transportation expert—would be unproductive at best, and offensive to the victims at worst.’ Perhaps it's my job as a security professional—in which I am paid to find vulnerabilities in syste’s … or perhaps it's the recent spate of news that both cars’ and even commercial airplanes’ heavily computerized control systems can be commandeered, wirelessly, by a remote attacker … or perhaps it is the fact that the crash occurred immediately after national rail safety week and on the eve of a legislative debate on cutting Amtrak funding … or perhaps I’ve just been reading too much Pynchon… but ever since I heard that the train was speeding and that there is no direct evidence incriminating the train operators of negligence (other than the speed), the first thing that popped into my mind was: Software. I haven’t heard anyone (other than well-known security expert Simson Garfinkel) discuss it, so that’s what I’m going to do in the remainder of this post.
One topic that the media has latched onto is Positive Train Control (PTC): a technology that, if only it had been implemented, is purported to have been able to avert the crash. What the media doesn’t say is that the ACS-64 locomotive that was pulling the fateful train was already equipped with PTC. You see, the Advanced Civil Speed Enforcement System (ACSES)—’mtrak's version of PTC for the Northeast Corridor—requires components both on-board the locomotive and wayside (on the tracks). In other words, PTC will only be fully functional if both the locomotive and tracks are upgraded. Portions of track South of Philadelphia and North of New York already have support. In the case of the Philadelphia crash, the locomotive was a new model that had support, but the tracks on which it derailed did not.
I contend that a software bug in the ACSES system should not be ruled out as a potential cause of or contributing factor to the derailment. Let me be clear: I am not claiming that software was a likely cause of the crash. I am neither a transportation expert nor do I have any detailed knowledge of the ACSES implementation. The purpose of this article is to provide enough evidence that software errors are a plausible enough explanation that the possibility should at least become a part of the public discussion.
There is a reasonable precedent of software bugs causing physical catastrophes. For example, a software bug in Toyota’s electronic throttle control system recently caused the massively publicized “unintended acceleration” problem in many of their vehicles, killing at least 89 people as a result. In 2007, a group of six F-22 Raptor fighter jets experienced multiple computer crashes coincident with crossing the international dateline caused by a software bug that did not anticipate that corner case. The planes lost all navigation and communication, and were only able to make it back to land by following their tankers. Vehicles and transportation systems in general are so complex, automated, and computerized these days that a single software bug can wreak havoc.
But how could a system that is intended to provide a safeguard against crashing actually cause a crash?
A relatively recent report to congress by the Federal Railroad Commission on the implementation of PTC states that ACSES has control over the
…event recorder, train line data sensors, horn circuit, brake systems, cab signal system (if equipped), and the Communication Segment.So, presumably, the system has no control over acceleration, just deceleration.
I am perhaps about to delve too far into the sea of speculation, but as James Burke so eloquently demonstrated, a failure in one system can cascade to cause failures in seemingly independent and unconnected others.
A distributed system is one in which the failure of a computer you didn’t even know existed can render your own computer unusable.There is a display in the cab of the locomotive with a speedometer, looking something like this:
CC BY-SA 3.0, from here
When the track is PTC-enabled, there is a second speed readout on the bottom, showing the maximum speed allowed on the track. When the track is not PTC-enabled, the readout looks as pictured here. If the conductor or engineer is relying on that display to gauge the train’s current speed, he or she is thereby relying on the output of ACSES’s algorithms, programming, hardware, and sensors. A failure in any of those pieces could result in an incorrect speed readout, e.g., causing one to believe that the speed of the train were actually slower than in reality. This is similar to how Air France Flight 447 was doomed by an engineering design flaw in its airspeed sensors, which caused a failure in the autopilot software, which reported inaccurate instrumentation to the pilots, who relied upon the incorrect information, making manual piloting errors that caused the plane to crash.
I do not wish the tragedy of Amtrak Train No. 188 on anyone; it could have very easily been me sitting in that café car a week ago. While history has proven that human error is the most frequent cause of these types of accidents, we increasingly need to also look at the software for possible fault.