Saturday, July 13, 2013

Getting Humans Out of the Loop

July 13, 2013, 4:30 p.m.

What Can 'War Games' Teach About Disasters?

In the opening scene from the 1983 movie "War Games," two soldiers in a missile silo watching over ICBMs targeting Russia, receive what they believe to be orders for an actual launch. Before the end of their countdown, one finds himself unable to turn the key that he believes will cause the death of millions. Here is that opening:

In fact, it was only a test. A White House official, upon learning that 22% of the missile commanders failed to launch, visits NORAD. Dr. John McKittrick (Dabney Coleman) and other systems engineers at NORAD have concluded that command of the missile silos should be handled by computer. As Dr. McKittrick puts it to the White House official, over the objections of the commanding general, "I think we ought to take the men out of the loop." Ultimately, a supercomputer named WOPR (War Operation Plan Response), replaced the missile commanders, and spent its time continuously running simulations of U.S. military encounters with the Russians ("war games").

I won't reveal more about the film's story, in case you haven't yet watched it and I've inspired you to do so.

Last evening, for reasons unknown, I chose to watch it again for what was probably the tenth time during the last 30 years. Maybe it was our recent disasters that inspired me to watch it. Maybe it was the film that caused me to think about those disasters.

As it happens, both disasters occurred on the same day, one week ago, July 6th. And both raise the dilemma confronted by the characters in "War Games": from the standpoint of safety and reliability of human-machine systems is it better to have the humans in -- or "out of the loop"?

One of the disasters involved the Asiana Airlines Boeing 777 crash at the San Francisco airport on July 6, 2013, in which three died and 50 were seriously injured. Matthew L. Wald and Norimitsu Onishi, "In Asiana Crash Investigation, Early Focus Is on the Crew's Actions," New York Times, July 9, 2013, p. A12. A human had taken over the controls, as neither the airport's glide-slope indicator nor the plane's autopilot was activated. [Photo credit: John Green, San Jose Mercury News/Associated Press.]

The other was "A runaway train [that] exploded Saturday [July 6th], killing at least one person [by today, July 13th, the death toll is more like 50] and forcing more than 1,000 people to evacuate from a town in the province of Quebec, the police said. The 73-car train, which included tank cars carrying petroleum, destroyed much of downtown Lac-Mégantic, a town of about 6,000, in a blaze that continued through the day." Ian Austen, "Train Blast Kills at Least One and Forces Evacuations in Canada," New York Times, July 6, 2013. [Photo credit: Paul Chiasson, The Canadian Press/Associated Press.]

In the case of the plane crash,
The crash landing of a South Korean airliner in San Francisco has revived concerns that airline pilots get so little opportunity these days to fly without the aid of sophisticated automation that their stick-and-rudder skills are eroding. . . . [Lee Gang-guk was] flying without the aid of a key part of the airport's instrument landing system, which provides pilots with a glide slope to follow so that the plane isn't too high or low. ["Part of the instrument landing system on Runway 28 Left here had been shut down because of construction. The 777 is built to lock on to the instrument landing system, accepting its signals for lateral and horizontal navigation to land in the correct spot on the tarmac. American pilots use that capability often . . .." Matthew L. Wald and Norimitsu Onishi, "In Asiana Crash Investigation, Early Focus Is on the Crew's Actions," New York Times, July 9, 2013, p. A12. ]. . . And he was manually flying the plane with the autopilot shut off, which other pilots said is not unusual in the last stage of a landing, although some airlines prefer that their pilots use automated landing systems. . . . . Overall, automation has . . . been a boon to aviation safety, providing a consistent precision that humans can't duplicate. But pilots and safety officials have expressed concern in recent years that pilots' "automation addiction" has eroded their flying skills to the point that they sometimes don't know how to recover from stalls and other problems. Dozens of accidents in which planes stalled in flight or got into unusual positions from which pilots were unable to recover have occurred in recent years.
Joan Lowy, "Role of Aircraft Automation Eyed in Air Crash," Associated Press, July 9, 2013.

The analysis of the Lac-Mégantic tragedy is a little more nuanced, but raises similar issues:
Revered by NASA rocket engineers and surgeons alike, [renowned U.K.-based safety theorist James] Reason’s most famous legacy is the “Swiss cheese model,” which imagines safety checks to be like slices of cheese.

When a safety system is airtight and closely followed, the slices are cheddar: Rigid and impermeable to error.

Overtime, however, as employees grow complacent and safety standards slip, the slices begin to develop Swiss-cheese-like holes through which mistakes are allowed to pass.

If the holes are allowed to multiply, it is only a matter of time before a simple mistake can pass clean through all the layers of cheese and trigger a disaster.

“There is a growing appreciation that large scale disasters … are the result of separate small events that become linked and amplified in ways that are incomprehensible and unpredictable,” wrote the U.S. organizational theorist Karl E. Weick in a 1990 analysis of the 1977 Tenerife air disaster, in which two fully-loaded 747s collided at a Canary Islands airport.

The disaster is now a textbook case of organizational vulnerability: If any one of a myriad of tiny blunders (a stressed crew, foggy weather, a crowded tarmac, botched radio communications and a first officer unwilling to criticize his captain) had been avoided, Tenerife’s 583 victims would still be alive.

Similarly, although the RMS Titanic struck an iceberg — a seeming natural disaster — its maiden voyage was doomed by decades of lax British safety standards. The liner was charging at top speed through a patch of ocean known to be unusually icy, a design flaw rendered its watertight compartments useless if the ship settled too low in the water, and not only was the Titanic famously not carrying nearly enough lifeboats, but the crew lacked any official policy on how to deploy them.

Already, the latticework of errors that caused the Lac-Mégantic disaster have begun to emerge.

For starters, the event that appears to have kicked off the disaster was a locomotive fire that broke out only minutes after engineer Tom Harding had parked the train. And, even if not a single hand brake had been applied, the train should have been held in place by air brakes. Further, the train was parked on a main line instead of a siding equipped with safety features to combat runaway trains. Also, the train was hauling enough crude oil to fill three Olympic swimming pools, yet carried it in a class of railcars highlighted by regulators as being uniquely vulnerable to leaks and explosions.

Most chillingly, this has all seemingly happened before.

In 1996, a string of 20 grain cars rolled free in an Edson, Alta., train yard, accelerating to 50 km/h before smashing head-on into the locomotive of a CN freight train. In the resulting explosion and fire, three crew members were killed.

As possibly at Lac-Mégantic, the ultimate cause of the crash was the faulty application of hand brakes: Train crews had received “little supervision” in how to properly set the brakes and the brakes they did set were almost useless due to missing parts.
Tristin Hopper, "'Complex' Latticework of Errors That Caused Lac-Mégantic Train Disaster Has Just Begun to Emerge," National Post July 13, 2013

So what's the answer? The answer is that there is no answer, as succinctly put in the aphorism, "To err is human, to really foul things up requires a computer." When informed that the software to run President Reagan's Strategic Defense Initiative ("Star Wars") program ran to over 100 million lines of code, a computer science professor is said to have responded, "I've never seen a computer program of over three lines of code that worked the first time it was run."

On the other hand, planes -- commercial airlines as well as drones -- can fly themselves. There are reasons to have pilots and flight attendants on flights; but it's not because the plane is incapable of getting you there by itself. We already have cars that can drive themselves. Trains have automatic braking systems. Tractors navigating by GPS can more precisely apply fertilizer, and plant seeds in straighter rows, than sharp-eyed farmers. From the standpoint of safety, we might be better off spending more on computer programmers and less on equipment operators.

"Despite rising fears of technology displacing huge swaths of the workforce, there remain huge classes of jobs that robots (and low-wage foreign workers) still can’t replace in the United States, and won’t replace any time soon. To land the best of those jobs, workers need sophisticated vocabularies, advanced problem-solving abilities and other high-value skills that the U.S. economy does a good job of bestowing on young people from wealthy families — but can’t seem to deliver to poor and middle-class kids. . . . In the past 20 years, almost all the net job gains were in the two areas computers struggle with the most: working with new info (for example, figuring out a customer’s Internet service issues) and solving unstructured problems (such as repairing cars when computer diagnostics can’t pinpoint what’s wrong)." Jim Tankersley, "Have the Robots Come for the Middle Class?" Washington Post, July 12, 2013.

There are many among our skilled labor force who are experienced and conscientious. We wouldn't have the country we live in without their skills and efforts. They are often under-appreciated, under-paid, and working in unsafe conditions.

But relying on humans comes with risks. Employers may cut the workforce below the number necessary to keep equipment properly maintained, replaced when necessary, and watched over when operating. New, young employees may not be adequately trained and experienced. Long hours may increase the likelihood of accidents. (The train wreck and explosion in Lac-Mégantic occurred after 1:00 a.m.; the Asiana crash at the end of a ten-hour night flight across the Pacific Ocean.) An employee may fail to show up, or be impaired by alcohol or other drugs, talking on a cell phone or texting.

"As planemakers build ever-safer jets, it’s often the split-second decisions by humans at the controls that can make the biggest difference between a smooth landing and a flight that ends in disaster. The last moments of Asiana Airlines . . . Flight 214 . . . underscore the stakes in the cockpit even in aircraft as sophisticated as [a Boeing 777], according to safety consultants, retired pilots and aviation scholars following the U.S. investigation. . . . 'Whether it’s a disaster or a close call comes down to the pilot,” said Les Westbrooks, a former commercial and military pilot who now teaches airline operations at Embry-Riddle Aeronautical University. “Airplanes have incredible automation. But when the human has to exercise judgment, you can’t design around that.'” Mary Jane Credeur, Mary Schlangenstein and Julie Johnsson, "Asiana Crash Shows Lessons of Pilots Trumping Technology," Bloomberg, July 9, 2013.

Of course, there's more to humans in the workplace than their superiority to computers -- when that's the case. If we fully automate everything the computer programmers can automate, which is a lot, we end up with an even more severe unemployment problem than we have now. And we create stores and service centers that seemingly have no employees -- after entering a big box store you're on your own with nothing but a hunting license, and when you go to check out the only reason you get assistance with the self-checkout is that without it frustrated customers would simply give up and leave without paying. There would be nothing but automated answering, voice recognition phones.

But now that computers can beat the world chess champions, and win at "Jeopardy," we better get used to fellow workers who look a whole lot like robots, and be willing to hand the controls over to them once it becomes obvious that they really can do a better job than we can when landing a plane or driving in bumper-to-bumper freeway traffic.

# # #


Anonymous said...

I enjoyed your most recent blog post. Of course I immediately thought of the culture at my workplace, and a totally preventable accident in which 9 were killed and 80+ injured. In the "real world" those responsible would have been charged with reckless endangerment, public endangerment, and/or criminally negligent manslaughter/homicide -- at a minimum. Not to mention civil charges. As it is, there were absolutely no consequences. Zero. The same was true when management caused my two coworkers' deaths. A formal report said they were primarily responsible for the accident, yet nothing happened to the employees working there.

“Yep, just a bad day at work. A few people died. What're ya gonna do -- am I right or am I right?”

Full immunity -- it's a good thing!

Without any accountability, it is all but guaranteed that we will continue to have these types of preventable accidents, injuries, and deaths.

Nick said...

Advertising Notice

Notice Regarding Advertising: This blog runs an open comments section. All comments related to blog entries have (so far) remained posted, regardless of how critical. Although I would prefer that those posting comments identify themselves, anonymous comments are also accepted.

The only limitation is that advertising posing as comments will be removed. That is why one or more of the comments posted on this blog entry, containing links to unrelated matter, have been deleted.
-- Nick