Friday, March 14, 2014

Systems Failures : Lessons from Aviation

The F-16 was the first fighter aircraft purpose-built to pull 9-g maneuvers, all by Fly-By-Wire

Mysteries pull the human mind. And few things can be so mysterious as a 777 airliner that is roughly 200 ft long and weighing around 600 tons disappearing from the radar without a trace. Six days into the MH370 disappearance, investigators from more than 10 countries have not located it physically but they are chasing clues.

Searching for premature answers is understandable in aviation disasters. An airplane is a highly valuable asset to any nation and a tool for international development and international diplomacy. It is an agent of globalization, like trucks and ships and helps build economies. The passing of a flight from one country's borders into another has behind it possibly thousands of pages worth of treaties, policies and communication protocols.

In a post 9/11 world, the world is sensitive to another Mohammad Atta turning off an airplane transponder and steering it into buildings of economic importance. The image of an aircraft being used as a suicidal missile to launch fundamentalist propaganda remains fresh in the mind.

Aviation Accidents As a Systems Failure

The days since MH370's first disappearance plays out unsurprisingly like the Air France 447 saga. People said lightning caused the carbon composite body to rupture.  Some others revisited bomb threats made against the air liner months before the incident. A few entertained catastrophic electrical system failures because of the thunderstorm it flew through. The aircraft had flown through a military territory, did it get accidentally shot down?

Of the various root causes leading to the death 228 people, few imagined that an airspeed indicator (pitot tube) that was designed rigorously to certification tests would fail its function. Fault Code "34111506" had drawn first blood.

Discussion of the AF447 ACARS reading shown on French television

But the accident, they say, was still preventable. After ice crystals began developing on the pitot tubes, instruments began giving erroneous readings inside the cockpit. Autopilot gave up, turned off and the aircraft was in manual fly mode.

Making matters worse, at the time around this circumstance, the captain was in the back on his customary rest period and the least experienced of the three pilots had primary command of the airplane. Both co-pilots didn't recover from their lack of orientation, let alone practice proper flying etiquette. The pilot who had first command failed to recognize impending stall and refused to let go of the side stick. Without valuable airspeed, the aircraft lost lift and plummeted into the black ocean with three confused pilots and a whole lot of lives on board.

After the series of investigations came to a close, most experts today agree that it was a systems failure. A failure that began with international regulatory bodies not mandating a proper training rigor which commercial airplane pilots required to act in the face of rare but dangerous circumstances. A failure that involved a sensitive cockpit control stick issue that is unique to Airbus, explained here very well by Capt.Sully Sullenberger. Mixed with the cocktail of other occurances such as bad weather and instrument malfunction, it was the perfect storm for an accident.

The crash of the Concorde 4590 was another systems failure as well. The ground level entities failed to clear the runway off sharp metal debris which had the potential to do harm. Flight's tires runs over said debris leading to a violent tire disintegration. Tire impacts the fuel cell. Fuel cell ruptures and throws fuel into two engine intakes subsequently leading to stall and flameout. Fuel ignites, starts a fire and renders two engines useless. This leads quickly to violent loss of life and property.

Finally, one more quick example from the much celebrated "Cactus" 1549 landing in the New York's Hudson River. What everyone knows by now (hopefully) is that the plane hit a flock of Canadian geese at around 2800 ft, leading to both engines losing thrust and a decisive moment from the captain to land in the river.

However, a Congressional Hearing after the incident saw Florida Republican making statements about old and inadequate number of Air Traffic Control routes to get planes out of La Guardia, a complaint of radar screen settings to "dumb down" clutter which may make it possible to block out information, such as a flock of birds. There were even remarks about the possibility of an inexperienced and unqualified Air Traffic Controller making decisions for routes in the absence of more experienced personnel.

Can you always control the flight path of Geese so they don't hit your plane? No. But if take off routes from the runway are inflexible and you combine that with the lack of information management, that can be a problem which could lead to a safety issue later down the line. This highlights a systems challenge.

Review Design Redundancy 

Aircrafts are basically complex computers into which redundancies are built. But does that answer for safety every time? A million times probably yes. One or two times, possibly no. But how do you try to widen the Yes-No probability divide so you know that you're operating in safe region for almost all of the time?

That's where a second eye on your systems design helps.

Consider the example of the F-16 air combat fighter developed in the mid-1970's by General Dynamics. At the time, it was state of the art in fighter jets, incorporating the first "fly-by-wire" mode of operation, which means any or all of the mechanical cables that moves flight control surfaces were replaced by servoactuators controlled by electrical signals.

The F-16 development engineers designed quadruply-redundant signals to each servoactuator, the reasoning being that the probability of losing all four electrical signals all at once was extremely remote and that no single-point failure condition could induce this condition.

Well, it turns out that after the design exercise, General Dynamics called together a separate group of engineers to analyze this design. This team found out that although the F-16 included quadruply-redundancy, should any of the common electrical connector plugs that these signals used fail or should the harnesses carrying the signals be cut, all signal paths would be lost. The development engineers had missed that one.

Was it a significant increase in cost to go back to the drawing board and correct that design? Yes. But it didn't cost as much as a life.

In the late 1980's, Boeing decided to do away with a heavy and obtrusively large pair of plug type 9ft x 9ft cargo doors on the starboard side of the 747 jumbo jet's belly. The original thinking of this design was that since the plug doors open inwards and wedge into the passageway, it would be an extra measure of safety against the possibility of breaching the integrity of the fuselage while the aircraft was pressurized.

But it was heavy and it had wide tracks. The new design would be a lighter weight gull wing design, that is, they would swing out and up.

To make it function, they built into it three rotary actuators and a complex system of aluminum C-shaped latches to allow the door to open/close and lock. An operator could depress the close button and have the door shut in about 15 seconds. Manually depressing the latch lock handle in the middle of door would be the final step in locking it. This final step would also isolate the opening/closing control circuits from electricity. Should the motors malfunction, a worker could manually operate the latching mechanism using a socket drive.

I obtained the locking sequence from a video released by FAA and it is shown below.

Sequence of events in the opening of 811's cargo door

Unfortunately, trouble brewed after the design was put into operational flight. A number of warnings of failing or improperly functioning door systems did not prevent the tragedy that was to come on February 24, 1989. Around 2:00 A.M on what was a routine flight by United Airlines 811 from Hawaii, a thunderous boom shot 11 million pounds of pressurized cabin air past a gaping 13ft x 15ft hole in the fuselage. In the blink of an eye, nine passengers were sucked out to their deaths.

The plane was heroically landed by the pilot. A larger disaster was averted but finger pointing took place soon, directing blame at 14 separate instances of manual door operation by technicians which investigators theorized could have damaged the door locks on this particular aircraft.

The actual clues to what happened lay not just at the bottom of the Pacific Ocean.

After recovering the blown out door, investigators were alerted to a well timed incident at Kennedy International Airport in June 1991. It was discovered that just after initial door closure in a 747-200, a stray electrical signal was able to rotate the cargo door latch open and move the 800 pound door up! This corresponded exactly with observations from the recovered cargo door which showed that the latch cams were moved to their open positions and this had thereby deformed the C shaped aluminum latches which would otherwise try and stop the cams from rotating.

Aftermath of United 811 rapid decompression

The door was a complex electromechanical setup and the weakest link was the faulty electrical wiring which permitted stray signals to actuate the door in flight. Within this design system was the larger system scope of all maintenance personnel who operated on it day in and day out and expected their actions to be safe. When a few warnings related to those doors arose in the late 80's, all the OEM could do was to criticize the ground operators rather than pinning down where the stray signal came from.

Trying to design for all possible things that could go wrong is an exercise in futility. Anything more expensive than it needs to be won't sell.  But aviation's strategy of building in redundancies has worked out well for a number of years.

Design is rarely held in a vacuum. An engineer would do well to appreciate a system's level view of design and recognize that all the paperwork, documentation, procedures, its eventual use, intended or unintended and the complex Information Management System that manages it go with that design.

Numerous air safety incidents have given hard lessons that has changed the industry forever. Though these tragic accidents are rare, the consequences are very damaging. However, reviewing initial designs with well formed evaluation criteria and implementing any subsequent lessons learned into future designs can continue to make aviation safer.

Let us hope the very best for all families and people connected to Flight MH370.

No comments: