Editor’s Note: A few weeks back I was reading an article about recent plane crashes and disappearances. One of those pieces discussed the concept of triple redundant fail-safe measures to prevent such tragedies. It got me thinking about the necessity of redundancies in complex systems and how that relates to cities. I posed the thought to twitter and a fellow on Chris Storm chimed in. I asked him to expand on the analogy since he’s an expert in the field. The following is his response. It’s great. I just made one minor edit where he puts the punchline ahead of the joke.
People rarely speak of cities in terms of systems engineering. But just like an aircraft is comprised of various structural, propulsion, and avionics sub-systems, a city is an organization of multiple, interdependent sub-systems. Both systems may be prone to failures than can propagate throughout and cause the system as a whole to fail. In an aircraft, a failure could cause a loss of functionality or even loss of the aircraft. Likewise, a city can lose its function and, ultimately, its purpose. Unlike an aircraft, however, most cities do not purposely feature triple-redundant sub-systems and other reliability features typical of modern aircraft designs. But what if they did?
* * *
Aerospace engineers aim to prevent failures and improve reliability through the aircraft design process. For example, engineers can use analytics techniques–such as the failure modes and effect analysis (FEMA) approach–to demonstrate that an aircraft satisfies various safety and reliability requirements.
Under a FEMA approach, system engineers analyze failures based on the failure’s causes, modes, and effects. A failure cause is the underlying cause or sequence of causes that (ultimately) results in a failure mode. A failure mode is the specific manner by which failure of a component occurs. A failure effect is the immediate consequence of a failure mode. Failure effects may propagate up a system, such that a failure effect in a local component may lead to a failure at the next-higher level and a failure at the total-system level. A failure cause may be associated with multiple potential failure modes, and each failure mode may be associated with different failure effects.
System risk may be a function of (1) the probability of failure causes and (2) the severity of potential failure effects. System risk might be high, for example, if a failure mode has the potential for a catastrophic failure effect on the system, even if the failure cause leading to this failure mode is extremely unlikely to occur. Likewise, system risk might be high if low-severity failure effects are expected to occur frequently. Engineers may reduce this system risk using a myriad of techniques. Examples include:
- Eliminating potential causes of failure.
- Eliminating unacceptable failure modes.
- Limiting how failure modes propagate through the system.
Consider a chain on an escalator. Although designers may reduce the likelihood of, or even eliminate, many potential causes of failure, eliminating all causes of failure is simply not feasible. Designers may, however, eliminate failure modes that could result in catastrophic failure effects (injury to passengers) and accept failure modes that could result in non-catastrophic failure effects. Or as Mitch Hedberg famously said, an escalator can never break; it can only become stairs.
Designers may also limit how failure modes propagate throughout a system by providing redundant components and sub-systems. In the escalator example, an escalator chain may include redundant links so that failure of a single link does not result in a failure at the next-higher level (the chain). An escalator may also include multiple chains so that failure at the next-higher level (the chain) does not result in a failure at the total-system level (the escalator).
In some aircraft, redundant capability (even triple-redundancy) may be required for flight-critical sub-systems. For example, consider an aircraft with a fly-by-wire flight control system. Although component and sub-system testing may be suitable for eliminating many potential causes of failure and some unacceptable failure modes, electronic equipment could be subject to random failures that cannot be eliminated through testing and component design. In this example, the potentially-catastrophic failure effect may cause the system risk to be high even though the likelihood of a random failure may be low. Redundancy, however, may prevent such random failures within the flight control system from propagating up and causing catastrophic failure of the aircraft.
We can change how system risk is analyzed by changing how the “total system” is defined. In the aircraft example, it’s easy to see how the aircraft represents the total system. Escalators, on the other hand, may represent a sub-system within a larger system. After all, escalators are rarely the only means of changing floors in a building. Elevators and stairs also provide this function. Thus, if we define the “total system” as the system for transporting people between floors, then the escalator, elevator, and stairs may represent redundant subsystems that each can provide this function.
This brings us back to city building. If engineers want to improve reliability of a city’s road system, for example, they can reduce congestion by increasing vehicular capacity and add redundancy by providing alternative routes. This reasoning, taken to its extreme, can prompt some city planners to build new highways parallel to existing highways through urban areas (not to mention any names).
But the road network is not the total system in a city, just as an escalator is not the total transportation system in a building. As stated at the outset, a city comprises a multitude of interacting sub-systems, including commerce, residential, entertainment, education, public safety, and transportation systems. Failure affects arising out of one sub-system can be potential causes of failure in other sub-systems.
Therefore, city builders should provide redundancy within the city’s sub-systems to limit how far failures can propagate. Consider a road network within a city’s transportation system. A traffic accident on a highway might be identified as potential cause of failure. Traffic congestion is an obvious failure mode resulting from this cause of failure. Due to the limited-access nature of highways, the traffic failure propagates across the highway system, and the surrounding surface streets may provided limited redundancy. The severity of this failure mode may depend on just how important this highway segment is to the overall road system. The system risk increases as more people rely on this particular highway segment for transit.
Unlike a limited-access highway, a grid network of surface streets has inherent redundant capability to handle traffic accidents. But from an individual’s perspective, failure modes exist that eliminate capability of the entire road network. For example, if the individual does not have a car or has a car that is not in working order, then the inherent redundancy of the road network is fairly irrelevant. Mass transit may offer redundant capability to a city’s road network and may even be the primary mode of transportation for some commuters. In some cities, however, the same failure causes may result in failure modes for both the road and mass transit transportation systems. A Texas ice storm, for example, can shut down city streets and prevent light-rail lines from operating. As another example, neighborhood disruptions can shut down local access and prevent both road and rail service.
Therefore, for a city to achieve triple redundancy similar to that of a flight-critical aircraft system, cities must enable cycling and walking systems to provide similar capability to road and mass transit systems. For short trips, this goal is very achievable. The closer people live to their jobs, for example, the easier it is for a city to provide a triple-redundant transportation system. Long highway commutes, on the other hand, prevent triple redundancy. Walking and cycling are not viable options for long, daily commutes. Likewise, mass transit is less efficient over longer distances and is unlikely to provide redundant capability to the majority of users.
Chris Storm is an Assistant General Counsel at Bell Helicopter, where he specializes in intellectual property and technology commercialization. Chris advises Bell’s engineers and business leaders on a variety of matters, including Bell Helicopter’s development of the world’s first fly-by-wire commercial helicopter, the “Bell 525 Relentless.” Chris received his B.S. in Aerospace Engineering from The University of Texas and his J.D. from the University of Houston Law Center. Chris also expects to receive an M.S. in Technology Commercialization from The University of Texas in 2016.