24 votes

How a single flight plan with unexpected waypoint data caused a meltdown of the UK's air traffic control system

1 comment

  1. onyxleopard
    Link
    This part was interesting to me: Ultimately, this seems like a namespace collision issue. The fact that we can't solve this by simply appending numerals to duplicate ADEXP waypoint names is...

    This part was interesting to me:

    The software was incapable of extracting the UK portion of the ICAO flight plan, even though the flight plan was apparently valid (at least according to IFPS).

    The procedure was very fiddly and failed for a silly reason.

    Waypoint markers are not globally unique, but this is a known issue, so NATS should make sure their systems are robust enough to handle it. All other air traffic control authorities have to deal with this. NATS says the following about this in the report:

    Although there has been work by ICAO and other bodies to eradicate non-unique waypoint names there are duplicates around the world. In order to avoid confusion latest standards state that such identical designators should be geographically widely spaced. In this specific event, both of the waypoints were located outside of the UK, one towards the beginning of the route and one towards the end; approximately 4000 nautical miles apart.

    When waypoints with the same name are widely spaced, this makes flight plans unambiguous, because successive waypoints in a flight plan cannot be too far apart. They also mention possible actions they will take:

    The feasibility of working through the UK state with ICAO to remove the small number of duplicate waypoint names in the ICAO administered global dataset that relate to this incident.

    Waypoint names are clearly chosen to be short and snappy. Here's a sequence from some flight plan I found: KOMAL, ATRAK, SORES, SAKTA, ALMIK, IGORO, ATMED, etc. It's clear that the system has been designed so these names can be communicated quickly, e.g. over radio, and that pilots and air traffic controllers can become familiar with those on the routes they usually fly. Changing the name of a waypoint can be a scary operation. Uniqueness is obviously desirable, but it has to be balanced against other considerations. Including this suggestion in the initial report feels like NATS is trying to shift the blame onto ICAO.

    Furthermore, I don't see why a flight plan can't include the same geographic waypoint several times; for example for leisure flights or military exercises. Taking off and landing at the same airport is definitely a thing (called a "round-robin flight plan"). It doesn't sound like the FPRSA-R algorithm would be very robust to that.

    NATS officials are trying to spin this as:

    An air traffic meltdown in Britain was caused by a "one in 15 million" event, the boss of traffic control provider NATS said, as initial findings showed how a single flight plan with two identically labelled markers caused the chaos.

    "This was a one in 15 million chance. We've processed 15 million flight plans with this system up until this point and never seen this before," NATS CEO Martin Rolfe told the BBC, as airlines stepped up calls for compensation for the breakdown. Reuters

    The system was put in place in 2018, so what Martin Rolfe is saying here is that this sort of thing only had a chance of occurring "once every 5 years", which is apparently an acceptable frequency for having a complete air traffic control meltdown.

    Ultimately, this seems like a namespace collision issue. The fact that we can't solve this by simply appending numerals to duplicate ADEXP waypoint names is baffling to me. If the namespace is global, regardless if most human pilots don't encounter waypoints beyond their local airspace, then there should be a globally unique set of waypoint identifiers. Besides the waypoint naming standard being inadequate, it also amazes me that safety critical software is so low quality such that an exception being thrown would bring the whole thing down. That implies that the system isn't being analyzed for basic code quality issues.

    7 votes