Is the Short Authentication String (SAS) vulnerable to an attacker with voice impersonation capabilities?

In practical terms, no. It is a mistake to think this is simply an exercise in voice impersonation (perhaps this could be called the “Rich Little” attack). Although there are digital signal processing techniques for changing a person’s voice, that does not mean a man-in-the-middle attacker can safely break into a phone conversation and inject his own short authentication string (SAS) at just the right moment. He doesn’t know exactly when or in what manner the users will choose to read aloud the SAS, or in what context they will bring it up or say it, or even which of the two speakers will say it, or if indeed they both will say it. In addition, some methods of rendering the SAS involve using a list of words, notably the PGP word list, in a manner analogous to how pilots use the NATO phonetic alphabet to convey information. This can make it even more complicated for the attacker, because these words can be worked into the conversation in unpredictable ways. If the session also includes video, the MiTM may be further deterred by the difficulty of making the lips sync with the voice-spoofed SAS. Remember that the attacker places a very high value on not being detected, and if he makes a mistake, he doesn’t get to do it over.
To further reduce the liklihood of a voice impersonation attack, we recommend that both parties should verbally repeat the SAS, if they feel that the call is likely to invite the attention of an especially resourceful opponent who is willing to take risks. We also recommend that if the user interface permits, the SAS should be rendered via the PGP word list, instead of using base-32 digits.
Some people have raised the question that even if the attacker lacks voice impersonation capabilities, it may be unsafe for people who don’t know each other’s voices to depend on the SAS procedure. This is not as much of a problem as it seems, because it isn’t necessary that they recognize each other by their voice, it’s only necessary that they detect that the voice used for the SAS procedure matches the voice in the rest of the phone conversation.

  • 72
  • 12-Jun-2017