If I want to speak Navajo with my friend on the phone, I shouldn’t have to clear it first with the phone company.
Some proponents of other VoIP encryption schemes say that it offends their sensibilities to see ZRTP negotiate the cryptographic keys in the media stream, instead of in the signaling layer, as other VoIP encryption schemes do. They call it a “layer violation”. But to me (and to a number of other protocol designers I’ve talked to), it seems clear that the signaling should take care of its own key negotiation for signaling authentication, and the media layer should negotiate its own keys for media encryption. The two layers should each take care of their own cryptographic needs. If anything, doing the media encryption key negotiation in the signaling layer is the real layer violation.
In the same vein, I don’t feel that the VoIP service providers can always be trusted to act with my interests in mind, so I don’t want to involve their SIP servers in my encryption key negotiations. If I want to speak Navajo with my friend on the phone, I shouldn’t have to clear it first with the phone company. It’s just none of their business. And that’s part of what makes ZRTP so broadly appealing.
It’s also worth noting that traditional secure telephones in the PSTN world, such as the AT&T TSD-3600 or the STU-III, did all their key negotiations in the media stream. They used a modem to establish a digital channel on a normal voice grade phone line, negotiated their keys, and sent an encrypted voice stream all on the same channel. No one called it a layer violation. This is the way secure phones always worked before VoIP came along.