DTLS-SRTP is a derivative of TLS, a protocol designed for web browsers to talk to web servers, using a centrally managed PKI. A client-server protocol like TLS can work well in a client-server environment, but a phone call between two human beings is an ad-hoc peer-to-peer relationship, and the cryptographic key negotiations should reflect that. Instead of recycling a client-server protocol, ZRTP is purpose-built for VoIP. All these cryptographic protocols have a goal of negotiating keys in a way that stops man-in-the-middle (MiTM) attacks. To accomplish this, ZRTP doesn’t need a PKI, and we don’t need help from servers controlled by the phone company. Instead, ZRTP has two humans verbally compare a short authentication string to detect if there is a MiTM. Human beings can readily see if there is a MiTM by direct evidence and common sense. ZRTP harnesses the immense resources at both endpoints, which each have a brain with a hundred trillion synapses and the unique power of human intuition.
DTLS-SRTP tries to repurpose itself to VoIP’s peer-to-peer environment, but it cannot escape its client-server roots, and that’s why it depends so completely on the SIP servers to secure the connection. DTLS-SRTP’s MiTM protection collapses in the absence of end-to-end integrity protection in the SIP layer. The only mechanism for this in SIP (besides S/MIME which has been around for 6 years without any implementation) is Enhanced SIP Identity (RFC 4474). However, it turns out that if you are using your SIP phone to call a regular phone number, then RFC 4474 doesn’t provide integrity protection, and MiTM protection for DTLS-SRTP collapses. Why? Because for a regular phone number, the SIP identity is of the form sip:+email@example.com asserted by example.com. A MiTM can just remove the RFC 4474 signature, change the a=fingerprint, then re-sign the identity as sip:+firstname.lastname@example.org asserted by example2.com. DTLS-SRTP Elephant in the Room - cartoon How does the callee know that this phone number is actually originating from example.com and not example2.com? There is no way to tell, hence, DTLS-SRTP has no protection from MiTM attacks. Regular phone numbers will be commonly used as identifiers for SIP phone customers for a long time, so this will continue to be a major security weakness for DTLS-SRTP.
Even if this problem with regular phone numbers is somehow solved, we are still left with the Elephant in the Room that in the final analysis, the security of DTLS-SRTP requires a PKI. The PKI dependency will either be contained within DTLS-SRTP itself or within SIP, because of the DTLS-SRTP dependency on SIP end-to-end integrity. All SIP end-to-end integrity mechanisms require a PKI, and all the complexity and bureaucracy that implies. Many years of experience in the crypto industry leads us to believe that PKI is an inappropriate approach to achieving media security in VoIP.
Some vendors who plan to implement DTLS-SRTP products say they will use self-signed public key certificates if no PKI is available. But a self-signed certificate offers no protection against a MiTM attack. If they don’t use a PKI, and have no other MiTM attack countermeasure, such as key continuity or a short authentication string, it just won’t be a secure phone.
Although far less important than the aforementioned pachyderm, here’s another strike against this protocol: DTLS-SRTP must bear the additional cost of a signature calculation of its own, in addition to the signature calculation the SIP layer uses to achieve its integrity protection. ZRTP needs no signature calculation of its own to leverage the signature calculation carried out in the SIP layer. This may be relevant in low-power mobile platforms, or in highly loaded servers.