Musings of a telco vendor about the cloud thing

Over the last days, I was working on deploying and testing our open-source VoIP soft-switch SPCE v2.2-rc1 on an Amazon EC2 instance in order to provide a ready-to-run AMI for our community.

Coincidently, I came across this tweet from @martingeddes:

When you hear a vendor selling you “cloud”, remember what they really have on offer is “fog”.

The funny thing is that although I was just managing to get our SPCE to work on “the cloud” (and mind you, I was pretty enthusiastic about it), the tweet finally expressed in one sentence the ambivalence I have with it for quite some time now.

What’s the fuzz about “Cloud Telephony”?

Cloud Telephony is a pretty hot topic at the moment in the web development world. It started with ribbit a couple of years ago (and it took me like 3 years to “get” what they were doing) and got a lot of attraction with the emerge of Twilio and Tropo. Telephony was considered quite a boring topic (who really cared about integrating a java applet on a website?) until various APIs provided really easy access to telephony features.

So with all this asynchronous access to telephony APIs, a whole lot of telephony applications, tightly integrated into websites, pop up everywhere on the Internet. It’s really becoming a whole new ecosystem. Remember that you were able to do the same thing with Asterisk over 5 years ago? Admittedly it’s much easier now.

But honestly, do you really know what’s going on behind the scene when you issue a request to call somebody?

How traditional ITSPs operate

If you’re looking for broadband offerings, you most likely get a triple-play bundle (Internet, Telephony, Mobile or IPTV). Usually, all ISPs except incumbents offer telephony via IP, even if you don’t know it. They lease the last mile from the incumbent (DSL) or have their own access networks (Cable, WiMax, WiFi), and telephony is just another service on top of their IP network. Most routers, EMTAs, modems etc. provide a phone jack, so you can just reuse your old phone. >90% of residential customers don’t care about the underlying technology, they just want to use their phones. And that’s fine.

But do you know what it takes for a telephony switch vendor to get deployed at an ITSP to route those calls? Telephony networks (for a good reason) are still renown to operate at 5-9, which means an availability of 99.999% per year. Yeah, that’s around five and a half minutes of downtime. Per year. And that’s also fine since you don’t want to choke on something while waiting for your ITSP to get the phone service back up due to a software failure.

Now to be selected as a vendor for a serious ITSP, you’ve to disclose your whole system architecture, from the software components to the algorithms for various load balancing and fail-over mechanisms down to the hardware being used. The point here is that a buyer can evaluate how good or bad, compared to its competitors, a telephony vendor’s platform is designed.

How does this apply to the Cloud?

When you, as a vendor, offer services like “Cloud Telephony”, then you’ve control over your own software. The good thing here is (e.g. with Amazon EC2) that you can scale out horizontally quite quickly when it comes to hardware because new instances are launched pretty quickly.

The bad thing is that you still need to take care of scalability on an application level. Adding more server instances doesn’t help you much if you can’t leverage them on an application level. And then again, there is not much difference if you deploy your software on real hardware or in “the cloud”, because if it scales on the former, it will automatically do so on the latter. You also don’t have any detailed insight into the underlying software and hardware architecture, since you’re happily decoupled from that problem. Good for you – as long as everything runs fine.

But the most important thing is this: Your cloud fails!

For me, the whole EC2 cluster always was and still just is a convenient way to quickly launch more server instances. At some point during all this hype I was really thinking that I probably miss something, like that you’re not responsible anymore for providing active/active or active/standby services since this is taken care of “in the cloud”. Fortunately, me (and a lot of others) always doubted that. Those who didn’t were punched into their face quite heavily by the latest Amazon outage.

So what’s the point?

The term “cloud telephony” actually says nothing at all. For example, with the SPCE running on an Amazon EC2 instance, it means you don’t have to pay up-front for the bare metal and its power and cooling costs. Not more and not less. And this applies to any other vendor or service. What if one of their processes crashes? How would the cloud help? Right, it won’t.

If you want to get a reliable service, dig deep into their architecture to find out how it works and how they operate. Just saying “we’re reliable because we use the Cloud” should raise a HUGE red flag.

So if Twilio, Tropo and Co. want to evolve from a pure “End-Customer Approach” to something like a B2B reselling model, then they should be prepared for some serious questions regarding architecture and stuff. Telling our ITSPs to use an additional external service “running in the cloud” on top of our soft-switch (which goes through lots of cycles of acceptance testing at the ITSP before being deployed to real customers) won’t impress them much, although their customers might still appreciate it.

But mind you, if it breaks, it’s the ITSP’s fault, not the fault of the 3rd party vendor. At least from the end-customers point of view.