Google Voice Set to Transform the Phone

By Kurt Cagle
March 12, 2009


In 2007, Google acquired the Grand Central Service, a VOIP based service that let users take advantage of a single phone number that could be used to forward to other phones, to record conversations and so forth. This service has been under the radar for some time, but today Google announced the new Google Voice, a free service based upon Grand Central that will debut in the next several weeks to new users.

The new service actually provides a number of features:

  • With one number, you can call all of the devices that you currently own, so as to reach you whether at home or on the road.
  • If you miss a call, Google will automatically record a voice mail for that call that can be accessed via a web interface.
  • You can also set up to record all calls going through the service, a potential boon to people like myself doing interviews.
  • What is perhaps more intriguing, Google will automatically run these calls through a speech to text filter, creating automated transcripts. While it is likely that such transcripts won't be perfect in most cases, the demos indicate that they're sufficient to pick up enough to get the gist of a call (and in conjunction with the audio make it much easier to create clean transcripts).
  • You will likely be able to get RSS/Atom feeds of phone calls as well.

Voice communication has, for the most part, represented the great void in our information record. In pure analog form, voice is remarkably difficult to search even when you have the ability to edit a recording, let alone if you have millions of hours of airtime to look through, and as such the benefits of being able to do so have largely accrued only to large security agencies capable of sitting on phone trunks with ultrafast computers.

The most immediate benefit of VOIP was economic - by breaking up audio into packets and sending them over the TCP/IP network, it began to break up the gateway role that traditional phone companies had, and made voice communication simply another mode of communication over broadband, in addition to text and video.

Similarly, speech-to-text capabilities have been evolving handily on desktop systems, but one of the things that many people have discovered over the years is that STT is neither all that terribly useful for communicating directly with a computer (especially in noisy environments) nor for the most part is it all that useful for creating documents. Speaking effects the way that we communicate in different ways than typing does - the latter often providing enough of a latency to make organizing content much easier than attempting to speak a document.

It's when you mix telephony and speech-to-text that things begin to get interesting. First, by moving the processing from a desktop computer to a dedicated server, Google is able to devote more resources to the process of providing meaningful transcriptions - and removing the barrier of having to include STT processors and large user profiles on handhelds. Presumably, such speech profiles are retained for each account, which means that increased use will make profiles more accurate (especially if both participants are using Google Voice).

Yet more significantly, once you have transcribed content - whether voice mail, personal reminders or conversations - you can search that content, you can categorize it, you can syndicate it, and you can work with it as data. The implications of this are, frankly, huge.

For instance, consider a call center for a company. Typically companies are highly dependent upon its operators to provide some context for a service call - what product it was for, what category it fell into, and so forth. Most of the time, this information gets captured at best poorly, especially if the same call operator has to deal with a number of calls in rapid succession. Yet with transcripts, it becomes possible to perform text analysis on the incoming calls, to pick up both obvious keywords and more subtle interaction data, and it becomes possible to perform semantic analysis on this data in ways that you can't with analog speech.

Another (fairly obvious) example - voice messages can be sent across low-bandwidth messaging protocols, while still linking to the audio of the initial calls. Syndicate this content, including not only the message itself but the associated metadata that can be pulled form the records, and you can get notified on SMS, Twitter, via email or in your syndication reader any time someone calls you. Combine this with something like Skype and you can call people back anytime you have access to the Internet across VOIP (or offline of course through handsets), without ever having to go through the tedium of listening to recorded voice messages.

Of course, this also makes it possible to filter content - show me only those messages from my wife in the last three days where our eldest daughter and "school" was mentioned, show me only those calls from work, filter out calls from telemarketers based upon a market-speak profile (or possibly just dynamic look-up-lists of numbers). Again, many larger organizations have had similar systems for some time, but Google's heft will make such capabilities freely available to everyone.

The feed and filtering capabilities combined in turn also scream "mashups" to me, either as a browser extension or within a web site. It's easy to envision a phone control panel in Firefox that would show your most recent calls, and that could be configured to let you go through a VOIP provider (more on that momentarily) to let you call back with a keypress. This will also serve a more subtle function ... it will continue the process of turning phone calls from being a full synchronous operation (you must respond to the unknown call) into being a fully asynchonous one (you may return the call at your own convenience).

Moreover, a logical next step (though not necessarily one that will be in Google Voice) would be to provide the reverse text-to-speech option. You could see at a glance the message, could hear it for clarification, then could text out the message to Google that would run it through a text-to-speech avatar and call back out (assuming you didn't just simply text to an alternate channel, such as SMS) using the recorded text to speech.

Indeed, one interesting spin on this would be to record professional voice actors to analyse their speech patterns, then use the filters taken from those patterns to create audio avatars for use in text-to-speech engines. Your friends would get your responses back voiced as Brad Pitt or Angelina Jolie (or Elmer Fudd, if it came to that).

While this is amusing, Google moving into this space also introduces a number of interesting, and in many cases disturbing, wrinkles. This is a VOIP operation in the purest sense of the word. Google may not own the handsets or the wires, but its not at all hard to project forward about five years when WiMax or LTE becomes prevalent, and where handsets are all tied into these networks.

Couple this with Android (or Android based open-chipsets) being available at relatively low cost, and other sets, such as the iPhone, communicating similarly over IP. If I was a telecommunications executive, I would be panicking at this point. There is no question that Google Voice could be a nuclear bomb that will wipe out most of the existing telecommunications industry in the next five years.

Add onto this the fact that the same data that would be readily searchable by you could, with a court order, adequate cash or some effective hacking, be readily searchable by others. Anyone who's spent any time in social networks knows that they create a data shadow; the data shadow cast by VOIP (which tends to be far more intimate) could be far larger. Presumably, the software recording messages and conversations would provide some clear queue that this was happening, but the potential for recording information that you didn't intend to provide still exists, and the chances that such Freudian slips become part of the record become far higher.

These are, admittedly, many of the same objections that surfaced with the first web-based email systems early in the decade. The potential for abuse by Google is relatively low, given their past history and the consequences that such abuse would have on them, and I suspect that most people will readily trade the possibility for abuse for the convenience of the service.

I'm looking forward to trying Google Voice - it offers a number of things that I've had to cobble together out of disparate pieces for a while, and to me it represents a medium that we've only just begun to explore.

Kurt Cagle is an online editor for O'Reilly Media. Please feel free to subscribe to his news feed or follow him on Twitter.

You might also be interested in:

News Topics

Recommended for You

Got a Question?