10 July 2008

Voice Chatting Across Continents

EverQuest II is product that is translated into several different languages and runs in several different regions around the globe. We want our voice chat to be game-wide, so that means making it work in those different regions. (Having a Universal Translator would be nice too, but I think that's a few years away).

Anyways, different languages have different character sets for names and text, so EQII uses a combination of UTF-8 encoding and Unicode to represent all text (names, places, quests, player communication, etc). However, a common signaling protocol used by voice chat applications (including Vivox's that we're using for EQII) is a case-insensitive, ASCII-only protocol known as SIP. This leaves us with a dilemma: how do we represent Unicode player names and channel names in the SIP protocol?

Originally we were going to use Base64 encoding, but this has a variety of problems for our application:
  • It requires case sensitivity, but SIP is case insensitive
  • Encoded text is not human readable
  • It increases the size of the data
In the end, we found and decided to use an interesting encoding scheme called Punycode. Consider the Russian word "свободными". UTF-8 encoding requires 20 bytes. Our encoding must fit within the limit of about 60 characters. Standard URL percent encoding requires 60 characters. Base64 encoding would probably require around 26+ characters. Punycode requires only 13 characters! What's more is that Punycode is human readable for ASCII strings ("Autenil" encoded in Punycode becomes "Autenil-", but "свободными" becomes "90abhqtfebx0i").

I've enjoyed discovering this neat encoding algorithm (and it's been a lifesaver for our Voice Chat implementation). If you need to efficiently represent Unicode in ASCII, Punycode might be the way to go.

No comments: