09 February 2010

Evolution of a Streaming Client

It's funny how many things in the game industry start out as "I wish..."
... I wish we had Guild Halls.
... I wish we had Shader 3.0 support.
... I wish we had Battlegrounds.

... I wish our game was easier to download.

It's equally funny how many things are started by people in their own time just trying to make the game better. That's how EverQuest II's streaming client started out.

Taking an existing game (with 12GB of client assets no less) and streaming it is no simple task. I started off with a "proof of concept" just to prove that it could be done with EverQuest II. As I got into it, the concept became a full fledged project. It wasn't officially on the schedule, so it was really a labor of love on my part. After a few weeks of silently working on it, I called the producer into my office and said, "Hey, check this out." Needless to say he was pretty surprised.

There are three major conversion steps for a streaming system.

Serving the assets


EverQuest II has roughly 500,000 client-side asset files: meshes, textures, collision meshes, shaders, data files, sounds, music, you name it. Have you ever tried putting half-a-million tiny files in a directory? Take my word for it: Don't.

My first inclination was to build a custom server. The server would run off of the PAK files that we already ship with the DVD-based game client. I had grandiose plans about how to track files that clients were downloading and automatically send assets to clients that they didn't know they needed.

But alas, it was not to be. A custom server means that every client would have to be talking to our server. We would have to think about where to place the server geographically, handling varying load characteristics, availability, bandwidth, etc. These were all questions that had already been answered; we didn't need to ask them again and try to come up with our own answers.

What else is great at serving files to a large number of clients all over the world? Web servers! Specifically, HTTP servers. We already used a CDN for patching purposes--we just needed to serve all the game assets individually and on-demand now.

This caused another wrinkle. The client needs to know a list of all the assets that are available and whether the assets that it has previously downloaded are out of date. We call this the "manifest." This manifest must be fully up-to-date before the client tries to load ANY assets. My custom server knew how to negotiate a manifest with the client in a fairly bandwidth-friendly way because it was smart. CDNs are less smart--they just serve files. EQII's manifest is about 6MB, which you definitely don't want to download every time you run the game. The solution I developed involves parts of the manifest available as separate files and an overarching CRC file that is requested first. The CRC file is always requested, but it's only about 8KB. Based on comparisons with the CRC file, the client reconstructs the full manifest by grabbing parts that it needs.


Requesting the assets


Compared to everything else, serving the assets is the "easy" part. Requesting the assets is far more difficult. You're essentially replacing file system access with a network connection. That sounds a lot easier than it is. File system access is inherently goverened by the Operating System and allows any thread to open nearly any file and read data from it. A network connection is a single pipe (or in our case, a collection of pipes) that must well-defined and tightly-controlled access. Any thread that could just expect to read from a file at any point must now be synchronized with other threads requesting assets from a network resource.

Another major difference is that file system access is synchronous from an application's perspective. This means that while waiting for the Operating System to read data from a file, the thread goes to sleep and allows the system to do other things. Generally this happens so quickly that you barely notice, but network connections aren't nearly as fast as your local hard disk. For this reason, we want most of our asset requests to be asynchronous: we send the asset request and go about doing other things until it finishes at some later time.

Unfortunately, it's much easier to do synchronous reads than asynchronous. The EverQuest II client had many synchronous reads that you didn't even notice because the file system is fast enough. If they weren't made asynchronous, a streaming client would appear to 'lock up' while waiting for an asset to be fetched. Obviously, this is undesirable, and nearly unavoidable in some cases.

Furthermore, network connections in games are usually given time by the main thread to do their work (colloquially referred to as "pumping"). That won't work in this system. What if the main thread needs to synchronously load an asset (which still happens occasionally, especially on client startup)? It would be waiting for an asset to finish loading and wouldn't be able to update the network connection that it is effectively waiting on.

Clearly, a system is needed that can pump itself. Any thread can request an asset synchronously or asynchronously and the network connection continues updating as long as the client is running. The system should be able to determine if a request for an asset has already been sent and we don't need to waste bandwidth by requesting it again. The system should be able to recognize and quickly send higher priority requests. And, oh yes, let's not forget about failure cases. This piece of technology is the very heart of the streaming client.


Storing the assets

Obviously, once an asset has been downloaded, we don't want to waste bandwidth downloading that asset again. It might take minutes to enter a zone for the first time, but we don't want to take that long every time we enter that zone. Therefore, that asset must be stored locally.

A possibility is to store each asset as its own file, but this fails in practice. Operating Systems are not optimized for hundreds of thousands of tiny files. No, these files must be stored in a larger file, packed together and easily accessible.

EQII already has a packed file format. Unfortunately, the way it's set up does not lend itself to modification. When EQII's packed files are written, they're never intended to change. With new assets being downloaded constantly, these files will be changing, and often.

My solution was to develop a new type of asset database specifically suited to our needs. These database files can store a large number of tiny assets, rapidly add and remove assets and quickly retrieve individual assets.


Other Considerations

The most difficult part of building a streaming system for an existing client has been trying to change synchronous asset requests into asynchronous. Consider the following simple example:
Animation* pAnim = pAssetSystem->LoadAnimation( "animation/player_anim1" );
if ( pAnim )
{
// Do something with loaded asset
}
The above example would need to fetch player_anim1 synchronously. Changing this to be asynchronous might look like the following example:
Asset<Animation> anim( &myAnimLoadHandler );
pAssetSystem->StartLoad( &anim, "animation/player_anim1" );
...

void AnimLoadHandler::OnLoaded( Asset<Animation>& a )
{
// Do something with loaded asset
}
There's much complexity missing from the second example, but the point should be clear: making something asynchronous is much more difficult than making something synchronous.


Conclusion

Working on the streaming client was one of the most fun projects that I've ever worked on in a technical sense. It was challenging, but the results are a huge payoff.