09 February 2010

Evolution of a Streaming Client

It's funny how many things in the game industry start out as "I wish..."
... I wish we had Guild Halls.
... I wish we had Shader 3.0 support.
... I wish we had Battlegrounds.

... I wish our game was easier to download.

It's equally funny how many things are started by people in their own time just trying to make the game better. That's how EverQuest II's streaming client started out.

Taking an existing game (with 12GB of client assets no less) and streaming it is no simple task. I started off with a "proof of concept" just to prove that it could be done with EverQuest II. As I got into it, the concept became a full fledged project. It wasn't officially on the schedule, so it was really a labor of love on my part. After a few weeks of silently working on it, I called the producer into my office and said, "Hey, check this out." Needless to say he was pretty surprised.

There are three major conversion steps for a streaming system.

Serving the assets


EverQuest II has roughly 500,000 client-side asset files: meshes, textures, collision meshes, shaders, data files, sounds, music, you name it. Have you ever tried putting half-a-million tiny files in a directory? Take my word for it: Don't.

My first inclination was to build a custom server. The server would run off of the PAK files that we already ship with the DVD-based game client. I had grandiose plans about how to track files that clients were downloading and automatically send assets to clients that they didn't know they needed.

But alas, it was not to be. A custom server means that every client would have to be talking to our server. We would have to think about where to place the server geographically, handling varying load characteristics, availability, bandwidth, etc. These were all questions that had already been answered; we didn't need to ask them again and try to come up with our own answers.

What else is great at serving files to a large number of clients all over the world? Web servers! Specifically, HTTP servers. We already used a CDN for patching purposes--we just needed to serve all the game assets individually and on-demand now.

This caused another wrinkle. The client needs to know a list of all the assets that are available and whether the assets that it has previously downloaded are out of date. We call this the "manifest." This manifest must be fully up-to-date before the client tries to load ANY assets. My custom server knew how to negotiate a manifest with the client in a fairly bandwidth-friendly way because it was smart. CDNs are less smart--they just serve files. EQII's manifest is about 6MB, which you definitely don't want to download every time you run the game. The solution I developed involves parts of the manifest available as separate files and an overarching CRC file that is requested first. The CRC file is always requested, but it's only about 8KB. Based on comparisons with the CRC file, the client reconstructs the full manifest by grabbing parts that it needs.


Requesting the assets


Compared to everything else, serving the assets is the "easy" part. Requesting the assets is far more difficult. You're essentially replacing file system access with a network connection. That sounds a lot easier than it is. File system access is inherently goverened by the Operating System and allows any thread to open nearly any file and read data from it. A network connection is a single pipe (or in our case, a collection of pipes) that must well-defined and tightly-controlled access. Any thread that could just expect to read from a file at any point must now be synchronized with other threads requesting assets from a network resource.

Another major difference is that file system access is synchronous from an application's perspective. This means that while waiting for the Operating System to read data from a file, the thread goes to sleep and allows the system to do other things. Generally this happens so quickly that you barely notice, but network connections aren't nearly as fast as your local hard disk. For this reason, we want most of our asset requests to be asynchronous: we send the asset request and go about doing other things until it finishes at some later time.

Unfortunately, it's much easier to do synchronous reads than asynchronous. The EverQuest II client had many synchronous reads that you didn't even notice because the file system is fast enough. If they weren't made asynchronous, a streaming client would appear to 'lock up' while waiting for an asset to be fetched. Obviously, this is undesirable, and nearly unavoidable in some cases.

Furthermore, network connections in games are usually given time by the main thread to do their work (colloquially referred to as "pumping"). That won't work in this system. What if the main thread needs to synchronously load an asset (which still happens occasionally, especially on client startup)? It would be waiting for an asset to finish loading and wouldn't be able to update the network connection that it is effectively waiting on.

Clearly, a system is needed that can pump itself. Any thread can request an asset synchronously or asynchronously and the network connection continues updating as long as the client is running. The system should be able to determine if a request for an asset has already been sent and we don't need to waste bandwidth by requesting it again. The system should be able to recognize and quickly send higher priority requests. And, oh yes, let's not forget about failure cases. This piece of technology is the very heart of the streaming client.


Storing the assets

Obviously, once an asset has been downloaded, we don't want to waste bandwidth downloading that asset again. It might take minutes to enter a zone for the first time, but we don't want to take that long every time we enter that zone. Therefore, that asset must be stored locally.

A possibility is to store each asset as its own file, but this fails in practice. Operating Systems are not optimized for hundreds of thousands of tiny files. No, these files must be stored in a larger file, packed together and easily accessible.

EQII already has a packed file format. Unfortunately, the way it's set up does not lend itself to modification. When EQII's packed files are written, they're never intended to change. With new assets being downloaded constantly, these files will be changing, and often.

My solution was to develop a new type of asset database specifically suited to our needs. These database files can store a large number of tiny assets, rapidly add and remove assets and quickly retrieve individual assets.


Other Considerations

The most difficult part of building a streaming system for an existing client has been trying to change synchronous asset requests into asynchronous. Consider the following simple example:
Animation* pAnim = pAssetSystem->LoadAnimation( "animation/player_anim1" );
if ( pAnim )
{
// Do something with loaded asset
}
The above example would need to fetch player_anim1 synchronously. Changing this to be asynchronous might look like the following example:
Asset<Animation> anim( &myAnimLoadHandler );
pAssetSystem->StartLoad( &anim, "animation/player_anim1" );
...

void AnimLoadHandler::OnLoaded( Asset<Animation>& a )
{
// Do something with loaded asset
}
There's much complexity missing from the second example, but the point should be clear: making something asynchronous is much more difficult than making something synchronous.


Conclusion

Working on the streaming client was one of the most fun projects that I've ever worked on in a technical sense. It was challenging, but the results are a huge payoff.

6 comments:

Unknown said...

Fascinating to a developer like me, and especially dealing with threading issues.

On that though, what adaptations did you have to make, not just to fetch the asset, but to the game client itself to deal with an asset not being there yet? Placeholder assets, invisible "visual" objects, or what? Obviously before you could take the occasional framerate hiccup and then it was there with the non-streaming client, but since the asset is NOT there until the "OnLoaded()" method runs with the streaming one, what does the engine do in the meantime?

Joshua Kriegshauser said...

Eriol:

With the way that EQII is set up, we can fetch nearly everything that we need during zoning (i.e. the loading screen).

Other things (like an NPC) that we don't have assets for yet generally don't draw until we have assets for them. Their assets are fetched at a higher priority, meaning that they get inserted into the queue to be fetched either immediately or earlier than other pending assets.

We don't do any placeholder systems currently. It wouldn't be too difficult to do a "cloud" or some other effect to act as a placeholder before actual assets are loaded, but we haven't actually seen a need for this yet.

EQII's resource system was already largely set up to load assets asynchronously off of disk at different priorities (and changing priorities as you get closer to items). I extended this to support a network connection and fixed a few places that were still loading synchronously.

There is one case that I can think of where we actually do lock up the client waiting for a synchronous request: world collision meshes less than 5 meters away. We do this to prevent your character from falling through the world. In testing, it took very low bandwidth and very high speed in-game movement to actually get to this point, though.

Eric Heimburg said...

Hey, this is a pretty interesting read! This is a pretty huge technical achievement that many other games would do well to try to emulate. Could you estimate how many hours or days of work it took you to complete?

Anonymous said...

Nice overview of what you had to go through.

I will say that I recently reinstalled the client. I started with the streaming client as that is all we can grab from the web site now. After playing just a bit like that, I could see that I did not want to constantly be hit with those zone load DLs until the entire set of assets finally came down. I actually tracked down my disks and installed those (over internal network, ugh!) to use the 'old'' patcher.

It was hard to tell how the new client works when playing the game. Based on other games, I somewhat expected the assets to stream in the background for *all* zones, regardless of where I was. Of course, current zones get priority. From the write up, it looks like only the current zone is loaded when you zone. Any thought to having the rest of the zones start downloading while you are playing and before you zone in there? Just think of a new player in someplace like Goro... They could spend hours in that zone. Several other 'connecting' zones could easily be downloaded so that you are not hit with download times when moving around.

Just a though/wish list item for you :)

Dan Kegel said...

Hi Joshua. I'm curious - is there any way to precache the world? Let's say I'm about to take my laptop on a trip where I'll only have tethered, low bandwidth access, and I want to avoid having to download any assets while I'm playing. Can I ask the client to somehow cache everything in a few zones?

Joshua Kriegshauser said...

Hi Dan,

There's not really any way to have your client sync the assets for particular zones except by actually going there.

The client will try and fetch zone geometry for zones that you either have characters in or for zones that the current zone is connected to. However, the client doesn't know what NPCs are in the adjacent zones, so it can't fetch assets for the NPCs, only the zone geometry.