The Joshua Tree: March 2008

30 March 2008

nVidia 8800 Frustrations

We've had several EQ2 players complaining about stuttering issues with their nVidia GeForce 8800s. Enough that I requested a machine with the card in it from our Compatibility Lab so that we could diagnose the problem. We were able to reproduce the problem right away. The programmer who looked into the problem used MS PIX to isolate the cause and discovered that some Release() calls were taking over 100ms! So, it appears that sometimes when lots of dynamic allocation (and freeing) is happening, returning resources to Direct3D can take WAY too long.

It seems EQ2 isn't the only game affected. UT2003 and UT2004 and Team Fortress 2 as well as Test Drive Unlimited have people reporting problems. Even a Mac user reports issues using applications. Curiously, however, some games seem to be largely unaffected but I don't know why.

With so many games affected but comparable- and lower-end cards not having issues, I suspect the card and/or the drivers. The data we collected from PIX regarding Release() supports this theory. Furthermore, we've been able to adapt one of the Direct3D sample apps to demonstrate this issue. The fact that newer cards seem to work better and some people flat out don't have the issue suggest that it might be hardware, but I'm just speculating. nVidia is usually hailed as one of the better hardware/driver providers, but recently reports are surfacing that paint a different picture (pun only slightly intended).

So, assuming nVidia doesn't (or can't) fix this problem, where do we go from here? If the problem is specifically limited to Release() calls, we could cache textures and vertex and index buffers instead of releasing them, but this has its own set of problems. We've tried to open contact with nVidia, ~~but so far have been unable to get a response~~ (update).

If you're using a GeForce 8800-series card with EQ2 and have stuttering problems, my apologies. We're working on the problem.

Update 4/4/2008: See post
Update 4/16/2008: See post
Update 7/1/2008: See post
Posts tagged "nVidia"

26 March 2008

Infinite Loops Are Bad

When working on an MMO server, one thing that you're never going to get rid of are crashes. We generally don't like to admit it, but it's true. Crashes can be caused by many things including bad software, bad hardware, bad programmers, bad state, bad anything. Linux (and Windows) MMO server developers typically have crash recovery down to a science: dump a core file (or minidump under Windows), mail it off to the programmers and restart the process.

Worse than crashes (and hopefully less frequent) are a little problem known as Infinite Loops. But if you have a server process lock up, how do you get useful information in order to fix it? We want to treat Infinite Loops like crashes and get a core file or minidump that shows where the lockup occurred and hopefully why.

The concept behind detecting an infinite loop is trivial: Start the main game loop as a separate thread and increment a value over time (say, every main loop iteration). The initial thread then looks for this value to be changed over time. If the value goes long enough without having changed, your process can be considered to have locked up. I'll leave it as an exercise for the reader.

So how does our infinite loop detection thread get the other thread to drop useful information?

In Linux (and most other flavors of *nix) it's really easy. Just do a pthread_kill() to the locked-up game thread with a signal that drops a core (SIGABRT usually works nicely) and then _exit().

For Windows, there's a little more code to write and you have to have dbghelp.dll. Basically, just write a minidump file. The normal minidump file should have information on all threads, so even though you're writing it from the main thread doing the infinite loop detection, it will include all the necessary info about your locked-up thread.

Here's a very simple sample of a Linux program for making a thread drop a core:

#include <pthread.h>
#include <stdio.h>
#include <signal.h>
#include <unistd.h>

#include <sys/types.h>
#include <sys/stat.h>

void* threadMain( void* )
{
        while ( 1 )
                ;
        return 0;
}


int main( int argc, char** argv )
{
        pthread_t id;
        if ( 0 != pthread_create( &id, 0, threadMain, 0 ) )
        {
                perror( "pthread_create failed" );
                return 1;
        }

        sleep( 1 );

        pthread_kill( id, SIGABRT );
        _exit( 0xabcd );
}

Why am I writing about this? Previously EQ2's infinite loop detection would essentially do this:

assert( 0 && "INFINITE LOOP DETECTED" );

Which would crash the process and cause it to be restarted, but doesn't give us any useful information about where or why it locked up. In a past life when working on UO we didn't even have any infinite loop detection, so a locked-up server would just blackhole people and prevent the shard from doing synchronized backups. On EQ2 we don't typically see a lot of infinite loops at this point, but it's recently been changed to give us more information. And that's not something that you're likely to see in the patch notes ;)

Edit 3/29/08 10:23 PM:
Response based on a comment by KC (see comment below). Unfortunately, blogger doesn't support <pre> or <code> tags :P
Infinite Loops in EQ2 are usually caused by programming errors. Our script system (lua) has an instruction limit that the EQ2 team added so Designers can't cause infinite loops. An unending dialog cycle (if I'm interpreting you correctly) wouldn't be an infinite loop since it requires user interaction (the server is waiting on the client before serving up the next dialog). The simplest infinite loop would be something like this:

while(true){/*do nothing*/}

This just "does nothing" forever without giving the server the chance to do anything else. We actually have a dev-only command that does this for testing the infinite loop handler :)

The last time we actually had a legitimate infinite loop issue, the code looked something like this:

for ( unsigned i = 0; i != kiCount; /*increment in loop*/ )
{
    if ( shouldSkip( i ) )
    {
        // oops, forgot to increment i!
        continue;
    }
    // real logic
    ++i;
}

21 March 2008

EQ2 Devs playing on Test

Greg "Rothgar" Spence started a thread on the EQ2 forums asking for classes that the devs should create characters and play with.

I started a Dwarf Ranger named "Autenil" in Greater Faydark on Test server.

Feel free to message me if you see me in-game.

12 March 2008

Voice chat is coming to EverQuest II

A few days ago, SOE and Vivox announced an agreement to bring voice chat and more to SOE's lineup of games.

Personally, I'm very excited about this. And it's not just because I'm the one working on it for EverQuest II. Some of the features mentioned in the press release will be interesting, such as the voice font feature (which allows you to change your voice real-time) and the ability to call into guild chat with a regular phone.

07 March 2008

Memory Usage and the EverQuest II Client

3D graphics tend to use a good amount of memory which increases with their level of detail, texture size, complexity of geometry, etc. Combined with sound, scenegraphs and other things that make up a game client and you can use a lot of memory.

The EQ2 client is pretty memory hungry, but we're making strides in changing how that works a little bit. Currently, we have several reports of issues with Vista. While it's not technically a supported OS for EQ2, we still want to ensure that people can play the game. Unfortunately, either Vista's DX9 compatibility or Vista drivers seem to be using a LOT more memory then they did on Windows XP.

To mitigate this problem, there's a few things we're doing.

/3gb switch and 64-bit support - Most Windows applications are not aware of large addresses (i.e. memory addresses over 2GB). There is a linker switch that tells Windows that your application can use addresses over 2GB. This has a dual benefit: for users on 64-bit Windows, the EQ2 client will have access to nearly the entire 4GB address space. For users on 32-bit Windows, there is a switch that can be added to your boot.ini file that tells Windows to allow up to 3GB user address space. While this change may prevent out-of-memory crashes, they don't solve the original problem that the client uses too much memory.
Texture Downsampling - This system picks textures that are far enough away and unloads the highest quality mip level. This can lead to significant memory savings as well as saving texture memory on the video card, without significantly degrading the quality of the graphics.
Non-drawn geometry unloading - Since the game loads assets basically within a radius of the camera, sometimes things get loaded but are occluded and used rarely. Things that haven't been drawn in a long time don't need to be kept in memory and can be unloaded. This saves both system memory and video memory since we don't need to hold onto textures and vertex/index buffers that aren't being used. Generally, the performance cost is negligible. Loading things back in has some cost, but assets have a better chance of being in the disk cache greatly reducing the load time.

The systems mentioned above also won't come into play until you hit a certain amount of memory in use. There are other things that we're looking at too, but the changes mentioned above appear to have the most bang for the buck.