The Joshua Tree

14 November 2008

EQII Dev Blog: On Your Mark, Get Set...

I wrote a new blog post for the EQII Dev Blog:

We on the EverQuest II team are excited! We’re less than a week away from launching our Fifth expansion, The Shadow Odyssey! For some on the team, this will be the first title that they’re credited on. For others, it’s another box on the shelf that joins a vast array of excellent titles bearing their name. Everyone on the team shares feelings of joy, triumph and pride as we prepare for next Tuesday. Very soon, you will be able to experience the fruits of our labor over the past year.

11 November 2008

Braid

If you have an Xbox-360 and you haven't tried out Braid, you're missing a very fun game. Get on Xbox Live Arcade and at least try out the free trial, and then go buy the game.

Braid (by Jonathan Blow) is a platform-based puzzle game (with many throwbacks to the original Super Mario game) that uses various methods of controlling time to solve puzzles. It challenges how you think about time by providing a different mechanic throughout each of five levels. The gameplay is unique, the music is captivating and the art is vibrant and rich. The end of the game is mind-blowing (I'll admit that I didn't "get it" the first time I played through the last level).

I can't say enough great things about this awesome game. Go check it out for yourself.

04 November 2008

Americans: Go Vote!

Today is Election Day.

Regardless of what you believe or who you're supporting, this election will make history.

Be a part of it. Go out and vote!

Sure, it's not easy sometimes. The lines can be long. Voting machines can malfunction and cause delays. Weather can be an issue.

But it's a right and a responsibility. We The People of the United States of America have this privilege to choose those that govern over us. Use it.

Go Vote.

06 October 2008

unique_vector

Occasionally it's fun to share a programming experience from work. Last week I found myself having to make changes to a tool that builds a particular client data file for EverQuest II. I discovered that this tool would build unique arrays by searching linearly for the content and adding it if it didn't exist. Some of these arrays would get very large (on the order of 30,000 items or more) and the number of checks even larger (200,000+). There are much faster ways to search for a unique item, but we had the constraint of needing to assign a permanent index for the item (i.e. its position in an array).

I wrote out a quick nifty STL-like helper class to accomplish the same behavior, but much faster. The class definition looks like this (note that I used unsigned instead of defining my own size_type for brevity):

template < typename T >
class unique_vector
{
    typedef std::vector< const T* > Tvector;
    typedef std::map< T, unsigned > Tmap;

            Tmap    m_map;
    mutable Tvector m_vector;
    mutable bool    m_dirty;

    void m_build() const;

public:
    unique_vector();

    unsigned size() const;
    unsigned add( const T& );
    const T& operator [] ( unsigned ) const;
};

This stitches together a map (to achieve uniqueness and provide faster lookup time) and a vector to provide index-based lookups. This is mainly achieved through two functions. Let's take a look at the add() function:

template < typename T >
inline unsigned unique_vector<T>::add( const T& t )
{
    unsigned index = m_map.size();
    std::pair< Tmap::iterator, bool > result =
        m_map.insert( std::make_pair( t, index ) );
    m_dirty = true;
    return (*result.first).second;
}

This function relies on the fact that std::map::insert() fails if the key (t in this case) is already present in the map, but returns an iterator to the existing item. This function basically tries to add the item to the map with the next index (using m_map.size() as the next index so the first item will have index 0, next will have index 1 and so on). If it's not there, insert() will succeed and return an iterator to the new record (using the provided index). If it's already there, insert() fails but returns the existing item. The function returns the index associated with the record, whether it's the new index from a freshly inserted item or the old index from a pre-existing item.

Notice that the m_dirty flag is set to true. This indicates that the vector needs to be rebuilt. The operator[]() calls m_build() if m_dirty is set to true:

template < typename T >
inline void unique_vector<T>::m_build() const
{
    m_vector.resize( m_map.size() );
    for ( Tmap::const_iterator iter( m_map.begin() );
          iter != m_map.end();
          ++iter )
    {
        m_vector[ (*iter).second ] = &(*iter).first;
    }
    m_dirty = false;
}

This function quickly builds the vector using the unique indexes stored in the map as array indexes and pointers to the objects used as map keys. The function is const (and data members mutable) to allow it to be called from the const operator[]() function.

So, how much of an improvement was it? The tool uses complex structures as T for this template, but I ran a simple test using 10,000 pre-allocated std::string objects and 100,000 attempted adds. On my AMD 64 4000+ with 2GB RAM, the linear version took 5,077.160 ms (5 seconds). The unique_vector ran the same test in 49.972 ms. Algorithm changes almost always trump when optimizing.

P.S. In case you were wondering, here are the remaining very simple functions:

template < typename T >
inline unique_vector<T>::unique_vector() :
    m_dirty( false )
{}

template < typename T >
inline unsigned unique_vector<T>::size() const
{
    return m_map.size();
}

template < typename T >
inline const T& unique_vector<T>::operator [] ( unsigned u ) const
{
    if ( m_dirty ) m_build();
    return *m_vector[ u ];
}

27 September 2008

EQII Getting Multicore Support... Wait, what?!

An interesting note showed up in some recent EQII Test Patch Notes. Yes, it's true. We're working on "Multicore support" for EQII and plan to release the first iteration with Game Update 49. "Multicore Support" is a fan-friendly way of supporting multiple CPUs by way of multiple threads of execution in the client.

I thought I'd take some time to explain what led up to this and offer some explanation of the difficulties of threading. EQII was released in late 2004, but development actually started about five years earlier. In 2004, multiple-CPU machines were generally very expensive and reserved for servers and high-end workstations; the mainstream machines were single-CPU and relied upon increasing clock speeds to keep getting faster. Late in the development of EQII, Intel was starting to introduce hyper-threaded processors. Even though some Operating Systems would detect a hyper-threaded processor as two CPUs, it was really just one CPU. The game targeted computers more advanced for the day (in some ways) and launched largely with the expectation that CPUs would just get faster and faster.

However, clock speed increases started decelerating. Now we find ourselves getting machines that have smaller clock speed increases but more CPUs. Dual- and Quad-core machines are becoming quite commonplace and AMD has even introduced a Triple-core chip.

When an MMO is under development, the core engine design is towards the front of the project so that the tool chains and art pipelines can be created, allowing more people to be more productive. It is then refined over the course of the project, but single-threaded vs. multi-threaded (multicore) is something that should be decided early in the engine design. The reason for this is simple: multi-threaded programming can be significantly different than single-threaded programming and, unless you've done it every day for 20 years, decidedly NOT easier than network code.

Let me explain how single- and multi-threaded programming are different with a simple example. Single-threaded programming is like driving down the single lane of a highway. You're going fast and don't really have to stop for anything. Multi-threaded programming is like two (or more, but we'll use two for my example) highways: one running north-south and the other running east-west. At some point they have intersections. What happens if there are no traffic lights? Disaster. It's the same with multi-threaded programming. You have to find all of the intersections (programmers call them critical sections) and put traffic lights (synchronization primitives) on them. Intersections on a highway are easy to see; unfortunately they're not as easy in code. The other thing to consider is that you have to stop at the intersection and wait for your turn. It's the same with threads. That's also one of the reasons why adding a second CPU doesn't give you 100% more performance.

The first attempt at having additional cores help out was using a technology very similar to OpenMP. I took loops that iterated over lots of things (particles, vertex transformations, degenerate triangles in shadow meshes, etc) and parallelized them. The way this works is like this: Say you have 1000 numbers and you just have to add one to each of them. A single thread would step through all 1000 numbers adding one to each of them. OpenMP allows you to take those 1000 numbers and divide them up evenly among all processors. Unfortunately, this didn't give us any sort of meaningful performance gain as any gains were outdone by the time it takes to synchronize and hand off the data to other threads.

The second attempt became to take the specific system where we're spending most of our CPU time (animation and vertex transformation) and dividing it up in large chunks easily handed off to another thread. This turned out to work very well and netted at 10-15% frame rate gain in populous places. Another bonus was that it was mostly easy to integrate into the single-threaded engine. The way it works is by trying to do animations before the main game thread would. The first time something is animated in the main thread it gets added to a list of items that need animation. The next frame, the animation thread sees these items and can animate them before the main thread needs them. In some cases, the main thread needs one before the animation thread has gotten around to it. No worries! If something hasn't been animated yet, the main thread can take care of it. If the animation thread comes across something that the main thread is using, it just skips it and goes on to the next item. With how useful this system is, the drawbacks are that we only do animation in a separate thread and only one extra thread can really only make use of one extra processor (so Quad-core doesn't have a great advantage over Dual-core [yet], at least as far as EQII is concerned).

So what about the future? There's plenty of systems that we can offload in a similar manner to other threads: shadows, particles, maybe even scenegraph traversals or collision! However, just creating theads for each subtask isn't really the best way to help out. What I'd rather do is create individual tasks for each item: Animate, ExecuteParticleSystem, ComputeShadow, etc. that can be doled out to worker threads similar to Intel's Threading Building Blocks. This would give us the best fit to each processor type and support even higher processor counts in the future! Now that the proof-of-concept has been shown to work and the fan response so far is very favorable, all we need is time.

I hope this post has given you a little bit of an idea why multicore support is not simple, especially on a game engine that was not particularly designed with threading in mind.

13 September 2008

TV-B-Gone!

A few days ago I read an article about (and subsequently ordered) a kit to build Mitch Altman's TV-B-Gone. I haven't had the opportunity to build anything electronic in quite a while, so I jumped at the chance.

It came yesterday and I enlisted the help of my 6-year-old son to put it together today.

It's a fairly simple kit with an 8-pin microcontroller, 4 transistors, 2 capacitors, 5 resistors and 5 LEDs (oh, and a push button).

Assembly was a little slow as my son wanted to know what each part did and was worried about his personal safety considering that he equates all electricity with being shocked. I enjoyed explaining it to him and asking him to show me where the parts should go.

I whipped out the old soldering iron and tools and went to town. Putting together electronic kits reminded me of my childhood days assembling HeathKits and tinkering with my 200-in-1 electronics lab (and later, real breadboards and etching my own circuit boards). Ah, the good old days. I'm not exactly sure when I decided to switch from hardware to software, and while I love programming, it's always nice to smell the solder once in a while.

Here's a short video of the initial test of the product:

Needless to say, eating out should be a little more enjoyable now!

STL: A Love-and-Hate Relationship

I'm a big fan of the STL and specifically the containers. Usually. The interface generally makes sense, code is written around them fairly quickly and they have good interoperability. However, there are some problems that really start to become apparent on larger projects. Consider these search results on EQII's codebase:

"std::map" - 2460
"std::set" - 1574
"std::list" - 393

The EQII codebase has nearly 3 million lines of just C/C++/C# (client, server, tools, not counting server lua script and data files) and uses STL containers liberally. If I had to do it all over again, I believe I'd forgo using STL containers for a few reasons:

Memory Usage - STL allocators typically hold on to memory forever and reuse it as necessary. However, EQII has a very good memory manager with a constant-time free-able small block allocator (written by yours truly), reporting, leak detection, etc. Writing custom allocators for STL is a major pain (especially on an established codebase) and I'd really rather not hack STLport or the gnu implementation to use our memory manager.
Compile Times - The more templated code you have the longer the compile and link times. I've talked previously about how we replaced std::vector with our own class to save compile times. Also, have you looked at STL implementation code? It's bloated and ugly, which brings me to my next point...
Debugging - Trying to dig through a container in a debugger watch window (or GDB) is a pain. Sometimes it's even a matter of obscure memory offsets and you have to calculate addresses to see values in a map.
Additional Functionality - If you have a vector that you're not using anymore and you want to free the memory for it, clear() doesn't cut it. The code to do it is neither trivial nor intuitive: std::vector< T >().swap( myVec ). Adding functions isn't easy as the STL containers are purposely non-virtual.

I'll probably look for a different solution for containers for my next project. Home-grown containers can emulate the STL containers and provide all that I desire above. They can even be compatible with STL functors and algorithms too.