27 September 2008

EQII Getting Multicore Support... Wait, what?!

An interesting note showed up in some recent EQII Test Patch Notes. Yes, it's true. We're working on "Multicore support" for EQII and plan to release the first iteration with Game Update 49. "Multicore Support" is a fan-friendly way of supporting multiple CPUs by way of multiple threads of execution in the client.

I thought I'd take some time to explain what led up to this and offer some explanation of the difficulties of threading. EQII was released in late 2004, but development actually started about five years earlier. In 2004, multiple-CPU machines were generally very expensive and reserved for servers and high-end workstations; the mainstream machines were single-CPU and relied upon increasing clock speeds to keep getting faster. Late in the development of EQII, Intel was starting to introduce hyper-threaded processors. Even though some Operating Systems would detect a hyper-threaded processor as two CPUs, it was really just one CPU. The game targeted computers more advanced for the day (in some ways) and launched largely with the expectation that CPUs would just get faster and faster.

However, clock speed increases started decelerating. Now we find ourselves getting machines that have smaller clock speed increases but more CPUs. Dual- and Quad-core machines are becoming quite commonplace and AMD has even introduced a Triple-core chip.

When an MMO is under development, the core engine design is towards the front of the project so that the tool chains and art pipelines can be created, allowing more people to be more productive. It is then refined over the course of the project, but single-threaded vs. multi-threaded (multicore) is something that should be decided early in the engine design. The reason for this is simple: multi-threaded programming can be significantly different than single-threaded programming and, unless you've done it every day for 20 years, decidedly NOT easier than network code.

Let me explain how single- and multi-threaded programming are different with a simple example. Single-threaded programming is like driving down the single lane of a highway. You're going fast and don't really have to stop for anything. Multi-threaded programming is like two (or more, but we'll use two for my example) highways: one running north-south and the other running east-west. At some point they have intersections. What happens if there are no traffic lights? Disaster. It's the same with multi-threaded programming. You have to find all of the intersections (programmers call them critical sections) and put traffic lights (synchronization primitives) on them. Intersections on a highway are easy to see; unfortunately they're not as easy in code. The other thing to consider is that you have to stop at the intersection and wait for your turn. It's the same with threads. That's also one of the reasons why adding a second CPU doesn't give you 100% more performance.

The first attempt at having additional cores help out was using a technology very similar to OpenMP. I took loops that iterated over lots of things (particles, vertex transformations, degenerate triangles in shadow meshes, etc) and parallelized them. The way this works is like this: Say you have 1000 numbers and you just have to add one to each of them. A single thread would step through all 1000 numbers adding one to each of them. OpenMP allows you to take those 1000 numbers and divide them up evenly among all processors. Unfortunately, this didn't give us any sort of meaningful performance gain as any gains were outdone by the time it takes to synchronize and hand off the data to other threads.

The second attempt became to take the specific system where we're spending most of our CPU time (animation and vertex transformation) and dividing it up in large chunks easily handed off to another thread. This turned out to work very well and netted at 10-15% frame rate gain in populous places. Another bonus was that it was mostly easy to integrate into the single-threaded engine. The way it works is by trying to do animations before the main game thread would. The first time something is animated in the main thread it gets added to a list of items that need animation. The next frame, the animation thread sees these items and can animate them before the main thread needs them. In some cases, the main thread needs one before the animation thread has gotten around to it. No worries! If something hasn't been animated yet, the main thread can take care of it. If the animation thread comes across something that the main thread is using, it just skips it and goes on to the next item. With how useful this system is, the drawbacks are that we only do animation in a separate thread and only one extra thread can really only make use of one extra processor (so Quad-core doesn't have a great advantage over Dual-core [yet], at least as far as EQII is concerned).

So what about the future? There's plenty of systems that we can offload in a similar manner to other threads: shadows, particles, maybe even scenegraph traversals or collision! However, just creating theads for each subtask isn't really the best way to help out. What I'd rather do is create individual tasks for each item: Animate, ExecuteParticleSystem, ComputeShadow, etc. that can be doled out to worker threads similar to Intel's Threading Building Blocks. This would give us the best fit to each processor type and support even higher processor counts in the future! Now that the proof-of-concept has been shown to work and the fan response so far is very favorable, all we need is time.

I hope this post has given you a little bit of an idea why multicore support is not simple, especially on a game engine that was not particularly designed with threading in mind.


John Loch said...

Fascinating stuff, and you explained it very well for someone like me who's not a programmer.

Anonymous said...

Thanks for the info! Can you comment on rumors that SOE just hired a "graphics" programmer to potentially modernize EQ2 to make use of the 2/3 of modern video cards that are not being used? Multi-threading is a great start but until EQ2 knows how to use "2008" video card tech and not just "2004" and we will always be having these issues.


Anonymous said...

BTW what I mean by 2/3 not being used is not that EQ2 does not support 2/3 of the video cards out there... it's that on any modern video card EQ2 will only use about 1/3 of it's features because the new stuff was not around when EQ2 came out.

Example: EQ2 runs about the same on a $500 video card as it does on a $50 video card. Everything right now depends on the CPU and while opening up new cores for use is a very good thing... some things that EQ2 does on the CPU should now be done on the video cards

Anonymous said...

Hey Josh - I'm an avid Everquest 2 player and have been since original game release back in 2004. I actually had a big LAN party with 12 other friends, stayed up all night leveling my little Fury. Tried other games (WoW, AoC, ect..)I love EQ2.

I've been following the patch notes for a long time, reading your blog for a while, and keeping track of all the changes the SOE dev team has seen but not really saying much.

I've gotta say, you guys have been under fire from the SOE fans about the multi-core support for a while now. I know how complicated re-writing this kind of code could be (I work in the I.T. field) and it's good to see you guys taking this huge project on. It's good because it promotes longevity to a really fun game that I'm sure a lot of fans would like to see stick around. :)

Good work on the multi-threaded programming, keep it up. I like the Threading Building Blocks idea. As I'm sure you're aware, the future of the industry really is 64-bit and multi-core'd cpus with multi-threaded applications.

You're probably painfully aware of the graphics/cpu complaints from many fans. With your new graphics guy, I hope we can see more gpu based rendering occur in the coming patches. :)

Mikul said...

General error from beta login:

<alert> E:\dev\eq2\game\client\src\main.cpp (310):
Failed to set affinity to CPU 3

Whats this one actually mean? I have Quad 6800 extreme Kent.

Odd thing is i dont get the same problem on Test login. Is there client improvements between the 2?