An interesting note showed up in some recent EQII Test Patch Notes. Yes, it's true. We're working on "Multicore support" for EQII and plan to release the first iteration with Game Update 49. "Multicore Support" is a fan-friendly way of supporting multiple CPUs by way of multiple threads of execution in the client.
I thought I'd take some time to explain what led up to this and offer some explanation of the difficulties of threading. EQII was released in late 2004, but development actually started about five years earlier. In 2004, multiple-CPU machines were generally very expensive and reserved for servers and high-end workstations; the mainstream machines were single-CPU and relied upon increasing clock speeds to keep getting faster. Late in the development of EQII, Intel was starting to introduce hyper-threaded processors. Even though some Operating Systems would detect a hyper-threaded processor as two CPUs, it was really just one CPU. The game targeted computers more advanced for the day (in some ways) and launched largely with the expectation that CPUs would just get faster and faster.
However, clock speed increases started decelerating. Now we find ourselves getting machines that have smaller clock speed increases but more CPUs. Dual- and Quad-core machines are becoming quite commonplace and AMD has even introduced a Triple-core chip.
When an MMO is under development, the core engine design is towards the front of the project so that the tool chains and art pipelines can be created, allowing more people to be more productive. It is then refined over the course of the project, but single-threaded vs. multi-threaded (multicore) is something that should be decided early in the engine design. The reason for this is simple: multi-threaded programming can be significantly different than single-threaded programming and, unless you've done it every day for 20 years, decidedly NOT easier than network code.
Let me explain how single- and multi-threaded programming are different with a simple example. Single-threaded programming is like driving down the single lane of a highway. You're going fast and don't really have to stop for anything. Multi-threaded programming is like two (or more, but we'll use two for my example) highways: one running north-south and the other running east-west. At some point they have intersections. What happens if there are no traffic lights? Disaster. It's the same with multi-threaded programming. You have to find all of the intersections (programmers call them critical sections) and put traffic lights (synchronization primitives) on them. Intersections on a highway are easy to see; unfortunately they're not as easy in code. The other thing to consider is that you have to stop at the intersection and wait for your turn. It's the same with threads. That's also one of the reasons why adding a second CPU doesn't give you 100% more performance.
The first attempt at having additional cores help out was using a technology very similar to OpenMP. I took loops that iterated over lots of things (particles, vertex transformations, degenerate triangles in shadow meshes, etc) and parallelized them. The way this works is like this: Say you have 1000 numbers and you just have to add one to each of them. A single thread would step through all 1000 numbers adding one to each of them. OpenMP allows you to take those 1000 numbers and divide them up evenly among all processors. Unfortunately, this didn't give us any sort of meaningful performance gain as any gains were outdone by the time it takes to synchronize and hand off the data to other threads.
The second attempt became to take the specific system where we're spending most of our CPU time (animation and vertex transformation) and dividing it up in large chunks easily handed off to another thread. This turned out to work very well and netted at 10-15% frame rate gain in populous places. Another bonus was that it was mostly easy to integrate into the single-threaded engine. The way it works is by trying to do animations before the main game thread would. The first time something is animated in the main thread it gets added to a list of items that need animation. The next frame, the animation thread sees these items and can animate them before the main thread needs them. In some cases, the main thread needs one before the animation thread has gotten around to it. No worries! If something hasn't been animated yet, the main thread can take care of it. If the animation thread comes across something that the main thread is using, it just skips it and goes on to the next item. With how useful this system is, the drawbacks are that we only do animation in a separate thread and only one extra thread can really only make use of one extra processor (so Quad-core doesn't have a great advantage over Dual-core [yet], at least as far as EQII is concerned).
So what about the future? There's plenty of systems that we can offload in a similar manner to other threads: shadows, particles, maybe even scenegraph traversals or collision! However, just creating theads for each subtask isn't really the best way to help out. What I'd rather do is create individual tasks for each item: Animate, ExecuteParticleSystem, ComputeShadow, etc. that can be doled out to worker threads similar to Intel's Threading Building Blocks. This would give us the best fit to each processor type and support even higher processor counts in the future! Now that the proof-of-concept has been shown to work and the fan response so far is very favorable, all we need is time.
I hope this post has given you a little bit of an idea why multicore support is not simple, especially on a game engine that was not particularly designed with threading in mind.