14 April 2008

Leaking memory by the kilo[byte]

I finally got around to looking into reports of a memory leak in Veeshan's Peak mostly dealing with lava bombs on a bridge.

Sure enough, there's a leak. After a 30 minute meeting my test client had grown half a gig. Just while watching lava bombs bash a bridge. (Well, this is a debug-build client which uses more memory and doesn't have any optimizations, so that number is a little inflated)

Knowing that you have a leak is just the beginning. Sometimes, finding it (or fixing it) can be a pain.

We have some pretty decent tools built into the EQ2 framework for dealing with memory issues. I'm pretty proud of them. Given that, it didn't take long to find out that we were leaking particles, hundreds of them per frame.

EQ2 uses a lot of reference-counted objects to reduce memory leaks. Ironically, they can also contribute to memory leaks when you have a reference cycle (or circular reference). A reference cycle is basically when two objects hold references to each other. Unless something breaks the cycle, the reference counts won't decrease allowing both objects to be destroyed. Finding reference cycles can be difficult too since it's easy to determine that something is holding a reference to an object, but it's not always so easy to know which object is holding the reference.
... Until I instrumented our smart-pointer class to do that without having to change the many thousands of places where smart-pointers are used. The concept is pretty simple: The smart-pointer remembers the line of code (code address) where it increments the reference. The base object (with the reference counting support) remembers which smart-pointers are pointing to it. This allows me to know which line of code caused the excess reference counts. In this case, the smart-pointer is no longer a simple class that adds and removes references; it gets a little heavier. The good news is that you only need that extra weight when you're trying to find memory leaks. I'll leave this as an exercise for the reader.

So, case in point, we definitely had a reference cycle with some particles. Basically particles would reference scenegraph nodes that owned them. The fix is essentially to not hold a reference if it would cause a cycle, but we still need to hold a reference if there wouldn't be a cycle.

Therefore, without further ado, the fix:

Now, any excuse to use placement new gets me a little giddy. Plus I want to keep the Particle object size small because we always have a lot of them.
union
{
SceneGraphNode* pRawNodePointer;
char space[ sizeof( SmartPointer< SceneGraphNode > ) ];
} m_nodePointer;

Since these two members are bound by a union, I can only use one at a time. If I want to use a smart pointer, I have to construct it using placement new:
new (m_nodePointer.space) SmartPointer< SceneGraphNode >( pNode );

If I want to switch and use the raw pointer, I have to destruct the smart pointer:
reinterpret_cast< SmartPointer< SceneGraphNode >* >( m_nodePointer.space )->~SmartPointer< SceneGraphNode >();
m_nodePointer.pRawNodePointer = pNode;

VoĆ­la! All the particles get cleaned up now. This fix should be making its way onto live servers pretty soon.

Update 04/17/08: This fix is now on live servers!

15 comments:

Toldain said...

Placement new and unions. Classic stuff.

I'm curious about something. Does your code only apply to VP, or might there be other places in the game affected?

I don't really know how you structure the game, but I would be surprised if you had C++ code specific to one zone. I'd expect more that zone and mob specifics were in some scripting language that makes it easier for the level/encounter designers and the artists.

Which suggests to me that your fix might have an impact on memory usage in some other places, too, where a similar problem exists but on a smaller scale.

Joshua Kriegshauser said...

Toldain:

It will definitely have an affect on other particle effects throughout the game. I haven't checked directly, but this fix could also help people who wear armor with particle effects as some of them may be leaking too.

We do use scripting and data-driven architecture wherever possible. We have very little (if any) zone-specific C++ code.

Tiffany said...

The concept is pretty simple...

o_O

*watches the whole rest of the post fly over head*

Hehehe ;) Whatever you're doing, I'm sure it's awesome.

Anonymous said...

So basically the leak was that you were assigning values to the 'raw pointer' union member, without deconstructing the 'smart_pointer'?

If that is the case then you need to clean up the code from these hacks a bit - there is a reason that the C++ standard does not allow unions to contain members with constructors, destructors or assignment operators. Yet, you effectively force it to contain a class with the placement new - a class whose destructor will never be called automatically.

Fire the code review guy..;)

Joshua Kriegshauser said...

Oswaldor:

No, the leaks were because everything was a smart pointer. Smart pointers were referencing objects that cross-referenced themselves.

The union/placement-new method is a way for the client to decide whether or not to use a raw or smart pointer to eliminate reference cycles. It's cleaning up the memory leak, not causing new ones.

Nick McLaren said...

Alright, let me see if I have this straight... so the smart pointer is used when the base object being pointed to doesn't do a reference count (so it doesn't create a cycle), and the standard pointer only in the case when the target base object DOES a reference count?

I'm missing how you adjusted the smart pointer class to do this. It seems to me you have to explicitly change lines of code in the client to use the standard pointer instead of the smart pointer, in the cases where you know a cycle is being created. Unless you have the client checking for a reference count (or lack thereof) before instantiating the smart pointer.

Ahh.. so much to learn.. so little time. This blog is truly fascinating! Thanks Josh! =)

Joshua Kriegshauser said...

If there are no cycles, the smart pointer component is constructed and used. If a reference cycle is detected, then the raw pointer component is used.

We use smart pointers EVERYWHERE in the client. The particle system is the only area where reference cycles are known to exist, so the changes that I mentioned to use placement new are limited there. The change to the smart pointer doesn't affect release builds at all, but debug builds can be instrumented to give us more information about cycles.

Anonymous said...

So the union struct is something you implemented to fix the reference cycle issue?

Uggh - why would you use such a dangerous construct for that? You save 4 bytes (assuming 32-bit pointers) when using the smartpointer but loose sizeof(smartpointer)-4 bytes when you use the raw pointer. Considering the memory hogging of the client this is nothing.
Plus you will have dangles if the raw pointer part of the union is assigned without the smartpointer being explicitely destructed - kinda defeats the purpose of the smartpointer if you have to explicitely destruct it - yes I know you use it elsewhere as well, but surely there must be a more elegant method to solve this.

Toldain said...

Thanks for the reply, Joshua. I'm really glad to hear it. I have at least one friend that seems to have memory leak problems on a regular basis, maybe this will improve things.

Toldain said...

For what it's worth, I think the most likely hazard from this form of unions is a dangling pointer, which is no worse than the unsolved problem.

Given that this was done on particles, there can be a lot of them active at any time, just start casting some spells, in a raid.

I'm guessing that the staff is aware of how much memory the client uses, and pretty phobic about adding to that.

The only other solution I could think of in 5 mins was to get rid of the refcounting in the particles altogether, and destruct them by hand. But I can't tell if that would work without seeing more of their code. If the particles are added and removed by, say, point-of-view pruning of the scene graph, it's just not going to work.

Honestly, it's an ideal candidate for trying garbage collection, assuming you have enough control over when and how long the collector runs.

Joshua Kriegshauser said...

Oswaldor:
Any construct can be dangerous if used in the wrong way. Likewise, there's a right way to use "dangerous" constructs. In this case, the union can be the private member of a class with functions for controlling its behavior. Add a little documentation and you've got safe, maintainable use of a "dangerous" construct. I just didn't go into that much detail in the blog post.

In the case of memory usage, we can have up to a couple hundred thousand particles constructed at any given time, and I never mentioned that there was only one smart pointer in each particle that is in danger of reference cycles (there's more). Additionally, it's very little extra work to use a union and save [potentially significant] space.

Our smart pointers are 4 bytes (same as regular pointers), unless they're instrumented to find reference cycles (special debug builds only, size goes up to 12 bytes).

Anonymous said...

Thanks a bunch Josh! I'm actually deployed overseas, but I do have a shoddy isp I can use where I'm at. I tried it back in February, but the memory leak in RoK areas and lag due to distance drove me nuts! And I cancelled my subscription. I was going to wait till I get back to the states, but I think I may try it out again.

Again, a BIG! kudos for ya on this one! Now if we can only convince you guys to support DX10 in the next expansion =). Yeah, I know that's a big leap, but it would be cool and would give EQ2 life for another 5 years+.

KC said...

Fascinating and in-depth. Its very encouraging to see such passion for coding, especially when the coding drives a game we all love :)

Mark Storer said...

So this one smart pointer/raw pointer union always points to the owning scene graph?

Sounds like a classic place to use boost::shared_ptr and boost::weak_ptr.

Or a memory pool. You just dump the whole pool as a block at the end of each frame. Of course if you're not freeing them en masse (as suggested by Toldain), that solution goes right out the window.

OTOH, using a block like that means that every particle could use in-place new (squee!), but NOT delete them.

I learned that trick reading the GarageGames forums & code several years ago. Torque/Tribes 2 used a "per frame" memory block that wasn't even altered at the end of the frame. No tracking information. Just "where's the next free memory coming from" and "where's the end of the frame block". The only cleanup was at the end of the frame, they moved the "next free" pointer back to the beginning.

So long as you don't need destructors, your memory management can be insanely fast.

Crazy fast: O(n) total time spent allocating objects where N is the number of objects. Of course the actual time spent is trivial: a function call/return, an add, and an assignment.

O(1) total time spent cleaning up objects per frame. Again, the actual execution time microscopic.

The trick is finding the right size for the per-frame memory block in the first place. IIRC, the Dynamics/Garage Games guys used a hard-coded buffer size.

I wonder if it was 640kb.

bluetomato said...

Can you share some example to detect memory leaks in cyclic smart pointers?I am also facing such issue and changing everywhere in code where smart pointers are used is very cumbersome.