19 December 2008

Memory-Mapping Files for Fun and Profit

It's amazing that I still meet programmers who don't understand the benefits of memory-mapping files. If you're unaware of what that means, let me explain it a little bit. Traditional reading from a file involves several system calls: opening the file, seeking to the desired position and then reading or writing to the file. System calls can be slow as the operating system may have to switch to kernel mode and call into drivers to perform the IO operations. Memory-mapping a file usually has just one call, and then you are able to access the file as if the entire file had been read into a contiguous block of memory. But here is the greatest benefit (at least to me as a MMO server programmer): all processes accessing a file can use the same virtual memory.

Take the EverQuest II server for instance. The (binary) data file that describes nearly everything except collision geometry runs about 460MB. Every zone server uses that same file and we run several zone servers on one physical machine. With memory-mapping, the OS uses 460MB of virtual memory (assuming the entire file is used) just once and all zone server instances can use that same virtual memory without having to read the file into each individual instance.

Like anything, there are drawbacks. Most modern operating systems use demand paging, so if you try and access a page of memory representing a part of the file that hasn't loaded yet, a page fault will be triggered and the OS will have to perform kernel-mode IO to load that page from the file. Depending on how much virtual memory your application has access to (and how spread out your accesses are), this could potentially trigger a very high number of page faults.

Another drawback is address space. A 32-bit application will only have access to less than 4GB of address space (Windows has a 2GB address space by default). If you map entire large files into memory, you are consuming this address space. This has become more and more important on the EQII servers. As I mentioned, our data file is 460MB. Since shortly after the game launched, this file has always been fully mapped into memory. That was fine back then as it was much smaller than it is today. Now, after launching our fifth expansion, the data file is now nearly one quarter of our allotted address space (and not getting any smaller).

Fortunately, the APIs for Linux (mmap) and Windows (MapViewOfFile) allow an offset and a length to be specified. This opens the door to partial memory-mapped files. However, there are some caveats. The offset cannot be just any offset; it must be aligned. On Linux the offset must be a multiple of the page size (found by getpagesize), but on Windows the offset must be a multiple of the "allocation granularity" (found by calling GetSystemInfo). The length of the mapping usually does not need to be aligned, but if you're planning a generic partial mapping solution, it's probably best to use the same alignment. Another thing to consider is that mapping parts of files as you need them will spread them all over your address space; you cannot assume that page 1 and page 2 of a file will be placed next to each other in memory if you map them separately.

Here are some of the design goals for my partial memory-mapping implementation:
  • Phase sections out as they are no longer used (start-up data is only needed at start-up)
  • Reduce thrashing (phasing out a section and then mapping it back in)
  • Similar performance to the current full-file implementation
  • Reduce memory usage!
  • Ability to permanently map/unmap sections (i.e. sections that don't phase out)
  • Ability to automatically re-size sections on demand
Allow me to touch a bit on that last point. For my interface, given an offset and a length, I wanted to return a pointer to contiguous data rather than copy data into a provided buffer. Say some previous access forced you to load page 10 (but not page 11) and now you want to read some data that spans pages 10 and 11. This presents a problem for the contiguous data interface. I can't just map in page 11 and hope that it ends up next to page 10 (I will point out that there are 'hints' that can be passed to the OS APIs to attempt address space positioning, but these are discouraged). Do you leave page 10 mapped and create a new mapping that combines pages 10 and 11? Since my interface always required an offset and a length for every access, I could safely unmap the existing page 10 and map pages 10 and 11 together. Data requests for data on page 10 would now refer to the joint mapping. However, this did greatly expand the number of test cases I needed and complicated the logic.

This nifty little piece of technology is currently running on our live game servers and doing very well. Instead of 460MB, we now keep a pool of about 20-30MB mapped from the file and generally enjoy a 0.1% or lower miss rate. Best of all, we still have all the benefits of memory-mapped files.

2 comments:

Anonymous said...

I'm not sure we can expect everyone to appreciate memory-mapped files when colleges are pumping out drones who only know Java and some HTML tags.

inacio.carozzi said...

Hi Joshua

Could I have a copy of this piece of code ( multiple mapped files ) ?

Thanks in advance

Inacio Carozzi