24 October 2009

Keep Designer Data in Files (not Databases)

Whoa! Did I just fall off the deep end?! Designer Data in ... files?! Data... doesn't that belong in a database?! There are those who say that it does, but I disagree, at least when talking about development.

In MMO terms, Designer Data is data that our Design team creates: Quests, Spells, NPCs, Paths, Items, Recipes, Locations, etc. Tons and tons of data that, together with art, is really the lifeblood of the game.

Designer Data is to Designers as Source Code is to Programmers. It's constantly changing and growing. Different people own different parts of it. Everyone on the dev team needs all of it and as up-to-date as possible to do their current tasks. Servers need it in a highly-optimized form to run the game. There are many parallels between Designer Data and Source Code.

Which brings me to my first major point. As files, Designer Data can be stored in Source Control (we use Perforce, and I love it). This gives an amazing amount of incredibly important functionality and basically for free:
  • Revision history - As text (EQII uses XML) files, anyone on the team can go back and see (through the same tool that everyone must use) all previous changes to a file. Anyone can see exactly what changed between each version or between multiple versions and who made the changes. Also, you can go back in time and grab previous file revisions.
  • Changelists - Most source control systems group together file changes. Perforce calls these Changelists. Grouping files together is an effective tool for seeing relative changes based on concept. One changelist might be labeled "Zone 1 population" and contain NPCs, Quests that those NPCs give out, Paths that those NPCs follow, etc. If a Changelist breaks the game, it can be rolled back wholesale (and you can see who made it and go pound them).
  • Integration - Changes are made in branches and we have branches for each distribution of the game. As a group of changes become ready to go Live, they are integrated into the next distribution (Main goes to QA, QA goes to Staged, Staged goes to Live). With Code and Data both in the same system, integration to the next distribution becomes a one-step process.
  • Code/Design in sync - Since Source Code and Design Data are both stored in the same system, they are always in sync. If a Programmer makes changes that affect both (which often happens), he can rest assured that everyone will easily get the latest in both.
  • Sandboxing - Designers are only affecting their own local version of design data until they check it in. They can do testing and make tweaks before checking in something that could potentially break everyone else.
  • Blame - It's so easy to see who made a change (and confront them if necessary).
If you were doing Design development in a database, you would have to develop your own solution around each of these (or use database tools that not everyone would have access to). And since the database is always up-to-date, and everyone is generally working out of the same database, problems could arise from changing formats or unexpected data.

Furthermore, Designer Data in files are free to have complex formats. For instance, in EQII the Designer Data is object oriented. Data definitions specify that a Character inherits from Entity and Entity inherits from a Base type. This also allows us to do things like have a base Predicate type and more complicated types like CharacterHasQuestPredicate that inherits from Predicate but has additional data members that only make sense for CharacterHasQuestPredicate (like the Quest name). In a database, you might have a Predicate table with a type column that is a number. You would have to look up what that number means from somewhere or write special tools that understand the relationship. Additionally, that Predicate table would have to include all of the options that any types could have in a very generic fashion. So you might have columns called "StringParam1" and "NumParam1". In the database then, you might have a row with type set to 12 and StringParam1 set to a quest name (or more likely NumParam1 set to a unique ID for a particular Quest). If you were looking at raw data, what would you rather see:
<object name="CharacterHasQuestPredicate">
  <string field="sQuest">quests/heritage/dwarven_work_boots</string>
</object>
Or:
Type     IntParam1    IntParam2
12 1243 <null>

The EQII Designer Data tool is very generic as it knows how to read the game's data definition. To actually add a whole new data type (Achievement for instance) actually takes zero changes to the Designer Data tool. It just takes writing a small text file that describes the Data Definition such as this one for our aforementioned CharacterHasQuestPredicate:
<objectdef name="CharacterHasQuestPredicate" inherits="Predicate">
  <fields>
    <fielddef name="sQuest" type="String" require_dir="quests/" default="" />
  </fields>
</objectdef>
However, reading XML files isn't the fastest thing to do. Server startup is fairly slow when reading from XML files, but EQII has a utility that converts the myriad XML files into a single file that contains all of the data in a highly optimized binary format (I've actually written about this file before). This file is mapped into virtual memory allowing all server processes running on the same physical machine to share one copy of the file in memory. For development purposes, developers can run their own servers against the XML data (and both internal development servers and rapidly changed external servers such as a Beta server could run against XML as well for fast turn around). Plus, there are enough options for converting XML data to databases that you could still use XML files for development and run production servers against a database.

The only major shortcoming of keeping Designer Data in files as opposed to a database is for searching and updating large amounts of data--things that a database is designed to do.

For searching data, EQII actually has a very workable solution. The EQII team has developed a system called VooDLe (a name playing off of Google and VDL--the internal name for EQII's data library). This is a very simple web solution that syncs to the latest data and indexes it for searching. It also detects file references and generates links. However, it doesn't help if you'd want to find all of the shields that have a block value higher than 300. For this, you can search VooDLe for the field that declares shield block value and quickly scan through the results. (Rarely is searching for a numeric value so cut and dried in EQII though with level scalars and combat scalars that affect everything.)

Updating large amounts of data is actually something that (shouldn't) happen very often. Furthermore, updating large amounts of data is error-prone and should be limited. You might even call this an inherent benefit to having designer data in files. When this is necessary, it's possible with a simple Perl script. If something does go wrong (even later) the Changelist can easily be reverted.

Based on all this, I feel it's more beneficial to a development team to use files for Designer Data during development rather than a database.

7 comments:

L said...

<3 XML
<3 VooDLe

Unknown said...

Yes. As a 6.5 year veteran of EQII's design team, I fully endorse this blog.

That is all.

Sean said...

Hear, hear. Change control would be reason enough for this in my mind.

Domino said...

Good grief yes. AMEN! Having worked with both types of data structure, I can say without doubt that keeping the designer data as isolated as possible is definitely best for both the designers and the game.

When I've had to work out of a database I was absolutely terrified that one mis-click, one misplaced keystroke, and I could be changing thousands of records all across the database, potentially without ever realizing it. It's not possible to avoid, it's human error, and you want to isolate that as much as possible, not stick it in a relational database where one error can be magnified a thousandfold with one keystroke, or one well-intentioned apprentice who doesn't understand relational databases can accidentally change something that affects every single entity in game.

And change control, and rollbacks? A nightmare I don't even want to think about.

Yes, provide good reporting tools, and tools like VooDLe are a godsend, but as a designer I beg you to save me from myself and keep my data, and my mistakes, as separate as possible from anything else they can mess up!

Joe Ludwig said...

Was there ever an option? Putting designer data anywhere BUT files seems like a bad idea.

Taylor Steil said...

I think if you guys had advanced level knowledge of SQL and Oracle your opinions about XML vs databases would be considerably different.

Oracle has all of what you're talking about- change control, tools to help less technical folk manage the data, and you can easily create different schemas for different environments. With the right architecture and tools it should be trivial to move data between schemas.


Case in point- today I wanted to update a few hundreds rows:
* add 23h 59m 59s to the date
* change one of the values from null to 1 or 0 based on another column
It only took about 2 minutes to do this via SQL.
There's no way you can update XML with that same speed and efficiency via writing Perl scripts.


Another huge advantage to having design data in a database is when you have to use it to tie the design data into your Business Intelligence reports. You'd be creating a ton of extra work if you had to take the XML design data and load that into Oracle just to make a simple report.

Lyndro said...

Yes Taylor, if every designer on your team has mastery of SQL then you can change and update data is much easier. I picked up a good bit of SQL while I was working on EQ with their database.

Creating Perl scripts to generate data ended up being not that much slower, in the long run, than entering through a database. The data in both case needs to be generated in some format or another. On both EQ and EQII we used excel for creating large amounts of data. On EQ we pasted the data into the database, on EQ II we ran a Perl script. Generating the Perl script had some overhead, but they were tools, and once generated could be reused.

For generating reports, whatever tools are used for searching data (VooDLe for example) should probably be used for generating reports, since the infrastructure is there.

As a designer I don't think it matters all that much what the backend structure of the data is. A good designer will learn to work with it when needed, what is important is the front end. Especially on a MMO where massive amounts of data needs to be generated in a fairly small window. Good tools should totally occlude whatever data structure you have for 99% of your work. If you have designers populating dungeons by writing down coordinates and typing them in by hand, whether they are entering them into Textpad for a perl script or Excel to paste into a database, then you're doing it wrong.

And for the specific example, I've seen a programmer on EQ II accomplish very similar tasks with Perl scripts in about 5 minutes (So yes, not 2) using the indexed data from VooDLe.