October 21, 2010

It Works!

Near the end of my last post I speculated about some changes that might bring performance in Aurora to where it ought to be. Well, I made the changes, and they worked. Performance is as zippy as ever, and I'm happy with where it is.

I also got some custom sounds that I contracted out. It was my first time paying for an asset, so I was a little apprehensive, but it worked out nicely. I made some modifications to the sounds on my own, and not all of them fit quite as well as I'd like, but I'm satisfied. I started experimenting with dynamic pitch adjustments (which XNA helpfully provides), and the results have been quite good so far.

On a more unfortunate note, I have to delay my schedule for Aurora. I signed up to compete in the US-specific Imagine Cup competition in the Fall term. It's going to consume my next month entirely, and I have a team this time around (hooray!), so I have serious obligations to fulfill. I'll put together yet another beta version of Aurora in it current state to submit to the IGF Student Showcase and save the full release for later.

October 10, 2010

Upgrading (?)

I've made three questionable upgrades to Aurora since my last blog post.

The first upgrade was to move the entire project to XNA 4.0 and .NET 4.0. I just bought and installed a new desktop computer running Windows 7. Naturally, I installed the latest Visual Studio (2010). Naturally, it only integrates with the latest version of XNA. Naturally, the latest version of XNA breaks a bunch of old code. It's all for the best, I'm sure, but I spent the first few hours of development on my new machine just trying to get the game running and get the newly premultiplied alpha values to work like the old system did.

The second upgrade was to start using LinkedLists instead of regular array-based Lists. There was absolutely no reason that two different cheap operations (adding an element to a List and making a new empty List) would ever consume 40% of my CPU time when equally frequent, more intensive calculations (like distances) took 2%. But one of my roommates suggested that maybe the allocation of new Lists and resizing of Lists during the Add functions could be triggering the automated garbage collection. And since I was constantly allocating and deleting small bits of data, that could add a ton of overhead. Using LinkedLists and just moving nodes around should be much cheaper.

The problem came when I implemented this in such a way that my units were no longer aggressively culling their list of enemies that they were "locked on" to. (Units lock on at 100 pixels and then speed towards their targets until a real collision occurs at 10.) So on every frame they would lock on to the same enemies over and over, speeding towards them at an accelerating rate, totally ruining the smooth, flowing feel I liked so much. The best solution to this will probably involve making a Dictionary of locked-on-units with keys based on some unique ID for each unit in the game.

The third upgrade was to move to a grid system of collision detection, as was suggested by my girlfriend Stephanie after my last blog post. I had considered this system before, but threw the idea away due to imagined complications. Since then, Stephanie and my roommates (better programmers than me) have convinced me that the grid system is superior. My collision detection problem is always going to be fundamentally an O(n^2) problem at close distances, but my solution was ultimately O(n^2) at all distances (albeit with some aggressive optimizations). The grid system is O(1) at great distances.

Naturally, the first implementation of this destroyed the game's performance. Naturally, that was caused by my mistakes in the implementation.

My best guess right now is that the problem lies in the way the grid stores units. It currently keeps one LinkedList for all units in each square. When a square has 4000 units of the same team all together (as the stress test starts), each unit checks each other unit to see if it friendly, then does nothing. Doing nothing is cheap, but just checking 4000^2 times every frame might be enough to bring the game to a crawl. It looks like I'll have to set up an array of LinkedLists (ugh) so that each faction can weed out friendly units in one step.

I'm really hoping that these changes will boost performance to its practical best, so that at the very least I shift the bottleneck to the graphics card on most systems. I'm working with a new profiler and an improved system, so it's hard to compare these results to what I had before, but I'm a little worried about the profiling tests I've done already. Running the game at 1fps on a stress test, the profiler reported only a little over half of the time being spent directly in my collision detection sections. Even if that were reduced to zero, wouldn't that only double the current performance? Will that be enough?

In any case, I'll keep working on it, and hopefully I'll be able to add the new levels in time for an early November release. I've got plenty of possible resources and plenty of intelligent people to lean on, so I've got no excuses for failure.

October 3, 2010

Profiling Away

I'm continuing to make good progress, although I took a break from my TODO list to pull out NProf again and do some performance work. Performance has improved ten times over since the first version of Aurora, but it still chugs a bit when there are a lot of units on the screen. I like having a lot of units on the screen, so I wanted to see if there was anything I could do to fix this.

I changed some initialization settings in the code so that there would be over 3000 units for each player on screen. I thought before that performance was being limited by the sprite drawing, which can't really be helped at this point. To my surprise, though, the stress test showed 80% of computation time being spent outside the Draw function, and over 65% being spend doing proximity detection for the game's units.

Now, I knew when starting this project that my proximity detection would be the most difficult part to program efficiently. I've got three factions, each with thousands of units, with each unit floating freely on a plane, and each of which needs to know when it is within a short radius of any other. That's n^2 right off the bat, and that's bad.

My algorithm so far has been for each unit to keep a list of all other units along with the next possible frame that that enemy unit might collide with it. When that frame comes, the unit calculates the distance, checks for a collision, and recalculates the next possible frame of collision. That way, units that are far away are checking for collision with each other only every thousand frames or so, while units nearby are checking every frame.

That worked pretty well, but it's still stressing the game at high unit counts, and all the work with dictionaries and lists is holding back performance. The specific problem is that
For every unit:
For every enemy unit:
I have to check for collision and if there is none,
I find the next frame to check and look up to see if that frame is already accounted for, and
Either make a brand new list of enemy units for that upcoming frame, or else
Retrieve the existing list and add this enemy unit

Profiling reveals that during the stress test the game spent over 25% of its time on the one line of code that corresponds to the last line above. A further 15% was spent on the second-to-last line above. A further 10% on the third-to-last line above. I'm spending a half of my stress test CPU time on an if-else statement that shows up as four lines of code in my game. If I can streamline that process, I might be able to knock out yet another bottleneck. And I think I can do just that.

I think part of the problem right now is that A) there are so many possible upcoming frames to check for (when we decide whether to make a new list of enemies to check or to add to it), and B) when a group of, say, 100 enemy stars clump together on a single frame-to-be-checked, the game retrieves that frame's list, adds a single star, and puts it back 100 times rather than collecting them all first.

One way that has occurred to me to get around this is to use the current algorithm for all units within, say, 100 frames, but beyond that to check enemy units only on 100-frame increments, up to a certain known limit. That way I can create a few relevant lists and add them to the dictionary all at once, and the places where I can add them are already known. It would require more actual distance checks, but it would allow much easier look-up and it would make recording the checks much easier.