October 3, 2010

Profiling Away

I'm continuing to make good progress, although I took a break from my TODO list to pull out NProf again and do some performance work. Performance has improved ten times over since the first version of Aurora, but it still chugs a bit when there are a lot of units on the screen. I like having a lot of units on the screen, so I wanted to see if there was anything I could do to fix this.

I changed some initialization settings in the code so that there would be over 3000 units for each player on screen. I thought before that performance was being limited by the sprite drawing, which can't really be helped at this point. To my surprise, though, the stress test showed 80% of computation time being spent outside the Draw function, and over 65% being spend doing proximity detection for the game's units.

Now, I knew when starting this project that my proximity detection would be the most difficult part to program efficiently. I've got three factions, each with thousands of units, with each unit floating freely on a plane, and each of which needs to know when it is within a short radius of any other. That's n^2 right off the bat, and that's bad.

My algorithm so far has been for each unit to keep a list of all other units along with the next possible frame that that enemy unit might collide with it. When that frame comes, the unit calculates the distance, checks for a collision, and recalculates the next possible frame of collision. That way, units that are far away are checking for collision with each other only every thousand frames or so, while units nearby are checking every frame.

That worked pretty well, but it's still stressing the game at high unit counts, and all the work with dictionaries and lists is holding back performance. The specific problem is that
For every unit:
For every enemy unit:
I have to check for collision and if there is none,
I find the next frame to check and look up to see if that frame is already accounted for, and
Either make a brand new list of enemy units for that upcoming frame, or else
Retrieve the existing list and add this enemy unit

Profiling reveals that during the stress test the game spent over 25% of its time on the one line of code that corresponds to the last line above. A further 15% was spent on the second-to-last line above. A further 10% on the third-to-last line above. I'm spending a half of my stress test CPU time on an if-else statement that shows up as four lines of code in my game. If I can streamline that process, I might be able to knock out yet another bottleneck. And I think I can do just that.

I think part of the problem right now is that A) there are so many possible upcoming frames to check for (when we decide whether to make a new list of enemies to check or to add to it), and B) when a group of, say, 100 enemy stars clump together on a single frame-to-be-checked, the game retrieves that frame's list, adds a single star, and puts it back 100 times rather than collecting them all first.

One way that has occurred to me to get around this is to use the current algorithm for all units within, say, 100 frames, but beyond that to check enemy units only on 100-frame increments, up to a certain known limit. That way I can create a few relevant lists and add them to the dictionary all at once, and the places where I can add them are already known. It would require more actual distance checks, but it would allow much easier look-up and it would make recording the checks much easier.

2 comments:

SAT said...

It seems like there should be a way that you don't have to have each unit store every other unit. To me that seems like a lot of unnecessary storage.

Maybe you could just have one grid of every possible space and keep track of which units are occupying those spaces? Then rather than checking whether every single enemy unit is about to collide with you, you could just check whether every other unit in your square is an enemy unit.

And to further enhance the grid idea, instead of iterating through every possible square, perhaps you could maintain a list of the indices of squares that are occupied, so you'd only have to check squares that matter.

Does that make sense? Or seem like it would be an improvement?

E McNeill said...

1. Yes, it's unnecessary to store every other unit. However, I've got plenty of memory, and not a lot of CPU power to spare. If I didn't store them, I'd have to check every enemy unit every frame. Offloading work into memory seemed like a good way to sidestep that bottleneck.

2. I thought a little bit about using a grid system like this, but I figured it would be equally inefficient and more difficult to program. Some of the issues:

A) Maintaining the grid requires checking every frame to see if each unit has moved into a different square.

B) A unit on the right edge of one square would have to detect its proximity to units on the left edge of the square next to it. You'd either have to have each unit check its own square and the eight surrounding squares, or you'd have to make a grid of overlapping squares and allow units to occupy more than one at a time.

C) I'm not sure how to determine the right size for the grid. If the squares are too small, you have to check a lot of them, and units move between squares frequently. If the squares are too large, they can include a whole bunch of units, all of which are checking each other every frame.


On the other hand, problem A might ultimately be pretty cheap (four simple bounds checks to see if it's in the right square), problem B might be excusable (O(9n) still beats O(n^2)), and problem C might have an easy answer (make the squares as small as practical and don't worry about memory). So maybe this would be the best system after all...