Unroll A Roll A Roll on A Roll - AMORTIZE!


Herein we describe an optimization technique similar to putting a processing loop into a background thread, but with (arguably) better efficiency, reliability, and giving control directly to the owning thread (rather than indirectly via operating system prioirity calls.)

Ahem, got that?

Introducing: A Slow Routine

Let's define a simple method, it works on a 2D image, called process_image (image* I):

void process_image (image* I) {

for (int y = 0; y < I->h; ++y) {

for (int x = 0; x < I->w; ++x) {

I->set_pixel (x, y, raytrace (x, y));

}

}

}

Here the "raytrace" method isn't important-- it's just something complex that we need to do.

Now Imagine It In A Realtime Application

Next, we imagine this little routine running inside your main game loop.

For instance, update () calls sprite::update () for something, which then checks if it has an image, and if not it calls process_image (I) on it's own bitmap in order to raytrace itself (nevermind the specifics of whether or not this is sensible.)

This only happens once, let's say, when a sprite first is loaded, or once in awhile, say when it changes it's animation.

But there is a problem-- calls to process_image () take way too long!

In particular, the framerate hiccups noticably whenever we create a new sprite. The rest of the time, the game runs smoothly, but here we have a complicated thing to do, and it's all happening within one frame, and there just isn't enough space in that frame, in realtime, to avoid a hiccup.

Amortize!

Computers run in sequential time, making them run in real time is an interesting and important problem for games.

One way to address a problem like this, is amortize.

When we say "amortize" what we mean is, something that would normally cost a whole bunch all at once, well, divide it out across a longer period of time. In our case, we aren't calculating the cost of a new bagel machine over it's expected lifetime of 36 months, instead we are spreading the cost of process_image () over, say, 36 frames.

Unroll

Here's how.

  1. Many local variables (in this case x, y) need to become more global, i.e., exist in a persistant context. Usually this means making them object members.
  2. We rewrite the method to execute for a certain number of Cycles; it will process a certain number of pixels (Cycles) and then, no matter if it's finished or not, it will return.
  3. We use our variables from step 1- to pick up where we left off.
  4. We have a special variable that we set to know everything has finished off.

Example Code

Typically our variables "cur_x" and "cur_y" below would exist inside a class declaration, i.e., be object members, but let's ignore that for a second to make this example simpler:

int cur_x, cur_y;

int state = 0; // 0 == none, 1 == processing, 2 == done

int process_image_partial (image* I, int Cycles) {

if (0 == state) {

// state 0 means, "initialize things"; if we set state to 0 and then call this

// method, it will initialize whatever member or global variables it needs, and

// set itself up to start processing.

cur_x = 0;

cur_y = 0;

state = 1;

}

else if (1 == state) {

// this is the "processing" state; start where we left off (cur_x, cur_y) and iterate

// as long as we have cycles to do it.

bool BreakFlag = false;

for (int y = cur_y; y < I->h; ++y) {

for (int x = cur_x; x < I->w; ++x) {

// check if we can process any more cycles, and then decrement;

// it's best to do it before processing, and in this order, or

// we will end up processing the same pixel twice

if (Cycles <= 0) {

// save so that next frame we start off here

cur_x = x;

cur_y = y;

BreakFlag = true;

break;

}

--Cycles;

// process this pixel, assuming we have cycles

I->set_pixel (x, y, raytrace (x, y));

}

if (BreakFlag) break;

// we finished a line; the next one should start at 0

cur_x = 0;

}

// if we made this far, and didn't break, it means we are done!

if (!BreakFlag) {

state = 2; // set state to finished, so calling routine can tell

}

}

// it is much more convenient to just return unused cycles, and have our calling routine

// call us a number of times until they are used up, than to try and write this routine

// so that it always uses all cycles. The calling routine can normally just ignore any

// unused cycles, anyhow, if it's not worried about taking a few extra frames to finish.

return Cycles;

}

The Calling Routine

The calling routine for this would look something like this:

function sprite::update (double T) {

// presumably, something happens (e.g., our creation) which necessitates us to call

// process image; if so, set it up here

if (needs_process_image) {

// reset it, no matter what's going on; pretend that state, cur_x, and cur_y are

// now members of sprite, not globals.

state = 0;

process_image_partial (image, 1);

needs_process_image = false; // clear this out, it's been initiated

}

// are we processing image? just raytrace 10 pixels per frame

if (1 == state) {

process_imgae_partial (image, 10);

}

// check to see if we finished

if (2 == state) {

do_something_useful_with_image_now_that_image_is_complete_and_ready (image);

// reset state so we don't keep executing this little section.

state = 0;

}

}

Pros and Cons

This code is complex, but the benefit is huge.

BENEFITS
  • You get very precise control over how long process_image_partial () takes each frame.
  • You don't have to rely on the operating system to allocate a background thread, which involves it's own set of uncertainties.
  • You can easily adjust your "cycles" budget later on, as needed. So later on you can just basically instantly "spend less time" doing this by decreasing cycles.
  • And it's very efficient, really.
DRAWBACKS
  • The code is complex, to be sure.
  • For more complex source algorithms, it's even more complex.
  • You need some object state in the object so you can keep track of where you are.
  • You lose some flexibility if you want to change the algorithm later on (this isn't as major as it may seem, however, since the core structure of the algorithm does not change.)
  • The last drawback is that you don't get instantaneous results; for instance, maybe an animation or effect takes a few frames before it starts to play. However in a game this often doesn't actually matter.

The complexity can be somewhat mitigated by breaking the process_image_partial () method or similar into multiple function calls. You can sort of avoid using object state, if that's a burden, by just passing the necessary variables in as parameters, but you'll still need that object state to be somewhere to keep that information on where you are at stored frame-to-frame.

Finally, there is just no real way around it-- each time you use it, you'll likely need to write it from scratch. Each situation will require you to think about it fresh. For instance, would it be enough to have each row of each image be considered a "cycle", rather than each pixel? You could save yourself some work.

Amortization is Useful

It's very rewarding to implement this and see framerate go from a hiccupland to butter-smooth.

It's not the right approach for most slowdown situations, of course. I'd recommend optimizing in other ways, first, as a general rule. And if you are waiting for file or other IO, use an actual thread (don't use this to load data from files or get network data, please.) But if you hadn't thought of this technique before, keep it on hand for when you need it.

Amortization is a genuinely useful, powerful and effective technique. It's a specialized tool, and is somewhat difficult to employ-- but it works very, very well, when you need it.

2012-09-28


◀ Back