Sign In/My Account | View Cart  
advertisement


Listen Print Discuss

Building a 3D Engine in Perl, Part 4
by Geoff Broadwell | Pages: 1, 2, 3, 4, 5, 6, 7, 8

Too Fast

There's yet another problem; this time, one that will require a change to the frame rate calculations. The frame rate shown in the above screenshots is either 333 or 500, but nothing else. On this system, the frames take between two and three milliseconds to render, but because SDL can only provide one-millisecond resolution, the time delta for a single frame will appear to be exactly either .002 second or .003 second. 1/.002=500, and 1/.003=333, so the display is a blur, flashing back and forth between the two possible values.

To get a more representative (and easier-to-read) value, the code must average frame rate over a number of frames. Doing this will allow the total measured time to be long enough to drown out the resolution deficiency of SDL's clock.

The first thing I needed was a routine to initialize the frame rate data to carry over multiple frames:

sub init_fps
{
    my $self = shift;

    $self->{stats}{fps}{cur_fps}    = 0;
    $self->{stats}{fps}{last_frame} = 0;
    $self->{stats}{fps}{last_time}  = $self->{world}{time};
}

The new stats structure in the engine object will hold any statistics that the engine gathers about itself. To calculate FPS, the engine needs to remember the last frame for which it took a timestamp, as well as the timestamp for that frame. Because the engine calculates the frame rate only every few frames, it also saves the last calculated FPS value so that it can render it as needed. The init_fps call, as usual, goes at the end of init:

$self->init_fps;

The new update_fps routine now calculates the frame rate:

sub update_fps
{
    my $self      = shift;

    my $frame     = $self->{state}{frame};
    my $time      = $self->{world}{time};

    my $d_frames  = $frame - $self->{stats}{fps}{last_frame};
    my $d_time    = $time  - $self->{stats}{fps}{last_time};
    $d_time     ||= 0.001;

    if ($d_time >= .2) {
        $self->{stats}{fps}{last_frame} = $frame;
        $self->{stats}{fps}{last_time}  = $time;
        $self->{stats}{fps}{cur_fps}    = int($d_frames / $d_time);
    }
}

update_fps starts by gathering the current frame number and timestamp, and calculating the deltas from the saved values. Again, $d_time must default to 0.001 second to avoid possible divide-by-zero errors later on.

The if statement checks to see if enough time has gone by to result in a reasonably accurate frame rate calculation. If so, it sets the last frame number and timestamp to the current values and the current frame rate to $d_frames / $d_time.

The update_fps call must occur early in the main_loop, but after the engine has determined the new frame number and timestamp. main_loop now looks like this:

sub main_loop
{
    my $self = shift;

    while (not $self->{state}{done}) {
        $self->{state}{frame}++;
        $self->update_time;
        $self->update_fps;
        $self->do_events;
        $self->update_view;
        $self->do_frame;
    }
}

The final change needed to enable the new more accurate display is in draw_fps; the $d_time lookup goes away and the $fps calculation turns into a simple retrieval of the current value from the stats structure:

my $fps  = $self->{stats}{fps}{cur_fps};

The more accurate calculation now makes it easy to see the difference between the frame rate for a simple view (Figure 7):

frame rate for a simple view
Figure 7. Frame rate for a simple view

and the frame rate for a more complex view (Figure 8).

frame rate for a complex view
Figure 8. Frame rate for a complex view

Is the New Display a Bottleneck?

The last thing to do is to check that the shiny new frame rate display is not itself a major bottleneck. The easiest way to do that is to turn benchmark mode back on in init_conf:

    benchmark =& 1,

After doing that, I ran the engine under dprofpp again, and then analyzed the results, just as I had earlier:

$ dprofpp -Q -p step075

Done.
$ dprofpp -I -g main::main_loop
Total Elapsed Time = 3.943764 Seconds
  User+System Time = 1.063773 Seconds
Inclusive Times
%Time ExclSec CumulS #Calls sec/call Csec/c  Name
 100.       -  1.064      1        - 1.0638  main::main_loop
 94.6   0.006  1.007    384   0.0000 0.0026  main::do_frame
 85.2   0.019  0.907    384   0.0000 0.0024  main::draw_frame
 50.7   0.205  0.540    384   0.0005 0.0014  main::draw_view
 16.8   0.073  0.179    384   0.0002 0.0005  main::draw_fps
 15.4   0.095  0.164    384   0.0002 0.0004  main::set_projection_2d
 11.6   0.045  0.124    384   0.0001 0.0003  main::draw_axes
 10.9   0.116  0.116   2688   0.0000 0.0000  SDL::OpenGL::CallList
 8.74   0.013  0.093    384   0.0000 0.0002  main::end_frame
 7.52   0.003  0.080    384   0.0000 0.0002  SDL::App::sync
 7.24   0.077  0.077    384   0.0002 0.0002  SDL::GLSwapBuffers
 4.89   0.052  0.052   3072   0.0000 0.0000  SDL::OpenGL::PopMatrix
 4.70   0.023  0.050    384   0.0001 0.0001  main::update_view
 3.67   0.039  0.039   3456   0.0000 0.0000  SDL::OpenGL::GL_LIGHTING
 3.48   0.037  0.037    384   0.0001 0.0001  SDL::OpenGL::Begin

As it currently stands, draw_view takes half of the run time of main_loop, and the combination of set_projection_2d and draw_fps takes about a third of the main_loop time together. Is that good or bad news?

draw_view is so quick now because I've just optimized it. Now that it's running so fast again, I can afford to add more features and perhaps make a more complex scene, either of which will make draw_view take a larger percentage of the time again. Also, set_projection_2d is necessary for any in-window statistics, debugging, or HUD (heads up display) anyway, so the time spent there will not go to waste.

That leaves draw_fps, taking about one sixth of main_loop's run time. That's perhaps a bit larger than I'd like, but not large enough to warrant additional effort yet. I'll save my energy for the next set of features.

Conclusion

During this article, I covered several concepts relating to engine performance: adding a benchmark mode; profiling with dprofpp; using display lists to optimize slow, repetitive rendering tasks; and using display lists, bitmapped fonts, and averaging to produce a smooth frame rate display. I also added a stub for a triggered events subsystem, which I'll come back to in a future article.

With these performance improvements, the engine is ready for the next new feature, textured surfaces, which will be the main topic for the next article.

Until then, enjoy yourself and have fun hacking!