Recently in Games Category

Lightning Strikes Four Times

by Mike Friedman

Good software design principles tell us that we should work to separate unrelated concerns. For example, the popular Model-View-Controller (MVC) pattern is common in web application designs. In MVC, separate modular components form a model, which provides access to a data source, a view, which presents the data to the end user, and a controller, which implements the required features.

Ideally, it's possible to replace any one of these components without breaking the whole system. A templating engine that translates the application's data into HTML (the view) could be replaced with one that generates YAML or a PDF file. The model and controller shouldn't be affected by changing the way that the view presents data to the user.

Other concerns are difficult to separate. In the world of aspect-oriented programming, a crosscutting concern is a facet of a program which is difficult to modularize because it must interact with many disparate pieces of your system.

Consider an application that logs copious trace data when in debugging mode. In order to ensure that it is operating correctly, you may want to log when it enters and exits each subroutine. A typical way to accomplish this is by conditionally executing a logging function based on the value of a constant, which turns debugging on and off.

    use strict;
    use warnings;

    use constant DEBUG => 1;

    sub do_something { 
        log_message("I'm doing something") if DEBUG;

        # do something here

        log_message("I'm done doing something") if DEBUG;
    }

This solution is simple, but it presents a few problems. Perhaps most strikingly, it's simply a lot of code to write. For each subroutine that you want to log, you must write two nearly identical lines of code. In a large system with hundreds or thousands of subroutines, this gets tedious fast, and can lead to inconsistently formatted messages as every copy-paste-edit cycle tweaks them a little bit.

Further, it offends the simple design goal of an MVC framework, because every component must talk to the logging system directly.

One way to improve this technique is to automatically wrap every interesting subroutine in a special logging function. There are a few ways to go about this. One of the simplest is to use subroutine attributes to install a dynamically generated wrapper.

Attributes

Perl 5.6 introduced attributes that allow you to add arbitrary metadata to a variable. Attributes can be attached both to package variables, including subroutines, and lexical variables. Since Perl 5.8, attributes on lexical variables apply at runtime. Attributes on package variables activate at compile-time.

The interface to Perl attributes is via the attributes pragma. (The older attrs is deprecated.) The CPAN module Attribute::Handlers makes working with attributes a bit easier. Here's an example of how you might rewrite the logging system using an attribute handler.

    use strict;
    use warnings;

    use constant DEBUG => 1;

    use Attribute::Handlers;

    sub _log : ATTR(CODE) {
        my ($pkg, $sym, $code) = @_;

        if( DEBUG ) {
            my $name = *{ $sym }{NAME};

            no warnings 'redefine';

            *{ $sym } = sub {
                log_message("Entering sub $pkg\:\:$name");
                my @ret = $code->( @_ );
                log_message("Leaving sub $pkg\:\:$name");
                return @ret;
            };
        }
    }

    sub do_something : _log {
        print "I'm doing something.\n";
    }

Attributes are declared by placing a colon (:) and the attribute name after a variable or subroutine declaration. Optionally, the attribute can receive some data as a parameter; Attribute::Handlers goes to great lengths to massage the passed data for you if necessary.

To set up an attribute handler, the code declares a subroutine, _log, with the ATTR attribute, passing the string CODE as a parameter. Attribute::Handlers provides ATTR, and the CODE parameter tells it that the new handler only applies to subroutines.

During compile time, any subroutine declared with the _log attribute causes Perl to call the attribute handler with several parameters. The first three are the package in which the subroutine was compiled, a reference to the typeglob where its symbol lives, and a reference to the subroutine itself. These are sufficient for now.

If the DEBUG constant is true, the handler sets to work wrapping the newly compiled subroutine. First, it grabs its name from the typeglob, then it adds a new subroutine to its spot in the symbol table. Because the code redefines a package symbol, it's important to turn off warnings for symbol redefinitions in within this block.

Because the new function is a lexical closure over $pkg, $name, and most importantly $code, it can use those values to construct the logging messages and call the original function.

All of this may seem like a lot of work, but once it's done, all you need to do to enable entry and exit logging for any function is to simply apply the _log attribute. The logging messages themselves get manufactured via closures when the program compiles, so we know they'll always be consistent. If you want to change them, you only have to do it in one place.

Best of all, because attribute handlers get inherited, if you define your handler in a base class, any subclass can use it.

Caveats

Although this is a powerful technique, it isn't perfect. The code will not properly wrap anonymous subroutines, and it won't necessarily propagate calling context to the wrapped functions. Further, using this technique will significantly increase the number of subroutine dispatches that your program must execute during runtime. Depending on your program's complexity, this may significantly increase the size of your call stack. If blinding speed is a major design goal, this strategy may not be for you.

Going Further

Other common cross-cutting concerns are authentication and authorization systems. Subroutine attributes can wrap functions in a security checker that will refuse to call the functions to callers without the proper credentials.

Perl Outperforms C with OpenGL

by Bob Free

Desktop developers often assume that compiled languages always perform better than interpreted languages such as Perl.

Conversely, most LAMP online service developers are familiar with mechanisms for preloading Perl interpreters modules (such as Apache mod_perl and ActivePerl/ISAPI), and know that Perl performance often approaches that of C/C++.

However, few 3D developers think of Perl when it comes to performance. They should.

GPUs are increasingly taking the load off of CPUs for number-crunching. Modern GPGPU processing leverages C-like programs and loads large data arrays onto the GPU, where processing executes independent of the CPU. As a result, the overall contribution of CPU-bound programs diminish, while Perl and C differences become statistically insignificant in terms of GPU performance.

The author has recently published a open source update to CPAN's OpenGL module, adding support for GPGPU features. With this release, he has also posted OpenGL Perl versus C benchmarks--demonstrating cases where Perl outperforms C for OpenGL operations.

What Is OpenGL?

OpenGL is an industry-standard, cross-platform language for rendering 3D images. Originally developed by Silicon Graphics Inc. (SGI), it is now in wide use for 3D CAD/GIS systems, game development, and computer graphics (CG) effects in film.

With the advent of Graphic Processing Units (GPU), realistic, real-time 3D rendering has become common--even in game consoles. GPUs are designed to process large arrays of data, such as 3D vertices, textures, surface normals, and color spaces.

It quickly became clear that the GPU's ability to process large amounts of data could expand well beyond just 3D rendering, and could applied to General Purpose GPU (GPGPU) processing. GPGPUs can process complex physics problems, deal with particle simluations, provide database analytics, etc.

Over the years, OpenGL has expanded to support GPGPU processing, making it simple to load C-like programs into GPU memory for fast execution, to load large arrays of data in the form of textures, and to quickly move data between the GPU and CPU via Frame Buffer Objects (FBO).

While OpenGL is in itself a portable language, it provides no interfaces to operating system (OS) display systems. As a result, Unix systems generally rely on an X11-based library called GLX; Windows relies on a WGL interface. Several libraries, such as GLUT, help to abstract these differences. However, as OpenGL added new extensions, OS vendors (Microsoft in particular) provided different methods for accessing the new APIs, making it difficult to write cross-platform GPGPU code.

Perl OpenGL (POGL)

Bob Free of Graphcomp has just released a new, portable, open source Perl OpenGL module (POGL 0.55).

This module adds support for 52 new OpenGL extensions, including many GPGPU features such as Vertex Arrays, Frame Buffer Objects, Vertext Programs, and Fragment Programs.

In terms of 3D processing, these extensions allow developers to perform real-time dynamic vertex and texturemap generation and manipulation within the GPU. This module also simplifies GPGPU processing by moving data to and from the CPU through textures, and loading low-level, assembly-like instructions to the GPU.

POGL 0.55 is a binary Perl module (written in C via XS), that has been tested on Windows (NT/XP/Vista) and Linux (Fedora 6. Ubuntu/Dapper). Source and binaries are available via SVN, PPM, tarball, and ZIP at the POGL homepage.

POGL OS Performance

The POGL homepage includes initial benchmarks comparing POGL on Vista, Fedora, and Ubuntu. These tests show that static texture rendering on an animated object on Fedora was 10x faster than Vista; Ubuntu was 15x faster (using the same nVidia cards, drivers, and machine).

A subsequent, tighter benchmark eliminated UI and FPS counters, and focused more on dynamic texturemap generation. These results, posted on OpenGL C versus Perl benchmarks, show comparable numbers for Fedora and Ubuntu, with both outperforming Vista by about 60 percent.

Note: a further performance on these benchmarks could be available through the use of GPU vertex arrays.

Perl versus C Performance

These benchmarks also compare Perl against C code. It found no statistical difference between overall Perl and C performance on Linux. Inexplicably, Perl frequently outperformed C on Vista.

In general, C performed better than Perl on Vertex/Fragment Shader operations, while Perl outperformed C on FBO operations. In this benchmark, overall performance was essentially equal between Perl and C.

The similarity in performance is explained by several factors:

  • GPU is performing the bulk of the number-crunching operations
  • POGL is a compiled C module
  • Non-GPU operations are minimal

In cases where code dynamically generates or otherwise modifies the GPU's vetex/fragment shader code, it is conceivable that Perl would provide even better than C, due to Perl's optimized and interpreted string handling.

Perl Advantages

Given that GPGPU performance will be a wash in most cases, the primary reason for using a compiled language is to obfuscate source for intellectual property (IP) reasons.

For server-side development, there's really no reason to use a compiled language for GPGPU operations, and several reasons to go with Perl:

  • Perl OpenGL code is more portable than C; therefore there are fewer lines of code
  • Numerous imaging modules for loading GPGPU data arrays (textures)
  • Portable, open source modules for system and auxiliary functions
  • Perl (under mod-perl/ISAPI) is generally faster than Java
  • It is easier to port Perl to/from C than Python or Ruby
  • As of this writing, there is no FBO support in Java, Python, or Ruby

There is a side-by-side code comparison between C and Perl posted on the above benchmark page.

Desktop OpenGL/GPU developers may find it faster to prototype code in Perl (e.g., simpler string handling and garbage collection), and then port their code to C later (if necessary). Developers can code in one window and execute in another--with no IDE, no compiling--allowing innovators/researchers to do real-time experiments with new shader algorithms.

Physicists can quickly develop new models; researchers and media developers can create new experimental effects and reduce their time to market.

Summary

Performance is not a reason a reason to use C over Perl for OpenGL and GPGPU operations, and there are many cases where Perl is preferable to C (or Java/Python/Ruby).

By writing your OpenGL/GPU code in Perl, you will likely:

  • Reduce your R&D costs and time to market
  • Expand your platform/deployment options
  • Accelerate your company's GPGPU ramp up

Using Test::Count

by Shlomi Fish

A typical Test::More test script contains several checks. It is preferable to keep track of the number of checks that the script is running (using use Test::More tests => $NUM_CHECKS or the plan tests => $NUM_CHECKS), so that if some checks are not run (for whatever reason), the test script will still fail when being run by the harness.

If you add more checks to a test file, then you have to remember to update the plan. However, how do you keep track of how many tests should run? I've already encountered a case where a DBI related module had a different number of tests with an older version of DBI than with a more recent one.

Enter Test::Count. Test::Count originated from a Vim script I wrote to keep track of the number of tests by using meta-comments such as # TEST (for one test) or # TEST*3*5 (for 15 tests). However, there was a limit to what I could do with Vim's scripting language, as I wanted a richer syntax for specifying the tests as well as variables.

Thus, I wrote the Test::Count module and placed it on CPAN. Test::Count::Filter acts as a filter, counts the tests, and updates them. Here's an example, taken from a code I wrote for a Perl Quiz of the Week:

#!/usr/bin/perl -w

# This file implements various functions to remove
# all periods ("."'s) except the last from a string.

use strict;

use Test::More tests => 5;
use String::ShellQuote;

sub via_split
{
    my $s = shift;
    my @components = split(/\./, $s, -1);
    if (@components == 1)
    {
        return $s;
    }
    my $last = pop(@components);
    return join("", @components) . "." . $last;
}

# Other Functions snipped.

# TEST:$num_tests=9
# TEST:$num_funcs=8
# TEST*$num_tests*$num_funcs
foreach my $f (@funcs)
{
    my $ref = eval ("\\&$f");
    is($ref->("hello.world.txt"), "helloworld.txt", "$f - simple"); # 1
    is($ref->("hello-there"), "hello-there", "$f - zero periods"); # 2
    is($ref->("hello..too.pl"), "hellotoo.pl", "$f - double"); # 3
    is($ref->("magna..carta"), "magna.carta", "$f - double at end"); # 4
    is($ref->("the-more-the-merrier.jpg"),
       "the-more-the-merrier.jpg", "$f - one period"); # 5
    is($ref->("hello."), "hello.", "$f - one period at end"); # 6
    is($ref->("perl.txt."), "perltxt.", "$f - period at end"); # 7
    is($ref->(".yes"), ".yes", "$f - one period at start"); # 8
    is($ref->(".yes.txt"), "yes.txt", "$f - period at start"); # 9
}

Filtering this script through Test::Count::Filter provides the correct number of tests. I then add this to my .vimrc:

function! Perl_Tests_Count()
    %!perl -MTest::Count::Filter -e 'Test::Count::Filter->new({})->process()'
endfunction

autocmd BufNewFile,BufRead *.t map <F3> :call Perl_Tests_Count()<CR>

Now I can press F3 to update the number of checks.

Test::Count supports +,-,*, /, as well as parentheses, so it is expressive enough for most needs.

Acknowledgements

Thanks to mrMister from Freenode for going over earlier drafts of this article and correcting some problems.

What's In that Scalar?

by brian d foy

Scalars are simple, right? They hold single values, and you don't even have to care what those values are because Perl figures out if they are numbers or strings. Well, scalars show up just about anywhere and it's much more complicated than single values. I could have undef, a number or string, or a reference. That reference can be a normal reference, a blessed reference, or even a hidden reference as a tied variable.

Perhaps I have a scalar variable which should be an object (a blessed reference, which is a single value), but before I call a method on it I want to ensure it is to avoid the "unblessed reference" error that kills my program. I might try the ref built-in to get the class name:

   if( ref $maybe_object ) { ... }

There's a bug there. ref returns an empty string if the scalar isn't an object. It might return 0, a false value, and yes, some Perl people have figured out how to create a package named 0 just to mess with this. I might think that checking for defined-ness would work:

   if( defined ref $maybe_object ) { ... }

... but the empty string is also defined. I want all the cases where it is not the one value that means it's not a reference.

   unless( '' eq ref $maybe_object ) { ... }

This still doesn't tell me if I have an object. I know it's a reference, but maybe it's a regular data reference. The blessed function from Scalar::Util can help:

   if( blessed $maybe_object ) { ... }

This almost has the same problem as ref. blessed returns the package name if it's an object, and undef otherwise. I really need to check for defined-ness.

   if( defined blessed $maybe_object ) { ... }

Even if blessed returns undef, I still might have a hidden object. If the scalar is a tied variable, there's really an object underneath it, although the scalar acts as if it's a normal variable. Although I normally don't need to interact with the secret object, the tied built-in returns the secret object if there is one, and undef otherwise.

        my $secret_object = tied $maybe_tied_scalar;

        if( defined $secret_object ) { ... }

Once I have the secret object in $secret_object, I treat it like any other object.

Now I'm sure I have an object, but that doesn't mean I know which methods I can call. The isa function in the UNIVERSAL package supposedly can figure this out for me. It tells me if a class is somewhere in an object's inheritance tree. I want to know if my object can do what a Horse can do, even if I have a RaceHorse:

   if( UNIVERSAL::isa( $object, 'RaceHorse' ) {
           $object->method;
           }

...what if the RaceHorse class is just a factory for objects in some other class that I'm not supposed to know about? I'll make a new object as a prototype just to get its reference:

   if( UNIVERSAL::isa( $object, ref RaceHorse->new() ) {
           $object->method;
           }

A real object-oriented programmer doesn't care what sort of object it is as long as it can respond to the right method. I should use can instead:

   if( UNIVERSAL::can( $object, $method ) {
           $object->method;
           }

This doesn't always work either. can only knows about defined subroutine names, and only looks in the inheritance tree for them. It can't detect methods from AUTOLOAD or traits. I could override the can method to handle those, but I have to call it as a method (this works for isa too):

   if( $object->can( $method ) ) {
           $object->method;
           }

What if $object wasn't really an object? I just called a method on a non-object! I'm back to my original problem, but I don't want to use all of those tests I just went through. I'll fix this with an eval, which catches the error for non-objects:

   if( eval{ $object->can( $method ) } ) {
           $object->method;
           }

...but what if someone installed a __DIE__ handler that simply exit-ed instead of die-ing? Programmers do that sort of thing forgetting that it affects the entire program.

   $SIG{__DIE__} = sub { exit };

Now my eval tries to die because it caught the error, but the __DIE__ handler says exit, so the program stops without an error. I have to localize the __DIE__ handler:

   if( eval{ local $SIG{__DIE__}; $object->can( $method ) } ) {
           $object->method;
           }

If I'm the guy responsible for the __DIE__ handler, I could use $^S to see if I'm in an eval:

   $SIG{__DIE__} = sub { $^S ? die : exit };

That's solved it, right? Not quite. Why do all of that checking? I can just call the method and hope for the best. If I get an error, so be it:

   my $result = eval { $object->method };

Now I have to wrap all of my method calls in an eval. None of this would really be a problem if Perl were an object language. Or is it? The autobox module makes Perl data types look like objects:

   use autobox;

   sub SCALAR::println { print $_[0], "\n" }

   'Hello World'->println;

That works because it uses a special package SCALAR, although I need to define methods in it myself. I'll catch unknown methods with AUTOLOAD:

   sub SCALAR::AUTOLOAD {}

Or, I can just wait for Perl 6 when these things get much less murky because everything is an object.

Building a 3D Engine in Perl, Part 4


This article is the fourth in a series aimed at building a full 3D engine in Perl. The first article started with basic program structure and worked up to displaying a simple depth-buffered scene in an OpenGL window. The second article followed with a discussion of time, view animation, SDL events, keyboard handling, and a nice chunk of refactoring. The third article continued with screenshots, movement of the viewpoint, simple OpenGL lighting, and subdivided box faces.

At the end of the last article, the engine was quite slow. This article shows how to locate the performance problem and what to do about it. Then it demonstrates how to apply the same new OpenGL technique a different way to create an on-screen frame rate counter. As usual, you can follow along with the code by downloading the sample code.

SDL_perl Developments

First, there is some good news--Win32 users are no longer left out in the cold. Thanks to Wayne Keenan, SDL_perl 1.x now fully supports OpenGL on Win32, and prebuilt binaries are available. There are more details at the new SDL_perl 1.x page on my site; browse the Subversion repository at svn.openfoundry.org/sdlperl1.

If you'd like to help in the efforts to improve SDL_perl 1.x, please come visit the SDL_perl 1.x page, check out the code and send me comments or patches, or ping me in #sdlperl on irc.freenode.net.

Benchmarking the Engine

As I mentioned in the introduction, when last I left off, the engine pretty much crawled. It's time to figure out why and figure out what to do about it. The right tool for the first job is a profiler, which watches a running program and keeps track of the performance of each part of it. Perl's native profiler is dprofpp, which tracks time spent and call count for every subroutine in the program. Examining these numbers will reveal if the engine spends most of its time in one routine, which will then be the focus for optimization.

It's best if these numbers are relatively repeatable from run to run, making it easy to compare profiles before and after a change. For a rendering engine, the easiest solution is a benchmark mode. In benchmark mode, the engine runs for a set period of time or number of frames, displaying a predefined scene or sequence. I chose to enable benchmark mode with a new setting in init_conf:

benchmark => 1,

The engine already displays a constant scene as long as the user doesn't press any keys; the remaining requirement is to quit after a set period.

In previous articles I've simply hardcoded an out-of-time check into the rendering loop, but this time I opted for a more general approach, using triggered events. Engine events so far have always come from SDL in response to external input, such as key presses and window close events. In contrast, the engine itself produces triggered events in response to changes in the state of the simulated world, such as a player attempting to open a door or attack an enemy.

To gather these events, I added two new lines to the beginning of do_events; the opening lines are now:

sub do_events
{
    my $self = shift;

    my $queue     = $self->process_events;
    my $triggered = $self->triggered_events;
    push @$queue, @$triggered;

After processing the SDL events with process_events and stuffing the resulting commands into the $queue, do_events calls triggered_events to gather commands from any pending internally generated events and adds them to the $queue. triggered_events can be pretty simple for now:

sub triggered_events
{
    my $self = shift;

    my @queue;
    push @queue, 'quit' if $self->{conf}{benchmark} and
                           $self->{world}{time} >= 5;
    return \@queue;
}

This is pretty much a direct translation of the old hardcoded timeout code to the command queue concept. Normally triggered_events simply returns an empty arrayref, indicating no events were triggered, and therefore no commands generated. Benchmark mode adds a quit command to the queue as soon as the world time reaches 5 seconds. Normal command processing in do_events will take care of the rest.

dprofpp is Your (Obtuse) Friend

With benchmark mode enabled, the engine runs under dprofpp. The first step is to collect the profile data:

dprofpp -Q -p step065

-p step065 tells dprofpp to profile the program named step065, and -Q tells it to quit after collecting the data. dprofpp ran step065, collected the profile data, and stored it in a specially formatted text file named tmon.out in the current directory.

To turn the profile data into human-readable output, I used dprofpp without any arguments. It crunched the collected data for a while and finally produced this:

$ dprofpp
Exporter::Heavy::heavy_export_to_level has 4 unstacked calls in outer
Exporter::export_to_level has -4 unstacked calls in outer
Exporter::export has -12 unstacked calls in outer
Exporter::Heavy::heavy_export has 12 unstacked calls in outer
Total Elapsed Time = 4.838377 Seconds
  User+System Time = 1.498377 Seconds
Exclusive Times
%Time ExclSec CumulS #Calls sec/call Csec/c  Name
 88.1   1.320  1.320      1   1.3200 1.3200  SDL::SetVideoMode
 38.1   0.571  0.774    294   0.0019 0.0026  main::draw_quad_face
 16.0   0.240  0.341      8   0.0300 0.0426  SDL::OpenGL::BEGIN
 13.0   0.195  0.195  64722   0.0000 0.0000  SDL::OpenGL::Vertex
 11.3   0.170  0.170      1   0.1700 0.1700  DynaLoader::dl_load_file
 9.34   0.140  0.020     12   0.0116 0.0017  Exporter::export
 6.67   0.100  0.100   1001   0.0001 0.0001  SDL::in
 4.00   0.060  0.060      1   0.0600 0.0600  SDL::Init
 3.34   0.050  0.847      8   0.0062 0.1059  main::BEGIN
 2.00   0.030  0.040      5   0.0060 0.0080  SDL::Event::BEGIN
 1.80   0.027  0.801     49   0.0005 0.0163  main::draw_cube
 1.47   0.022  0.022   2947   0.0000 0.0000  SDL::OpenGL::End
 1.33   0.020  0.020      1   0.0200 0.0200  warnings::BEGIN
 1.33   0.020  0.020     16   0.0012 0.0012  Exporter::as_heavy
 1.33   0.020  0.209      5   0.0040 0.0418  SDL::BEGIN

There are several problems with this output. The numbers are clearly silly (88 percent of its time spent in SDL::SetVideoMode?), the statistics for the various BEGIN blocks are inconsequential to the task and in the way, and the error messages at the top are rather disconcerting. To fix these issues, dprofpp has the -g option, which tells dprofpp to only display statistics for a particular routine and its descendants:

$ dprofpp -g main::main_loop
Total Elapsed Time = 4.952042 Seconds
  User+System Time = 0.812051 Seconds
Exclusive Times
%Time ExclSec CumulS #Calls sec/call Csec/c  Name
 70.3   0.571  0.774    294   0.0019 0.0026  main::draw_quad_face
 24.0   0.195  0.195  64722   0.0000 0.0000  SDL::OpenGL::Vertex
 3.32   0.027  0.801     49   0.0005 0.0163  main::draw_cube
 2.71   0.022  0.022   2947   0.0000 0.0000  SDL::OpenGL::End
 1.23   0.010  0.010     49   0.0002 0.0002  SDL::OpenGL::Rotate
 1.11   0.009  0.009      7   0.0013 0.0013  main::prep_frame
 1.11   0.009  0.009     70   0.0001 0.0001  SDL::OpenGL::Color
 0.25   0.002  0.002   2947   0.0000 0.0000  SDL::OpenGL::Begin
 0.00       - -0.000      1        -      -  main::action_quit
 0.00       - -0.000      2        -      -  SDL::EventType
 0.00       - -0.000      2        -      -  SDL::Event::type
 0.00       - -0.000      7        -      -  SDL::GetTicks
 0.00       - -0.000      7        -      -  SDL::OpenGL::Clear
 0.00       - -0.000      7        -      -  SDL::OpenGL::GL_NORMALIZE
 0.00       - -0.000      7        -      -  SDL::OpenGL::GL_SPOT_EXPONENT

You may have noticed that I specified main::main_loop instead of just main_loop. dprofpp always uses fully qualified names and will give empty results if you use main_loop without the main:: package qualifier.

In this exclusive times view, the percentages in the first column and the row order depend only on the runtime of each routine, without respect to its children. Using just this view, I might have tried to optimize draw_quad_face somehow, as it appears to be the most expensive routine by a large margin. That's not the best approach, however, as an inclusive view (-I) shows:

$ dprofpp -I -g main::main_loop
Total Elapsed Time = 4.952042 Seconds
  User+System Time = 0.812051 Seconds
Inclusive Times
%Time ExclSec CumulS #Calls sec/call Csec/c  Name
 100.       -  0.814      7        - 0.1163  main::do_frame
 99.9       -  0.812      1        - 0.8121  main::main_loop
 99.7       -  0.810      7        - 0.1158  main::draw_view
 99.2       -  0.806      7        - 0.1151  main::draw_frame
 98.6   0.027  0.801     49   0.0005 0.0163  main::draw_cube
 95.3   0.571  0.774    294   0.0019 0.0026  main::draw_quad_face
 24.0   0.195  0.195  64722   0.0000 0.0000  SDL::OpenGL::Vertex
 2.71   0.022  0.022   2947   0.0000 0.0000  SDL::OpenGL::End
 1.23   0.010  0.010     49   0.0002 0.0002  SDL::OpenGL::Rotate
 1.11   0.009  0.009     70   0.0001 0.0001  SDL::OpenGL::Color
 1.11   0.009  0.009      7   0.0013 0.0013  main::prep_frame
 0.25   0.002  0.002   2947   0.0000 0.0000  SDL::OpenGL::Begin
 0.00       - -0.000      1        -      -  main::action_quit
 0.00       - -0.000      2        -      -  SDL::EventType
 0.00       - -0.000      2        -      -  SDL::Event::type

In this view, draw_quad_face looks even worse, because the first column now includes the time taken by all of the OpenGL calls inside of it, including tens of thousands of glVertex calls. It seems that I should do something to speed it up, but at this point it's not entirely clear how to simplify it or reduce the number of OpenGL calls it makes (other than reducing the subdivision level of each face, which would reduce rendering quality).

Actually, there's a better option. The real problem is that draw_cube dominates the execution time, and draw_quad_face dominates that. How about not calling draw_cube (and therefore draw_quad_face) at all during normal rendering? It seems extremely wasteful to have to tell OpenGL how to render a cube face dozens of times each frame. If only there were a way to tell OpenGL to remember the cube definition once, and just refer to that definition each time the engine needs to draw it.

Display Lists

I expect no one will find it surprising that OpenGL provides exactly this function, with the display lists facility. A display list is a list of OpenGL commands to execute to perform some function. The OpenGL driver stores it (sometimes in a mildly optimized format) and further code refers to it by number. Later, the program can request that OpenGL run the commands in some particular list as many times as desired. Lists can even call other lists; a bicycle model might call a wheel display list twice, and the wheel display list might itself call a spoke display list dozens of times.

I added init_models to create display lists for each shape I want to model:

sub init_models
{
    my $self = shift;

    my %models = (
        cube => \&draw_cube,
    );
    my $count  = keys %models;
    my $base   = glGenLists($count);
    my %display_lists;

    foreach my $model (keys %models) {
        glNewList($base, GL_COMPILE);
        $models{$model}->();
        glEndList;

        $display_lists{$model} = $base++;
    }

    $self->{models}{dls} = \%display_lists;
}

%models associates each model with the code needed to draw it. Because the engine already knows how to draw a cube, I simply reused draw_cube here. The next two lines begin the work of building the display lists. The code first determines how many display lists it needs and then calls glGenLists to allocate them. OpenGL numbers the allocated lists in sequence, returning the first number in the sequence (the list base). For example, if the code had requested four lists, OpenGL might have numbered them 1051, 1052, 1053, and 1054, and would then return 1051 as the list base.

For each defined model, init_models calls glNewList to tell OpenGL that it is ready to compile a new display list at the number $base. OpenGL then prepares to convert any subsequent OpenGL calls to entries in the list, rather than rendering them immediately. If I had chosen GL_COMPILE_AND_EXECUTE instead of GL_COMPILE, OpenGL would perform the rendering and save the calls in the display list at the same time. GL_COMPILE_AND_EXECUTE is useful for on-the-fly caching when code needs active rendering anyway. Because init_models is simply precaching the rendering commands and nothing should render while this occurs, GL_COMPILE is the better choice.

The code then calls the drawing routine, which conveniently submits all of the OpenGL calls needed for the new list. The call to glEndList then tells OpenGL to stop recording entries in the display list and return to normal operation. The model loop then records the display list number used by the current model in the %display_lists hash, and increments $base for the next iteration. After processing all of the models, init_models saves %display_lists into a new structure in the engine object.

init calls init_models just before init_objects:

$self->init_models;
$self->init_objects;

With this initialization in place, the next step was to change draw_view to draw from either a model or a draw routine. To do this, I replaced the $o->{draw}->() call with:

    if ($o->{model}) {
        my $dl = $self->{models}{dls}{$o->{model}};
        glCallList($dl);
    }
    else {
        $o->{draw}->();
    }

If the object has an associated model, draw_view looks up the display list in the hash created by init_models, and then calls the list using glCallList. Otherwise, draw_view falls back to calling the object's draw routine as before. A quick run confirmed that the fallback works and adding init_models didn't break anything, so it was safe to change init_objects to use models instead of draw routines for the cubes. This involved replacement of just three lines--I changed each copy of:

        draw        =& \&draw_cube,

to:

        model       =& 'cube',

Suddenly, the engine was much faster and more responsive. A dprofpp run confirmed this:

$ dprofpp -Q -p step068

Done.
$ dprofpp -I -g main::main_loop
Total Elapsed Time = 4.053240 Seconds
  User+System Time = 0.973250 Seconds
Inclusive Times
%Time ExclSec CumulS #Calls sec/call Csec/c  Name
 99.9       -  0.973      1        - 0.9733  main::main_loop
 86.5   0.024  0.842    413   0.0001 0.0020  main::do_frame
 58.1   0.203  0.566    413   0.0005 0.0014  main::draw_view
 56.9   0.016  0.554    413   0.0000 0.0013  main::draw_frame
 20.1   0.196  0.196    413   0.0005 0.0005  SDL::GLSwapBuffers
 19.3       -  0.188    413        - 0.0005  SDL::App::sync
 18.4       -  0.180    413        - 0.0004  main::end_frame
 16.7   0.163  0.163   2891   0.0001 0.0001  SDL::OpenGL::CallList
 9.14   0.028  0.089    413   0.0001 0.0002  main::do_events
 8.53   0.035  0.083    413   0.0001 0.0002  main::prep_frame
 6.68   0.008  0.065    413   0.0000 0.0002  main::process_events
 5.03   0.049  0.049   3304   0.0000 0.0000  SDL::OpenGL::GL_LIGHTING
 4.93   0.002  0.048    413   0.0000 0.0001  SDL::Event::pump
 4.73   0.046  0.046    413   0.0001 0.0001  SDL::PumpEvents
 4.11   0.012  0.040    413   0.0000 0.0001  main::update_time

Note that I had to run dprofpp -Q -p again with the new code before doing the analysis, or dprofpp would have just reused the old tmon.out.

The first thing to note in this report is that previously the engine only managed seven frames (calls to do_frame) before timing out, but now managed 413 in the same time! Secondly, as intended, main_loop never calls draw_cube, having replaced all such calls with calls to glCallList. Because of this it is no longer necessary to do many thousands of low-level OpenGL calls to draw the scene each frame, with the attendant Perl and XS overhead. Instead, the OpenGL driver handles all of those calls internally, with minimal overhead.

This has the added advantage that it is now feasible to run the engine on one computer and display the window on another, as the OpenGL driver on the displaying computer saves the display lists. Once init_models compiles the display lists, they are loaded into the display driver, and future frames require minimal network traffic to handle glCallList. (Adventurous users running X can do this by logging in locally to the display computer, sshing to the computer that has the engine and SDL_perl on it, and running the program there. If your ssh has X11 forwarding turned on, your reward should be a local window. And there was much rejoicing.)

An FPS Counter

The measurements that dprofpp performs have enough overhead to significantly reduce the engine's apparent performance. (Even old hardware can do better than 80-100 FPS with this simple scene.) The overhead is necessary to get a detailed analysis, but when it comes time to show off, most users want to have a nice frame rate display showing the performance of the engine running as fast as it can.

Making a frame rate display requires the ability to render text in front of the scene. The necessary pieces of that are:

  1. A font containing glyphs for the characters to display (at least 0 through 9).
  2. A font reader to load the font from a file into memory as bitmaps.
  3. A converter from raw bitmaps to a format that OpenGL can readily display.
  4. A way to render the proper bitmaps for a given string.
  5. A way to calculate the current frame rate.

The Numbers Font

There are hundreds of freely available fonts, but most of them are available only in fairly complex font formats such as TrueType and Type 1. Some versions of SDL_perl support these complex font formats, but this support has historically been frustratingly buggy or incomplete.

Given the relatively simple requirement (render a single integer), I chose instead to create a very simple bitmapped font format just for this article. The font file is numbers-7x11.txt in the examples tarball. It begins as follows:

7x11

30
..000..
.0...0.
.0...0.
0.....0
0.....0
0.....0
0.....0
0.....0
.0...0.
.0...0.
..000..

31
...0...
..00...
00.0...
...0...
...0...
...0...
...0...
...0...
...0...
...0...
0000000

The first line indicates the size of each character cell in the font; in this case, seven columns and 11 rows. The remaining chunks each consist of the character's codepoint in hex followed by a bitmap represented as text--. represents a transparent pixel, and 0 represents a rendered pixel. Empty lines separate chunks.

The Font Reader

To read the glyph definitions into bitmaps, I first added read_font_file:

sub read_font_file
{
    my $self = shift;
    my $file = shift;

    open my $defs, '<', $file
        or die "Could not open '$file': $!";
    local $/ = '';

    my $header  = <$defs>;
    chomp($header);
    my ($w, $h) = split /x/ =& $header;

    my %bitmaps;
    while (my $def = <$defs>) {
        my ($hex, @rows) = grep /\S/ =& split /\n/ =& $def;

        @rows = map {tr/.0/01/; pack 'B*' =& $_} @rows;

        my $bitmap           = join '' =& reverse @rows;
        my $codepoint        = hex $hex;

        $bitmaps{$codepoint} = $bitmap;
    }

    return (\%bitmaps, $w, $h);
}

read_font_file begins by opening the font file for reading. It next requests paragraph slurping mode by setting $/ to ''. In this mode, Perl automatically breaks up the font file at empty lines, with the header first followed by each complete glyph definition as a single chunk. Next, the routine reads the header, chomps it, and splits the cell size definition into width and height.

With the preliminaries out of the way, read_font_file creates a hash to store the finished bitmaps and enters a while loop over the remaining chunks of the font file. Each glyph definition is split into a hex number and an array of bitmap rows; using grep /\S/ =& ignores any trailing blank lines.

The next line converts textual rows to real bitstrings. First, each transparent pixel (.) becomes 0, and each rendered pixel (0) turns into a 1. Feeding the resulting binary text string to pack 'B*' converts the binary into an actual bitstring, with the bits packed in starting from the high bit of each byte (as OpenGL prefers). The resulting bitstrings are stored back in @rows.

Because OpenGL prefers bitmaps to start at the bottom and go up, the code reverses @rows before joining to create the finished bitmap. The hex operator converts the hex number to decimal to be the key for the newly created bitmap in the %bitmaps hash.

After parsing the whole font file, the function returns the bitmaps to the caller, along with the cell size metrics.

Speaking OpenGL's Language

The bitmaps produced by read_font_file are simply packed bitstrings, in this case 11 bytes long (one byte per seven-pixel row). Before using them to render strings, the engine must first load these bitmaps into OpenGL. This happens in the main init_fonts routine:

sub init_fonts
{
    my $self  = shift;

    my %fonts = (
        numbers =& 'numbers-7x11.txt',
    );

    glPixelStore(GL_UNPACK_ALIGNMENT, 1);

    foreach my $font (keys %fonts) {
        my ($bitmaps, $w, $h) = 
            $self->read_font_file($fonts{$font});

        my @cps    = sort {$a <=& $b} keys %$bitmaps;
        my $max_cp = $cps[-1];
        my $base   = glGenLists($max_cp + 1);

        foreach my $codepoint (@cps) {
            glNewList($base + $codepoint, GL_COMPILE);
            glBitmap($w, $h, 0, 0, $w + 2, 0,
                     $bitmaps->{$codepoint});
            glEndList;
        }

        $self->{fonts}{$font}{base} = $base;
    }
}

init_fonts opens with a hash associating each known font with a font file; at the moment, only the numbers font is defined. The real work begins with the glPixelStore call, which tells OpenGL that the rows for all bitmaps are tightly packed (along one-byte boundaries) rather than being padded, so that each row begins at even two-, four-, or eight-byte memory locations.

The main font loop starts by calling read_font_file to load the bitmaps for the current font into memory. The next line sorts the codepoints into @cps, and the following line finds the maximum codepoint by simply taking the last one in @cps.

The glGenLists call allocates display lists for codepoints 0 through $max_cp, which will have numbers from $base through $base + $max_cp. For each codepoint defined by the font, the inner loop uses glNewList to start compiling the appropriate list, glBitmap to load the bitmap into OpenGL, and finally, glEndList to finish compiling the list.

The glBitmap call has six parameters aside from the bitmap data itself ($bitmaps->{$codepoint}). The first two are the width and height of the bitmap in pixels, which read_font_file conveniently provides. The next two define the origin for the bitmap, counted from the lower-left corner. Bitmap fonts use a non-zero origin for several purposes, generally when the glyph extends farther left or below the "normal" lower-left corner. This may be because the glyph has a descender (a part of the glyph that descends below the general line of text, as with the lowercase letters "p" and "y"), or perhaps because the font leans to the left. The simple code in init_fonts assumes none of these special cases apply and sets the origin to (0,0).

The last two parameters are the X and Y increments, the distances that OpenGL should move along the X and Y axes before drawing the next character. Left-to-right languages use fonts with positive X and zero Y increments; right-to-left languages use negative X and zero Y. Top-to-bottom languages use zero X and negative Y. The increments must include both the width/height of the character itself and any additional distance needed to provide proper spacing. In this case, the rendering will be left to right. I wanted two extra pixels for spacing, so I set the X increment to width plus two, and the Y increment to zero.

The last line of the outer loop simply saves the list base for the font to make it available later during rendering.

init calls init_fonts as usual, just after the call to init_time:

$self->init_fonts;

Text Rendering

The hard part is now done: parsing the font file and loading the bitmaps into OpenGL. The new draw_fps routine calculates and renders the frame rate:

sub draw_fps
{
    my $self   = shift;

    my $base   = $self->{fonts}{numbers}{base};
    my $d_time = $self->{world}{d_time} || 0.001;
    my $fps    = int(1 / $d_time);

    glColor(1, 1, 1);
    glRasterPos(10, 10, 0);
    glListBase($base);
    glCallListsScalar($fps);
}

The routine starts by retrieving the list base for the numbers font, retrieving the world time delta for this frame, and calculating the current frames per second as one frame in $d_time seconds. It takes a little care to make sure $d_time is non-zero, even if the engine is running so fast that it renders a frame in less than a millisecond (the precision of SDL time handling); otherwise, the $fps calculation would die with a divide-by-zero error.

The OpenGL section begins by setting the current drawing color to white with a call to glColor. The next line sets the raster position, the window coordinates at which to place the origin of the next bitmap. After rendering each bitmap, the raster position is automatically updated using the bitmap's X and Y increments so that the bitmaps will not overlap each other. In this case, (10, 10, 0) sets the raster position ten pixels up and right from the lower-left corner of the window, with Z=0.

The next two lines together actually call the appropriate display list in our bitmap font for each character in the $fps string. glCallListsScalar breaks the string into individual characters and calls the display list with the same number as the codepoint of the character. For example, for the "5" character (at codepoint 53 decimal), glCallListsScalar calls display list 53. Unfortunately, there's no guarantee that display list 53 actually will display a "5," because the font's list base may not be 0. If the font had a list base of 1500, for example, the code would need to call display list 1500+53=1553 to display the "5."

Rather than make the programmer do this calculation manually every time, OpenGL provides the glListBase function, which sets the list base to use with glCallLists. After the glListBase call above, OpenGL will automatically offset every display list number specified with glCallLists by $base.

You may have noticed that in the code I use glCallListsScalar, but the previous paragraph referred to glCallLists instead. glCallListsScalar is actually an SDL_perl extension (not part of core OpenGL) that provides an alternate calling convention for glCallLists in Perl. Internally, SDL_perl implements both Perl routines using the same underlying C function in OpenGL (glCallLists). SDL_perl provides two different calling conventions because Perl treats a string and an array of numbers as two different things, while C treats them as essentially the same.

If you want to render a string, and all of the characters in the string have codepoints <= 255 decimal (single-byte character sets, and the ASCII subset of most variable-width character sets), you can use glCallListsScalar, and it will do the right thing for you:

glCallListsScalar($string);

If you simply want to render several display lists with a single call, and you're not trying to render a string, use the standard version of glCallLists:

glCallLists(@lists);

If you need to render a string, but it contains characters above codepoint 255, you have to use a more complex workaround:

glCallLists(map ord($_) =& split // =& $string);

Because the FPS counter merely renders ASCII digits, the first option works fine.

draw_frame now ends with a call to draw_fps, like so:

sub draw_frame
{
    my $self = shift;

    $self->set_projection_3d;
    $self->set_eye_lights;
    $self->set_view_3d;
    $self->set_world_lights;
    $self->draw_view;
    $self->draw_fps;
}

For now, I decided to turn off benchmark mode by changing the config setting in init_config to 0:

    benchmark =& 0,

With the font handling in place, and draw_fps called each frame to display the frame rate in white in the lower-left corner, everything should be grand, as Figure 1 shows.

drawing frame rate, take one
Figure 1. Drawing the frame rate

Oops. There's no frame rate display. Actually, it's there, just very faint. If you look very carefully (or turn your video card's gamma up very high), you can just make out the frame rate display near the top of the window, above the big white box on the right. There are (at least) two problems--the text is too dark and it's in the wrong place.

The first problem is reminiscent of the dark scene in the last article, after enabling lighting but no lights. Come to think of it, there's not much reason to have lighting enabled just to display stats, but the last object rendered by draw_view left it on. To make sure lighting is off, I added a set_lighting_2d routine, which draw_frame now calls just before calling draw_fps:

sub set_lighting_2d
{
    glDisable(GL_LIGHTING);
}

the unlit frame rate
Figure 2. The unlit frame rate

Figure 2 is much better! With lighting turned off, the frame rate now renders in bright white as intended. The next problem is the incorrect position. Moving and rotating the viewpoint shows that while the digits always face the screen, their apparent position moves around (Figure 3).

moving frame rate
Figure 3. A moving frame rate

It turns out that the current modelview and projection matrices transform the raster position set by glRasterPos, just like the coordinates from a glVertex call. That means OpenGL reuses whatever state the modelview and projection matrices are in.

To get unaltered window coordinates, I need to use an orthographic projection (no foreshortening or other non-linear effects) matching the window dimensions. I also need to set an identity modelview matrix (so that the modelview matrix won't transform the coordinates at all). All of this happens in set_projection_2d, called just before set_lighting_2d in draw_frame:

sub set_projection_2d
{
    my $self = shift;

    my $w    = $self->{conf}{width};
    my $h    = $self->{conf}{height};

    glMatrixMode(GL_PROJECTION);
    glLoadIdentity;
    gluOrtho2D(0, $w, 0, $h);

    glMatrixMode(GL_MODELVIEW);
    glLoadIdentity;
}

This routine first gathers the window width and height from the configuration hash. It then switches to the projection matrix (GL_PROJECTION) and restores the identity state before calling gluOrtho2D to create an orthographic projection matching the window dimensions. Finally, it switches back to the modelview matrix (GL_MODELVIEW) and restores its identity state as well. The frame rate now renders at the intended spot near the lower-left corner (Figure 4).

frame rate in the right spot
Figure 4. The frame rate in the correct position

There is another more subtle rendering problem, however, which you can see by moving the viewpoint forward a bit (Figure 5).

frame rate depth problems
Figure 5. Frame rate depth problems

Notice how the "5" is partially cut off. The problem is that OpenGL compares the depth of the pixels in the thin yellow box to the depth of the pixels in the frame rate display, and finds that some of the pixels in the 5 are farther away than the pixels in the box. In effect, part of the 5 draws inside the box. In fact, moving the viewpoint slightly to the left from this point will make the frame rate disappear altogether, hidden by the near face of the yellow box.

That's not very good behavior from a statistics display that should appear to hover in front of the scene. The solution is to turn off OpenGL's depth testing, using a new line at the end of set_projection_2d:

glDisable(GL_DEPTH_TEST);

With this change, you can move the view anywhere without fear that the frame rate will be cut off or disappear entirely (Figure 6).

position-independent frame rate
Figure 6. Position-independent frame rate

Too Fast

There's yet another problem; this time, one that will require a change to the frame rate calculations. The frame rate shown in the above screenshots is either 333 or 500, but nothing else. On this system, the frames take between two and three milliseconds to render, but because SDL can only provide one-millisecond resolution, the time delta for a single frame will appear to be exactly either .002 second or .003 second. 1/.002=500, and 1/.003=333, so the display is a blur, flashing back and forth between the two possible values.

To get a more representative (and easier-to-read) value, the code must average frame rate over a number of frames. Doing this will allow the total measured time to be long enough to drown out the resolution deficiency of SDL's clock.

The first thing I needed was a routine to initialize the frame rate data to carry over multiple frames:

sub init_fps
{
    my $self = shift;

    $self->{stats}{fps}{cur_fps}    = 0;
    $self->{stats}{fps}{last_frame} = 0;
    $self->{stats}{fps}{last_time}  = $self->{world}{time};
}

The new stats structure in the engine object will hold any statistics that the engine gathers about itself. To calculate FPS, the engine needs to remember the last frame for which it took a timestamp, as well as the timestamp for that frame. Because the engine calculates the frame rate only every few frames, it also saves the last calculated FPS value so that it can render it as needed. The init_fps call, as usual, goes at the end of init:

$self->init_fps;

The new update_fps routine now calculates the frame rate:

sub update_fps
{
    my $self      = shift;

    my $frame     = $self->{state}{frame};
    my $time      = $self->{world}{time};

    my $d_frames  = $frame - $self->{stats}{fps}{last_frame};
    my $d_time    = $time  - $self->{stats}{fps}{last_time};
    $d_time     ||= 0.001;

    if ($d_time >= .2) {
        $self->{stats}{fps}{last_frame} = $frame;
        $self->{stats}{fps}{last_time}  = $time;
        $self->{stats}{fps}{cur_fps}    = int($d_frames / $d_time);
    }
}

update_fps starts by gathering the current frame number and timestamp, and calculating the deltas from the saved values. Again, $d_time must default to 0.001 second to avoid possible divide-by-zero errors later on.

The if statement checks to see if enough time has gone by to result in a reasonably accurate frame rate calculation. If so, it sets the last frame number and timestamp to the current values and the current frame rate to $d_frames / $d_time.

The update_fps call must occur early in the main_loop, but after the engine has determined the new frame number and timestamp. main_loop now looks like this:

sub main_loop
{
    my $self = shift;

    while (not $self->{state}{done}) {
        $self->{state}{frame}++;
        $self->update_time;
        $self->update_fps;
        $self->do_events;
        $self->update_view;
        $self->do_frame;
    }
}

The final change needed to enable the new more accurate display is in draw_fps; the $d_time lookup goes away and the $fps calculation turns into a simple retrieval of the current value from the stats structure:

my $fps  = $self->{stats}{fps}{cur_fps};

The more accurate calculation now makes it easy to see the difference between the frame rate for a simple view (Figure 7):

frame rate for a simple view
Figure 7. Frame rate for a simple view

and the frame rate for a more complex view (Figure 8).

frame rate for a complex view
Figure 8. Frame rate for a complex view

Is the New Display a Bottleneck?

The last thing to do is to check that the shiny new frame rate display is not itself a major bottleneck. The easiest way to do that is to turn benchmark mode back on in init_conf:

    benchmark =& 1,

After doing that, I ran the engine under dprofpp again, and then analyzed the results, just as I had earlier:

$ dprofpp -Q -p step075

Done.
$ dprofpp -I -g main::main_loop
Total Elapsed Time = 3.943764 Seconds
  User+System Time = 1.063773 Seconds
Inclusive Times
%Time ExclSec CumulS #Calls sec/call Csec/c  Name
 100.       -  1.064      1        - 1.0638  main::main_loop
 94.6   0.006  1.007    384   0.0000 0.0026  main::do_frame
 85.2   0.019  0.907    384   0.0000 0.0024  main::draw_frame
 50.7   0.205  0.540    384   0.0005 0.0014  main::draw_view
 16.8   0.073  0.179    384   0.0002 0.0005  main::draw_fps
 15.4   0.095  0.164    384   0.0002 0.0004  main::set_projection_2d
 11.6   0.045  0.124    384   0.0001 0.0003  main::draw_axes
 10.9   0.116  0.116   2688   0.0000 0.0000  SDL::OpenGL::CallList
 8.74   0.013  0.093    384   0.0000 0.0002  main::end_frame
 7.52   0.003  0.080    384   0.0000 0.0002  SDL::App::sync
 7.24   0.077  0.077    384   0.0002 0.0002  SDL::GLSwapBuffers
 4.89   0.052  0.052   3072   0.0000 0.0000  SDL::OpenGL::PopMatrix
 4.70   0.023  0.050    384   0.0001 0.0001  main::update_view
 3.67   0.039  0.039   3456   0.0000 0.0000  SDL::OpenGL::GL_LIGHTING
 3.48   0.037  0.037    384   0.0001 0.0001  SDL::OpenGL::Begin

As it currently stands, draw_view takes half of the run time of main_loop, and the combination of set_projection_2d and draw_fps takes about a third of the main_loop time together. Is that good or bad news?

draw_view is so quick now because I've just optimized it. Now that it's running so fast again, I can afford to add more features and perhaps make a more complex scene, either of which will make draw_view take a larger percentage of the time again. Also, set_projection_2d is necessary for any in-window statistics, debugging, or HUD (heads up display) anyway, so the time spent there will not go to waste.

That leaves draw_fps, taking about one sixth of main_loop's run time. That's perhaps a bit larger than I'd like, but not large enough to warrant additional effort yet. I'll save my energy for the next set of features.

Conclusion

During this article, I covered several concepts relating to engine performance: adding a benchmark mode; profiling with dprofpp; using display lists to optimize slow, repetitive rendering tasks; and using display lists, bitmapped fonts, and averaging to produce a smooth frame rate display. I also added a stub for a triggered events subsystem, which I'll come back to in a future article.

With these performance improvements, the engine is ready for the next new feature, textured surfaces, which will be the main topic for the next article.

Until then, enjoy yourself and have fun hacking!

Building a 3D Engine in Perl, Part 3


This article is the third in a series aimed at building a full 3D engine in Perl. The first article started with basic program structure and worked up to displaying a simple depth-buffered scene in an OpenGL window. The second article followed with a discussion of time, view animation, SDL events, keyboard handling, and a nice chunk of refactoring.

Editor's note: see also the next article in the series, profiling your application.

Later in this article, I'll discuss movement of the view position, continue the refactoring work by cleaning up draw_view, and begin to improve the look of our scene using OpenGL lighting and materials. Before I cover that, your feedback to the previous articles has included a couple of common requests: screenshots and help with porting issues. If you're having problems running SDL_Perl and the sample code from these articles on your system, or might be able to help the Mac OS X and Win32 readers, take a look at the next section. Otherwise, skip down to the Screenshots section, where the main article begins.

Known Porting Issues

General

Some versions of SDL_Perl require that the program load SDL::Constants to recognize SDL_QUIT and other constants. As this change should be transparent to other users, I have merged that into the latest version of the sample code, retroactive to the first use of an SDL constant.

FreeBSD

See the suggestions at the beginning of the second article.

Mac OS X

I spent some time research the porting issues on Mac OS X but am as yet unable to figure out a simple procedure for building SDL_Perl from scratch. Recent emails on the sdl-devel mailing list seem to indicate that Mac OS X builds for recent SDL_Perl sources are problematic right now, but older releases seem to be even worse. There have been some packaging attempts in the past, but none that I have found so far install a fully configured set of SDL_Perl libraries into the system perl. I'm no Mac porting expert, so I appreciate any help on this; please post a comment in this month's article discussion if you have a suggestion or solution.

Slackware

According to comments by Federico (ironfede) in last month's article discussion, Slackware ships with a version of SDL_Perl that requires SDL::Constants. This is not an issue for the current version of the sample code, which I fixed as mentioned above in the General issues paragraph.

Win32

Win32 porting went as did Mac OS X porting. I was quite excited when chromatic pointed me to some old Win32 PPM packages, but sadly they don't include a working version of SDL::OpenGL. Building manually was "interesting" at best, as I have no access to a Microsoft compiler and precious little experience using gcc under Win32. As with the Mac folks, I appreciate any help from the readers. Please post a comment in this month's article discussion if you have a suggestion or solution for your fellows.

Screenshots

Thankfully, screenshots are much easier to handle than porting issues. I'd like the user to be able to take a screenshot whenever desired. The obvious way to accomplish that is to bind the screenshot action to a key; I chose function key F4 at random. First I added it to the bind hash:

        bind   => {
            escape => 'quit',
            f4     => 'screenshot',
            left   => '+yaw_left',
            right  => '+yaw_right',
            tab    => '+look_behind',
        }

The new key must have an action routine, so I altered that lookup hash as well:

    $self->{lookup}{command_action} = {
          quit         => \&action_quit,
          screenshot   => \&action_screenshot,
        '+yaw_left'    => \&action_move,
        '+yaw_right'   => \&action_move,
        '+look_behind' => \&action_move,
    };

I need to wait until drawing completes for the entire scene before I can take a snapshot, but event processing happens before drawing begins. To work around this, I set a state variable marking that the user has requested a screenshot, rather than perform the screenshot immediately:

sub action_screenshot
{
    my $self = shift;

    $self->{state}{need_screenshot} = 1;
}

The code checks this state variable in a new line at the end of end_frame, after the drawing has completed and it has synced the screen with the image written into OpenGL's color buffer:

sub end_frame
{
    my $self = shift;

    $self->{resource}{sdl_app}->sync;
    $self->screenshot if $self->{state}{need_screenshot};
}

The screenshot routine is surprisingly short but dense:

sub screenshot
{
    my $self = shift;

    my $file = 'screenshot.bmp';
    my $w    = $self->{conf}{width};
    my $h    = $self->{conf}{height};

    glReadBuffer(GL_FRONT);
    my $data = glReadPixels(0, 0, $w, $h, GL_BGR,
                            GL_UNSIGNED_BYTE);
    SDL::OpenGL::SaveBMP($file, $w, $h, 24, $data);

    $self->{state}{need_screenshot} = 0;
}

The routine starts by specifying a filename for the screenshot and gathering the width and height of the screen. The real work begins with the call to glReadBuffer. Depending on the OpenGL driver, the hardware, and a number of advanced settings, OpenGL may have provided several color buffers in which to draw and read images. In fact, the default behavior on most systems is to draw onto one buffer, known as the back buffer, and display a separate buffer, known as the front buffer. After completing the drawing for each frame, the SDL::App::sync call moves the image from the back buffer to the front buffer so the user can see it. Behind the scenes, OpenGL generally handles this in one of two different ways, depending on the underlying implementation. Software OpenGL implementations, such as Mesa, copy the data from the back buffer to the front buffer. Hardware-accelerated systems can swap internal pointers so that the back buffer becomes the front buffer and vice versa. As you can imagine, this is much faster.

This extra work brings a great benefit. Without double buffering, as soon as one frame completes, the next frame immediately clears the screen to black and starts drawing again from scratch. Depending on the relative speed difference between the user's monitor and the application, this would probably appear to the user as a flickering, dark, perpetually half-drawn scene. With double buffering, this problem is almost gone. The front buffer shows a solid stable image while all of the drawing is done on the back buffer. Once the drawing completes, it takes at most a few milliseconds to sync up and start displaying the new frame. To the human eye, the animation appears solid, bright, and (hopefully) smooth.

In this case, I want to make sure that I take a screenshot of exactly the same image the user sees, so I tell OpenGL that I want to read the image in the front buffer (GL_FRONT).

At this point, it's safe to read the image data into a Perl buffer in the proper format. The first four arguments to glReadPixels specify the lower-left corner and size of the sub-image to read. The next two arguments together tell OpenGL what format I would like for the data. I specify that I want to read the entire window and that I want the data in the correct format for a BMP file--one unsigned byte for each of the red, green, and blue color channels for each pixel, but in reverse order.

Once I have the data from OpenGL I use the SDL_Perl utility routine SaveBMP to save the image into a file. The arguments are the filename, image width, image height, color depth (24 bits per pixel), and data buffer. Finally, the routine resets the need_screenshot state flag and returns.

At this point you should be able to take a screenshot each time you press the F4 key. Of course, I'd like to show several screenshots during this article as the code progresses. The current code overwrites the previous screenshot file every time I request a new one. Because I number each runnable version of the code, I used a quick workaround resulting in a different screenshot filename for each code step. I first load one of the core Perl modules to strip directories from a path:

use File::Basename;

Then I use the filename of the script itself as part of my screenshot filename:

    my $file = basename($0) . '.bmp';

This may be all you need for your application, or you may want to add some code to number each file uniquely. This code is enough to fix my problem, so I've left the more powerful version as an exercise for the reader.

Here then is the first screenshot:

The observant reader will notice that this image is not a BMP file; it's a PNG image, which is both much smaller than a BMP and more friendly to web standards. There are many tools available that can perform this conversion. Any good image editor can do it. In this case that's overkill--I instead used the convert program from the ImageMagick suite of utilities:

convert step042.bmp step042.png

Moving the Viewpoint

That view is more than a tad overplayed. The user can't even move the viewpoint to see the back or sides of the scene. It's time to change that. I started by defining some new key bindings:

        bind   => {
            escape => 'quit',
            f4     => 'screenshot',
            a      => '+move_left',
            d      => '+move_right',
            w      => '+move_forward',
            s      => '+move_back',
            left   => '+yaw_left',
            right  => '+yaw_right',
            tab    => '+look_behind',
        }

I then updated the command_action lookup hash to handle these as movement keys:

    $self->{lookup}{command_action} = {
          quit          => \&action_quit,
          screenshot    => \&action_screenshot,
        '+move_left'    => \&action_move,
        '+move_right'   => \&action_move,
        '+move_forward' => \&action_move,
        '+move_back'    => \&action_move,
        '+yaw_left'     => \&action_move,
        '+yaw_right'    => \&action_move,
        '+look_behind'  => \&action_move,
    };

init_view needs to initialize two more velocity components and matching deltas:

    $self->{world}{view} = {
        position    => [6, 2, 10],
        orientation => [0, 0, 1, 0],
        d_yaw       => 0,
        v_yaw       => 0,
        v_forward   => 0,
        v_right     => 0,
        dv_yaw      => 0,
        dv_forward  => 0,
        dv_right    => 0,
    };

action_move needs a new movement speed to match the existing yaw speed and some additions to %move_update:

    my $speed_move       = 5;
    my %move_update      = (
        '+yaw_left'     => [dv_yaw     =>  $speed_yaw ],
        '+yaw_right'    => [dv_yaw     => -$speed_yaw ],
        '+move_right'   => [dv_right   =>  $speed_move],
        '+move_left'    => [dv_right   => -$speed_move],
        '+move_forward' => [dv_forward =>  $speed_move],
        '+move_back'    => [dv_forward => -$speed_move],
        '+look_behind'  => [d_yaw      =>  180        ],
    );

So far, the changes are mostly hash updates instead of procedural code; that's a good sign that the existing code design has some more life left. When conceptually simple changes require significant code modification, especially special cases or repetitive blocks of code, it's time to look for a refactoring opportunity. Thankfully, these changes are in initialization and configuration rather than special cases.

One routine that requires a good bit of new code is update_view. I added these lines to the end:

    $view->{v_right}        += $view->{dv_right};
    $view->{dv_right}        = 0;
    $view->{v_forward}      += $view->{dv_forward};
    $view->{dv_forward}      = 0;

    my $vx                   =  $view->{v_right};
    my $vz                   = -$view->{v_forward};
    $view->{position}[0]    += $vx * $d_time;
    $view->{position}[2]    += $vz * $d_time;

That routine is beginning to look a bit repetitious and has several copies of very similar lines of code, so it goes on the list of places to refactor in the future. There are not yet enough cases to make the best solution obvious, so I'll hold off for a bit.

The new code starts by applying the new velocity deltas in the same way that it updates v_yaw earlier in the routine. It converts the right and forward velocities to velocities along the world axes by noting that the view starts out with "forward" parallel to the negative Z axis and "right" parallel to the positive X axis. It then multiplies the X and Z velocities by the time delta to arrive at a position change, which it adds into the current view position.

This version of the code works fine as long as the user doesn't rotate the view. When the view rotates, "forward" and "right" don't match the new view directions. They still point down the -Z and +X axes respectively, which can prove very disorienting for high rotations. The solution is a bit of trigonometry. The idea is treat the initial X and Z velocities as components of the total velocity vector, and rotate that vector through the same angle that the user rotated the view:

    my $vx                   =  $view->{v_right};
    my $vz                   = -$view->{v_forward};
    my $angle                = $view->{orientation}[0];
    ($vx, $vz)               = rotate_xz($angle, $vx, $vz);
    $view->{position}[0]    += $vx * $d_time;
    $view->{position}[2]    += $vz * $d_time;

The two middle lines are the new ones. They call rotate_xz to do the vector rotation work and then set $vx and $vz to the returned components of the rotated velocity vector. rotate_xz is:

sub rotate_xz
{
    my ($angle, $x, $z) = @_;

    my $radians = $angle * PI / 180;
    my $cos     = cos($radians);
    my $sin     = sin($radians);
    my $rot_x   =  $cos * $x + $sin * $z;
    my $rot_z   = -$sin * $x + $cos * $z;

    return ($rot_x, $rot_z);
}

After converting the angle from degrees to radians, the code calculates and saves the sine and cosine of the angle. It then calculates the rotated velocity components given the original unrotated components. Finally, it returns the rotated components to the caller.

I'll skip the derivation here (you're welcome), but if you're curious about how and why this calculation performs a rotation, there are numerous books that explain the wonders of vector mathematics in amazing detail. O'Reilly's Physics for Game Developers, by David M. Bourg, includes a high-level discussion of rotation. Charles River Media's Mathematics for 3D Game Programming & Computer Graphics, by Eric Lengyel, includes a deeper discussion though I, for one, have college math flashbacks every time I read it. Speaking of which, any college textbook on linear algebra should include as much detail as you desire.

This code requires a definition for PI, provided by the following line near the top of the program, right after requesting warnings from Perl:

use constant PI => 4 * atan2(1, 1);

The constant module evaluates possibly complex calculations during the compile phase and then converts them into constants at runtime. The above calculation takes advantage of a standard trig identity to derive a value for PI accurate to as many digits as the system can deliver.

update_view now does the right thing, no matter what angle the view is facing. It doesn't take long to find a more interesting view:

Let There Be Lighting!

Okay, so maybe that's not much more interesting, admittedly. This scene needs a little mood lighting instead of the flat colors I've used so far (especially because they make it hard to see the shape of each object clearly). As a first step, I turned on OpenGL's lighting system with a new line at the end of prep_frame:

    glEnable(GL_LIGHTING);

Far from lighting the scene, the view is now almost black. If you look very carefully and your monitor and room lighting are forgiving, you should be able to just make out the objects, which are very dark gray on the black background. In order to see anything, I must enable both GL_LIGHTING and one or more lights to provide light to the scene. Without a light, the objects are dark gray instead of true black because OpenGL, by default, applies a very small amount of light to the entire scene, known as ambient light. To show the objects more brightly, I turned on the first OpenGL light with another new line at the end of prep_frame:

    glEnable(GL_LIGHT0);

Now the objects are brighter, but they're still just gray. When calculating colors with lighting enabled, OpenGL uses a completely different set of parameters from the colors used when lighting is disabled. Together these new parameters make up a material. Complex interactions between the parameters that make up a material can result in very interesting color effects, but in this case, I'm not trying to create a complex effect. I want my objects to have their old colors back without worrying about the full complexity that materials provide. Thankfully, OpenGL provides a way to state that the current material should default to the current color. To do this, I add yet another line to the end of prep_frame:

    glEnable(GL_COLOR_MATERIAL);

At this point, the objects once again have color, but each of the faces is still the same shade rather than appearing to be lit by a single light source somewhere. The problem is that OpenGL does not know whether each face points toward or away from the light and, if so, by how much. The angle between the face and the light determines how much light falls on the surface and, therefore, how bright it should appear. It is possible to calculate the angle of each face in my scene from the location of its vertices, but this is not always the right thing to do (especially when dealing with curved surfaces), so OpenGL does not calculate this internally. Instead, the program needs to do the direction calculations and tell OpenGL the result, known as the normal vector.

Luckily, in draw_cube the faces align with the coordinate axes so that each face points down one of them (positive or negative X, Y, or Z). I don't have to do any calculation here, just tell OpenGL which normal vector to associate with each face:

sub draw_cube
{
    # A simple cube
    my @indices = qw( 4 5 6 7   1 2 6 5   0 1 5 4
                      0 3 2 1   0 4 7 3   2 3 7 6 );
    my @vertices = ([-1, -1, -1], [ 1, -1, -1],
                    [ 1,  1, -1], [-1,  1, -1],
                    [-1, -1,  1], [ 1, -1,  1],
                    [ 1,  1,  1], [-1,  1,  1]);
    my @normals = ([0, 0,  1], [ 1, 0, 0], [0, -1, 0],
                   [0, 0, -1], [-1, 0, 0], [0,  1, 0]);

    glBegin(GL_QUADS);

    foreach my $face (0 .. 5) {
        my $normal = $normals[$face];
        glNormal(@$normal);

        foreach my $vertex (0 .. 3) {
            my $index  = $indices[4 * $face + $vertex];
            my $coords = $vertices[$index];
            glVertex(@$coords);
        }
    }
    glEnd;
}

The new lines are the definition of the @normals array and the two lines at the top of the $face loop that select the correct normal for each face and pass it to OpenGL using glNormal.

The boxes are now shaded reasonably and it's clear that the light is coming from somewhere behind the viewer; the front faces are brighter than the sides. Unfortunately, the axes are now dark again:

I did not specify any normal for the axis lines because the concept doesn't make a whole lot of sense for lines or points. However, with lighting enabled, OpenGL needs a set of normals for every lit object, so it goes back to the current state and uses the most recently defined normal. For the very first frame this is the default normal, which happens to point towards the default first light, but for succeeding frames it will be the last normal set in draw_cube. The latter definitely does not point toward the light, and the axes end up dark.

I'd rather the axis lines didn't take part in lighting calculations at all and kept their original bright colors, regardless of any lighting (or lack thereof) in the scene. To do this, I removed the line that enables GL_LIGHTING in prep_frame and inserted two new lines near the top of draw_view:

sub draw_view
{
    glDisable(GL_LIGHTING);

    draw_axes();

    glEnable(GL_LIGHTING);

Now lighting is off before drawing the axis lines and back on afterward. The axis lines have bright colors again, but rotating the view exposes a new problem. When the view rotates, the direction of the light changes as well:

Because of the way that OpenGL calculates light position and direction, any lights defined before the view is set are fixed to the viewer like the light on a miner's helmet. To fix a light relative to the simulated world, define the light instead after setting the view. I removed the line enabling GL_LIGHT0 in prep_frame and moved it to the new routine set_world_lights:

sub set_world_lights
{
    glEnable(GL_LIGHT0);
}

I then updated draw_frame to call the new routine after setting the view:

sub draw_frame
{
    my $self = shift;

    $self->set_projection_3d;
    $self->set_view_3d;
    $self->set_world_lights;
    $self->draw_view;
}

Unfortunately, this doesn't work. OpenGL only updates its internal state with the light's position and direction when they change explicitly, not when the light is enabled or disabled. I've never set the light's parameters explicitly, so the original default still stands. This issue is easy to fix with another line in set_world_lights:

sub set_world_lights
{
    glLight(GL_LIGHT0, GL_POSITION, 0.0, 0.0, 1.0, 0.0);

    glEnable(GL_LIGHT0);
}

In one of the few OpenGL interface decisions that actively annoys me, the new line sets the direction of the light, not its position. OpenGL defines all lights as one of two types: directional or positional. OpenGL assumes directional lights are very far away so that anywhere in the scene the direction from the light to each object is effectively the same. Positional lights are nearer and OpenGL must calculate the direction from the light to every vertex of every object in the scene independently. As you can imagine, this is much slower, but produces more interesting lighting effects.

The key to choosing between these two types is the last parameter of the glLight call above. If this parameter is 0, the light is directional and the other three coordinates specify the direction from which the light comes. In this case, I've specified that the light should come from the +Z direction. If the last parameter is 1, then OpenGL makes the light positional and uses the other three coordinates to set the light's position within the scene. For now, I'll skip the gory details of what happens when a value other than 0 or 1 is used, but in short, the light will be positional and extra calculations determine the actual position used. Most of the time it's best to ignore that case.

You may wonder why I explicitly specified 0.0 and 1.0 instead of 0 and 1. This is a workaround for a bug in glLight in some versions of SDL_Perl when it is presented with integer arguments instead of floating-point arguments.

With this line added, the light now stays fixed in the world, even when the user moves and rotates the view:

A Lantern

Of course, sometimes a light connected to the viewer is exactly the intention. For example, perhaps the desired effect is for the player to hold a lantern or flashlight to light dark places. Both of these are localized light sources that light nearby objects quite a bit, but distant objects only a little. The primary difference between them is that a flashlight and certain types of lanterns cast light primarily in one direction, often in a cone. Most lanterns, torches, and similar light sources cast light in all directions (barring shadows from handles, fuel tins, and the like).

Non-directed light is a little simpler to implement, so I'll start with lantern light. I wanted the light rooted at the viewer's position, so I defined the light before setting the view:

sub draw_frame
{
    my $self = shift;

    $self->set_projection_3d;
    $self->set_eye_lights;
    $self->set_view_3d;
    $self->set_world_lights;
    $self->draw_view;
}

I refer to viewer-fixed lights as eye lights because OpenGL refers to the coordinate system it uses for lights as eye coordinates, and a light defined this way as maintaining a particular position "relative to the eye." Here's set_eye_lights:

sub set_eye_lights
{
    glLight(GL_LIGHT1, GL_POSITION, 0.0, 0.0, 1.0, 0.0);

    glEnable(GL_LIGHT1);
}

Here I set the second light exactly the same way I set the first. Note that it doesn't matter that I actually define the second light in my program before the first. Each OpenGL light is independently numbered and always keeps the same number, rather than acting like a stack or queue numbered by order of use.

Sadly, the new code doesn't seem to have any effect at all. In reality, there really is a new light shining on the scene--unlike GL_LIGHT0, which defaults to shining bright white, all of the other lights default to black and provide no new light to the scene. The solution is to set another parameter of the light:

sub set_eye_lights
{
    glLight(GL_LIGHT1, GL_POSITION, 0.0, 0.0, 1.0, 0.0);
    glLight(GL_LIGHT1, GL_DIFFUSE,  1.0, 1.0, 1.0, 1.0);

    glEnable(GL_LIGHT1);
}

The front faces of each object should appear considerably brighter. Moving around the scene shows that the eye light brightens a surface only dimly lit by the world light:

If you watch carefully, however, you'll notice that the lighting varies by the view rotation--not position. I defined the light as directional with the light coming from behind the viewer, rather than positional, with the light coming from the viewer directly. I hinted at the fix earlier--changing the GL_POSITION parameter as follows:

    glLight(GL_LIGHT1, GL_POSITION, 0.0, 0.0, 0.0, 1.0);

The light now comes from (0, 0, 0) in eye coordinates, right at the viewpoint. Moving around and rotating shows that this version has the intended effect.

The simulated lantern still shines as brightly on far-away objects as it does on near ones. A real lantern's light falls off rapidly with distance from the lantern. OpenGL can do this with another setting:

sub set_eye_lights
{
    glLight(GL_LIGHT1, GL_POSITION, 0.0, 0.0, 0.0, 1.0);
    glLight(GL_LIGHT1, GL_DIFFUSE,  1.0, 1.0, 1.0, 1.0);
    glLight(GL_LIGHT1, GL_LINEAR_ATTENUATION, 0.5);

    glEnable(GL_LIGHT1);
}

This case tells OpenGL to include a dimming term in its equations proportional to the distance between the light and the object. Physics-minded readers will point out that physically accurate dimming is proportional to the square of the distance, and OpenGL does allow this using GL_QUADRATIC_ATTENUATION. However, a host of factors (including the lighting equations that OpenGL uses and the non-linear effects of the graphics hardware, monitor, and human eye) make this more accurate dimming look rather odd. Linear dimming turns out to look better in many cases, so that's what I used here. It is also possible to combine different dimming types, so that the dimming appears linear for nearby objects and quadratic for distant ones, which you may find a better tradeoff. The 0.5 setting tells OpenGL how strong the linear dimming effect should be for my scene.

Moving around the scene, you should be able to see the relatively subtle dimming effect in action. Don't be afraid to leave it subtle instead of turning the dimming effect way up. Some moods call for striking lighting effects, while others call for lighting effects that the viewer notices only subconsciously. In some visualization applications, lighting subtlety is a great virtue, allowing the human visual system's amazing processing power to come to grips with a complex scene without being overwhelmed.

A Flashlight

I really happen to like the way a flashlight casts its cone of light, so I converted the omnidirectional light of the lantern to a directed cone. OpenGL refers to this type of light as a spotlight and includes several light parameters to define them. The first change is a new setting in set_eye_lights:

    glLight(GL_LIGHT1, GL_SPOT_CUTOFF, 15.0);

This sets the angle between the center of the light beam and the edges of the cone. OpenGL accepts either 180 degrees (omnidirectional) or any value between 0 and 90 degrees (from a laser beam to a hemisphere of light). In this case, I chose a centerline-to-edge angle of 15 degrees, making a nice 30-degree-wide cone of light.

This change indeed limits the cone of light, but also reveals an ugly artifact. Move to a point just in front of the left front corner of the white cube and rotate the view to pan the light across the yellow box. You'll see the light jump nastily from corner to corner, even disappearing entirely in between. Even when a corner is lit, the shape of the light is not very conelike:

OpenGL's standard lighting model only performs the lighting calculations at each vertex, interpolating the results in between. For models that have many small faces and a resulting high density of vertices, this works relatively well. It breaks down nastily in scenes containing objects with large faces and few vertices, especially when a positional light is close to an object. Spotlights make the problem even more apparent, as they can easily shine between two vertices without lighting either of them; the polygon then appears uniformly dark.

Ode to Rush

Advanced OpenGL functionality paired with recent hardware can solve this problem with per-pixel lighting calculations. Older hardware can fake it with light maps and similar tricks. Rather than using advanced functionality, I'll use a simpler method for improving the lighting, known as subdivisions. (Those of you scratching your heads over the Rush reference can now breathe a collective sigh of relief.) Subdivisions have their own problems, as I'll show later, but those issues explain a lot about the design of graphics APIs, so they're worth a look.

As the name implies, the basic idea is to subdivide each face into many smaller faces, each with its own set of vertices. For curved objects such as spheres and cylinders, this is essential so that nearby objects appear to curve smoothly. For objects with large flat faces, such as boxes and pyramids, this merely has the side effect of forcing the per-vertex lighting calculations to be done many times across each face.

Before I can use subdivided faces, I need to prepare by refactoring draw_cube:

sub draw_cube
{
    # A simple cube
    my @indices = qw( 4 5 6 7   1 2 6 5   0 1 5 4
                      0 3 2 1   0 4 7 3   2 3 7 6 );
    my @vertices = ([-1, -1, -1], [ 1, -1, -1],
                    [ 1,  1, -1], [-1,  1, -1],
                    [-1, -1,  1], [ 1, -1,  1],
                    [ 1,  1,  1], [-1,  1,  1]);
    my @normals = ([0, 0,  1], [ 1, 0, 0], [0, -1, 0],
                   [0, 0, -1], [-1, 0, 0], [0,  1, 0]);

    foreach my $face (0 .. 5) {
        my $normal = $normals[$face];
        my @corners;

        foreach my $vertex (0 .. 3) {
            my $index  = $indices[4 * $face + $vertex];
            my $coords = $vertices[$index];
            push @corners, $coords;
        }
        draw_quad_face(normal    => $normal,
                       corners   => \@corners);
    }
}

Instead of performing the OpenGL calls directly in draw_cube, it now calls draw_quad_face. For each large face it creates a new @corners array filled with the vertex coordinates of the corners of that face. It then passes that array and the face normal to draw_quad_face, defined as follows:

sub draw_quad_face
{
    my %args    = @_;
    my $normal  = $args{normal};
    my $corners = $args{corners};

    glBegin(GL_QUADS);
    glNormal(@$normal);

    foreach my $coords (@$corners) {
        glVertex(@$coords);
    }
    glEnd;
}

This function performs exactly the OpenGL operations that draw_cube used to do. I've also used a different argument-passing style for this routine than I have previously. In this case, I pass named arguments because I know that I will add at least one more argument very soon and that there's a pretty good chance I'll want to add more later. When the arguments to a routine are likely to change over time, and especially when callers might want to specify only a few arguments and allow the rest to take on reasonable defaults, named arguments are usually a better choice. The arguments can either be a hashref or a list stuffed into a hash. This time, I chose the latter method.

After refactoring comes testing, and a quick run showed that everything worked as expected. Safe in that knowledge, I rewrote draw_quad_face to subdivide each face:

sub draw_quad_face
{
    my %args    = @_;
    my $normal  = $args{normal};
    my $corners = $args{corners};
    my $div     = $args{divisions} || 10;
    my ($a, $b, $c, $d) = @$corners;

    # NOTE: ASSUMES FACE IS A PARALLELOGRAM

    my $s_ab = calc_vector_step($a, $b, $div);
    my $s_ad = calc_vector_step($a, $d, $div);

    glNormal(@$normal);
    for my $strip (0 .. $div - 1) {
        my @v = ($a->[0] + $strip * $s_ab->[0],
                 $a->[1] + $strip * $s_ab->[1],
                 $a->[2] + $strip * $s_ab->[2]);

        glBegin(GL_QUAD_STRIP);
        for my $quad (0 .. $div) {
            glVertex(@v);
            glVertex($v[0] + $s_ab->[0],
                     $v[1] + $s_ab->[1],
                     $v[2] + $s_ab->[2]);

            $v[0] += $s_ad->[0];
            $v[1] += $s_ad->[1];
            $v[2] += $s_ad->[2];
        }
        glEnd;
    }
}

The new routine starts by adding the new optional argument divisions, which defaults to 10. This specifies how many subdivisions the face should have both "down" and "across"; the actual number of sub-faces is the square of this number. For the default 10 divisions, that comes to 100 sub-faces for each large face, so each cube has 600 sub-faces.

The next line labels the corners in counterclockwise order. This puts corner A diagonally across from corner C, with B on one side and D on the other.

As the comment on the next line indicates, I've simplified the math considerably by assuming that the face is at least a parallelogram. With this simplification, I can calculate the steps for one division along sides AB and AD and use these steps to position every sub-face across the entire large face.

I can't just calculate the step as a simple distance to move, because I have no idea which direction each edge is pointing and wouldn't know which way to move for each step. Instead, I calculate the vector difference between the vertices at each end of the edge and divide that by the number of divisions. The code does the same calculation twice, so I've extracted it into a separate routine:

sub calc_vector_step
{
    my ($v1, $v2, $div) = @_;

    return [($v2->[0] - $v1->[0]) / $div,
            ($v2->[1] - $v1->[1]) / $div,
            ($v2->[2] - $v1->[2]) / $div];
}

Returning to draw_quad_face, it stores the vector steps in $s_ab (the step along the AB side) and $s_ad (the step along the AD side). Next it sets the current normal, which for a flat face remains the same across its entirety.

Finally, I can begin to define the sub-faces themselves. I've taken advantage of the OpenGL quad strip primitive to draw the sub-faces as a series of parallel strips extending from the AB edge to the CD edge. For each strip, I first need to calculate the location of its starting vertex. I know this is on the AB edge, so the code starts at A and adds an AB step for each completed strip. For the first strip, this puts the starting vertex at A. For the last strip, the starting vertex will be one step (one strip width) away from B. It initializes the current vertex @v with the starting vertex and will keep it updated as it moves along each strip.

It then begins a strip of quads with glBegin(GL_QUAD_STRIP). To define the strip, I've specified the locations of each pair of vertices across from each other along its length. For each pair, it uses the current vertex and a calculated vertex one step further along the AB direction. The code then moves the current vertex one step along the length of the strip (the AD direction). Once the strip is complete, it ends it with glEnd and loops again for the next strip.

All of this complexity makes quite a visual difference:

It's clear that the light has a definite shape to it, but the lighting is so jagged that it's distracting. One way to fix this is to increase the number of divisions, making smaller sub-faces. This requires a simple addition to the draw_quad_face call in draw_cube:

        draw_quad_face(normal    => $normal,
                       corners   => \@corners,
                       divisions => 30);

The result is quite a bit less jagged:

Unfortunately, the jaggies are smaller but still obviously there--and the closer the viewer is to an object the bigger they appear. There are also nine times as many sub-faces to draw (30/10 squared) and the program now runs considerably slower. If you're lucky enough to have a recent system with fast video hardware and don't notice the slowdown, use 100 or so for the number of divisions. You'll probably see it.

Softening the Edges

Clearly, increasing the number of subdivisions only goes so far to improve the rendering, while simultaneously costing dearly in performance. I'll try a different tack and go back to what I know about a flashlight. Most flashlights cast a beam that is brighter in the center than at the edge. (Some have a dark circle in the very center, but I'm ignoring that for now.) I can take advantage of this to create a more accurate image and also soften the large jaggies considerably. First, I backed out my change to the draw_quad_face call:

        draw_quad_face(normal    => $normal,
                       corners   => \@corners);

Then I changed one spotlight parameter for the flashlight in set_eye_lights and added another:

    glLight(GL_LIGHT1, GL_SPOT_CUTOFF,   30.0);
    glLight(GL_LIGHT1, GL_SPOT_EXPONENT, 80.0);

With the change to GL_SPOT_CUTOFF, I've widened the beam to twice its original angle. At the same time, I've told OpenGL to make it quite a bit dimmer at the edges using GL_SPOT_EXPONENT, hopefully hiding any jaggies. The new parameter has a somewhat confusing name that refers to the details of the equation that determines the strength of the off-center dimming effect. In a theme seen throughout the mathematics of computer graphics, the dimming is a function of the cosine of the angle between the center line and the vertex being lit. In fact, the dimming factor is the cosine raised to the exponent specified by GL_SPOT_EXPONENT. Why use the cosine of the angle? It turns out to be cheap to calculate--cheaper than calculating the angle itself--and also gives a nice smooth effect.

With luck, the new beam will appear about the same width to the eye as the old one:

Good enough. The image looks better without the massive performance strain of high subdivision levels.

Refactoring Drawing

There's still something not right, but it will take a few more objects in the scene to show it. draw_view is already a repetitive hardcoded mess and it's been on the "to be refactored" list for a while, so now seems a good time to clean it up before I add to the mess.

draw_view performs a series of transformations and state settings for each object drawn. I want to move to a more data-driven design, with each object in the simulated world represented by a data structure specifying the needed transformations and settings. Eventually, these structures may become full-fledged blessed objects, but I'll start simple for now.

I initialized the data structures in init_objects:

sub init_objects
{
    my $self = shift;

    my @objects = (
        {
            draw        => \&draw_axes,
        },
        {
            lit         => 1,
            color       => [ 1, 1,  1],
            position    => [12, 0, -4],
            scale       => [ 2, 2,  2],
            draw        => \&draw_cube,
        },
        {
            lit         => 1,
            color       => [ 1, 1, 0],
            position    => [ 4, 0, 0],
            orientation => [40, 0, 0, 1],
            scale       => [.2, 1, 2],
            draw        => \&draw_cube,
        },
    );

    $self->{world}{objects} = \@objects;
}

Each hash includes the arguments to the various transformations to apply to it, along with a reference to the routine that actually draws the object and a flag indicating whether the object should be subject to OpenGL lighting. The object array then becomes a new part of the world hash for easy access later.

I called this routine at the end of init as usual:

    $self->init_objects;

I also replaced draw_view with a version that interprets the data into a series of OpenGL calls:

sub draw_view
{
    my $self    = shift;

    my $objects = $self->{world}{objects};

    foreach my $o (@$objects) {
        $o->{lit} ? glEnable (GL_LIGHTING)
                  : glDisable(GL_LIGHTING);

        glColor(@{$o->{color}})        if $o->{color};

        glPushMatrix;

        glTranslate(@{$o->{position}}) if $o->{position};
        glRotate(@{$o->{orientation}}) if $o->{orientation};
        glScale(@{$o->{scale}})        if $o->{scale};

        $o->{draw}->();

        glPopMatrix;
    }
}

The new routine iterates over the world object array, performing each requested operation. It either skips or defaults any unspecified values. First up is the choice to enable or disable GL_LIGHTING, followed by setting the current color if requested. The code next checks for and applies the usual transformations and finally, calls the object draw routine.

For simplicity and robustness, I've unconditionally wrapped the transformations and draw routine in a matrix push/pop pair rather than trying to detect whether they need the push and pop. OpenGL implementations tend to be highly optimized with native code, and any detection I did would be Perl. Chances are good that such an "optimization" would instead slow things down. This way, my code stays cleaner and even a misbehaving draw routine that performed transformations internally without cleaning up afterwards will not affect the next object drawn.

A quick test showed that this refactored version still worked. Now I could add a few more objects to demonstrate the remaining lighting issue. I specified several more boxes programmatically by inserting a new loop before the end of init_objects:

    foreach my $num (1 .. 5) {
        my $scale =   $num * $num / 15;
        my $pos   = - $num * 2;
        push @objects, {
            lit         => 1,
            color       => [ 1, 1,  1],
            position    => [$pos, 2.5, 0],
            orientation => [30, 1, 0, 0],
            scale       => [1, 1, $scale],
            draw        => \&draw_cube,
        };
    }

    $self->{world}{objects} = \@objects;
}

For each box, just two parameters vary: position and Z scale. I chose the position to set each box next to the last, progressing along the -X axis. The scale is set so that the height and width of each box remains the same, but the depths vary from very shallow for the first box to fairly deep for the last.

The loop specifies five boxes in total and begins by calculating the X position and Z scaling (depth) for the current box. The next few lines simply create a new hash for the new box and push it onto the object array.

Finally, there was one last change--the bright world light overwhelms the problematic effect from the flashlight. This is an easy fix; I commented out the line that enables it:

sub set_world_lights
{
    glLight(GL_LIGHT0, GL_POSITION, 0.0, 0.0, 1.0, 0.0);

#     glEnable(GL_LIGHT0);
}

By panning to the left across the scene until the viewpoint is in front of the new boxes, the problem becomes obvious:

The brightness of the lighting varies immensely depending on the depth of the box! This rather unintuitive outcome is an unfortunate side effect of how OpenGL must handle normals. A normal specifies the direction of the surface associated with a vertex. If a rigid object rotates, its surfaces rotate, so all of its normals must rotate as well. OpenGL handles this by transforming normal coordinates as it would vertex coordinates. This runs into trouble with any transformations other than translation and rotation. OpenGL calculations assume that normals are normalized (have unit length). Scaling the normal breaks this assumption and results in the effect seen above.

To fix this, I told OpenGL that normals may not have unit length and that OpenGL must normalize them before other calculations are performed. This is not the default behavior because of the performance cost of normalizing each vector. An application that can ensure normals are always unit length after transformation can keep the default and run a little faster. I want to allow arbitrary scaling of objects, so I enabled automatic normalization with another line at the end of prep_frame:

    glEnable(GL_NORMALIZE);

That fixed the problem:

With that bug killed, I reenabled the world light by uncommenting the glEnable line in set_world_lights:

sub set_world_lights
{
    glLight(GL_LIGHT0, GL_POSITION, 0.0, 0.0, 1.0, 0.0);

    glEnable(GL_LIGHT0);
}

Conclusion

During this article I've moved pretty quickly, covering screenshots, movement of the viewpoint, the beginnings of lighting in OpenGL, and subdivided faces for the boxes. Along the way, I took the chance to refactor draw_view into a more data-driven design and made the scene a little more interesting.

Unfortunately, these new changes have slowed things down quite a bit. OpenGL has several features that can improve performance considerably. Next time, I'll talk about one of the most powerful of these: display lists. I'll also introduce basic font handling and run with the performance theme by adding an FPS display to the engine.

Until next time, have fun and keep hacking!

Visit the home of the Perl programming language: Perl.org

Sponsored by

Powered by Movable Type 5.02