Sign In/My Account | View Cart  
advertisement


Listen Print

Proxy Objects

Building Garbage Collected Circular References

by Matt Sergeant
August 07, 2002

In your time as a Perl programmer, it becomes almost inevitable that at some point you will have to manage in-memory tree structures of some sort. When you do this, it becomes important to be aware of how Perl manages memory, and when you might come up against a situation where Perl will not free its memory -- a situation that can happen easily ... as we'll see below.

In writing the XML::XPath module, (a library that implements some of the XML Document Object Model, or DOM) I came across a particular problem with Perl's memory-management mechanism. In this article, I will detail the problem and demonstrate a technique for building data structures that do not exhibit this problem.

The problem is circular references.

What Is a Circular Reference?

Circular references are simply self-referential data structures -- a complex data structure that at some point contains a reference to a part of itself further up in the hierarchy. They are a useful idiom when you need two parts of a structure to be able to refer to each other -- often we see them used in parent/child relationships, where a child object might need access to the methods in the parent:

figure 1

The case we're concerned with here is an XML tree, where we have a root node, and each node can have one or more children. In an XML DOM, the child nodes need the ability to refer back to their parents:

my $parent_name = $node->getParentNode()->getNodeName();

While it is often extremely useful to encode this type of relationship in your code, it is rather problematic for Perl.

Reference Counting

Before we see why circular references are problematic, first we need to understand how Perl's memory management works. In order to return memory for use by other parts of your program when it becomes available, Perl uses a technique called "reference counted garbage collection".

By using a count on each variable within Perl of the number of references to that variable (the number of other things in your program that refer to it), Perl's garbage collector can ensure timely destruction of lexical variables (those created with my). Each time the Perl interpreter sees something referencing that variable, it increments the variable's internal reference count. When the reference count goes down to zero, that variable's memory is freed, and its destructor is called.

You can see the reference count of a variable using the Devel::Peek module:

use Devel::Peek;
my $x = "Hello";
my $y = \$x;
Dump($x);

Which outputs:

SV = PV(0x804b85c) at 0x8057620
  REFCNT = 2
  FLAGS = (PADBUSY,PADMY,POK,pPOK)
  PV = 0x805d800 "Hello"\0
  CUR = 5
  LEN = 6

The important field for these purposes is the REFCNT field, which shows us there are two references to our variable - one for the main copy of the variable (the one we see as $x) and one for the reference that $y holds.

Reference Counting with Circular References

Reference counting works well until you need to build self-referencing data structures. Let's look again at the design of our XML DOM library, where children need access to their parent nodes. The DOM specification requires you to maintain a two-way relationship between the parent and the child nodes. This leads to our circular reference -- the parent holds a reference to the child, and the child holds a reference to the parent.

Circular References in Detail

Circular references occur when a variable either directly or indirectly refers to itself somehow. We can easily show this using hash references and Devel::Peek again:

use Devel::Peek;
for (1..1) { # enter scope here
  my %x = ();
  my %y = ();
  $x{child} = \%y;
  $y{parent} = \%x;
  Dump(\%x);
} # leave scope here

Now we can see that both variables have a refcount of 2 (the 3 for %x is because we have to take an extra reference in order to Dump() it):

...
SV = PVHV(0x8060b40) at 0x805702c
  REFCNT = 3               ### This is %x
  ...
  Elt "child" HASH = 0x77420b6
  SV = RV(0x8063008) at 0x804b494
    REFCNT = 1
    FLAGS = (ROK)
    RV = 0x8056fe4
    SV = PVHV(0x8060ba0) at 0x8056fe4
      REFCNT = 2           ### This is %y
      ...
      Elt "parent" HASH = 0xcc03940
      SV = RV(0x806300c) at 0x8056f84
        REFCNT = 1
        FLAGS = (ROK)
        RV = 0x805702c

Now problems arise. When %y goes out of scope, it cannot be completely garbage collected, because its reference count was 2, not 1. %x also cannot be garbage collected (if it could, then it would free the extra reference to %y) for the same reason. So we end up with two zombie variables, which can neither be garbage collected nor freed back to the system. This is why we get what appears to be memory leaks when we use circular references. Had we modified our for () loop to repeat more than once, we would see our memory usage steadily growing.

Rather than proving this by watching our memory grow though, we can demonstrate what his happening with objects. We can create a simple object and output something in its DESTROY method:

package CircObj;
sub new { bless {}, shift }
sub parent { my $self = shift; @_ ? $self->{parent} = 
    shift : $self->{parent} }
sub child { my $self = shift; @_ ? $self->{child} = 
    shift : $self->{child} }
sub DESTROY { warn("CircObj::DESTROY\n") }

Now a normal instance of this will output the destroy method as soon as its scope exits:

{
  my $x = CircObj->new();
  warn("Leaving scope\n");
}
warn("Scope left\n");

Results in the following output:

Leaving scope
CircObj::DESTROY
Scope left

However, if we fill in our circular references, then we get a different result:

for (1..1) {
  my $parent = CircObj->new;
  my $child = CircObj->new;
  $parent->child($child);
  $child->parent($parent);
  warn("Leaving scope\n");
}
warn("Scope left\n");

And we see the output:

Leaving scope
Scope left
CircObj::DESTROY
CircObj::DESTROY

What this means is our variables are only getting DESTROYed when the Perl interpreter does its global cleanup -- at the time our program exits, not in the timely manner we are used to with normal objects. This may not be too problematic for some scripts, but if you're running any sort of large program, or a long-running persistent interpreter like mod_perl, then you will see your memory steadily grow as you do this repeatedly.

A common way to "fix" this problem is to use a manual destructor -- a method or function call that breaks the circle somehow. In our original hashes example, this is as simple as adding delete $x{child} to our code (deleting $y{parent} is optional then, as the circle is already broken), and in the object example we can supply a destroy() method that the user can call before exiting the scope. But neither of those options is terribly user friendly, and I believe in letting people who use my modules to expect it to "just work".

Pages: 1, 2, 3

Next Pagearrow