Beginning PMCs

One of the best things about Parrot is that it’s not just for Perl implementors. Parrot 0.0.3 came with support for extensible data types that can be used to implement the types used in your favorite language. The mechanism by which these types are extensible is called the PMC.

The PMC, or Parrot Magic Cookie, type is a special data container for user-defined data types. Because these user-defined types are essentially implementations of a set of methods, we refer to them as PMC classes. Currently, the legal PMC classes are the PerlInt, PerlNum, PerlString, PerlArray, and PerlHash types. The PerlInt, PerlNum and PerlString data types combine to form the PerlScalar data type.

PMC registers, unlike the basic Integer, Number, and String registers, must be specially allocated with the new P0, PMCType instruction. Other operations like set P0,5 are handled by special functions that are implemented by the PMC class. The rest of this article is about how to create your own PMC class implementation, alongside the PerlInt and PerlHash data types.

For our example, we’re going to implement a simple queue data structure. Our queue will be a set of integers; the queue will grow when an integer is assigned to it, and will shrink when an integer is read from it. We’ll use the PerlInt class as a basis, so it may be helpful to look at some examples of operations that use it:

  new P0, PerlInt  # Create a new PMC in the 'PerlInt' class
  set P0, 1234     # Set the value of the PMC to 1234
  set P0, "4567"   # Set the value of the PMC to 4567
  set P0, 12.34    # Set the value of the PMC to 12
  set I0, P0       # Set I0 to the current value of the PMC
  print P0
  print "n"

Note that no special instructions like set_string or set_float were required to assign data of different types to the PMC. Each instruction does the Right Thing given the initial type of the PMC. This has several important consequences when designing new data types, the largest of which is that it generally isn’t necessary to add special instructions to access data contained within a PMC.

On the other side, this means that PMCs should attempt to behave rationally in all situations. It’s not an onerous requirement, but in some cases, rational behavior is hard to define. Queues are fairly simple to define though, in terms of behavior. A queue has one way to get data in and one way to get data out.

Since we can use one instruction for multiple classes, we’ll use set Pn,In to add an integer to the queue, and set In,Pn to get an element out of the queue. The last operation we need to perform on a queue is to determine whether the queue is empty. The PerlArray class uses set In,Pn to return the length of the array into In, but we’ve already decided to use that to get an integer out of the queue.

Instead of set In,Pn to determine how many elements are in the queue, all we really need to know is whether the queue is empty or in use. For that, we can use the handy boolean operator, if Pn,In. Here, the integer register is actually the number of instructions to skip over if the condition is true. We’ll have it branch if the queue is empty.

So, our IntQueue data type will implement three instructions. First, the set Pn,In instruction will add an integer to the queue. Second, when the queue is empty, if Pn,In will branch to the appropriate offset. Finally, set In,Pn will dequeue the last integer in the queue and place it into the appropriate integer register.

Some sample source using IntQueue may come in handy at this time:

  new P0, IntQueue   # Create the queue
  set P0, 7          # Enqueue 7
  set P0, -43        # Enqueue -43
  set I0, P0         # Dequeue 7
  print I0           # Should print '7'.
  if P0, QUEUE_emPTY # Goto label 'QUEUE_emPTY'

Core Operations

Before forging ahead with the IntQueue, let’s take a look at the core operations file. Within your CVS tarball, open parrot/core.ops and search for the set operations. While there are files such as parrot/core_ops.c and parrot/Parrot/OpLib/core_ops.pm, this is the master file. Changes in parrot/core_ops.c will be overwritten the next time you build, so make your edits to parrot/core.ops.

Having said that, let’s look at a sample PMC operation.

  inline op set(out PMC, in NUM) {
    $1->vtable->set_number(interpreter,$1,$2);
    goto NEXT();
  }

Since core.ops is split into a Perl and C source file, the syntax is, of necessity, a mixture of Perl and C. The ‘inline’ declaration is a hint to the JIT compiler, which is beyond the scope of the article. Parameters also have hints for the JIT compiler, but the most important bits here are the PMC and NUM tags, because these let the compiler know what types this operation can take.

When preprocessing into Perl, the prototype is the only piece of interest, as the assembler only needs to know the name and parameter list in order to build the assembly code.

C preprocessing is a bit more complicated, but still fairly straightforward. Tokens like $2 are replaced with the appropriate code to access the declared parameter, and a few keywords like NEXT() are replaced with code to return the next instruction in the stream.

With the exception of those tags, the rest of the code is pure C, with access to all of the Parrot internals. Of course, you shouldn’t access such things as the register internals, but the rest of the C API is available, the most common APIs being located in parrot/include/parrot/string.h and parrot/include/parrot/key.h, the latter primarily being used for aggregate data structures.

The preprocessor, while slightly confusing, is much more flexible than the current system of nested CPP macros that Perl currently uses, and hopefully easier to understand.

Virtual Tables

The code above used a curious construct:

  $1->vtable->set_number(INTERP,$1,$2);

Parameter $1 is a PMC, and since these are user-defined types, the code simply can’t assign $2 to $1, as the non-PMC operations would do. Instead, each PMC has a table of function pointers assigned to it, and the interpreter calls the appropriate function.

For example, assuming that the P0 register is being initialized by the new P0,IntQueue instruction, the above code would run the set_number member of the IntQueue class. Since the type of P0 is decided on at runtime, the dispatch mechanism is completely independent of the parameter type. What this means in the case of the IntQueue type is that no modifications need to be made to the parrot/core.ops file.

Parrot Class Files

parrot/classes contains all of the PMC classes used by Parrot. Like the parrot/core.ops file, this too is preprocessed before final compilation, so all edits should be made to the parrot/classes/*.pmc files.

Creating a new class file from scratch is somewhat daunting, so we’ll use an existing class file to base IntQueue on. While IntQueue is an aggregate type like PerlHash, the interface matches PerlInt closest, in that it only deals with one element at a time.

Start by copying parrot/classes/PerlInt.pmc to parrot/classes/IntQueue.pmc, and replace all instances of PerlInt with IntQueue. There will be some additional C code necessary that will be available in the sample source at the end of the article, but not discussed beyond the API.

Registering the parrot/classes/IntQueue.pmc is done in two files. Add the appropriate lines to parrot/global_setup.c to initialize the new PMC type, and add the new vtable entry to parrot/include/parrot/pmc.h. This is only done in the case of types that are intended to be part of Parrot itself; when Parrot has the ability to dynamically load PMC classes at runtime, a more flexible mechanism will be derived for registering classes, but for now, we’ll pretend that IntQueue is going to be a core interpreter data type.

Within parrot/core.ops, the instructions the IntQueue type uses look like this:

  op new(out PMC, in INT) {
    PMC* newpmc;
    if ($2 <0 || $2 >= enum_class_max) {
      abort(); /* Deserve to lose */
    }
    newpmc = pmc_new(interpreter, $2);
    $1 = newpmc;
    goto NEXT();
  }

  inline op set(out PMC, in INT) {
    $1->vtable->set_integer_native(interpreter, $1, $2);
    goto NEXT();
  }

  inline op set(out INT, in PMC) {
    $1 = $2->vtable->get_integer(interpreter, $2);
    goto NEXT();
  }

  op if(in PMC, in INT) {
    if ($1->vtable->get_bool(interpreter, $1)) {
      goto OFFSET($2);
    }
    goto NEXT();
  }

Naturally, each of these call PMC vtable entries, and each one of these has to be implemented. As of this writing, the appropriate vtable entries as they are in parrot/classes/perlint.pmc look like this:

    void init () { /* This is called from pmc_new() */
        SELF->cache.int_val = 0;
    }

    void set_integer_native (INTVAL value) {
        SELF->cache.int_val = value;
    }

    INTVAL get_integer () {
        return SELF->cache.int_val;
    }

    BOOLVAL get_bool () {
        return pmc->cache.int_val != 0;
    }

Any code before the pmclass declaration in a parrot/classes/*.pmc file is literally copied into the C source, so we’ll use this area to store our data structures and APIs. In order to make matters simple, we’ll assume that the following API is available for our use:

  static CONTAINER* new_container ( void );
  static void enqueue ( CONTAINER* container, INTVAL value );
  static INTVAL dequeue ( CONTAINER* container );
  static INTVAL queue_length ( CONTAINER* container );

The API should be fairly straightforward to use. Initializing the container is done with new_container, which returns a pointer to our new queue data type. Adding a new queue element is done with enqueue, and deleting an element is done with dequeue. The queue’s length can be found with queue_length.

The CONTAINER data type has to be stored somewhere, and we look into parrot/include/parrot/pmc.h to find out where to store it. We find the definition of the PMC structure to be:

    struct PMC {
      VTABLE *vtable;
      INTVAL flags;
      DPOINTER *data;
      union {
        INTVAL int_val;
        FLOATVAL num_val;
        DPOINTER *struct_val;
      } cache;
      SYNC *synchronize;
    };

There are two areas we can store data: data is used as a general dumping ground for a data type’s internal data structures, and the cache union is used for fast access to simpler data structures. data is the right place to hang our CONTAINER structure.

Like most of the other files within Parrot, the IntQueue class is also preprocessed. The major preprocessing done here is to replace the SELF tag with a reference to the current PMC. In the rare case that you need a reference to the current interpreter, that tag is INTERP.

Initializing the IntQueue class is done with the init member. Since we’re storing our queue in the data, we’ll let the new_container function hand us a pointer to our new queue, and save that.

    void init () {
        SELF->data = new_container();
    }

Getting an integer out of the queue is done with the get_integer member. This isn’t meant to be production-quality, so we won’t worry about error checking. So, we’ll simply return the integer from the container.

    INTVAL get_integer () {
        return dequeue((CONTAINER*)SELF->data);
    }

Adding an integer to the queue is done with the set_integer_native member. We’ll simply use the enqueue function to place the integer onto the queue like so:

    void set_integer_native (INTVAL value) {
        enqueue(SELF->data,value);
    }

The final function we need to support is being able to determine whether the queue is empty, and we use the queue_length function for that. The PMC member function that does this is get_bool, and the code to access this is pretty straightforward:

    BOOLVAL get_bool () {
        return queue_length(SELF->data) != 0;
    }

This code has been checked in to the Parrot CVS, so feel free to look at the full version there. We’ve now walked through the major files needed to implement a Parrot Magic Cookie. Next time, we’ll explore the functions needed to implement aggregate data types like hashes and arrays, and learn about the new garbage collection system.

In the meantime, if you want to play with implementing your own data types for Parrot, then take a look at docs/vtables.pod in the Parrot source tree for more information about the members that you can implement and how to design your own classes from scratch.

Tags

Feedback

Something wrong with this article? Help us out by opening an issue or pull request on GitHub