XS utility routines that are good to know

In the previous tutorial, we learned how to write our own functions in XS, how to process multiple arguments, and return different values, including undef.

In this tutorial I’m going to cover some useful routines for common cases you’ll encounter when programming in XS. One that you’ve already seen is SvOK which can tell you if a scalar is defined or not. Here are the new areas I’ll discuss:

  • Scheduling XS code to run at startup
  • Handling tied variables
  • Unicode tools

When writing XS code, these are things you’ll often want to be aware of, and know how to handle.

Module Code

As before, we’ll define the module code to load our XS. This is all that’s required:

package XS::Tutorial::Three;
require XSLoader;


That should be saved as lib/XS/Tutorial/Three.pm.

XS Code

The top of the XS file will look similar to the previous chapter:

#define PERL_NO_GET_CONTEXT // we'll define thread context if necessary (faster)
#include "EXTERN.h"         // globals/constant import locations
#include "perl.h"           // Perl symbols, structures and constants definition
#include "XSUB.h"           // xsubpp functions and macros

MODULE = XS::Tutorial::Three  PACKAGE = XS::Tutorial::Three

Remember to append any XS code after the PROTOTYPES line. This should be saved as lib/XS/Tutorial/Three.xs.

Scheduling XS code to run at startup

Sometimes you’ll need to run some code before your XS functions can work. For example, libpostal has startup routines which populate data structures that must be called before the library can be used.

You could code this in a “lazy” way, that is, inside the XS function, check to see if the init code has been run, and if not, run it before executing the rest of the function code.

However XS offers another way to do it by using the BOOT keyword. Any C code included below the keyword, will be executed during the startup process:

printf("We're starting up!\n");

The boot section is terminated by the first empty line after the keyword.

Handling tied variables

Tied variables are special variables that execute custom code when they are interacted with. But you never use them, so why worry about them? The thing is if you’re writing code to be used by others, you can’t be sure that a caller won’t pass a tied variable to one of your XS functions. And unlike regular Perl, XS does not execute tied code automatically.

XS does provide functions for working with tied variables though. One you’ll see in a lot of XS code is SvGETMAGIC. Imagine your function is passed a tied variable; it’s value will be undefined in XS, until you call mg_get (“magic get”) on it, which calls FETCH.

Unfortunately, mg_get can only be called on tied scalars so you don’t want to call it on a regular scalar. That’s where SvGETMAGIC comes in: if the scalar is tied, it will call mg_get, if not, nothing will happen.

Here’s how you might use it:

get_tied_value(SV *foo)

  /* call FETCH() if it's a tied variable to populate the sv */

This code declares an XS function called get_tied_value, which accepts a scalar variable, and calls SvGETMAGIC on it, returning the value, by pushing it onto the stack.


You might be wondering why functions dealing with tied variables are named “magic” or “mg”. The reason is that tied behavior for each variable is implemented via a pointer to a magic virtual table which is a structure containing function pointers to the tied behavior.

Often the Perl C API will provide mg (“magic”) and nomg (“non magic”) variants of functions, so you can decide if you want to trigger the tied behavior.

UTF-8 tools

Perl has loads of tools for managing UTF-8 encoded text, but with XS you’re working in C, which does not. Start thinking about basic types like char and common assumptions in C code, and you’ll realize that multibyte characters can wreak havoc unless you handle them correctly.

Fortunately, the Perl C API does provide functions for managing UTF-8 data that can help. Here are a couple of examples.

Perl scalars have a UTF-8 flag, which is turned on when the scalar contains decoded UTF-8 data. We can detect it with SvUTF8:

is_utf8(SV *foo)
  /* if the UTF-8 flag is set return 1 "true" */
  if (SvUTF8(foo)) {
  /* else return undef "false" */
  else {

This declares an XS function called is_utf8 which accepts a scalar and returns true if the UTF-8 flag is set, or false if it isn’t.

Imagine you have some C code that only works with ASCII text, that is, single byte characters. You can detect incoming scalars that have the UTF-8 flag turned on with SvUTF8, but what do you do about ones that have the flag?

You could croak immediately, throwing an exception. Or you could try to downgrade the scalar to be non UTF-8 as the string may be marked as UTF-8 but only contain ASCII compatible characters (decimal values 0-127).

is_downgradeable(SV *foo)
  /* if the UTF-8 flag is set and the scalar is not downgrade-able return
     undef */
  if (SvUTF8(foo) && !sv_utf8_downgrade(foo, TRUE)) {
  /* else return 1 */
  else {

This function returns false if the scalar contains data that is not downgrade-able to ASCII, otherwise it returns true. It does that by using the sv_utf8_downgrade function, which accepts the scalar and a boolean value indicating if it’s ok to fail. As the second argument is TRUE, the function simply returns false if the scalar is not downgrade-able (otherwise it would croak).


Cover image © Steve Buissinne


David Farrell

David is a professional programmer who regularly tweets and blogs about code and the art of programming.

Browse their articles


Something wrong with this article? Help us out by opening an issue or pull request on GitHub