Recently in Security Category

Elements of Access Control

Why Gates?

In a perfect world we wouldn't do things we should not. However the world is not like this; people do forbidden things sometimes. This also applies to computer systems used by more than one person. Almost everyone has tried to read someone else's email, view accounting department salary reports, or something else, or access otherwise hidden data.

I know you have never done this, but many people have.

In Construction

The simplest way to allow or forbid a user account to do something is to check if the account is in a list of permitted accounts somewhere. If you assume that everything is forbidden unless explicitly allowed, the access function can be as simple as:

  # access_check() return 1 or undef
  sub access_check
  {
    my $user_id     = shift;
    my @allow_users = @_;

    my %quick_allow = map { $_ => 1 } @allow_users;

    return $quick_allow{ $user_id };
  }

  my @allowed = ( 11, 12, 23, 45 );

  print "User 23 allowed\n" if access_check( 23, @allowed );
  print "User 13 allowed\n" if access_check( 13, @allowed );
  print "User 99 allowed\n" if access_check( 99, @allowed );

  # only "User 23 allowed" will be printed

Usually access control can be almost as simple as this function. Using user IDs for access control is simple, but tends to be hard to maintain. The problem appears with systems with many users or with public systems where new users may be created at any point. Access lists may become very large for each operation, which needs access controls.

One solution to this problem is access groups. Each user may be a member of several groups. The access check will pass if the user is a member of a group with permission for the required operation. This middle level in the access check isolates users from the access check directly. It also helps the system's design--you can associate preset access groups with all controlled operations at their point of creation. Subsequently created users only need to be attached to one or more of those groups:

  # mimic real system environment:
  # %ALL_USER_GROUPS represents "storage" that contains all
  # groups that each user is attached to
  my %ALL_USER_GROUPS = (
                    23 => [ qw( g1  g4 ) ],
                    13 => [ qw( g3  g5 ) ],
                    );
  # user 23 is in groups g1 and g4
  # user 13 -- in g3 and g5

  # return list of user's groups. read data from storage or
  # from %ALL_USER_GROUPS in this example
  sub get_user_groups
  {
    my $user_id     = shift;

    return @{ $ALL_USER_GROUPS{ $user_id } || [] };
  }

  # access_check_grp() return 1 or 0
  sub access_check_grp
  {
    my $user_id     = shift;
    my @allow_users = @_;

    my %quick_allow = map { $_ => 1 } @allow_users;

    my @user_groups = get_user_groups( $user_id );

    for my $group ( @user_groups )
    {
      # user groups is listed, allow
      return 1 if $quick_allow{ $group };
    }

    # user group not found, deny
    return 0;
  }

  # this groups list is static and will not be altered
  # when users are added or removed from the system
  my @allowed = qw( g1  g2  g7  g9 );

  print "User 23 allowed\n" if access_check_grp( 23, @allowed );
  print "User 13 allowed\n" if access_check_grp( 13, @allowed );
  print "User 99 allowed\n" if access_check_grp( 99, @allowed );

  # only "User 23 allowed" will be printed

Storage

Probably the most popular storage for system data nowadays is the SQL database. Here is a simple example of how to store users, groups, and mapping between them. Three tables are required:

  SQL CREATE statements:

  create table user  ( id integer primary key, name char(64), pass char(64) );
  create table group ( id integer primary key, name char(64) );
  create table map   ( user integer, group integer );

  TABLE USER:

   Column |     Type      | Modifiers
  --------+---------------+-----------
   id     | integer       | not null
   name   | character(64) |
   pass   | character(64) |

  TABLE GROUP:

   Column |     Type      | Modifiers
  --------+---------------+-----------
   id     | integer       | not null
   name   | character(64) |

  TABLE MAP:

   Column |  Type   | Modifiers
  --------+---------+-----------
   user   | integer |
   group  | integer |

Let's fill those tables with some data:

  letme=# select id, name from user;
   id |       name
  ----+------------------
    1 | Damian
    2 | Clive
    3 | Lana
  (3 rows)

  letme=# select * from group;
   id |       name
  ----+------------------
    1 | Admin
    2 | Users
    3 | Moderators
  (3 rows)

  letme=# select * from map;
   user | group
  -----+-----
     1 |   1
     1 |   2
     3 |   2
     3 |   3
     2 |   2
  (4 rows)

Users in this example are attached to those groups:

  Damian: Users, Admin
  Clive:  Users
  Lana:   Users, Moderators

Run-Time

Applications apply access control after user login. You can combine it with the login procedure--for example to allow only specific group of users to connect on weekends. Even so, the access check occurs only after the login succeeds, that is, when the username and password are correct.

A simple approach for loading required access info is:

  • Login, check username and password
  • For unsuccessful login, deny access, print message, etc.
  • For successful login, load group list for the user from database
  • Check for required group(s) for login

    This may deny login, print a message, or continue.

  • User logged in, continue

    All access checks for operations happen after this point.

The run-time storage for a user's groups can be simple hash. It can be either global or inside the user session object, depending on your system design. I've used a global hash here for simplicity of the examples, but if you copy and paste this code, remember that it is mandatory for you to clear and recreate this global hash for every request right after the login or user session changes! You can also use some kind of session object to drop all user data at the end of the session, but this is just an option, not the only correct or possible way.

(Also, a truly robust system would store a well-hashed version of the password, not the literal password, but that's a story for a different article.)

  #!/usr/bin/perl
  use strict;
  use DBI;
  use Data::Dumper;

  our $USER_NAME;
  our $USER_ID;
  our %USER_GROUPS;

  my $DBH = DBI->connect( "dbi:Pg:dbname=letme", "postgres", "",
      { AutoCommit => 0 } );

  # this is just an example!
  # username and password acquiring depends on the specific application
  user_login( 'Damian', 'secret4' );

  print "User logged in: $USER_NAME\n";
  print "User id:        $USER_ID\n";
  print "User groups:    " . join( ', ', keys %USER_GROUPS ) . "\n";

  sub user_login
  {
    my $user_name = shift;
    my $user_pass = shift;

    $USER_NAME   = undef;
    $USER_ID     = undef;
    %USER_GROUPS = ();

    # both name and password are required
    die "Empty user name"     if $user_name eq '';
    die "Empty user password" if $user_pass eq '';

    eval
    {
      my $ar = $DBH->selectcol_arrayref(
          'SELECT ID FROM USER WHERE NAME = ? AND PASS = ?',
                                        {},
                                        $user_name, $user_pass );

      $USER_ID   = shift @$ar;

      die "Wrong user name or password" unless $USER_ID > 0;

      $USER_NAME = $user_name;

      # loading groups
      my $ar = $DBH->selectcol_arrayref( 'SELECT GROUP FROM MAP WHERE USER = ?',
                                        {},
                                        $USER_ID );

      %USER_GROUPS = map { $_ => 1 } @$ar;
    };
    if( $@ )
    {
      # something failed, it is important to clear user data here
      $USER_NAME   = undef;
      $USER_ID     = undef;
      %USER_GROUPS = ();

      # propagate error
      die $@;
    }
  }

If Damian's password is correct, this code will print:

  User logged in: Damian
  User id:        1
  User groups:    1, 2

The group access check function now is even simpler:

  sub check_access
  {
    my $group = shift;
    return 0 unless $group > 0;
    return $USER_GROUPS{ $group };
  }

Sample code for an access check after login will be something like:

  sub edit_data
  {
    # require user to be in group 1 (admin) to edit data...
    die "Access denied" unless check_access( 1 );

    # user allowed, group 1 check successful
    ...
  }

or

  if( check_access( 1 ) )
  {
    # user ok
  }
  else
  {
    # access denied
  }

Access Instructions

The next problem is how to define which groups can perform specific operations. Where this information is static (most cases), you can store group lists in configuration (text) files:

  LOGIN: 2
  EDIT:  1

That is, the EDIT operation needs group 1 (admin) and LOGIN needs group 2 (all users).

Another example is to allow only administrators to log in during weekends:

  # all users for mon-fri
  LOGIN_WEEKDAYS: 2

  # only admin for sat-sun
  LOGIN_WEEKENDS: 1

Administrators will be in both groups (1, 2), so they will be able to log in anytime. All regular users cannot login on weekends.

This group list includes a moderators group. It could be useful to allow moderators do their job on weekends as well, implying an OR operation:

  # only admin or moderators for sat-sun
  LOGIN_WEEKENDS: 1, 3

This named set of groups is a policy.

For now, there's only one level in the policy and an OR operation between groups in a list. Real-world policies may be more complex. However there is no need to overdesign this. Even large systems may work with just one more level. Here's an AND operation:

  LOGIN_WEEKENDS: 1+3, 4, 1+5+9

This policy will match (allowing login on weekend days) only for users in the following groups:

     1 AND 3
  OR 4
  OR 1 AND 5 AND 9

The login procedure must match the LOGIN_WEEKENDS policy before allowing user to continue with other operations. Thus, you need a procedure for reading policy configuration files:

  our %ACCESS_POLICY;

  sub read_access_config
  {
    my $fn = shift; # config file name

    open( my $f, $fn );
    while( <$f> )
    {
      chomp;
      next unless /\S/; # skip whitespace
      next if  /^[;#]/; # skip comments

      die "Syntax error: $_\n" unless /^\s*(\S+?):\s*(.+)$/;
      my $n = uc $1; # policy name: LOGIN_WEEKENDS
      my $v =    $2; # groups lsit: 1+3, 4, 1+5+9

      # return list of lists:
      # outer list uses comma separator, inner lists use plus sign separator
      $ACCESS_POLICY{ $n } = access_policy_parse( $v );
    }
    close( $f );
  }

  sub access_policy_parse
  {
    my $policy = shift;
    return [ map { [ split /[\s\+]+/ ] } split /[\s,]+/, $policy ];
  }

For the LOGIN_WEEKENDS policy, the resulting value in %ACCESS_POLICY will be:

  $ACCESS_POLICY{ 'LOGIN_WEEKENDS' } =>

                [
                  [ '1', '3' ],
                  [ '4' ],
                  [ '1', '5', '9' ]
                ];

To match this policy, a user must be in every groups listed in any of the inner lists:

  sub check_policy
  {
    my $policy = shift;

    my $out_arr = $ACCESS_POLICY{ $policy };
    die "Invalid policy name; $policy\n" unless $out_arr;

    return check_policy_tree( $out_arr );
  }

  sub check_policy_tree
  {
    my $out_arr = shift;

    for my $in_arr ( @$out_arr )
    {

      my $c = 0; # matching groups count
      for my $group ( @$in_arr )
      {
        $c++ if $USER_GROUPS{ $group };
      }

      # matching groups is equal to all groups count in this list
      # policy match!
      return 1 if $c == @$in_arr;
    }

    # if this code is reached then policy didn't match
    return 0;
  }

The example cases will become:

  sub user_login
  {
      # login checks here
      ...

      # login ok, check weekday policy
      my $wday = (localtime())[6];

      my $policy;
      if( $wday == 0 or $wday == 6 )
      {
        $policy = 'LOGIN_WEEKEND';
      }
      else
      {
        $policy = 'LOGIN_WEEKDAY';
      }

      die "Login denied" unless check_policy( $policy );
  }

  sub edit_data
  {
    # require user to be in group 1 (admin) to edit data...
    die "Access denied" unless check_policy( 'EDIT' );

    # user allowed, 'EDIT' policy match
    ...
  }

Now you have all the parts of a working access control scheme:

  • Policy configuration syntax
  • Policy parser
  • User group storage and mapping
  • User group loading
  • Policy match function

This scheme may seem complete, but it lacks one thing.

Data Fences

In a multiuser system there is always some kind of ownership on the data stored in the database. This means that each user must see only those parts of the data that his user groups own.

This ownership problem solution is separate from the policy scheme. Each row must have one or more fields filled with groups that have access to the data. Any SQL statements for reading data must also check for this field:

  my $rg  = join ',', grep { $USER_GROUPS{ $_ } } keys %USER_GROUPS;
  my $ug  = join ',', grep { $USER_GROUPS{ $_ } } keys %USER_GROUPS;
  my $sql = "SELECT * FROM TABLE_NAME
             WHERE READ_GROUP IN ( $rg ) AND UPDATE_GROUP IN ( $ug )";

The result set will contain only rows with read and update groups inside the current user's group set. Sometimes you may need all of rows with the same read group for display, even though some of those rows have update restrictions the user does not meet. This case will use only the READ_GROUP field for select and will cut off users when they try to update the record without permission:

  my $rg  = join ',', grep { $USER_GROUPS{ $_ } } keys %USER_GROUPS;
  my $sql = "SELECT * FROM TABLE_NAME WHERE READ_GROUP IN ( $rg )";

  $sth = $dbh->prepare( $sql );
  $sth->execute();
  $hr = $sth->fetchrow_hashref();

  die "Edit access denied" unless check_access( $hr->{ 'UPDATE_GROUP' } );

When access checks are explicitly after SELECT statements it is possible to store full policy strings inside CHAR fields:

  $hr = $sth->fetchrow_hashref();

  die "Edit access denied" unless check_policy_record( $hr, 'UPDATE_GROUP' );

  sub check_policy_record
  {
      my $hr     = shift; # hash with record data
      my $field  = shift; # field containing policy string

      my $policy = $hr->{ $field };
      my $tree   = access_policy_parse( $policy );

      return check_policy_tree( $tree );
  }

In the Middle of Nowhere

This access control scheme is simple and usable as described. It does not cover all possible cases of access control, but every application has its own unique needs. In certain cases, you can push some of these access controls to lower levels -- your database, for example -- depending on your needs. Good luck with building your own great wall!

Preventing Cross-site Scripting Attacks

Introduction

The cross-site scripting attack is one of the most common, yet overlooked, security problems facing web developers today. A web site is vulnerable if it displays user-submitted content without checking for malicious script tags.

Luckily, Perl and mod_perl provide us with easy solutions to this problem. We highlight these built-in solutions and also a introduce a new mod_perl module: Apache::TaintRequest. This module helps you secure mod_perl applications by applying perl's powerful "tainting" rules to HTML output.

What is "Cross-Site Scripting"?

Lately the news has been full of reports on web site security lapses. Some recent headlines include the following grim items: Security problems open Microsoft's Wallet, Schwab financial site vulnerable to attack, or New hack poses threat to popular Web services. In all these cases the root problem was caused by a Cross-Site Scripting attack. Instead of targeting holes in your server's operating system or web server software, the attack works directly against the users of your site. It does this by tricking a user into submitting web scripting code (JavaScript, Jscript, etc.) to a dynamic form on the targeted web site. If the web site does not check for this scripting code it may pass it verbatim back to the user's browser where it can cause all kinds of damage.

Consider the following URL:

http://www.example.com/search.pl?text=<script>alert(document.cookie)</script>

If an attacker can get us to select a link like this,and the Web application does not validate input, then our browser will pop up an alert showing our current set of cookies.This particular example is harmless; an attacker can do much more damage, including stealing passwords, resetting your home page, or redirecting you to another Web site.

Even worse, you might not even need to select the link for this to happen. If the attacker can make your application display a chunk of html, you're in trouble. Both the IMG and IFRAME tags allow for a new URL to load when html is displayed. For example the following HTML chunk is sent by the BadTrans Worm. This worm uses the load-on-view feature provided by the IFRAME tag to infect systems running Outlook and Outlook Express.


  --====_ABC1234567890DEF_====
  Content-Type: multipart/alternative;
           boundary="====_ABC0987654321DEF_===="

  --====_ABC0987654321DEF_====
  Content-Type: text/html;
           charset="iso-8859-1"
  Content-Transfer-Encoding: quoted-printable


  <HTML><HEAD></HEAD><BODY bgColor=3D#ffffff>
  <iframe src=3Dcid:EA4DMGBP9p height=3D0 width=3D0>
  </iframe></BODY></HTML>
  --====_ABC0987654321DEF_====--

  --====_ABC1234567890DEF_====
  Content-Type: audio/x-wav;
           name="filename.ext.ext"
  Content-Transfer-Encoding: base64
  Content-ID: <EA4DMGBP9p>

This particular example results in executable code running on the target computer. The attacker could just as easily insert HTML using the URL format described earlier, like this:

<iframe src="http://www.example.com/search.pl?text=<script>alert(document.cookie)</script>">

The "cross-site" part of "cross-site scripting" comes into play when dealing with the web browser's internal restrictions on cookies. The JavaScript interpreter built into modern web browsers only allows the originating site to access it's own private cookies. By taking advantage of poorly coded scripts the attacker can bypass this restriction.

Any poorly coded script, written in Perl or otherwise, is a potential target. The key to solving cross-site scripting attacks is to never, ever trust data that comes from the web browser. Any input data should be considered guilty unless proven innocent.

Solutions

There are a number of ways of solving this problem for Perl and mod_perl systems. All are quite simple, and should be used everywhere there might be the potential for user submitted data to appear on the resulting web page.

Consider the following script search.pl. It is a simple CGI script that takes a given parameter named 'text' and prints it on the screen.


        #!/usr/bin/perl
        use CGI;

        my $cgi = CGI->new();
        my $text = $cgi->param('text');

        print $cgi->header();
        print "You entered $text";

This script is vulnerable to cross-site scripting attacks because it blindly prints out submitted form data. To rid ourselves of this vulnerability we can either perform input validation, or insure that user-submitted data is always HTML escaped before displaying it.

We can add input validation to our script by inserting the following line of code before any output. This code eliminates everything but letters, numbers, and spaces from the submitted input.


        $text =~ s/[^A-Za-z0-9 ]*/ /g;

This type of input validation can be quite a chore. Another solution involves escaping any HTML in the submitted data. We can do this by using the HTML::Entities module bundled in the libwww-perl CPAN distribution. The HTML::Entities module provides the function HTML::Entities::encode(). It encodes HTML characters as HTML entity references. For example, the character < is converted to &lt;, " is converted to &quot;, and so on. Here is a version of search.pl that uses this new feature.


        #!/usr/bin/perl
        use CGI;
        use HTML::Entities;

        my $cgi = CGI->new();
        my $text = $cgi->param('text');

        print $cgi->header();
        print "You entered ", HTML::Entities::encode($text);

Solutions for mod_perl

All of the previous solutions apply to the mod_perl programmer too. An Apache::Registry script or mod_perl handler can use the same techniques to eliminate cross-site scripting holes. For higher performance you may want to consider switching calls from HTML::Entities::encode() to mod_perl's much faster Apache::Util::escape_html() function. Here's an example of an Apache::Registry script equivilant to the preceding search.pl script.


        use Apache::Util;
        use Apache::Request;

        my $apr = Apache::Request->new(Apache->request);

        my $text = $apr->param('text');

        $r->content_type("text/html");
        $r->send_http_header;
        $r->print("You entered ", Apache::Util::html_encode($text));

After a while you may find that typing Apache::Util::html_encode() over and over becomes quite tedious, especially if you use input validation in some places, but not others. To simplify this situation consider using the Apache::TaintRequest module. This module is available from CPAN or from the mod_perl Developer's Cookbook web site.

Apache::TaintRequest automates the tedious process of HTML escaping data. It overrides the print mechanism in the mod_perl Apache module. The new print method tests each chunk of text for taintedness. If it is tainted the module assumes the worst and html-escapes it before printing.

Perl contains a set of built-in security checks know as taint mode. These checks protect you by insuring that tainted data that comes from somewhere outside your program is not used directly or indirectly to alter files, processes, or directories. Apache::TaintRequest extends this list of dangerous operations to include printing HTML to a web client. To untaint your data just process it with a regular expression. Tainting is the Perl web developer's most powerful defense against security problems. Consult the perlsec man page and use it for every web application you write.

To activate Apache::TaintRequest simply add the following directive to your httpd.conf.

       PerlTaintCheck on    

This activates taint mode for the entire mod_perl server.

The next thing we need to do modify our script or handler to use Apache::TaintRequest instead of Apache::Request. The preceding script might look like this:


        use Apache::TaintRequest;

        my $apr = Apache::TaintRequest->new(Apache->request);

        my $text = $apr->param('text');

        $r->content_type("text/html");
        $r->send_http_header;

        $r->print("You entered ", $text);

        $text =~ s/[^A-Za-z0-9 ]//;
        $r->print("You entered ", $text);

This script starts by storing the tainted form data 'text' in $text. If we print this data we will find that it is automatically HTML escaped. Next we do some input validation on the data. The following print statement does not result in any HTML escaping of data.

Tainting + Apache::Request.... Apache::TaintRequest

The implementation code for Apache::TaintRequest is quite simple. It's a subclass of the Apache::Request module, which provides the form field and output handling. We override the print method, because that is where we HTML escape the data. We also override the new method -- this is where we use Apache's TIEHANDLE interface to insure that output to STDOUT is processed by our print() routine.

Once we have output data we need to determine if it is tainted. This is where the Taint module (also available from CPAN) becomes useful. We use it in the print method to determine if a printable string is tainted and needs to be HTML escaped. If it is tainted we use the mod_perl function Apache::Util::html_escape() to escape the html.


package Apache::TaintRequest;

use strict;
use warnings;

use Apache;
use Apache::Util qw(escape_html);
use Taint qw(tainted);

$Apache::TaintRequest::VERSION = '0.10';
@Apache::TaintRequest::ISA = qw(Apache);

sub new {
  my ($class, $r) = @_;

  $r ||= Apache->request;

  tie *STDOUT, $class, $r;

  return tied *STDOUT;
}


sub print {
  my ($self, @data) = @_;

  foreach my $value (@data) {
    # Dereference scalar references.
    $value = $$value if ref $value eq 'SCALAR';

    # Escape any HTML content if the data is tainted.
    $value = escape_html($value) if tainted($value);
  }

  $self->SUPER::print(@data);
}

To finish off this module we just need the TIEHANDLE interface we specified in our new() method. The following code implements a TIEHANDLE and PRINT method.


sub TIEHANDLE {
  my ($class, $r) = @_;

  return bless { r => $r }, $class;
}

sub PRINT {
  shift->print(@_);
}

The end result is that Tainted data is escaped, and untainted data is passed unaltererd to the web client.

Conclusions

Cross-site scripting is a serious problem. The solutions, input validation and HTML escaping are simple but must be applied every single time. An application with a single overlooked form field is just as insecure as one that does no checking whatsoever.

To insure that we always check our data Apache::TaintRequest was developed. It builds upon Perl's powerful data tainting feature by automatically HTML escaping data that is not input validated when it is printed.

Resources

Symmetric Cryptography in Perl

Having purchased the $250 cookie recipe from Neiman-Marcus, Alice wants to send it to Bob, but keep it away from Eve, who snoops on everyone's network traffic from the cubicle down the hall.

How can Perl help her?

Ciphers

Cryptographic algorithms, or ciphers, offer Alice one way to protect her data. By encrypting the recipe before sending it over the network, she can render it useless to anyone but Bob, who alone possesses the secret information required to decrypt it.

Ciphers were once closely guarded secrets, but relying on the secrecy of an algorithm is a risky proposition. If your security were somehow compromised, adversaries could read all of your past messages, and (if you ever discovered the breach) you must find an entirely different algorithm to use in future.

Modern ciphers, usually publicly known and widely studied, rely on the secrecy of a key instead. They encrypt the same plaintext differently for each key; to decrypt a ciphertext, you must know the key used to produce it. New keys are easy to generate, so the compromise of a single key is a smaller problem. Although messages encrypted with the stolen key are rendered readable, the algorithm itself can be reused.

Algorithms that use the same key for both encryption and decryption are called symmetric ciphers. To use such an algorithm, Alice and Bob must agree on a key to use before they can exchange messages. Since decryption depends only on the knowledge of this key, they must ensure that they share the key by a secure channel that Eve cannot access (Alice could whisper the key into Bob's ear over dinner, for example).

Most well-known symmetric ciphers are block ciphers. The plaintext to be encrypted must be split into fixed-length blocks (usually 64 or 128 bits long) and fed to the cipher one at a time. The resulting blocks (of the same length) are concatenated to form the ciphertext.

The ciphers in widespread use today vary in strength, key length, block size and their approach to encrypting data. Some of the popular ciphers (IDEA, Twofish, Rijndael) are implemented by eponymous modules in the Crypt:: namespace on the CPAN (Crypt::IDEA and so on).

To decide which cipher to use for a particular application, one must consider the strength and speed required, and the computational resources available. The decision cannot be made without research, but IDEA is often considered the best practical choice for a general purpose cipher.

Keys

Symmetric ciphers usually use randomly generated keys (typically between 64 and 256 bits in length), and computers are notoriously bad at truly random number generation. Fortunately, many modern systems have some support for the generation of cryptographically secure random numbers, ranging from expensive hardware to device drivers that gather entropy from the timing delay between interrupts.

Crypt::Random, available from the CPAN, is a convenient interface to the /dev/random device on many Unix systems. Once installed, it is simple to use:

    use Crypt::Random qw( makerandom );
    $key = makerandom( Size => 128, Strength => 1);

For cryptographic key generation, the Strength parameter should always be 1. The Size in bits of the desired key depends on the cipher you want to use the key with. Typical symmetric key sizes range from 128 to 256 bits.

How Can I Use These in Perl?

The Crypt modules all support the same simple interface: new($key) creates a cipher object, and the encrypt() and decrypt() methods operate on single blocks of data. The responsibility for key generation and sharing, providing suitable blocks, and the transmission of the ciphertext, lies with the user. In the examples below, we will use the Crypt::Twofish module. Twofish is a free, unpatented 128-bit block cipher with a 128, 192, or 256-bit key.

    use Crypt::Twofish;
    # Create a new Crypt::Twofish object with the 128-bit key generated
    # above.
    $cipher = Crypt::Twofish->new($key);
    # Encrypt a block full of 0s...
    $ciphertext = $cipher->encrypt(pack "H*", "00"x16);
    # And then decrypt the result.
    print unpack "H*", $cipher->decrypt($ciphertext);

The implementation raises an important issue: What does one do with the second chunk of an 18-byte file? Twofish cannot operate on anything less than a 16-byte block, so padding must be added to the end of the last block to make it 16 bytes long. NULs (\000) are usually used to pad the block, but the value used doesn't matter, because the padding is removed after the ciphertext is decrypted.

Alice can now use this code to encrypt her recipe:

    # Assume that $key contains a previously-generated key, and that
    # PLAINTEXT and CIPHERTEXT are filehandles opened for reading and
    # writing respectively.
    $cipher = Crypt::Twofish->new($key);
    while (read(PLAINTEXT, $block, 16)) {
        $len   = length $block;
        $size += $len;
        # Add padding if necessary
        $block .= "\000"x(16-$len) if $len < 16;
        $ciphertext .= $cipher->encrypt($block);
    }
    # Record the size of the plaintext, so that the recipient knows how
    # much padding to remove.
    print CIPHERTEXT "$size\n";
    print CIPHERTEXT $ciphertext;

The output of this program can be safely sent across the network to Bob, perhaps as an e-mail attachment. Bob, having received the secret key by some other means, can then use the following code to decrypt the message:

    $cipher = Crypt::Twofish->new($key);
    $size = <CIPHERTEXT>;
    while (read(CIPHERTEXT, $ct, 16)) {
        $pt .= $cipher->decrypt($ct);
    }
    # Write only $size bytes of the output; ignore padding.
    print PLAINTEXT substr($pt, 0, $size);

This is really all we need for symmetric cryptography in Perl. Using a different cipher is simply a matter of installing another module and changing the ``Twofish'' above. From a cryptographic perspective, however, there are still some problems we must consider.

Cipher Modes

The code above uses the Twofish cipher in Electronic Code Book (ECB) mode, meaning that nth ciphertext block depends only on the key and the nth plaintext block. For a particular key, one could build an exhaustive table (or Code Book) of plaintext blocks and their ciphertext counterparts. Then, instead of actually encrypting the plaintext, one could simply look at the relevant entries in the table to find the ciphertext.

Because of the highly repetitive nature of most texts, plaintext blocks and their corresponding blocks in the ciphertext tend to be repeated quite often. Further, it is often possible to make informed guesses about parts of the plaintext (Eve knows, for example, that Alice's messages all have a long Tolkien quote in the signature).

Given enough patience and ciphertext, Eve can start to build a code book that maps ciphertext blocks to plaintext ones. Then, without knowing either the algorithm or the key, she could simply look up the relevant blocks in the intercepted ciphertext and write down large parts of the original plaintext!

Several new cipher modes have been invented to address this problem. One of the most generally useful ones is Cipher Block Chaining. CBC starts by generating a random block (called an Initializ,ation Vector, or IV) and encrypting it. The first plaintext block is XORed with the encrypted IV before being encrypted. Thereafter, each block is XORed with the ciphertext of the block preceding it, and then encrypted.

Here, each ciphertext block depends on the preceding ciphertext block, and the plaintext blocks so far. Thus, the blocks must be decrypted in order, and none of the patterns displayed by ECB are present. The IV itself does not need to be kept secret, and is usually transmitted with the ciphertext like $size above.

Decryption of the ciphertext proceeds in the opposite order. The first ciphertext block is decrypted and XORed with the IV to form the first plaintext block, and each ciphertext block thereafter is XORed with the previous one to form a plaintext block. Other modes are similar in intent, but vary in detail, including the way errors in transmission affect the ciphertext, and the amount of feedback or dependency on previous blocks.

Alice and Bob could alter their code to perform cipher block chaining, but the handy Crypt::CBC module can save them the trouble. The module, available from the CPAN, is used in conjunction with a symmetric cipher module (like Crypt::Twofish). It handles padding, IV generation and all other details. The user only needs to specify a key, and the data to be encrypted or decrypted.

Thus, Alice could just do:

    use Crypt::CBC;
    $cipher = new Crypt::CBC ($key, 'Twofish');
    undef $/; $plaintext = <PLAINTEXT>;
    print CIPHERTEXT $cipher->encrypt($plaintext);

And Bob could do:

    use Crypt::CBC;
    $cipher = new Crypt::CBC ($key, 'Twofish');
    undef $/; $ciphertext = <CIPHERTEXT>;
    print PLAINTEXT $cipher->decrypt($ciphertext);

Much simpler!


Asymmetric Cryptography

Asymmetric (or public-key) ciphers use a pair of mathematically related keys, and the algorithms are so designed that data encrypted with one half of the key pair can only be decrypted by the other. Bob can generate a key pair and keep one half secret, while publishing the other half. Alice can then encrypt the recipe with Bob's public key, knowing that it can only be decrypted with the secret half. Although this eliminates the need to share keys over a secure channel, it has its problems, too. For one, most public key encryption schemes require much longer keys (often 2048 bits or more) and are much slower.

The Crypt namespace contains modules for public key cryptography as well. Crypt::RSA is a portable implementation of the (now free) RSA algorithm, one of the most widely studied public-key encryption schemes. There are interfaces to various versions of PGP (Crypt::PGP2, Crypt::PGP5, Crypt::GPG), as well as implementations of public-key based signature algorithms (Crypt::DSA).


Cryptanalysis

Unfortunately, our implicit assumption that the ciphertext is useless to Eve is not always true. Depending on the information and resources that are available to her, she can try various means to retrieve the recipe. The simplest strategy is to try and guess the key Alice used. This is known as a brute-force attack, and involves repeatedly generating random keys and trying to decrypt the ciphertext with each one.

The effectiveness of this approach depends on the size of the key: the longer it is, the more possible keys there are, and the more guesses will be required, on average, to find the right one. Thus, the only possible defense is to use a key long enough to make a key search computationally impractical.

How long is a safe key? DES with 56-bit keys was recently cracked in a little less than a day, but the 128-bit keyspace (range of possible keys) is 4 * 10**21 times larger still. Although computing power is becoming cheaper, it seems likely that 128-bit keys will be safe from brute-force attacks for many years to come.

Of course, there are far more sophisticated attacks that they may be vulnerable to. As we saw in the description of ECB, cryptanalysts can often exploit patterns in the plaintext (long signatures, repeated phrases) or ciphertext (repeated blocks) to great advantage, or they may look for weaknesses (or exploit known ones) in the algorithm. Often, a combination of such techniques reduces the potential keyspace enough that a brute-force attack becomes practical.

Cryptanalysis and cryptographic techniques advanced hand-in-hand; new ciphers are designed to withstand old attacks, and newer attacks are attempted all the time. This makes it very important to stay abreast of current advances in cryptographic technology if you are serious about protecting your data for long periods of time.


Further Resources

google.com
The Great God Google knows enough about cryptography-related material to keep you occupied for a considerable amount of time.

perl-crypto-subscribe@perl.org
The perl-crypto mailing list, although not very active at the moment, is intended for discussion of all aspects (both user and developer level) of cryptography with Perl.

Crypt::Twofish
The Crypt::Twofish module, which has benefited from the inputs of several people, is a good example of how to write a portable cipher implementation.

Visit the home of the Perl programming language: Perl.org

Sponsored by

Powered by Movable Type 5.02