Recently in Networking Applications Category

Build a Wireless Gateway with Perl

You have set up and configured various wireless access devices, but could not find one that included all of the features you needed. You could wait for a firmware upgrade from your manufacturer, hoping that they will include the features you want. However, your chances of finding all of your issues addressed by a new firmware package, if it ever comes out, are slim to none. Now is the time to roll up your sleeves and build your own wireless access gateway from scratch. Don't let this idea scare you; this is all possible thanks to the open source world.

This article introduces an open source project called AWLP (Alptekin's Wireless Linux Project), which turns a PC with an appropriate wireless LAN card (Prism2/2.5/3) into a full-featured, web-managed wireless access gateway. That old Pentium 120 machine in your basement might march back up the stairs shortly.

Building your own wireless access device is nothing new. Around three years ago, Jouni Malinen released HostAP, a Linux driver for wireless LAN cards with Intersil's Prism2/2.5/3 chipset. When operated in a special mode called Managed, the host computer acts as a wireless access point. HostAP does its job and does it well, but it is command line only, and that's not suitable for everyone. A complete solution that potentially competes with off-the-shelf devices needs more features, including DHCP, firewall, DNS, and NTP, along with a web interface for configuration. This is the de facto standard nowadays.

To use a car analogy, this is like building a custom car; you want to be able to put in a new clutch, change the suspension, or try a new braking system and see how it performs. In the case of the gateway, you might want to implement an outgoing port-based filter in addition to the incoming one in the firewall. You may want to try a different DNS server or develop a special module that will block MAC addresses through ACLs after exceeding a certain amount of bandwidth in a short period of time.

For AWLP, the chassis is GNU/Linux Slackware, the engine is Host AP Driver, the transmission is Host AP Utilities, and so on. AWLP code, written in Perl, is the part that makes all these components work together in harmony as a wireless access gateway. After you set up AWLP, you will have a functioning, preconfigured, wireless access gateway to start with. Then you can start modifying the Perl code and configuration files to test and implement the extra capabilities.

Hardware Requirements

You need a dedicated machine for this task. Running with less than 32 MB of RAM will be painful; the system is more RAM-intensive than it is CPU-intensive so you can run it even on a 486 processor--but be aware that most 486 machines have only ISA slots. You might have problems finding ISA 10/100 Mbps Ethernet cards and ISA wireless LAN cards on the market. An old Pentium machine with at least 1GB HDD is ideal for this task.

If you plan on running your wireless device on a 24/7 basis, and if availability and portability are major concerns, you might consider dedicated hardware. Check out OpenBrick Community for compact hardware platforms with HDD support.

In addition to the computer, you also need an Ethernet card compatible with Linux, a PCI or PCMCIA wireless LAN card that has a Prism2/2.5/3 chipset, and a PCI-to-PCMCIA converter, depending on your choice of wireless LAN card and the slots on your board. Refer to the AWLP Hardware Compatibility section for in-depth information on choosing these three components.

Installing AWLP

There are three phases of installation:

  1. Custom installing Slackware with tagfiles
  2. Upgrading packages in Slackware
  3. Installing AWLP codes and configuration files

The third step is the easiest one. The first step, installing Slackware with tagfiles provided by the AWLP package, will take most of your time and effort. You must have Slackware installation discs on hand. Consult the Slackware Linux Project site to obtain them. In order to keep this article readable, I again refer you to the related site, the step-by-step AWLP Installation Instructions. After completing the first and second phase of the installation instructions, you can install AWLP, the code, and the configuration files that will make everything work together.

I highly recommend that you take a disk image after successfully installing AWLP and before you start to modify the code if you have installed it on a slow machine. Installing AWLP code and configuration files take no more than couple of minutes, but in order to reach this step, you must have custom-installed the Slackware with the tagfiles provided, and this might take a couple of hours on an old (<200MHz) Pentium machine. Taking a disk image and restoring from that image when needed will be a lot easier and quicker than starting afresh from Step 1.

Under the Hood

AWLP uses several configuration files:

  • DHCP, the Dynamic Host Configuration Protocol: /etc/dhcpd.conf
  • Apache Web Server: /etc/apache/httpd.conf
  • DNS, the Domain Name System: /etc/named.conf, /var/named/caching-example/localhost.zone, /var/named/caching-example/named.ca, and /var/named/caching-example/named.local
  • NTP, the Network Time Protocol: /etc/ntp.conf
  • Firewalling: /etc/rc.d/rc.firewall

In addition to these standard configuration files, there are some custom configuration files such as /etc/awlp/oui_filtered.txt. This file contains the filtered version of oui.txt, representing the IEEE Organizationally Unique Identifiers. It makes it possible to find the company manufacturing a specific Ethernet card, wired or wireless, using the first 24 bits of MAC address. In order to have an in-depth knowledge of all the configuration files and their layouts, I encourage you to examine installer.sh from the AWLP package.

AWLP code resides in /var/www/cgi-bin/awlp. The core of the AWLP code is index.pl. It does all of the error checking, manipulation, and modification. The web server, Apache 1.3.33, runs as the user and group apache. In order to manipulate configuration files through the web browser, the related configuration files have suid and guid permissions set. This strategy is definitely more secure than running the web server as root.

The part that interacts with HostAP command line tools and utilities, along with standard Linux tools and utilities, is engines1.pl. It serves as an include file and contains various subroutines to do the work. engines2.pl also contains various subroutines, but these usually do sorting, searching, and conversion. radar.pl provides status checking and monitoring functionality. It's not crucial to the operating of the wireless access gateway, but it definitely does add value because watching and monitoring how your device is performing is the key factor to the success of your implementation.

extras.pl provides ASCII to HEX conversion, DHCP lease table display, and several other non-core functions. As the name implies, error_messages.pl has all the error messages and their descriptions. global_configuration.pl has pretty much all the configuration variables ranging from critical to non-critical. You must understand the inner workings of the system in order to change the configuration variables up to $MAIN_TITLE. $MAIN_TITLE and the following variables are also necessary, but for mostly cosmetic purposes, so you can customize them without worrying about accidentally disabling needed features.

Sample Modification

Now you know what AWLP is and what it can do. What about all these modifications and customizations? Now, it is time to show you a step-by-step instruction for adding a feature to AWLP. As of AWLP 1.0, you can configure the DHCP server only by manually changing /etc/dhcpd.conf. The feature I want to demonstrate will simply show the contents of the file. Once you have familiarity with the code, you can improve it to modify the contents of dhcpd.conf.

The Apache web server runs as the user and group apache. The result of ls -l on /etc/dhcpd.conf shows:

# ls -l /etc/dhcpd.conf
-rwxrwx---  1 root apache 618 Mar 5 12:41 /etc/dhcpd.conf*

The script will be able to read and modify the /etc/dhcpd.conf file. Open the index.pl file. At the top, there are two configuration variables; @MainPageLinksAction and @MainPageLinksName. These two control the links on the left side. To add an additional link for DHCP, add DHCP to both arrays.

Next, find the line that says:

elsif ($FORM_Action1 eq "Administration") {

Just before this line, add the following lines of code, then save and close the file.

elsif ($FORM_Action1 eq "DHCP") {

    my $DHCPConfigFileContent;

    if (open(FILE, "/etc/dhcpd.conf")) {
        local $/;
        $DHCPConfigFileContent = <FILE>;
        close(FILE);
    }

    unless ($DHCPConfigFileContent) {
        $DHCPConfigFileContent = "/etc/dhcpd.conf could not be read or it is empty!";
        $DHCPLeaseRangeStart = "N/A";
        $DHCPLeaseRangeEnd   = "N/A";
    }

    my ($DHCPLeaseRangeStart, $DHCPLeaseRangeEnd);

    if ($DHCPConfigFileContent =~ m/Range\s+(\d+\.\d+\.\d+\.\d+)\s+(\d+\.\d+\.\d+\.\d+)/i) {
        $DHCPLeaseRangeStart = $1;
        $DHCPLeaseRangeEnd   = $2;
    }

    $Right_Plane_Output .=<<HTMLCODE
    <TABLE ALIGN=CENTER CELLSPACING=3 CELLPADDING=3 BGCOLOR="#000000">
    <TR BGCOLOR="#FFFFFF">
            <TD ALIGN=LEFT>
            <font face="Helvetica, Arial, Sans-serif, Verdena" size="2">
            <TABLE CELLSPACING=0 CELLPADDING=0>
            <TR>
                    <TD>
                    <font face="Helvetica, Arial, Sans-serif, Verdena" size="2">
                    <PRE><CODE>
                    ${DHCPConfigFileContent}
                    </CODE></PRE>

                    <BR><BR><BR>
                    <font color="#FF0000">Lease Range:</font> 
                    <B>${DHCPLeaseRangeStart} to ${DHCPLeaseRangeEnd}</B>
                    <BR><BR><BR>
                    </TD>
            </TR>
            </TABLE>
            </TD>
    </TR>
    </TABLE>
HTMLCODE

}

Once you do this modification, there will be a new menu section on the left side with the name DHCP. Clicking the DHCP link on the left will show you the contents of /etc/dhcpd.conf and Lease Range values, which come from the contents of the file through a simple regular expression construct.

Conclusion

You will have a functioning wireless access gateway once you install AWLP. The above illustration proves how easy it is to add features to this software-based wireless access gateway, provided that you are familiar with Perl. However, to accomplish more useful modifications tailored to your needs, you should examine index.pl and the other core, and include scripts and configuration files.

Related Links

Cross-Language Remoting with mod_perlservice

Mod_perlservice? What is That?

Mod_perlservice is a cool, new way to do remoting -- sharing data between server and client processes -- with Perl and Apache. Let's start by breaking that crazy name apart: mod + perl + service.

Mod means that it's a module for the popular and ubiquitous Apache HTTP Server. Perl represents the popular and ubiquitous programming language. Service is the unique part. It's the new ingredient that unifies Apache, Perl, and XML into an easy-to-use web services system.

With mod_perlservice, you can write Perl subs and packages on your server and call them over the internet from client code. Clients can pass scalars, arrays, and hashes to the server-side subroutines and obtain the return value (scalar, array, or hash) back from the remote code. Some folks refer to this functionality as "remoting" or "RPC," so if you like you can say mod_perlservice is remoting with Perl and Apache. You can write client programs in a variety of languages; libraries for C, Perl, and Flash Action Script are all ready to go.

Now that you know what mod_perlservice is, let's look at why it is. I believe that mod_perlservice has a very clean, easy-to-use interface when compared with other RPC systems. Also, because it builds on the Apache platform it benefits from Apache's ubiquity, security, and status as a standard. Mod_perlservice sports an embedded Perl interpreter to offer high performance for demanding applications.

mod_perlservice

How Can I Use mod_perlservice?

Mod_perlservice helps create networked applications that require client-server communication, information processing, and sharing. Mod_perlservice is for applications, not for creating dynamic content for your HTML pages. However, you surely can use it for Flash remoting with Perl. Here are some usage examples:

  • A desktop application (written using your favorite C++ GUI library) that records the current local air temperature and sends it to an online database every 10 minutes. Any client can query the server to obtain the current and historical local air temperature of any other participating client.
  • A Flash-based stock portfolio management system. You can create model stock portfolios and retrieve real-time stock quote information and news.
  • A command-line utility in Perl that accepts English sentences on standard input and outputs the sentences in French. Translation occurs in server-side Perl code. If the sentence is idiomatic and the translation is incorrect, the user has the option of sending the server a correct translation to store in an online idiom database.

How Do I Start?

Let's move on to the fun stuff and set up a working installation. Before we begin, make sure you have everything you need! You need Apache HTTPD, Perl, Expat, mod_perlservice, and a mod_perlservice client library (Perl Client | C Client | Flash Client). You must download a client library separately, as the distribution does not include any clients! In your build directory:

myhost$ tar -xvzf mod_perlservice.tar.gz 
myhost$ cd mod_perlservice
myhost$ ./configure
myhost$ make
myhost$ make install

If everything goes to plan, you'll end up with a fresh mod_perlservice.so in your Apache modules directory, (usually /etc/apache/modules). Now it's time to configure Apache to use mod_perlservice. cd into your Apache configuration directory (usually /etc/apache/conf) Add the following lines to the file apache.conf (or httpd.conf, if you have only a single configuration file):

LoadModule perlservice_module modules/mod_perlservice.so
AddModule mod_perlservice.c

Add the following lines to commonapache.conf, if you have it and httpd.conf if you don't:

<IfModule mod_perlservice.c>  
<Location /perlservice>   SetHandler
mod_perlservice   
   Allow From All PerlApp
   myappname /my/app/dir
   #Examples
   PerlApp stockmarket /home/services/stockmarket   
   PerlApp temperature /home/services/temperature  
</Location>
</IfModule>

Pay close attention to the PerlApp directive. For every mod_perlservice application you want to run, you need a PerlApp directive. If I were creating a stock market application, I might create a directory: /home/services/stockmarket and add the following PerlApp directive:

PerlApp stockmarket /home/services/stockmarket

This tells mod_perlservice to host an application called stockmarket with the Perl code files located in the /home/services/stockmarket directory. You may run as many service applications as you wish and you may organize them however you wish.

With the configuration files updated, the next step is to restart Apache:

myhost$ /etc/init.d/apache restart
or
myhost$ apachectl restart

Now if everything went as planned, mod_perlservice should be installed. Congratulations!

An Example

Let's create that stock portfolio example mentioned earlier. It won't support real-time quotes, but will instead create a static database of common stock names and historical prices. The application will support stock information for General Electric (GE), Red Hat (RHAT), Coca-Cola (KO), and Caterpillar (CAT).

The application will be stockmarket and will keep all of the Perl files in the stock market application directory (/home/services/stockmarket). The first file will be quotes.pm, reading as follows:

our $lookups = {
    "General Electric" => "GE",
    "Red Hat"          => "RHAT",
    "Coca Cola"        => "KO",
    "Caterpillar Inc"  => "CAT"
};
our $stocksymbols = {
    "GE" => {
        "Price"            => 33.91,
        "EarningsPerShare" => 1.544
    },
    "RHAT" => {
		"Price" => 14.96,
		"EarningsPerShare" => 0.129
	},
    "KO"   => {
        "Price"            => 42.84,
        "EarningsPerShare" => 1.984
    },
    "CAT" => {
        "Price"            => 75.74,
        "EarningsPerShare" => 4.306
    }
};

package quotes;

sub lookupSymbol {
    my $companyname = shift;
    return $lookups->{$company_name};
}

sub getLookupTable {
    return $lookups;
}

sub getStockPrice {
    my $stocksymbol = shift;
    return $stocksymbols->{$stocksymbol}->{"Price"};
}
sub getAllStockInfo {
    my $stocksymbol = shift;
    return $stocksymbols{$stocksymbol};
}
1;

That's the example of the server-side program. Basically, two static "databases" ($lookups and $stocksymbols) provide information about a limited universe of stocks. The above methods query the static databases; the behavior should be fairly self-explanatory.

You may have as many .pm files in your application as you wish and you may also define as many packages within a .pm file as you wish. An extension to this application might be a file called news.pm that enables you to fetch current and historical news about your favorite stocks.

Now let's talk some security. As it stands, this code won't work; mod_perlservice will restrict access to any file and method you don't explicitly export for public use. Use the .serviceaccess file to export things. Create this file in each application directory you declare with mod_perlservice or you'll have no access. An example file might read:

<ServiceAccess>
  <AllowFile name="quotes.pm">
    Allow quotes::*
  </AllowFile>
</ServiceAccess>

In the stock market example, this file should be /home/services/stockmarket/.serviceaccess. Be sure that the apache user does not own this file; that could be bad for security. This file allows access to the file quotes.pm and allows public access to all (*) the methods in package quotes.

If I want to restrict access only to getStockPrice, I would have written Allow quotes::getStockPrice. After that, I could add access to lookupSymbol with Allow quotes::lookupSymbol. To make quotes.pm public carte blanche, use Allow *. You won't need to restart Apache when you make changes to this file as it reloads automatically.

Client Code

Well, so far I've only shown you half the story. It's time to create some client-side code. This client example uses the Flash "PerlService" library, just one of the client-side interfaces to mod_perlservice. The Flash client works well for browser interfaces while the Perl and C clients can create command-line or GUI (ie, GTK or Qt) applications. This article is on the web, so we'll give the Flash interface a spin and then go through an example in Perl.

The first code smidgen should go in the first root frame of your Flash application. It instantiates the global PerlService object and creates event handlers for when remote method calls return from the server. The event handlers output the requested stock information to the display box.

#include "PerlService-0.0.2.as"
// Create a global PerlService object
// Tell the PerlService object about the remote code we want to use:
// arg1) host: www.ivorycity.com
// arg2) application: stockmarket
// arg3) file: quotes.pm
// arg4) package: quotes
_global.ps = new PerlService("www.ivorycity.com","stockmarket","quotes.pm","quotes");
// First declare three callback functions to handle return values
function onStockPrice(val) { 
	output.text = "StockPrice: " + symbolInput.text + " " + val + "\n" + output.text;
}

function onAllStockInfo(val) { 
    output.text = "Stock Info: " + allInfoInput.text + "\n" + "\tPrice: "
	              + val.Price + "\n" + "\tEarnings Per Share: "
				  + val.EarningsPerShare + "\n" + output.text;
}

function onLookupSymbol(val) { 
	output.text = "Lookup Result: " + symbolInput.text + " " + val + "\n"
	              + output.text;
}

// Register callback handlers for managing return values from remote  methods
// ie, onStockPrice receives the return value from remote method getStockPrice

ps.registerReplyHandler( "getStockPrice", onStockPrice );
ps.registerReplyHandler( "getAllStockInfo", onAllStockInfo );
ps.registerReplyHandler( "lookupSymbol", onLookupSymbol );

Now for the code that makes things happen. The following code attaches to three separate buttons. When clicked, the buttons call the remote Perl methods using the global PerlService object. Flash Action Script is an event-driven system, so click event-handlers will call the remote code and return event-handlers will do something with those values.

buttons and code associations
Figure 1. Button and code associations.

When a user presses Button 1, call the remote method getStockPrice and pass the text in the first input box as an argument.

on (release) {
	ps.getStockPrice(box1.text);
}

When the user presses Button 2, call the remote method getAllStockInfo and pass the text in the second input box as an argument.

on (release) {
	ps.getAllStockInfo(box2.text);
}

When the user presses Button 3, call the remote method lookupSymbol and pass the text in the third input box as an argument.

on (release) {
	ps.lookupSymbol(box3.text);
}

That's the entire Flash example. Here is the finished product.

Perl Client

Not everyone uses Flash, especially in the Free Software community. The great thing about mod_perlservice is that everyone can join the party. Here's a Perl Client that uses the same server-side stock market API.


use PService;

my $hostname = "www.ivorycity.com";
my $appname  = "stockmarket";
my $filename = "quotes.pm";
my $package  = "quotes";              

#Create the client object with following arguments:      
#1) The host you want to use
#2) The application on the host
#3) The perl module file name
#4) The package you want to use

my $ps = PSClient->new( $hostname, $appname, $filename, $package );

# Just call those remote methods and get the return value
my $price  = $ps->getStockPrice("GE");
my $info   = $ps->getAllStockInfo("RHAT");
my $lookup = $ps->lookupSymbol("Coca Cola");                   

#Share your exciting new information with standard output
print "GE Price: " . $price . "\n";
print "Red Hat Price: " . $info->{Price} . "\n";
print "Red Hat EPS: " . $info->{EarningsPerShare} . "\n";
print "Coca-Cola's ticker symbol is " . $lookup . "\n";

Using the PSClient object to call remote methods might feel a little awkward if you expect to call them via quotes::getStockPrice(), but think of the $ps instance as a proxy class to your remote methods, if you like.

If things don't work, use print $ps->get_errmsg(); to print an error message. $ps->get_errmsg(); That's a local reserved function, so it doesn't call the server. It's one of a few reserved functions detailed in the Perl client reference.

As you can see, it requires much less work to create an example with the Perl client. You simply instantiate the PSClient object, call the remote methods, and do something with the return values. That's it. There is no protocol decoding, dealing with HTTP, CGI arguments, or any of the old annoyances. Your remote code may as well be local code.

Thanks for Taking the Tour

That's mod_perlservice. I'm sure many of you who are developing client-server applications can see the advantages of this system. Personally, I've always found the existing technologies to be inflexible and/or too cumbersome. The mod_perlservice system offers a clean, simple, and scalable interface that unites client-side and server-side code in the most sensible way yet.

What's next? mod_parrotservice!

Implementing Flood Control

Accordingly to Merriam-Webster Online, "flood" means:

1: a rising and overflowing of a body of water especially onto normally dry land;

2: an overwhelming quantity or volume.

In computer software there are very similar situations when an unpredictable and irregular flow of events can reach higher levels. Such situations usually are not comfortable for users, either slowing down systems or having other undesired effects.

Floods can occur from accessing web pages, requesting information from various sources (ftp lists, irc services, etc.), receiving SMS notification messages, and email processing. It is obvious that it is not possible to list all flood cases.

"Flood control" is a method of controlling the processing-rate of a stream of events. It can reject or postpone events until there are available resources (CPU, time, space, etc.) for them. Essentially the flood control restricts the number of events processed in a specific period of time.

Closing the Gates

To maintain flood control, you must calculate the flood ratio, which is:

flood ratio equation
Figure 1. Flood ratio equation.

fr flood ratio
ec event count
tp time period for ec

To determine if a flood is occurring, compare the flood ratio to the fixed maximum (threshold) ratio. If the result is less than the threshold, there's no flood. Accept the event. If the result is higher, refuse or postpone the event.

comparing the ratios
Figure 2. Comparing the ratios.

ec event count
tp time period for ec
fc fixed event count (max)
fp fixed time period for fc

It is possible to keep an array of timestamps of all events. Upon receipt of a new event, calculate the time period since the oldest event to use as the current count/time ratio. This approach has two drawbacks. The first is that it uses more and more memory to hold all of the timestamps. Suppose that you want only two events to happen inside a one-minute period, giving two events per minute. Someone can trigger a single event, wait half an hour, and finally flood you with another 58 requests. At this point the ratio will be 1.9/min., well below the 2/min. limit. This is the second drawback.

A better approach is to keep a sliding window either of events (fc) or time period (fp).

This period window requires an array of the last events. This array size is unknown. (The specific time units are not important, but the following examples use minutes.)


               past                                   now
    Timeline:  1----2----3----4----5----6----7----8----9---> (min)
    Events:    e1      e2 e3         e4     e5 e6     e7

This timeline measures event timestamps. To calculate the flood ratio, you count events newer than the current time window of size fp. And check against a ratio of four events in three minutes:

Time now:      9
Time window:   from 9-3 = 6 to now(9), so window is 6-9
Oldest event:  e5 (not before 6)
Event count:   3 (in 6-9 period)
Flood ratio:   3/3

This ratio of 3/3 is below the flood threshold of 4/3, so at this moment there is no flood. Perform this check at the last event to check. In this example, this event is e7. After each check, you can safely remove all events older than the time window to reduce memory consumption.

The other solution requires a fixed array of events with size fc. With our 4ev/3min example, then:


               past                  now
    Timeline:  <--5----6----7----8----9---> (min)
    Events:      e4        e5 e6     e7

The event array (window) is size 4. To check for a flood at e7, we use this:

Window size "fc": 4
First event time: e4 -> 5
Last  event time: e7 -> 9
Time period "tp": 9-5 = 4
Flood ratio is:   4/4

The ratio of 4/4 is also below the threshold of 4/3, so it's OK to accept event e7. When you must check a new event, add it to the end of the event array (window) and remove the oldest one. If the new event would cause a flood, remember to reverse these operations.

If the flood check fails, you can find a point in the future when this check will be OK. This makes it possible to return some feedback information to the user indicating how much time to wait before the system will accept the next event:

time until next event equation
Figure 3. Time until next event equation.

ec  event count (requests received, here equal to fc)
fc  fixed event count (max)
fp  fixed time period for fc
now the future time point we search for
ot  oldest event time point in the array (event timestamp)

simplified time until next event equation
Figure 4. Simplified time until next event equation.

time to wait equation
Figure 5. The time-to-wait equation.

time the actual current time (time of the new event)
wait time period to wait before next allowed event

If wait is positive, then this event should be either rejected or postponed. When wait is 0 or negative, it's OK to process the event immediately.

The Code

In the following implementation I'll use a slightly modified version of the sliding window of events. To avoid removing the last event and eventually replacing it after a failed check, I decided to check the current flood ratio with the existing events array and with the time of the new one:


               past                  now
    Timeline:  <--5----6----7----8----9---> (min)
    Events:      e3   e4   e5 e6    (e7)

    Window size fc: 4 (without e7)
    First event time: e4 -> 5
    Last event time: e7 -> 9
    Time period tp: 9-5 = 4
    Flood ratio is:   4/4

This seems a bit strange at first, but it works exactly as needed. The check is performed as if e6 is timed as e7, which is the worst case (the biggest time period for the fixed event window size). If the check passes, than after removing e3, the flood ratio will be always below the threshold!

Following this description I wrote a function to call for each request or event that needs flood control. It receives a fixed, maximum count of requests (the events window size) and a fixed time period. It returns how much time must elapse until the next allowed event, or 0 if it's OK to process the event immediately.

This function should be generic, so it needs some kind of event names. To achieve this there is a third argument -- the specific event name for each flood check.

Here is the actual code:

# this package hash holds flood arrays for each event name
# hash with flood keys, this is the internal flood check data storage
our %FLOOD;

sub flood_check
{
  my $fc = shift; # max flood events count
  my $fp = shift; # max flood time period for $fc events
  my $en = shift; # event name (key) which identifies flood check data

  $FLOOD{ $en } ||= [];   # make empty flood array for this event name
  my $ar = $FLOOD{ $en }; # get array ref for event's flood array
  my $ec = @$ar;          # events count in the flood array
  
  if( $ec >= $fc ) 
    {
    # flood array has enough events to do real flood check
    my $ot = $$ar[0];      # oldest event timestamp in the flood array
    my $tp = time() - $ot; # time period between current and oldest event
    
    # now calculate time in seconds until next allowed event
    my $wait = int( ( $ot + ( $ec * $fp / $fc ) ) - time() );
    if( $wait > 0 )
      {
      # positive number of seconds means flood in progress
      # event should be rejected or postponed
      return $wait;
      }
    # negative or 0 seconds means that event should be accepted
    # oldest event is removed from the flood array
    shift @$ar;
    }
  # flood array is not full or oldest event is already removed
  # so current event has to be added
  push  @$ar, time();
  # event is ok
  return 0;
}

I've put this on the CPAN as Algorithm::FloodControl.

To test it, I wrote a simple program that accepts text, line by line, from standard input and prints each accepted line or the amount of time before the program will accept the next line.

#!/usr/bin/perl
use strict;
use Algorithm::FloodControl;

while(<>)
  {
  # time is used to illustrate the results
  my $tm = scalar localtime;
  
  # exit on `quit' or `exit' strings
  exit if /exit|quit/i;
  
  # FLOOD CHECK: allow no more than 2 same lines in 10 seconds
  # here I use the actual data for flood event name!
  my $lw = flood_check( 2, 10, $_ );
  
  if( $lw ) # local wait time
    {
    chomp;
    print "WARNING: next event allowed in $lw seconds (LOCAL CHECK for '$_')\n";
    next;
    }
  print "$tm: LOCAL  OK: $_";
  
  # FLOOD CHECK: allow no more than 5 lines in 60 seconds
  my $gw = flood_check( 5, 60, 'GLOBAL' );
  
  if( $gw ) # global wait time
    {
    print "WARNING: next event allowed in $gw seconds (GLOBAL CHECK)\n";
    next;
    }
  print "$tm: GLOBAL OK: $_";
  }

I named this floodtest.pl. The of the test were: (">" marks my input lines)

cade@aenea:~$ ./floodtest.pl 
> hello
Wed Feb 17 08:25:35 2004: LOCAL  OK: hello
Wed Feb 17 08:25:35 2004: GLOBAL OK: hello
> hello
Wed Feb 17 08:25:38 2004: LOCAL  OK: hello
Wed Feb 17 08:25:38 2004: GLOBAL OK: hello
> hello
WARNING: next event allowed in 5 seconds (LOCAL CHECK for 'hello')
> bye
Wed Feb 17 08:25:43 2004: LOCAL  OK: bye
Wed Feb 17 08:25:43 2004: GLOBAL OK: bye
> hello
Wed Feb 17 08:25:45 2004: LOCAL  OK: hello
Wed Feb 17 08:25:45 2004: GLOBAL OK: hello
> see you
Wed Feb 17 08:25:48 2004: LOCAL  OK: see you
Wed Feb 17 08:25:48 2004: GLOBAL OK: see you
> next time
Wed Feb 17 08:25:52 2004: LOCAL  OK: next time
WARNING: next event allowed in 43 seconds (GLOBAL CHECK)
> one more try?
Wed Feb 17 08:26:09 2004: LOCAL  OK: one more try?
WARNING: next event allowed in 26 seconds (GLOBAL CHECK)
> free again
Wed Feb 17 08:26:31 2004: LOCAL  OK: free again
WARNING: next event allowed in 4 seconds (GLOBAL CHECK)
> free again
Wed Feb 17 08:26:42 2004: LOCAL  OK: free again
Wed Feb 17 08:26:42 2004: GLOBAL OK: free again

You can see that I could not enter "hello" 3 times during the first 10 seconds but still I managed to enter one more "hello" a bit later (the 10-second flood had ended for the "hello" line) and 2 other lines before the global flood check triggered (5 lines for 1 minute). After 60 seconds, floodtest.pl finally accepted my sixth line, "free again."

The next sections show how to use flood control in several applications. These examples are not exhaustive but are very common, so they will work as templates for other cases.

My Scores Please?

Imagine an IRC bot (robot) which can report scores from the local game servers. Generally this bot receives requests from someone inside IRC channel (a chat room, for those of you who haven�t used IRC) and reports current scores back to the channel. If this eventually becomes very popular, people will start requesting scores more frequently than it is useful just for fun, so there's a clear need for flood control.

I'd prefer to allow any user to request scores no more than twice per minute, but at the same time I want to allow 10 requests total every two minutes:

sub scores_request
{
    my $irc     = shift; # the IRC connection which I communicate over
                         # this is a Net::IRC::Connection object
    my $channel = shift; # the channel where "scores" are requested
    my $user    = shift; # the user who requested scores

    # next line means: do flood check for $user and if it is ok, then
    #                  check for global flood. this is usual Perl idiom.
    my $wait = flood_check( 2, 60, $user ) || flood_check( 10, 120, '*' );
    if( $wait ) # can be 0 or positive number so this check is simple
      {
      # oops flood detected, report this personally to the user
      $irc->notice( $user, "please wait $wait seconds" );
      }
    else
      {
      # it is ok, there is no flood, print scores back to the channel
      $irc->privmsg( $channel, get_scores() );
      }
}

This code uses the Net::IRC module, so if you want to know the details of the notice() and privmsg() functions, check the module documentation.

This is good example of combining events, but it works correctly only if the second flood ratio (in this case 10/120) is greater than first one (2/60). Otherwise you should extend the flood_check() function with an array of events to check in one loop, so if any of them fails the internal storage will update. Perhaps Algorithm::FloodControl will have such a feature in the future.

Another common case is to limit the execution of resource-consuming web scripts (CGI).

(Don't) Flood the Page!

If you want to limit CGI-script execution you will hit a problem: you must save and restore the flood-control internal data between script invocations. For this reason the Algorithm::FloodControl module exports another function called flood_storage, which can get or set the internal data.

In this example I'll use two other modules, Storable and LockFile::Simple. I use the first to save and restore the flood-control data to and from disk files and the second to lock this file to avoid corruptions if two or more instances of the script run at the same time:

#!/usr/bin/perl
use strict;
use Storable qw( store retrieve );
use LockFile::Simple qw( lock unlock );
use Algorithm::FloodControl;

# this is the file that should keep the flood data though /tmp is not
# the perfect place for it
my $flood_file = "/tmp/flood-cgi.dat";

# this is required so the web browser will know what is expected
print "Content-type: text/plain\n\n";

# I wanted to limit the script executions per remote IP so I have to
# read it from the web server environment
my $remote_ip = $ENV{'REMOTE_ADDR'};

# first of all--lock the flood file
lock( $flood_file );

# now read the flood data if flood file exists
my $FLOOD = retrieve( $flood_file ) if -r $flood_file;

# load flood data into the internal storage
flood_storage( $FLOOD ) if $FLOOD;

# do the actual flood check: max 5 times per minute for each IP
# this is the place where more checks can be done
my $wait = flood_check( 5, 60, "TEST_CGI:$remote_ip" );

# save hte internal data back to the disk
store( flood_storage(), $flood_file );

# and finally unlock the file
unlock( $flood_file );

if( $wait )
  {
  # report flood situation
  print "You have to wait $wait seconds before requesting this page again.\n";
  exit;
  }

# there is no flood, continue with the real work here
print "Hello, this is main script here, local time is:\n";
print scalar localtime;
print "\n...\n";

There are various issues to consider, such as the save/restore method, time required, and locking, but in any case the scheme will be similar.

Beep, Beep, Beep ...

In this last example I'll describe a small program, a variation of which I use for (email) SMS notifications. I wanted to avoid scanning large mail directories so I made my email filter copy incoming messages into a separate folder. The program scans this copy folder every 10 minutes for new messages. If there are any, it sends a notification for each one to my mobile phone and removes the copy of the message.

#!/usr/bin/perl
use strict;
use Algorithm::FloodControl;

our $MAIL_ROOT = '/home/cade/mail';
our @SCAN = ( 
              { # this is my personal mail, I'd like to be notified often
                FOLDER  => 'Personal2', # directory (mail folder) to scan
                FC      => 20,          # fixed event count
                FP      => 60*60,       # fixed time period, 1 hour
              }, 
              { # this is a mailing list, I don't need frequent notifications
                FOLDER  => 'AList2',    # directory (mail folder) to scan
                FC      => 3,           # fixed event count
                FP      => 20*60,       # fixed time period, 20 minutes
              }
            );
while(4)
  {
  process_folder( $_ ) for @SCAN;
  sleep(10*60); # sleep 10 minutes
  }

sub process_folder
{
  my $hr     = shift; # this is hash reference
  my $fc     = $hr->{ 'FC' };
  my $fp     = $hr->{ 'FP' };
  my $folder = $hr->{ 'FOLDER' };
  
  my @msg = glob "$MAIL_ROOT/$folder/*";
  return unless @msg; # no messages found
  for( @msg )
    {
    # there are new messages, so flood check is required
    my  $wait = flood_check( $fc, $fp, $folder );
    if( $wait )
      {
      # skip this pass if non-zero wait time is received for this folder
      print "FLOOD! $wait seconds required.\n";
      return;
      }
    send_sms( $folder, $_ );
    }
}

sub send_sms
{
  my $folder = shift;
  my $file   = shift;
  # implementation of this function is beyond the scope of this example
  # so I'll point just that it extracts subject line from the message file
  # and sends (over local sms gateway) text including folder name, time 
  # and subject
  unlink( $file );
  print "SMS: $folder, $file\n";
}

As you can see, this code -- while implementing a totally different task -- has exactly the same flood check as in the previous two examples.

Conclusion

I said in the beginning that flood control has a vast field of applications. There are many cases where it is appropriate or even necessary. There is no excuse to avoid such checks; implementing it is not hard at all.

Visit the home of the Perl programming language: Perl.org

Sponsored by

Powered by Movable Type 5.02