Using Amazon S3 from Perl
by Abel LinApril 08, 2008
Data management is a critical and challenging aspect for any online resource. With exponentially growing data sizes and popularity of rich media, even small online resources must effectively manage and distribute a significant amount of data. Moreover, the peace of mind associated with an additional offsite data storage resource is invaluable to everyone involved.
At SundayMorningRides.com, we manage a growing inventory of GPS and general GIS (Geography Information Systems) data and web content (text, images, videos, etc.) for the end users. In addition, we must also effectively manage daily snapshots, backups, as well as multiple development versions of our web site and supporting software. For any small organization, this can add up to significant costs -- not only as an initial monetary investment but also in terms of ongoing labor costs for maintenance and administration.
Amazon Simple Storage Service (S3) was released specifically to address the problem of data management for online resources -- with the aim to provide "reliable, fast, inexpensive data storage infrastructure that Amazon uses to run its own global network of web sites." Amazon S3 provides a web service interface that allows developers to store and retrieve any amount of data. S3 is attractive to companies like SundayMorningRides.com as it frees us from upfront costs and the ongoing costs of purchasing, administration, maintenance, and scaling our own storage servers.
This article covers the Perl, REST, and the Amazon S3 REST module, walking through the development of a collection of Perl-based tools for UNIX command-line based interaction to Amazon S3. I'll also show how to set access permissions so that you can serve images or other data directly to your site from Amazon S3.
A Bit on Web Services
Web services have become the de-facto method of exposing information and, well, services via the Web. Intrinsically, web services provide a means of interaction between two networked resources. Amazon S3 is accessible via both Simple Object Access Protocol (SOAP) or representational state transfer (REST).
The SOAP interface organizes features into custom-built operations, similar to remote objects when using Java Remote Method Invocation (RMI) or Common Object Resource Broker Architecture (CORBA). Unlike RMI or CORBA, SOAP uses XML embedded in the body of HTTP requests as the application protocol.
Like SOAP, REST also uses HTTP for transport. Unlike SOAP, REST operations are the standard HTTP operations -- GET, POST, PUT, and DELETE. I think of REST operations in terms of the CRUD semantics associated with relational databases: POST is Create, GET is Retrieve, PUT is Update, and DELETE is Delete.
"Storage for the Internet"
Amazon S3 represents the data space in three core concepts: objects, buckets, and keys.
- Objects are the base level entities within Amazon S3. They consist of both object data and metadata. This metadata is a set of name-attribute pairs defined in the HTTP header.
- Buckets are collections of objects. There is no limit to the number of objects in a bucket, but each developer is limited to 100 buckets.
- Keys are unique identifiers for objects.
Without wading through the details, I tend think of buckets as folders, objects as files, and keys as filenames. The purpose of this abstraction is to create a unique HTTP namespace for every object.
I'll assume that you have already signed up for Amazon S3 and received your Access Key ID and Secret Access Key. If not, please do so.
Please note that the S3::* modules aren't the only Perl modules available for connecting to Amazon S3. In particular, Net::Amazon::S3 hides a lot of the details of the S3 service for you. For now, I'm going to use a simpler module to explain how the service works internally.
Connecting, Creating, and Listing Buckets
Connecting to Amazon S3 is as simple as supplying your Access Key ID and your Secret Access Key to create a connection, called here $conn. Here's how to create and list the contents of a bucket as well as list all buckets.
#!/usr/bin/perl
use S3::AWSAuthConnection;
use S3::QueryStringAuthGenerator;
use Data::Dumper;
my $AWS_ACCESS_KEY_ID = 'YOUR ACCESS KEY';
my $AWS_SECRET_ACCESS_KEY = 'YOUR SECRET KEY';
my $conn = S3::AWSAuthConnection->new($AWS_ACCESS_KEY_ID,
$AWS_SECRET_ACCESS_KEY);
my $BUCKET = "foo";
print "creating bucket $BUCKET \n";
print $conn->create_bucket($BUCKET)->message, "\n";
print "listing bucket $BUCKET \n";
print Dumper @{$conn->list_bucket($BUCKET)->entries}, "\n";
print "listing all my buckets \n";
print Dumper @{$conn->list_all_my_buckets()->entries}, "\n";
Because every S3 action takes place over HTTP, it is good practice to check for a 200 response.
my $response = $conn->create_bucket($BUCKET);
if ($response->http_response->code == 200) {
# Good
} else {
# Not Good
}
As you can see from the output, the results come back in a hash. I've used Data::Dumper as a convenient way to view the contents. If you are running this for the first time, you will obviously not see anything listed in the bucket.
listing bucket foo
$VAR1 = {
'Owner' => {
'ID' => 'xxxxx',
'DisplayName' => 'xxxxx'
},
'Size' => '66810',
'ETag' => '"xxxxx"',
'StorageClass' => 'STANDARD',
'Key' => 'key',
'LastModified' => '2007-12-18T22:08:09.000Z'
};
$VAR4 = '
';
listing all my buckets
$VAR1 = {
'CreationDate' => '2007-11-28T17:31:48.000Z',
'Name' => 'foo'
};
';
Pages: 1, 2 |

