Sign In/My Account | View Cart  
advertisement


Listen Print

Web Basics with LWP
by Sean M. Burke | Pages: 1, 2, 3, 4, 5

Accessing HTTPS URLs

When you access an HTTPS URL, it'll work for you just like an HTTP URL would--if your LWP installation has HTTPS support (via an appropriate Secure Sockets Layer library). For example:


  use LWP 5.64;
  my $url = 'https://www.paypal.com/';   # Yes, HTTPS!
  my $browser = LWP::UserAgent->new;
  my $response = $browser->get($url);
  die "Error at $url\n ", $response->status_line, "\n Aborting"
   unless $response->is_success;
  print "Whee, it worked!  I got that ",
   $response->content_type, " document!\n";

If your LWP installation doesn't have HTTPS support set up, then the response will be unsuccessful, and you'll get this error message:


  Error at https://www.paypal.com/
   501 Protocol scheme 'https' is not supported
   Aborting at paypal.pl line 7.   [or whatever program and line]

If your LWP installation does have HTTPS support installed, then the response should be successful, and you should be able to consult $response just like with any normal HTTP response.

For information about installing HTTPS support for your LWP installation, see the helpful README.SSL file that comes in the libwww-perl distribution.

Getting Large Documents

When you're requesting a large (or at least potentially large) document, a problem with the normal way of using the request methods (like $response = $browser->get($url)) is that the response object in memory will have to hold the whole document--in memory. If the response is a 30-megabyte file, this is likely to be quite an imposition on this process's memory usage.

A notable alternative is to have LWP save the content to a file on disk, instead of saving it up in memory. This is the syntax to use:


  $response = $ua->get($url,
                         ':content_file' => $filespec,
                      );

For example,


  $response = $ua->get('http://search.cpan.org/',
                         ':content_file' => '/tmp/sco.html'
                      );

When you use this :content_file option, the $response will have all the normal header lines, but $response->content will be empty.

Note that this ":content_file" option isn't supported under older versions of LWP, so you should consider adding use LWP 5.66; to check the LWP version, if you think your program might run on systems with older versions.

If you need to be compatible with older LWP versions, then use this syntax, which does the same thing:


  use HTTP::Request::Common;
  $response = $ua->request( GET($url), $filespec );

Resources

Related Reading

Perl & LWP

Perl & LWP
By Sean M. Burke

Remember, this article is just the most rudimentary introduction to LWP--to learn more about LWP and LWP-related tasks, you really must read from the following:

  • LWP::Simple: Simple functions for getting, heading, and mirroring URLs.

  • LWP: Overview of the libwww-perl modules.

  • LWP::UserAgent: The class for objects that represent "virtual browsers."

  • HTTP::Response: The class for objects that represent the response to a LWP response, as in $response = $browser->get(...).

  • HTTP::Message and HTTP::Headers: Classes that provide more methods to HTTP::Response.

  • URI: Class for objects that represent absolute or relative URLs.

  • URI::Escape: Functions for URL-escaping and URL-unescaping strings (like turning "this & that" to and from "this%20%26%20that").

  • HTML::Entities: Functions for HTML-escaping and HTML-unescaping strings (like turning "C. & E. Brontë" to and from "C. & E. Brontë").

  • HTML::TokeParser and HTML::TreeBuilder: Classes for parsing HTML.

  • HTML::LinkExtor: Class for finding links in HTML documents.

  • And last but not least, my book Perl & LWP.


Copyright ©2002, Sean M. Burke. You can redistribute this document and/or modify it, but only under the same terms as Perl itself.