Sign In/My Account | View Cart  
advertisement


Listen Print

Embedding Web Servers

by Robert Spier
September 18, 2002

As with most of my previous articles, this one grew out of a project at my $DAY_JOB. The project du-jour involves large dependency graphs, often containing thousands of nodes and edges. Some of the relationships are automatically generated and can be quite complicated. When something goes wrong, it's useful to be able to visualize the graph.

Simple Graph:

A Simple Graph

We use GraphViz for rendering the graph, but it falls down on huge graphs. They turn into an unreadable mess of thick lines -- less than useful. To work around this, we trim down the graph to just a segment, centered around one node, and display only n inputs or outputs.

This works great, except that the startup time to create the graph data can be quite long, because of all the graph processing that is necessary to make sure the information is up to date. (The actually graph rendering is quite fast, for small graphs.)

The solution? Process the data once, and render it multiple times, using, yes, you guessed it, a Web interface!

Mechanics of HTTP

The Hyper Text Transfer Protocol (HTTP), is the protocol on which most of the Web thrives. It is a simple client/server protocol that runs over TCP/IP sockets.

Extremely oversimplified, it looks like this:

  • Client sends request to server: "Send me document named X"
  • Server responds to client: "Here's the data you asked for" (or "Sorry! I don't know what you mean.")

In practice, it's not much more complicated:

We will use wget to examine a sample HTTP request:

wget -dq http://www.perl.org/index.shtml


 ---request begin---
 GET /index.shtml HTTP/1.0
 User-Agent: Wget/1.8.1
 Host: www.perl.org
 Accept: */*
 Connection: Keep-Alive
 
 ---request end---
 HTTP/1.1 200 OK
 Date: Tue, 13 Aug 2002 18:12:23 GMT
 Server: Apache/2.0.40 (Unix)
 Accept-Ranges: bytes
 Content-Length: 10494
 Keep-Alive: timeout=15, max=100
 Connection: Keep-Alive
 Content-Type: text/html; charset=ISO-8859-1
 
 <... data downloaded to a file by wget...>

There's a lot of things we don't care about in a simple server - so lets boil it down to the guts.

Request:


 GET /index.shtml HTTP/1.0

GET is the type of HTTP action. There are others, but they're beyond the scope of this article.

/index.shtml is the name of the page to retrieve.

HTTP/1.0 is the protocol version supported by your client.

Response:


 HTTP/1.1 200 OK
 Content-Type: text/html;
 
 <data>

The first line is the status response. It includes the HTTP protocol version supported by the server, followed by the status code and a short text string defining the status.

For this article, we'll just care about status code 200 (everything is ok, here's the data) and code '404' (not found).

The next line is the MIME content type. This is required so that the Web browser knows how to display the data.

Common Content-Types:


        text/html       a HTML document
        text/plain      a plain text document
        image/jpeg      a JPEG image
        image/gif       a GIF image

After the above "header" section, there must be a blank line, and then the bytes containing the data. There's a lot more information that can go into the header block, but for the simple applications we will be developing, they are not needed.

You can use a telnet client to retrieve data from any Web server. You need to be careful though - many modern Web servers are virtual hosted, which means they require the Host: header in the request to retrieve the appropriate data.

Pages: 1, 2, 3

Next Pagearrow