3. Form Processing

One of the most popular uses for CGI programs is to process information from HTML forms. This chapter gives you an extremely brief overview of HTML and Forms. Next you see how the form information is sent to CGI programs. After being introduced to form processing, a Guest book application is developed.

3.1. A Brief Overview of HTML

HTML, or Hypertext Markup Language, is used by web programmers to describe the contents of a web page. It is not a programming language. You simply use HTML to indicate what a certain chunk of text is - such as a paragraph, a heading or specially formatted text. All HTML directives are specified using matched sets of angle brackets and are usually called tags. For example means that the following text should be displayed in bold. To stop the bold text, use the directive. Most HTML directives come in pairs and surround the affected text.

HTML documents need to have certain tags in order for them to be considered "correct". The .. set of tags surround the header information for each document. Inside the header, you can specify a document title with the .. tags.

Tip
HTML tags are case-insensitive. For example, is the same as <title>. However, using all upper case letters in the HTML tags make HTML documents easier to understand because you can pick out the tags more readily.</TD></TR></TBODY></TABLE> <P>After the document header, you need to have a set of <BODY>..</BODY> tags. Inside the document's body, you specify text headings by using a set of <H1>..</H1> tags. Changing the number after the H changes the heading level. For example, <H1> is the first level. <H2> is the second level, and so on. <P>You can use the <P> tag to indicate paragraph endings or use the <BR> to indicate a line break. The <B>..</B> and <I>..</I> tags are used to indicate bold and italic text. <P>The text and tags of the entire HTML document must be surrounded by a set of <HTML>..</HTML> tags. For example: <P><B><PRE><HTML> <HEAD><TITLE>This is the Title

This is a level one header

This is the first paragraph.

This is the second paragraph and it has italic text.

This is a level two header

This is the third paragraph and it has bold text. Most of the time, you will be inserting or modifying text inside the .. tags.

That's enough about generic HTML. The next section discusses Server-Side Includes. Today, Server-Side Includes are replacing some basic CGI programs, so it is important to know about them.

3.2. Server-Side Includes

One of the newest features that has been added to web servers is that of Server-Side Includes or SSI. SSI is a set of functions built into web servers that give HTML developers the ability to insert data into HTML documents using special directives. This means that you can have dynamic documents without needing to create full CGI programs.

The inserted information can take the form of a local file or a file referenced by a URL. You can also include information from a limited set of variables - similar to environmental variables. Finally, you can execute programs that can insert text into the document.

Note
The only real difference between CGI programs and SSI programs is that CGI programs must output an HTTP header as their first line of output. See "HTTP Headers" in [](./cgi.md), for more information.

Most web servers need the file extension to be changed from html to shtml in order for the server to know that it needs to look for Server-Side directives. The file extension is dependent on server configuration, but shtml is a common choice.

All SSI directives look like HTML comments within a document. This way, the SSI directives will simply be ignored on web servers that do not support them.

Table 20.1 shows a partial list of SSI directives supported by the webSite server from O'Reilly. Not all web servers will support all of the directives in the table. You need to check the documentation of your web server to determine what directives it will support.

Note
Table 20.1 shows complete examples of SSI directives. You need to modify the examples so that they work for your web site.

Table 20.1 - A Partial List of SSI Directives
Directive Description
Changes the format used to display dates.
Changes the format used to display file sizes. You may also be able to specify bytes (to display file sizes with commas) or abbrev (to display the file sizes in kilobytes or megabytes).
Changes the format used to display error messages caused by wayward SSI directives. Error messages are also sent to the server's error log.
Displays the value of the variable specified by ?. Several of the possible variables are mentioned in this table.
Displays the full path and filename of the current document.
Displays the virtual path and filename of the current document.
Displays the last time the file was modified. It will use this format for display: 05/31/96 16:45:40.
Displays the date and time using the local time zone.
Displays the date and time using GMT.
Executes a specified CGI program. It must be activated to be used. You can also use a cmd= option to execute shell commands.
Displays the last modification date of the specified file given a virtual path.
Displays the last modification date of the specified file given a relative path.
Displays the size of the specified file given a virtual path.
Displays the size of the specified file given a relative path.
Displays a file given a virtual path.
Displays a file given a relative path. The relative path can't start with the ../ character sequence or the / character to avoid security risks.

SSI provides a fairly rich set of features to the programmer. You might use SSI if you had an existing set of documents to which you wanted to add modification dates. You might also have a file you want to include in a number of your pages - perhaps to act as a header or footer. You could just use the SSI include command on each of those pages, instead of copying the document into each page manually. When available, Server-Side Includes provide a good way to make simple pages more interesting.

Before Server-Side Includes were available, a CGI program was needed in order to automatically generate the last modification date text or to add a generic footer to all pages.

Your particular web server might have additional directives that you can use. Check the documentation that came with it for more information.

Tip
If you'd like more information about Server-Side Includes, check out the following web site:

http://www.sigma.net/tdunn/
Tim Dunn has created a nice site that documents some of the more technical aspects of web sites.

Caution
I would be remiss if I didn't mention the down side of Server-Side Includes. They are very processor intensive. If you don't have a high-powered computer running your web server and you expect to have a lot of traffic, you might want to limit the number of documents that use Server-Side Includes.

3.3. HTML Forms

HTML forms are designed to let a web page designer interact with users by letting them fill out a form. The form can be composed of elements such as input boxes, buttons, checkboxes, radio buttons, and selection lists. All of the form elements are specified using HTML tags surrounded by a set of

..
tags. You can have more than one form per HTML document.

There are several modifiers or options used with the

tag. The two most important are METHOD and ACTION:

  • METHOD - Specifies the manner in which form information is passed to the CGI scripts. The normal values are either GET or POST. See "Handling Form Information" later in this chapter.

  • ACTION - Specifies the URL of the CGI script that will be invoked when the submit button is clicked. You could also specify an email address by using the mailto: notation. For example, sending mail would be accomplished by ACTION="mailto:medined@mtolive.com" and invoking a CGI script would be accomplished by ACTION="/cgi-bin/feedback.pl".

Most field elements are defined using the tag. Like the tag, has several modifiers. The most important are:

  • CHECKED - Specifies that the checkbox or radio button being defined is selected. This modifier should only be used when the element type is checkbox or radio.

  • NAME - Specifies the name of a form element. Most form elements need to have unique names. You'll see in the "Handling Form Information" section later in this chapter that your CGI script will use the element names to access form information.

  • MAXLENGTH - Specifies the maximum number of characters that the user can enter into a form element. If MAXLENGTH is larger than SIZE, the user will be able to scroll to access text that is not visible.

  • TYPE - Specifies the type of input field. The most important field types are checkbox, hidden, password, radio, reset, submit, and text.

  • SIZE - Specifies the size of an input field.

  • VALUE - Specifies the default value for a field. The VALUE modifier is required for radio buttons.

Let's look at how to specify a plain text field:

This HTML line specifies an input field with a default value of WasWaldo. The input box will be 25 characters long although the user can enter up to 50 characters.

At times, you may want the user to be able to enter text without that text being readable. For example, passwords need to be protected so that people passing behind the user can't secretly steal them. In order to create a protected field, use the password type.

Caution
The password input option still sends the text through the Internet without any encryption. In other words, the data is still sent as clear text. The sole function of the password input option is to ensure that the password is not visible on the screen at the time of entry.

The tag is also used to define two possible buttons - the submit and reset buttons. The submit button sends the form data to a specified URL - in other words to a CGI program. The reset button restores the input fields on the forms to their default states. Any information that the user had entered is lost. Frequently, the VALUE modifier is used to change the text that appears on the buttons. For example:

Hidden fields are frequently used as sneaky ways to pass information into a CGI program. Even though the fields are hidden, the field name and value are still sent to the CGI program when the submit button is clicked. For example, if your script generated an email form, you might include a list of email addresses that will be carbon-copied when the message is sent. Since the form user doesn't need to see the list, the field can be hidden. When the submit button is clicked, the hidden fields are still sent to the CGI program along with the rest of the form information.

The last two input types are checkboxes and radio buttons. Checkboxes let the user indicate either of two responses. Either the box on the form is checked or it is not. The meaning behind the checkbox depends entirely on the text that you place adjacent to it. Checkboxes are used when users can check off as many items as they'd like. For example:

Do you like the color Orange?
Do you like the color Blue?
Radio buttons force the user to select only one of a list of options. Using radio buttons for a large number of items (say, over five) is not recommended because they take up too much room on a web page. The Windows 95 Windows NT UNIX OS/2 CPU Type:
Intel Pentium DEC Alpha UnknownYou should always provide a default value for radio buttons because it is assumed that one of them must be selected. Quite often, it is appropriate to provide a "none" or "unknown" radio button (like the "CPU Type" in the above example) so that the user won't be forced to pick an item at random.

Another useful form element is the drop-down list input field specified by the set of tags. This form element provides a compact way to let the user choose one item from a list. The options are placed inside the tags. For example,

You can use the SELECTED modifier to make one of the options the default. Drop-down lists are very useful when you have three or more options to choose from. If you have less, consider using radio buttons. The set of tags. The tags.

The user's web browser will automatically provide scroll bars as needed. However, the text box will probably not word-wrap. In order to move to the next line, the user must press the enter key.

Note
If you'd like a more advanced introduction to HTML forms, try this web site:

http://robot0.ge.uiuc.edu/~carlosp/cs317/ft.1.html

3.4. Handling Form Information

There are two ways for your form to receive form information - the GET method and the POST method. The transfer mechanism is specified in the tag using the METHOD modifier. For example, the following HTML line tells the client web browser to send the form information back to the server using the GET method.

The GET method 
appends all of the form data to the end of the URL used to invoke the CGI 
script. A question mark is used to separate the original URL (specified by the 
ACTION modifier in the  tag) and the form information. The server 
software then puts this information into the QUERY_STRING environment variable 
for use in the CGI script that will process the form. 

The GET method can't be used for larger forms because some web servers limit the length of the URL portion of a request. (Check the documentation on your particular server.) This means that larger forms might blow up if submitted using the GET method. For larger forms, the POST method is the answer.

The POST method sends all of the form information to the CGI program using the STDIN filehandle. The web server will set the CONTENT_LENGTH environment variable to indicate how much data the CGI program needs to read.

The rest of this section develops a function capable of reading both types of form information. The goal of the function is to create a hash that has one entry for each input field on the form.

The first step is simply to read the form information. The method used to send the information is stored in the REQUEST_METHOD environment variable. Therefore, we can examine it to tell if the function needs to look at the QUERY_STRING environment variable or the STDIN filehandle. Listing 20.1 contains a function called getFormData() that places the form information in a variable called $buffer regardless of the method used to transmit the information.

Pseudocode

Define the getFormData() function.

Initialize a buffer.

If the GET method is used, copy the form information into the buffer.

If the POST method is used, read the form information into the buffer.

Listing 20.1-20LST01.PL - The First Step is to Get the Form Information.


sub getFormData { my($buffer) = "";

if ($ENV{'REQUEST_METHOD'} eq 'GET') {
    $buffer = $ENV{'QUERY_STRING'};
}
else {
    read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});
}

}

Tip
Since a single function can handle both the GET and POST methods, you really don't have to worry about which one to use. However, because of the limitation regarding URL length, I suggest that you stick with the POST method.

I'm sure that you find this function pretty simple. But you might be wondering what information is contained in the $buffer variable.

Form information is passed to a CGI program in name=value format and each input field is delimited by an ampersand (&). For example, if you have a form with two fields - one called name and one called age - the form information would look like this:

name=Rolf+D%27Barno&age=34
Can you see the two input fields? First, split up the information using the & as the delimiter:

name=Rolf+D%27Barno
age=34
Next, split up the two input fields based on the = character:

Field Name: name    Field Value: Rolf+D%27Barno
Field Name: age     Field Value: 34
Remember the section on URL encoding from Chapter 19? You see it in action in the name field. The name is really Rolf D'Barno. However, with URL encoding spaces are converted to plus signs and some characters are converted to their hexadecimal ASCII equivalents. If you think about how a single quote might be mistaken for the beginning of an HTML value, you can understand why the ASCII equivalent is used.

Let's add some features to the getFormData() function to split up the input fields and store them in a hash variable. Listing 20.2 shows the new version of the getFormData() function.

Pseudocode

Declare a hash variable to hold the form's input fields.

Call the getFormData() function.

Define the getFormData() function.

Declare a local variable to hold the reference to the input field hash.

  <P>Initialize a buffer. 
  <P>If the GET method is used, copy the form information into the buffer. 
  <P>If the POST method is used, read the form information into the buffer. 
  <P>Iterate over the array returned by the split() function. 
  <P>Decode both the input field name and value. 
  <P>Create an entry in the input field hash variable. 
  <P>Define the decodeURL() function. 
  <P>Get the encoded string from the parameter array. 
  <P>Translate all plus signs into spaces. 
  <P>Convert character coded as hexadecimal digits into regular characters. 
  <P>Return the decoded string.</TT></P></TD></TR></TBODY></TABLE>

Listing 20.2-20LST02.PL - The First Step is to Get the Form Information.


my(%frmFlds);

getFormData(%frmFlds);

sub getFormData { my(\(hashRef) = shift; my(\)buffer) = “”;

if ($ENV{'REQUEST_METHOD'} eq 'GET') {
    $buffer = $ENV{'QUERY_STRING'};
}
else {
    read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});
}

foreach (split(/&amp;/, $buffer)) {
    my($key, $value) = split(/=/, $_);
    $key   = decodeURL($key);
    $value = decodeURL($value);
    %{$hashRef}->{$key} = $value;
}

}

sub decodeURL { \(_ = shift; tr/+/ /; s/%(..)/pack('c', hex(\)1))/eg; return($_); }

The getFormData() function could be considered complete at this point. It correctly reads from both the GET and POST transmission methods, decodes the information, and places the input fields into a hash variable for easy access.

There are some additional considerations of which you need to be aware. If you simply display the information that a user entered, there are some risks involved that you may not be aware of. Let's take a simple example. What if the user enters Rolf in the name field and you subsequently displayed that field's value? Yep, you guessed it, Rolf would be displayed in bold! For simple formatting HTML tags this is not a problem, and may even be a feature. However, if the user entered an SSI tag, he or she may be able to take advantage of a security hole - remember the tag?

You can thwart would-be hackers by converting every instance of < to &lt and of > to &gt. The HTML standard allows for certain characters to be displayed using symbolic codes. This allows you to display a < character without the web browser thinking that a new HTML tag is starting.

If you'd like to give users the ability to retain the character formatting HTML tags, you can test for each tag that you want to allow. When an allowed tag is found, reconvert it back to using normal < and > tags.

You might want to check for users entering a series of

tags in the hopes of generating pages and pages of blank lines. Also, you might want to convert pressing the enter key into spaces so that the line endings that the user entered are ignored and the text will wrap normally when displayed by a web browser. One small refinement of eliminating the line endings could be to convert two consecutive newlines into a paragraph (

) tag.

When you put all of these new features together, you wind up with a getFormData() function that looks like Listing 20.3.

Pseudocode

Declare a hash variable to hold the form's input fields.

Call the getFormData() function.

Define the getFormData() function.

Declare a local variable to hold the reference to the input field hash.

  <P>Initialize a buffer. 
  <P>If the GET method is used, copy the form information into the buffer. 
  <P>If the POST method is used, read the form information into the buffer. 
  <P>Iterate over the array returned by the split() function. 
  <P>Decode both the input field name and value. 
  <P>Compress multiple <P> tags into one. 
  <P>Convert < into &amp;lt; and > into &amp;gt; stopping HTML tags 
  from interpretation. 
  <P>Turn back on the bold and italic HTML tags. 
  <P>Remove unneded carriage returns. 
  <P>Convert two newlines into a HTML paragraph tag. 
  <P>Convert single newlines into spaces. 
  <P>Create an entry in the input field hash variable. 
  <P>Define the decodeURL() function. 
  <P>Get the encoded string from the parameter array. 
  <P>Translate all plus signs into spaces. 
  <P>Convert character coded as hexadecimal digits into regular characters. 
  <P>Return the decoded string.</TT></P></TD></TR></TBODY></TABLE>

Listing 20.3-20LST03.PL - The First Step is to Get the Form Information.


my(%frmFlds);

getFormData(%frmFlds);

sub getFormData { my(\(hashRef) = shift; my(\)buffer) = “”;

if ($ENV{'REQUEST_METHOD'} eq 'GET') {
    $buffer = $ENV{'QUERY_STRING'};
}
else {
    read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});
}

foreach (split(/&amp;/, $buffer)) {
    my($key, $value) = split(/=/, $_);
    $key   = decodeURL($key);
    $value = decodeURL($value);

    $value =~ s/(<P>\s*)+/<P>/g;   # compress multiple <P> tags.
    $value =~ s/</&amp;lt;/g;           # turn off all HTML tags.
    $value =~ s/>/&amp;gt;/g;
    $value =~ s/&amp;lt;b&amp;gt;/<b>/ig;    # turn on the bold tag.
    $value =~ s!&amp;lt;/b&amp;gt;!</b>!ig;
    $value =~ s/&amp;lt;i&amp;gt;/<b>/ig;    # turn on the italic tag.
    $value =~ s!&amp;lt;/i&amp;gt;!</b>!ig;
    $value =~ s!\cM!!g;            # Remove unneeded carriage returns.
    $value =~ s!\n\n!<P>!g;        # Convert 2 newlines into paragraph.
    $value =~ s!\n! !g;            # Convert newline into spaces.
    %{$hashRef}->{$key} = $value;
}

}

sub decodeURL { \(_ = shift; tr/+/ /; s/%(..)/pack('c', hex(\)1))/eg; return($_); }

Caution
Tracking security problems seems like a never-ending task but it is very important, especially if you are responsible for a web server. As complicated as the getFormData() function is, it is still not complete. The

The only thing you might need to change in order for this form to work is the ACTION modifier in the

tag. The directory where you place the CGI program might not be /cgi-bin. The addgest.htm file will generate a web page that looks like the following figure.

Fig. 20.2 - The Add Entry Form

The CGI program in Listing 20.5 is invoked when a visitor clicks on the submit button of the Add Entry HTML form. This program will process the form information, save it to a data file and then create a web page to display all of the entries in the data file.

Pseudocode

Turn on the warning option.

Turn on the strict pragma.

Declare a hash variable to hold the HTML form field data.

Get the local time and pretend that it is one of the form fields.

Get the data from the form.

Save the data into a file.

Send the HTTP header to the remote web browser.

Send the start of page and header information.

Send the heading and request a horizontal line.

Call the readFormData() function to display the Guest book entries.

End the web page.

Define the getFormData() function.

Declare a local variable to hold the reference to the input field hash.

  <P>Initialize a buffer. 
  <P>If the GET method is used, copy the form information into the buffer. 
  <P>If the POST method is used, read the form information into the buffer. 
  <P>Iterate over the array returned by the split() function. 
  <P>Decode both the input field name and value. 
  <P>Compress multiple <P> tags into one. 
  <P>Convert < into &amp;lt; and > into &amp;gt; stopping HTML tags 
  from interpretation. 
  <P>Turn back on the bold and italic HTML tags. 
  <P>Remove unneded carriage returns. 
  <P>Convert two newlines into a HTML paragraph tag. 
  <P>Convert single newlines into spaces. 
  <P>Create an entry in the input field hash variable. 
  <P>Define the decodeURL() function. 
  <P>Get the encoded string from the parameter array. 
  <P>Translate all plus signs into spaces. 
  <P>Convert character coded as hexadecimal digits into regular characters. 
  <P>Return the decoded string. 
  <P>Define the zeroFill() function - turns "1" into "01". 
  <P>Declare a local variable to hold the number to be filled. 
  <P>Declare a local variable to hold the string length that is needed. 
  <P>Find difference between current string length and needed length. 
  <P>If the string is big enough (like "12") then return it. 
  <P>If the string is too big, prefix it with some zeroes. 
  <P>Define the saveFormData() function. 
  <P>Declare two local variables to hold the hash and file name. 
  <P>Open the file for appending. 
  <P>Store the contents of the hash in the data file. 
  <P>Close the file. 
  <P>Define the readFormData() function. 
  <P>Declare a local variable to hold the file name. 
  <P>Open the file for reading. 
  <P>Iterate over the lines of the file. 
  <P>Split the line into four variables using ~ as demlimiter. 
  <P>Print the Guest book entry using a minimal amount of HTML tags. 
  <P>Use a horizontal rule to separate entries. 
  <P>Close the file.</TT></P></TD></TR></TBODY></TABLE>

Listing 20.5-20LST05.PL - A CGI Program to Add a Guest book Entry and Display a Guest book HTML Page


#! /user/bin/perl -w use strict;

my(%fields);
my($sec, $min, $hour, $mday, $mon, $year) = (localtime(time))[0..5];
my($dataFile) = "data/gestbook.dat";

$mon  = zeroFill($mon, 2);
$hour = zeroFill($hour, 2);
$min  = zeroFill($min, 2);
$sec  = zeroFill($sec, 2);
$fields{'timestamp'} = "$mon/$mday/$year, $hour:$min:sec";

getFormData(\%fields);
saveFormData(\%fields, $dataFile);

print("Content-type: text/html\n\n");
print("<HTML>\n");
print("<HEAD><TITLE>Guestbook</TITLE></HEAD>\n");
print("<H1>Guestbook</H1>\n");
print("<HR>\n");
readFormData($dataFile);
print("</BODY>\n");
print("</HTML>\n");

sub getFormData { my(\(hashRef) = shift; my(\)buffer) = “”;

if ($ENV{'REQUEST_METHOD'} eq "GET") {
    $buffer = $ENV{'QUERY_STRING'};
}
else {
    read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});
}

foreach (split(/&amp;/, $buffer)) {
    my($key, $value) = split(/=/, $_);
    $key   = decodeURL($key);
    $value = decodeURL($value);

    $value =~ s/(<P>\s*)+/<P>/g;   # compress multiple <P> tags.
    $value =~ s/</&amp;lt;/g;           # turn off all HTML tags.
    $value =~ s/>/&amp;gt;/g;
    $value =~ s/&amp;lt;b&amp;gt;/<b>/ig;    # turn on the bold tag.
    $value =~ s!&amp;lt;/b&amp;gt;!</b>!ig;
    $value =~ s/&amp;lt;i&amp;gt;/<b>/ig;    # turn on the italic tag.
    $value =~ s!&amp;lt;/i&amp;gt;!</b>!ig;
    $value =~ s!\cM!!g;            # Remove unneeded carriage returns.
    $value =~ s!\n\n!<P>!g;        # Convert 2 newlines into paragraph.
    $value =~ s!\n! !g;            # convert newline into space.

    %{$hashRef}->{$key} = $value;
}

$fields{'comments'} =~ s!\cM!!g;
$fields{'comments'} =~ s!\n\n!<P>!g;
$fields{'comments'} =~ s!\n!<BR>!g;

}

sub decodeURL { \(_ = shift; tr/+/ /; s/%(..)/pack('c', hex(\)1))/eg; return($_); }

sub zeroFill { my(\(temp) = shift; my(\)len) = shift; my(\(diff) = \)len - length($temp);

return($temp) if $diff <= 0;
return(('0' x $diff) . $temp);

}

sub saveFormData { my(\(hashRef) = shift; my(\)file) = shift;

open(FILE, ">>$file") or die("Unable to open Guestbook data file.");
print FILE ("$hashRef->{'timestamp'}~");
print FILE ("$hashRef->{'name'}~");
print FILE ("$hashRef->{'email'}~");
print FILE ("$hashRef->{'comments'}");
print FILE ("\n");
close(FILE);

}

sub readFormData { my($file) = shift;

open(FILE, "<$file") or die("Unable to open Guestbook data file.");
while (<FILE>) {
    my($timestamp, $name, $email, $comments) = split(/~/, $_);

    print("$timestamp: <B>$name</B> <A HREF=mailto:$email>$email</A>\n");
    print("<OL><I>$comments</I></OL>\n");
    print("<HR>\n");
}
close(FILE);

}

This program introduces no new Perl tricks so you should be able to easily understand it. When the program is invoked, it will read the form information and then save the information to the end of a data file. After the information is saved, the program will generate an HTML page to display all of the entries in the data file.

While the program in Listing 20.5 works well, there are several things that can improve it:

  • Error Handling - instead of simply dying, the program could generate an error page that indicates the problem.

  • Field Validation - blank fields should be checked for and warned against.

  • Guest book display - visitors should be able to see the Guest book without needing to add an entry.

The CGI program in Listing 20.6 implements these new features. If you add ?display to the URL of the script, the script will simply display the entries in the data file. If you add ?add to the URL of the script, it will redirect the client browser to the addgest.htm web page. If no additional information is passed with the URL, the script will assume that it has been invoked from a form and will read the form information. After saving the information, the Guest book page will be displayed.

A debugging routine called printENV() has been added to this listing. If you have trouble getting the script to work, you can call the printENV() routine in order to display all of the environment variables and any form information that was read. Place the call to printENV() right before the tag of a web page. The displayError() function calls the printENV() function so that the error can have as much information as possible when a problem arises.

Pseudocode

Turn on the warning option.

Turn on the strict pragma.

Declare a hash variable to hold the HTML form field data.

Get the local time and pretend that it is one of the form fields.

Get the data from the form.

Was the program was invoked with added URL information?

if the display command was used, display the Guest book.

if the add command was use, redirect to the Add Entry page.

otherwise display an error page.

If no extra URL information, check for blank fields.

if blank fields, display an error page.

Save the form data.

Display the Guest book.

Exit the program.

Define the displayError() function.

Display an error page with a specified error message.

Define the displayPage() function.

Read all of the entries into a hash.

Display the Guest book.

Define the readFormData() function.

Declare local variables for a file name and a hash reference.

Open the file for reading.

Iterate over the lines of the file.

Split the line into four variables using ~ as demlimiter.

Create a hash entry to hold the Guest book information.

Close the file.

Define the getFormData() function.

Declare a local variable to hold the reference to the input field hash.

  <P>Initialize a buffer. 
  <P>If the GET method is used, copy the form information into the buffer. 
  <P>If the POST method is used, read the form information into the buffer. 
  <P>Iterate over the array returned by the split() function. 
  <P>Decode both the input field name and value. 
  <P>Compress multiple <P> tags into one. 
  <P>Convert < into &amp;lt; and > into &amp;gt; stopping HTML tags 
  from interpretation. 
  <P>Turn back on the bold and italic HTML tags. 
  <P>Remove unneded carriage returns. 
  <P>Convert two newlines into a HTML paragraph tag. 
  <P>Convert single newlines into spaces. 
  <P>Create an entry in the input field hash variable. 
  <P>Define the decodeURL() function. 
  <P>Get the encoded string from the parameter array. 
  <P>Translate all plus signs into spaces. 
  <P>Convert character coded as hexadecimal digits into regular characters. 
  <P>Return the decoded string. 
  <P>Define the zeroFill() function - turns "1" into "01". 
  <P>Declare a local variable to hold the number to be filled. 
  <P>Declare a local variable to hold the string length that is needed. 
  <P>Find difference between current string length and needed length. 
  <P>If the string is big enough (like "12") then return it. 
  <P>If the string is too big, prefix it with some zeroes. 
  <P>Define the saveFormData() function. 
  <P>Declare two local variables to hold the hash and file name. 
  <P>Open the file for appending. 
  <P>Store the contents of the hash in the data file. 
  <P>Close the file. </TT></P></TD></TR></TBODY></TABLE>

Listing 20.6-20LST06.PL - A More Advanced Guest Book


#! /user/bin/perl -w #use strict;

my(%fields);
my($sec, $min, $hour, $mday, $mon, $year) = (localtime(time))[0..5];
my($dataFile) = "data/gestbook.dat";

$mon  = zeroFill($mon, 2);
$hour = zeroFill($hour, 2);
$min  = zeroFill($min, 2);
$sec  = zeroFill($sec, 2);
$fields{'timestamp'} = "$mon/$mday/$year, $hour:$min:$sec";

getFormData(\%fields);

if ($ENV{'QUERY_STRING'}) {
    if ($ENV{'QUERY_STRING'} eq 'display') {
        displayPage();
    }
    elsif ($ENV{'QUERY_STRING'} eq 'add') {
        print("Location: /addgest.htm\n\n");
    }
    else {
        displayError("Unknown Command: <B>$ENV{'QUERY_STRING'}</B>");
    }
}
else {
    if (length($fields{'name'}) == 0) {
        displayError("Please fill the name field,<BR>\n");
    }
    if (length($fields{'comments'}) == 0) {
        displayError("Please fill the comments field,<BR>\n");
    }
    saveFormData(\%fields, $dataFile);
    displayPage();
}

exit(0);

sub displayError { print(“Content-type: text/html\n\n”); print(“\n”); print(“Guestbook Error\n”); print(“

Guestbook

\n”); print(“
\n”); print(“@_
\n”); print(“
\n”); printENV(); print(“\n”); print(“\n”); exit(0); }

sub displayPage { my(%entries);

readFormData($dataFile, \%entries);

print("Content-type: text/html\n\n");
print("<HTML>\n");
print("<HEAD><TITLE>Guestbook</TITLE></HEAD>\n");
print("<TABLE><TR><TD VALIGN=top><H1>Guestbook</H1></TD>\n");

print("<TD VALIGN=top><UL><LI><A HREF=\"/cgi-bin/gestbook.pl?add\">Add an Entry</A>\n");
print("<LI><A HREF=\"/cgi-bin/gestbook.pl?display\">Refresh</A></UL></TD></TR></TABLE>\n");
print("<HR>\n");

foreach (sort(keys(%entries))) {
    my($arrayRef) = $entries{$_};
    my($timestamp, $name, $email, $comments) = ($_, @{$arrayRef});

    print("$timestamp: <B>$name</B> <A HREF=mailto:$email>$email</A>\n");
    print("<OL>$comments</OL>\n");
    print("<HR>\n");
}
print("</BODY>\n");
print("</HTML>\n");

}

sub readFormData { my(\(file) = shift; my(\)hashRef) = shift;

open(FILE, "<$file") or displayError("Unable to open Guestbook data file.");
while (<FILE>) {
    my($timestamp, $name, $email, $comments) = split(/~/, $_);

    $hashRef->{$timestamp} = [ $name, $email, $comments ];
}
close(FILE);

}

sub getFormData { my(\(hashRef) = shift; my(\)buffer) = “”;

if ($ENV{'REQUEST_METHOD'} eq "GET") {
    $buffer = $ENV{'QUERY_STRING'};
}
else {
    read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});
}

foreach (split(/&amp;/, $buffer)) {
    my($key, $value) = split(/=/, $_);
    $key   = decodeURL($key);
    $value = decodeURL($value);

    $value =~ s/(<P>\s*)+/<P>/g;   # compress multiple <P> tags.
    $value =~ s/</&amp;lt;/g;           # turn off all HTML tags.
    $value =~ s/>/&amp;gt;/g;
    $value =~ s/&amp;lt;b&amp;gt;/<b>/ig;    # turn on the bold tag.
    $value =~ s!&amp;lt;/b&amp;gt;!</b>!ig;
    $value =~ s/&amp;lt;i&amp;gt;/<b>/ig;    # turn on the italic tag.
    $value =~ s!&amp;lt;/i&amp;gt;!</b>!ig;
    $value =~ s!\cM!!g;            # Remove unneeded carriage returns.
    $value =~ s!\n\n!<P>!g;        # Convert 2 newlines into paragraph.
    $value =~ s!\n! !g;            # convert newline into space.
    %{$hashRef}->{$key} = $value;
}

}

sub decodeURL { \(_ = shift; tr/+/ /; s/%(..)/pack('c', hex(\)1))/eg; return($_); }

sub zeroFill { my(\(temp) = shift; my(\)len) = shift; my(\(diff) = \)len - length($temp);

return($temp) if $diff <= 0;
return(('0' x $diff) . $temp);

}

sub saveFormData { my(\(hashRef) = shift; my(\)file) = shift;

open(FILE, ">>$file") or die("Unable to open Guestbook data file.");
print FILE ("$hashRef->{'timestamp'}~");
print FILE ("$hashRef->{'name'}~");
print FILE ("$hashRef->{'email'}~");
print FILE ("$hashRef->{'comments'}");
print FILE ("\n");
close(FILE);

}

sub printENV { print “The Environment report
\n”; print “———————-

\n”;
print “REQUEST_METHOD:  \(ENV{'REQUEST_METHOD'}*\n";
    print "SCRIPT_NAME:     *\)ENV{‘SCRIPT_NAME’}\n”;
print “QUERY_STRING:    \(ENV{'QUERY_STRING'}*\n";
    print "PATH_INFO:       *\)ENV{‘PATH_INFO’}\n”;
print “PATH_TRANSLATED: $ENV{‘PATH_TRANSLATED’}
\n”;

if ($ENV{'REQUEST_METHOD'} eq 'POST') {
    print "CONTENT_TYPE:    $ENV{'CONTENT_TYPE'}<BR>\n";
    print "CONTENT_FILE:    $ENV{'CONTENT_FILE'}<BR>\n";
    print "CONTENT_LENGTH:  $ENV{'CONTENT_LENGTH'}<BR>\n";
}
print("<BR>");

foreach (sort(keys(%ENV))) {
    print("$_: $ENV{$_}<BR>\n");
}
print("<BR>");

foreach (sort(keys(%fields))) {
    print("$_: $fields{$_}<BR>\n");
}
print("<BR>");

}

One of the major changes between Listing 20.5 and Listing 20.6 is in the readFormData() function. Instead of actually printing the Guest book data, the function now creates hash entries for it. This change was done so that an error page could be generated if the data file could not be opened. Otherwise, the error message would have appeared it the middle of the Guest book page - leading to confusion on the part of vistors.

A table was used to add two hypertext links to the top of the web page. One link will let visitors add a new entry and the other refreshes the page. If a second visitor has added a Guest book entry while the first visitor was reading, refreshing the page will display the new entry.

3.8. Summary

This chapter introduced you to HTML forms and form processing. You learned that HTML tags provide guidelines about how the content of a document is structured. For example, the

tag indicates a new paragraph is starting and the

..

tags indicate a text heading.

A "correct" HTML document will be entirely enclosed inside of a set of .. tags. Inside the tag are .. (surrounds document identification information) and .. (surrounds document content information) tags.

After the brief introduction to HTML, you read about Server-Side Includes. They are used to insert information into a document at the time that the page is sent to the web browser. This lets the document designer create dynamic pages without needing CGI programs. For example, you can display the last modification date of a document, or include other document such as a standard footer file.

Next, HTML forms were discussed. HTML forms display input fields that query the visitor to your web site. You can display input boxes, checkboxes, radio buttons, selection lists, submit buttons and reset buttons. Everything inside a set of .. tags is considered one form. You can have multiple forms on a single web page.

The

tag takes two modifiers. The ACTION modifier tell the web browser the name of the CGI program that gets invoked when the form's submit button is clicked. And the METHOD modifier determines how the form information should be sent to the CGI program. If the GET method is used, the information from the form's fields will be available in the QUERY_STRING environment variable. IF the POST method is used, the form information will be available via the STDIN variable.

The getFormData() function was developed to process form information about make it available via a hash variable. This function is the first line of defense against hackers. By investing time developing this function to close security holes, you are rewarded by having a safer, more stable web site.

Debugging a CGI script takes a little bit of preparation. First, create a batch or shell file that defines the environment variables that your CGI program needs. Then, create a test input file if you are using the POST method. Lastly, execute the CGI program from the command line using re-direction to point STDIN to your test input file.

Next, a Guest book application was presented. This application used an HTML form to gather comments from a user. The comments are saved to a database. Then, all of the comments stored in the database are displayed. The first version of the Guest book required the user to add an entry before seeing the contents of the Guest book. The second version of the Guest book let users view the contents without this requirement. In addition, better error checking and new features were added.

The next chapter, [](./web-servers.md), explores web server log files and ways to automatically create web pages.

3.9. Review Questions

  1. What does the acronym HTML stand for?

  2. What are the

    ..

    set of tags used for?

  3. What is the down side of using SSI directives?

  4. Can an HTML form have two submit buttons?

  5. Why should all angle brackets be replaced in form information?

  6. How much text can be entered into a