Common Gateway Interface (CGI), Environmental Variables and URL-Encoding
CGI(Common Gateway Interface)
Common Gateway Interface (CGI) is a standard for interfacing external programs with information servers on the Internet. So what does this mean? Basically, CGI is distinguished from a plain HTML document in that the plain HTML document is static, while CGI executes in real-time to output dynamic information. A program that implements CGI is executable, while the plain HTML document exists as a constant text file that doesn’t change. CGI, then, obtains information from users and tailors pages to their needs. While there are newer ways to perform the same kinds of actions that traditionally have been implemented with CGI, the latter is older and, in many ways, more versatile. It is for this reason that, over time, CGI has become generalized to refer to any program that runs on a Web server and interacts with a browser. For example, if you wanted to allow people from all over the world to query some database you had developed, you could create an executable CGI script that would transmit information to the database engine and then receive results and display them in the user’s Web browser. The user could not directly access the database without some gateway to allow access. This link between the database and the user is the “gateway,” which is where the CGI standard originated. A CGI script can be written in any language that allows it to be executed (e.g., C/C , Fortran, PERL, TCL, Visual Basic, AppleScript, Python, and Unix shells), but by far, the most common language for CGI scripting is PERL, followed by C/C . A CGI script is easier to debug, modify, and maintain than the typical compiled program, so many people prefer CGI for this reason. Alternatively there are other ways to write these scripts i.e. using ASP, JSP and PHP.
How CGI Scripts Work
In very basic terms, a CGI program must interpret the information sent to it, process the information in some way, and generate a response that will be sent back to the client. The following is the sequence of requesting from web server and how CGI scripts handle the request and dynamically provides the web pages.
One of the methods that the web server uses to pass information to a cgi script is through environmental variables. These are created and assigned appropriate values within the environment that the server spawns for the cgi script. Many of them, contain important information, that most cgi programs need to take into account.
This list highlights some of the most commonly used ones, along with a brief description and notes on possible uses for them. This list is by no means a complete reference; many servers pass their own extra variables, or having different names for some, so better check with your server's documentation. The purpose of this list is only to suggest some common good uses for some of the server-passed information.
CONTENT_LENGTH
The length (in bytes) of the input stream that is being passed through standard input.
This is needed when a script is processing input with the POST method, in order to read the correct number of bytes from the standard input. Some servers end the input string with EOF, but this is not guaranteed behavior, so, in order to be sure that you read the correct input length you can do something like read(STDIN,$input,$ENV{CONTENT_LENGTH})
DOCUMENT_ROOT
The directory over which all www document paths are resolved by the server.
Sometimes it is useful to know the server's document root, in order to compose absoulte file paths when all the script is eing given as a parameter is the relative path of the file within the www directory. It is also good practice to have your script resolve paths in this way, both for security reasons and for portability. Another common use is to be able to figure out what the url of a file will be if you only know the absolute path and the hostname. (there's another variable to find out the hostame)
HTTP_REFERER
The URL that the referred (via a link or redirection) the web client to the script. Typed URLs and bookmarks usually result in this variable being left blank.
In many cases a script may need to behave differently depending on the referer. For example, you may want to restrict your counter script to operate only if it is called from one of your own pages, to prevent someone from using it from another web page without your permission. Or even, the referer may be the actual data that the script needs to process. Extending the example above you might also like to install your counter to many pages, and have the script figure out from the referer which page generated the call and increment the appropriate count, keeping a separate count for each individual URL.
HTTP_USER_AGENT
The name/version of the client issuing the request to the script.
Like with referrers, one might need to implement behaviors that vary with the client software used to call the script. A redirection script could make use of this information to point the client to a page optimized for a specific browser, or you may want to have it block requests from specific clients, like robots or clients that are known not to support appropriate features used by what the script would normally output.
QUERY_STRING
Contains query information passed via the calling URL, following a question mark after the script location.
QUERY_STRING is the equivalent of content passed through STDIN in POST, but for script called with the GET method. Query arguments are written in this variable in their URL-Encoded form, just like they appear on the calling URL. You can process this string to extract useful parameters for the script.
REMOTE_ADDR
The IP address from which the client is issuing the request.
This can be useful either for logging accesses to the script (for example a voting script might want to log voters in a file by their IP in order to prevent them from voting more than once) or to block/behave differently for particular IP adresses. (this might be a requirement in a script that has to be restricted to your local network, and maybe perform different tasks for each known host)
REMOTE_HOST
The name of the host from which the client issues the request.
Just like REMOTE_ADDR above, only that this is the hostname of the remote machine. (If it is known via reverse lookup)
REQUEST_METHOD
The method used for the request. (usually GET, POST or HEAD)
It is wise to have your script check this variable before doing anything. You can determine where the input will be (STDIN for POST, QUERY_STRING for GET) or choose to permit operation only under one of the two methods. Also, it is a good idea to exit with an explanatory error message if the script is called from the command-line accidentally, in which case the variable is not defined.
SCRIPT_NAME
The virtual path from which the script is executed.
This is very useful if your script will output html code that contains calls to itself. Having the script determin its virtual path, (and hence, along with DOCUMENT_ROOT, its full URL) is much more portable than hard coding it in a configuration variable. Also, if you like to keep a log of all script accesses in some file, and want to have each script report its name along with the calling parameters or time, it is very portable to use SCRIPT_NAME to print the path of the script.
SERVER_NAME
The web server's hostname or IP address.
Very similarly to SCRIPT_NAME this value can be used to create more portable scripts in case they need to assemble URLs on the local machine. In scripts that are made publically accessible on a system with many virtual hosts, this can provide the ability to have different behaviours depending on the virtual server that's calling the script.
SERVER_PORT
The web server's listening port complements SERVER_PORT above, in forming URLs to the local system. A commonly overlooked aspect, but it will make your script portable if you keep in mind that not all servers run on the default port and thus need explicit port reference in the server address part of the URL.
URL- Encoding
URL encoding is normally performed to convert data passed via html forms, because such data may contain special character, such as "/", ".", "#", and so on, which could either:
Example:
One of the most common encounters with URL Encoding is when dealing with <form>s. Form methods (GET and POST) perform URL Encoding implicitly. Websites uses GET and POST methods to pass parameters between html pages.
As an example, the form below passing the string being URL encoded.
<form method="GET" action="example.html">
<input type="text" name="qs" size="50" value="This is MCA 5th Semester & We rocks at CGC">
<input type="submit">
</form>
This code when executed on browser will show as below
This is what you are going to see after question mark i.e. in query string when you click submit query button:
?qs=This is MCA 5th Semester & We rocks at CGC
So all spaces are replaced with ‘ ’ and the ‘&’ is replaced with &. This is an example of Url Encoding. So if you type in the following characters in that text box:
You are going to get is ?qs=$ & < /> ? ; # : = , " ' ~ + %
Because all of these characters are not permissible in url’s and is encoded in equivalent %## ASCII codes.
As you can see, when a character is URL-encoded, it's converted as %XY, where X and Y is a number. You will see later where these numbers come from.
What Should be URL Encoded?
As a rule of thumb, any non alphanumeric character should be URL encoded. This of course applies to characters that are to be interpreted as is (ie: is not intend to have special meanings) . In such cases, there's no harm in URL-Encoding the character, even if the character actually does not need to be URL-Encoded.
Some Common Special Characters
Here's a table of some of often used characters and their URL encodings.
Character URL Encoded
; ;
? ?
/ /
: :
# #
& &
= =
+
$ $
, ,
<space> or
% %
<
> >
~ ~
% %
Note that because the <space> character is very commonly used, a special code ( the " " sign) has been reserved as its URL encoding. Thus the string "A B" can be URL encoded as either "A B" or "A B".
1. What is CGI and how does it relate to computer science engineering? | ![]() |
2. What are environmental variables in the context of CGI? | ![]() |
3. How does URL-Encoding play a role in CGI? | ![]() |
4. How can CGI be used in computer science engineering projects? | ![]() |
5. What are some commonly used programming languages for CGI in computer science engineering? | ![]() |