Common Gateway Interface (CGI), Environmental Variables and URL-Encoding Computer Science Engineering (CSE) Notes | EduRev

Networking Basics

Computer Science Engineering (CSE) : Common Gateway Interface (CGI), Environmental Variables and URL-Encoding Computer Science Engineering (CSE) Notes | EduRev

The document Common Gateway Interface (CGI), Environmental Variables and URL-Encoding Computer Science Engineering (CSE) Notes | EduRev is a part of the Computer Science Engineering (CSE) Course Networking Basics.
All you need of Computer Science Engineering (CSE) at this link: Computer Science Engineering (CSE)

Common Gateway Interface (CGI), Environmental Variables and URL-Encoding

CGI(Common Gateway Interface)

Common Gateway Interface (CGI) is a standard for interfacing external programs with information servers on the Internet. So what does this mean? Basically, CGI is distinguished from a plain HTML document in that the plain HTML document is static, while CGI executes in real-time to output dynamic information. A program that implements CGI is executable, while the plain HTML document exists as a constant text file that doesn’t change. CGI, then, obtains information from users and tailors pages to their needs. While there are newer ways to perform the same kinds of actions that traditionally have been implemented with CGI, the latter is older and, in many ways, more versatile. It is for this reason that, over time, CGI has become generalized to refer to any program that runs on a Web server and interacts with a browser. For example, if you wanted to allow people from all over the world to query some database you had developed, you could create an executable CGI script that would transmit information to the database engine and then receive results and display them in the user’s Web browser. The user could not directly access the database without some gateway to allow access. This link between the database and the user is the “gateway,” which is where the CGI standard originated. A CGI script can be written in any language that allows it to be executed (e.g., C/C , Fortran, PERL, TCL, Visual Basic, AppleScript, Python, and Unix shells), but by far, the most common language for CGI scripting is PERL, followed by C/C . A CGI script is easier to debug, modify, and maintain than the typical compiled program, so many people prefer CGI for this reason. Alternatively there are other ways to write these scripts i.e. using ASP, JSP and PHP.

 

How CGI Scripts Work

 

Common Gateway Interface (CGI), Environmental Variables and URL-Encoding Computer Science Engineering (CSE) Notes | EduRev

 

In very basic terms, a CGI program must interpret the information sent to it, process the information in some way, and generate a response that will be sent back to the client. The following is the sequence of requesting from web server and how CGI scripts handle the request and dynamically provides the web pages.

 

  1. The Web surfer fills out a form and clicks, “Submit.” The information in the form is sent over the Internet to the Web server.

 

  1. The Web server “grabs” the information from the form and passes it to the CGI software.

 

  1. The CGI software performs whatever validation of this information that is required. For instance, it might check to see if an e-mail address is valid. If this is a database program, the CGI software prepares a database statement to add, edit, or delete information from the database.

 

  1. The CGI software then executes the prepared database statement, which is passed to the database driver.

 

  1. The database driver acts as a middleman and performs the requested action on the database itself.

 

  1. The results of the database action are then passed back to the database driver.

 

  1. The database driver sends the information from the database to the CGI software.

 

  1. The CGI software takes the information from the database and manipulates it into the format that is desired.

 

  1. If any static HTML pages need to be created, the CGI program accesses the Web server computer’s file system and reads, writes, and/or edits files.

 

  1. The CGI software then sends the result it wants the Web surfer’s browser to see back to the Web server.

 

  1. The Web server sends the result it got from the CGI software back to the Web surfer’s browser.

 

One of the methods that the web server uses to pass information to a cgi script is through environmental variables. These are created and assigned appropriate values within the environment that the server spawns for the cgi script. Many of them, contain important information, that most cgi programs need to take into account.

This list highlights some of the most commonly used ones, along with a brief description and notes on possible uses for them. This list is by no means a complete reference; many servers pass their own extra variables, or having different names for some, so better check with your server's documentation. The purpose of this list is only to suggest some common good uses for some of the server-passed information.

 

CONTENT_LENGTH

 

The length (in bytes) of the input stream that is being passed through standard input.

This is needed when a script is processing input with the POST method, in order to read the correct number of bytes from the standard input. Some servers end the input string with EOF, but this is not guaranteed behavior, so, in order to be sure that you read the correct input length you can do something like read(STDIN,$input,$ENV{CONTENT_LENGTH})

 

DOCUMENT_ROOT

 

The directory over which all www document paths are resolved by the server.

Sometimes it is useful to know the server's document root, in order to compose absoulte file paths when all the script is eing given as a parameter is the relative path of the file within the www directory. It is also good practice to have your script resolve paths in this way, both for security reasons and for portability. Another common use is to be able to figure out what the url of a file will be if you only know the absolute path and the hostname. (there's another variable to find out the hostame)

 

HTTP_REFERER

The URL that the referred (via a link or redirection) the web client to the script. Typed URLs and bookmarks usually result in this variable being left blank.

In many cases a script may need to behave differently depending on the referer. For example, you may want to restrict your counter script to operate only if it is called from one of your own pages, to prevent someone from using it from another web page without your permission. Or even, the referer may be the actual data that the script needs to process. Extending the example above you might also like to install your counter to many pages, and have the script figure out from the referer which page generated the call and increment the appropriate count, keeping a separate count for each individual URL.

HTTP_USER_AGENT

The name/version of the client issuing the request to the script.

Like with referrers, one might need to implement behaviors that vary with the client software used to call the script. A redirection script could make use of this information to point the client to a page optimized for a specific browser, or you may want to have it block requests from specific clients, like robots or clients that are known not to support appropriate features used by what the script would normally output.

 

QUERY_STRING

Contains query information passed via the calling URL, following a question mark after the script location.

QUERY_STRING is the equivalent of content passed through STDIN in POST, but for script called with the GET method. Query arguments are written in this variable in their URL-Encoded form, just like they appear on the calling URL. You can process this string to extract useful parameters for the script.

 

REMOTE_ADDR

The IP address from which the client is issuing the request.

This can be useful either for logging accesses to the script (for example a voting script might want to log voters in a file by their IP in order to prevent them from voting more than once) or to block/behave differently for particular IP adresses. (this might be a requirement in a script that has to be restricted to your local network, and maybe perform different tasks for each known host)

 

REMOTE_HOST

The name of the host from which the client issues the request.

Just like REMOTE_ADDR above, only that this is the hostname of the remote machine. (If it is known via reverse lookup)

 

REQUEST_METHOD

The method used for the request. (usually GET, POST or HEAD)

It is wise to have your script check this variable before doing anything. You can determine where the input will be (STDIN for POST, QUERY_STRING for GET) or choose to permit operation only under one of the two methods. Also, it is a good idea to exit with an explanatory error message if the script is called from the command-line accidentally, in which case the variable is not defined.

 

SCRIPT_NAME

The virtual path from which the script is executed.

This is very useful if your script will output html code that contains calls to itself. Having the script determin its virtual path, (and hence, along with DOCUMENT_ROOT, its full URL) is much more portable than hard coding it in a configuration variable. Also, if you like to keep a log of all script accesses in some file, and want to have each script report its name along with the calling parameters or time, it is very portable to use SCRIPT_NAME to print the path of the script.

 

SERVER_NAME

The web server's hostname or IP address.

Very similarly to SCRIPT_NAME this value can be used to create more portable scripts in case they need to assemble URLs on the local machine. In scripts that are made publically accessible on a system with many virtual hosts, this can provide the ability to have different behaviours depending on the virtual server that's calling the script.

 

SERVER_PORT

The web server's listening port complements SERVER_PORT above, in forming URLs to the local system. A commonly overlooked aspect, but it will make your script portable if you keep in mind that not all servers run on the default port and thus need explicit port reference in the server address part of the URL.

 

URL- Encoding

URL encoding is normally performed to convert data passed via html forms, because such data may contain special character, such as "/", ".", "#", and so on, which could either:

  1. have special meanings;
  2.  is not a valid character for an URL;
  3. could be altered during transfer.   For instance, the "#" character needs to be encoded because it has a special meaning of that of an html anchor.   The <space> character also needs to be encoded because is not allowed on a valid URL format.   Also, some characters, such as "~" might not transport properly across the internet. 

 

Example:
One of the most common encounters with URL Encoding is when dealing with <form>s.  Form methods (GET and POST) perform URL Encoding implicitly.  Websites uses GET and POST methods to pass parameters between html pages.

As an example, the form below passing the string being URL encoded. 

<form method="GET" action="example.html">

  <input type="text" name="qs" size="50" value="This is MCA 5th Semester & We rocks at CGC">
  <input type="submit">

</form>

This code when executed on browser will show as below

 

Common Gateway Interface (CGI), Environmental Variables and URL-Encoding Computer Science Engineering (CSE) Notes | EduRev

This is what you are going to see after question mark i.e. in query string when you click submit query button:

?qs=This is MCA 5th Semester %26 We rocks at CGC

So all spaces are replaced with ‘ ’ and the ‘&’ is replaced with %26. This is an example of Url Encoding. So if you type in the following characters in that text box:

 

Common Gateway Interface (CGI), Environmental Variables and URL-Encoding Computer Science Engineering (CSE) Notes | EduRev

You are going to get is ?qs=%24 %26 %3C %3E %3F %3B %23 %3A %3D %2C %22 %27 %7E %2B %25

Because all of these characters are not permissible in url’s and is encoded in equivalent %## ASCII codes.

As you can see, when a character is URL-encoded, it's converted as %XY, where X and Y is a number.   You will see later where these numbers come from.

What Should be URL Encoded?


As a rule of thumb, any non alphanumeric character should be URL encoded.  This of course applies to characters that are to be interpreted as is (ie: is not intend to have special meanings) .  In such cases, there's no harm in URL-Encoding the character,  even if the character actually does not need  to be URL-Encoded. 

Some Common Special Characters


Here's a table of some of often used characters and their URL encodings. 

Character     URL Encoded

   ;                    %3B

  ?                    %3F

   /                     %2F

   :                     %3A

  #                     %23

  &                    %26

  =                     %3D

                       %2B

  $                      %24

  ,                      %2C

<space>            %20 or 

  %                    %25

  <                     %3C

  >                     %3E

  ~                     %7E

  %                     %25

 

Note that because the <space> character is very commonly used, a special code ( the " " sign) has been reserved as its URL encoding.  Thus the string "A B" can be URL encoded as either "A%20B" or "A B".

Offer running on EduRev: Apply code STAYHOME200 to get INR 200 off on our premium plan EduRev Infinity!

Dynamic Test

Content Category

Related Searches

video lectures

,

Common Gateway Interface (CGI)

,

study material

,

Objective type Questions

,

mock tests for examination

,

shortcuts and tricks

,

Free

,

Environmental Variables and URL-Encoding Computer Science Engineering (CSE) Notes | EduRev

,

Environmental Variables and URL-Encoding Computer Science Engineering (CSE) Notes | EduRev

,

pdf

,

ppt

,

Common Gateway Interface (CGI)

,

past year papers

,

Common Gateway Interface (CGI)

,

Sample Paper

,

Extra Questions

,

Semester Notes

,

Important questions

,

Viva Questions

,

Environmental Variables and URL-Encoding Computer Science Engineering (CSE) Notes | EduRev

,

Exam

,

Previous Year Questions with Solutions

,

Summary

,

MCQs

,

practice quizzes

;