In the following document, several conventions are used to indicate various things.
code and keywords
-
This is used for web site addresses and keywords that you would type
in.
In the following section, I hope to briefly explain some of the more common world wide web technologies and how they fit together to create a complete web page. You may be familiar with some or all of this, so feel free to skim. My intention is simply to present a big picture look at what is involved in a web page.
Let's start with something everyone knows. When you go to a web
site, you need to know it's URL (Uniform Resource Locater),
or address. The Internet is composed of thousands of computers
networked together. Through a system called DNS that I won't get
into, many computers have easy to remember names. For example, you
could go to www.google.com
. When you type that into your
browser, your browser first finds which computer the google web site
is on, then asks that computer for a copy. It then displays that page
on the screen. What people don't often think about is the http://
that precedes the address. This is what defines the protocol
that your browser will use in getting the web page at www.google.com
.
In this case, it uses the HyperText Transfer Protocol. Essentially,
it is asking for a web page. Other possible protocols include FTP
(File Transfer Protocol), HTTPS (HTTP Secure), Telnet, and file—which
allows you to view files from your computer in your browser.
Now we get to the actual web page itself. Here, it is useful to use client/server terminology. The client/server model is useful in many areas, but I will stick to its application to web pages. In the most simple case, the server has information, and the client requests the information. For web pages, you have a computer with a web page on it—this is the server—and your home PC—which is the client. At the application level, your browser requests the web page from the server program. This separation between server and client is important. Some web technologies are on the client, and some are on the server. The interaction between the client and the server is also notable.
Anyone who has been to a web page has some idea what goes on client-side. You open up your browser, put in an address, and off you go, surfing the world wide web. What is less commonly known is that there is a great amount of disparity between browsers, which cause web developers no end of grief. Some of the major browsers that are available include Mozilla, Internet Explorer, Konqueror, Opera, Netscape, and Safari. The largest element to the client side is the HTML (HyperText Markup Language). Every web page is written in HTML. The tricky part is how each browser displays that HTML. I won't go into it, but keep it is important to remember that even though it looks good in your browser, it might look awful in someone else's browser. One of the latest web developments is the spread of CSS (Cascading Style Sheets). Again, browsers do not necessarily support CSS in a uniform or complete fashion. Briefly, CSS is used to change how a web page looks. First came CSS1 which just affects text. Then came CSS2 which can be used for layout. CSS2 has not been widely accepted for layout, but it is the future of web layout. Some other technologies that are on the client-side include Javascript, Java applets, and Shockwave or Flash elements. Javascript is embedded in the HTML and can be used for nice looking and useful interactive elements. Menus that scroll out are often done with Javascript. Java applets, Shockwave, and Flash elements are external files that are downloaded and embedded in the web page. Java applets are actually little programs that someone programmed and can do almost anything, but are usually not practical or needed in most circumstances. Shockwave and Flash can provide very nice interfaces and display graphical content well, but require expensive software to create and result in large files, which doesn't work well for those with dial-up Internet access.
What happens on the server is generally invisible. This is good
because it is secure. If you have information that you only want
certain people to see, you can keep it on the server and make sure
only the right people can see it. The server can also do a lot of
processing that would be impractical for the client. Considering
Google again, the Google servers stores millions of records and
return the results to clients' searches. It would be entirely
impractical for Google's entire database to be present on everyone's
computer and have everyone search through that database
themselves. If that happened, searched would start talking
hours instead of the 0.41-ish seconds that Google takes. What happens
on the server side is what affects whether a web page is static or
dynamic. This is a static web page. When you want it, you
ask the server for it, and the server returns exactly this page. In
the case of a dynamic web page, the server must do a bit
more work. Also, there are a lot of ways that it can do that work.
One way is through Microsoft's ASP
(Active Server Pages). Another is Perl
scripts. With Java you can create
JavaBeans. You
can also embed PHP into HTML, which
then gets parsed by the server and passed to the client. In short,
there are a lot of ways for a server to work before passing an HTML
document to the client. Now I'd like to talk a little bit about the
actual server application. There are a couple common web servers.
Microsoft's server is called Internet
Information Services (IIS). Probably the most widespread server
is the Apache web server.
Apache is a free program available on most platforms including
Windows, Linux, and OS X. Everything on the server side goes through
the web server. Often the work that must be done before a web page is
ready requires an external program or perhaps a plugin, but this is
all managed by the web server. Another program that can be present on
the server side is a database server. For example, a client requests
a webpage that has an element in a database. The web server will run
a script through Perl, which will call the database server to return
a piece of data. The script will then return an HTML document that
the webserver will send back to the client. Some common free database
servers include MySQL and PostgreSQL. Perl is a programming language
that works pretty well for a lot of dynamic content. A recent and
very good contender in the dynamic content arena is PHP. PHP code is
embedded in HTML. When a client requests, say, index.php, the server
will pass it to the PHP interpreter, which will leave all the
HTML as it is, but run the PHP code
which will result in a complete HTML >document.
The server then sends it to the client. There are a two ways to get a
dynamic webpage. The first is to simply have a page that is
constructed dynamically on the server with no input from the client.
The other way involves the client passing the server
parameters via CGI (Common Gateway Interface). The way to send
parameters via CGI from the client is as follows: using Google
once again, I want to search for "stuff". I go to the
website and search for "stuff". Looking at the URL of the
site it returns, I find
http://www.google.com/search?hl=en&lr=&ie=UTF-8&oe=UTF-8&q=stuff&btnG=Search
.
We know that http://
is the protocol and www.google.com
is the address, but what is all that other stuff? Well, search
is the CGI script. The "?" indicates that parameters are to
follow. Then there is a series of equalities
separated by ampersands. Without some further investigation,
I'm not sure what everything is, but I see q=stuff
which
looks like stuff is assigned to q. Then, the script takes that, and
all of the other inputs, and returns the site based on that search. I
can reasonably assume that instead of going through the main Google
page, I can simply replace "stuff" with whatever search I
want and get the results that way. Another factor is the lack of a
continuous connection between the the
client and the server. As it works, the client sends a request. The
server returns a document. Then the server promptly forgets the
client. When you use hotmail, yahoo mail, or anything that involves
logging in, there needs to be some way for the server to know that
the requests are still coming from you, and in fact who "you"
are. When you click "next" to see the next email in your
inbox, how is the server to know who is asking for a next email, and
what the current email is? This disconnected architecture is
inherently flawed in maintaining and idea of state, but as you can
see from the current state of the web, there are workarounds. This is
one place where cookies come into play. Summarily, the
solutions involve both the client and the server. In closing, it is
clear that while it is blissfully invisible to the end user, an
awful lot can go on server-side.
In this next section I hope to explain in more detail some of the specific technologies that are key to creating a web page. There is a vast amount of online resources pertaining to web development so I am not trying to be comprehensive. I simply want to provide a solid theoretical foundation from which further research will make anything possible.
The essential element of an HTML document is the text. If you
write some text in an HTML document, it will appear on the screen.
What distinguishes an HTML document from a plain text document is the
presence of tags. Tags change the way that the text appears.
For example, if I write <em>word</em>
, the
result in a browser will be word. As you can see, tags are
enclosed in < and >. Any text will be printed on the screen
unless it is enclosed in a tag. The text after the start tag
is affected by the tag, and the text stops being affected after the
end tag. An end tag is the same as the start tag, but with a
"/" immediately after the first "<". There are
a couple of different types of tags. Some tags are empty.
They do not affect the following text and they do not have a end tag.
<br>
is such a tag. Some tags do not require an
end tag, but for the sake of clean, easily understandable HTML, they
should be included. Many tags accept attributes Attributes
are included after the name of the tag but
still within the tag. The basic structure is as follows: <tag
opt1="stuff" opt2="stuff stuff2 stuff3">My
Content</tag>
. Attributes are always in the start tag.
An element is the start tag, the content between the tags,
and the end tag. There are two types of elements. Block-level
elements distinguish blocks of text. They usually start on a new line
and can contain other block-level elements or inline
elements. Inline elements can only contain text or other inline
elements. A practical example is an essay. This essay is composed of,
say, five paragraphs. Each of these paragraphs would be in a
block-level element. In this case, there is a nice tag for that—the
paragraph, or p
tag. Now, within each paragraph, there
are can be additional formatting. If a word is italicized,
it can be enclosed in the i
(for italic) tag. Each
paragraph begins on a new line, but the italic words are simply
italic.
You may wonder what you would do if you wanted to use a "<"
in your text. The answer is entities. A
special character entity begins with a "&" and ends
with a ";". For example, "<" is <
.
There are many entities that you can use.
White space is another interesting issue in HTML. In short, it has
very little effect. Any amount of white space is treated as one
space. I could start writing something in HTML, then press <enter>
a dozen times, and continue writing. When viewed in a browser, it
would look like one space. One advantage of this is that you can
construct HTML documents to be readable using indentation. The
disadvantage (or maybe it's an advantage!) is that all of the
formating is done with HTML.
You can put comments in HTML. This is useful if you want to write
complex HTML but help other people understand what you are doing. The
format of a comment is: <-- This is a comment -->
Letting users input text for searches or logging in is handled by
form
elements filled with such elements as textarea
and button
.
Here is the basic structure of an HTML document.
The first line states what version of HTML is used in the
following document. HTML documents will work without this statement,
but it should be there to let the browser know what to do with the
rest of the document. HTML 4.0 Transitional
should work
fine in most cases. The next line is the general wrapper tag for the
entire HTML document. The rest of the document is split into two
parts. The head
element contains general information
about the document. In this case, the title of the document is given
using the title
element. Client-side scripts as well as
meta data and style sheets go in the head
element.
Finally, the body
element contains all of the content of
the document. Everything that you have read so far is in the body.
CSS (or Cascading Style Sheets) are a recent addition to web page design. The idea behind CSS is to separate content from presentation. If you look at the source of this document, you'll see that I start out with a style sheet in the head, then all of the content is pretty much straight forward. All of the HTML tags I use in this document simply define the structure of the document. All of the colors and fonts are defined in the CSS style sheet. The design on a web site can be broken into three parts. The most obvious part is the content. The content is all of the words and the pictures that are present on a web site. Now, the content is meaningless if there is no structure. In an essay the structural elements would include a name, date, class, etc. Then there would be a title. Then you have the paragraphs in the essay. Without this structure, you only have a string of words. Finally, you have the layout. Again using the essay example, the layout includes what font you use, what your page margins are, whether your name is in the upper right or upper left of the page, whether your title is in bold, etc. So in a web page, HTML defines the structure and CSS defines the layout. Since these are separate, the layout can change drastically without having to even touch the content and structure. Instead of getting into details, here is some good information on CSS.
There is a vast number of resources on the web. Here are some good ones.