Conventions

In the following document, several conventions are used to indicate various things.

Overview of Web Technologies

In the following section, I hope to briefly explain some of the more common world wide web technologies and how they fit together to create a complete web page. You may be familiar with some or all of this, so feel free to skim. My intention is simply to present a big picture look at what is involved in a web page.

URL's

Let's start with something everyone knows. When you go to a web site, you need to know it's URL (Uniform Resource Locater), or address. The Internet is composed of thousands of computers networked together. Through a system called DNS that I won't get into, many computers have easy to remember names. For example, you could go to www.google.com. When you type that into your browser, your browser first finds which computer the google web site is on, then asks that computer for a copy. It then displays that page on the screen. What people don't often think about is the http:// that precedes the address. This is what defines the protocol that your browser will use in getting the web page at www.google.com. In this case, it uses the HyperText Transfer Protocol. Essentially, it is asking for a web page. Other possible protocols include FTP (File Transfer Protocol), HTTPS (HTTP Secure), Telnet, and file—which allows you to view files from your computer in your browser.

Web pages

Now we get to the actual web page itself. Here, it is useful to use client/server terminology. The client/server model is useful in many areas, but I will stick to its application to web pages. In the most simple case, the server has information, and the client requests the information. For web pages, you have a computer with a web page on it—this is the server—and your home PC—which is the client. At the application level, your browser requests the web page from the server program. This separation between server and client is important. Some web technologies are on the client, and some are on the server. The interaction between the client and the server is also notable.

Client-side

Anyone who has been to a web page has some idea what goes on client-side. You open up your browser, put in an address, and off you go, surfing the world wide web. What is less commonly known is that there is a great amount of disparity between browsers, which cause web developers no end of grief. Some of the major browsers that are available include Mozilla, Internet Explorer, Konqueror, Opera, Netscape, and Safari. The largest element to the client side is the HTML (HyperText Markup Language). Every web page is written in HTML. The tricky part is how each browser displays that HTML. I won't go into it, but keep it is important to remember that even though it looks good in your browser, it might look awful in someone else's browser. One of the latest web developments is the spread of CSS (Cascading Style Sheets). Again, browsers do not necessarily support CSS in a uniform or complete fashion. Briefly, CSS is used to change how a web page looks. First came CSS1 which just affects text. Then came CSS2 which can be used for layout. CSS2 has not been widely accepted for layout, but it is the future of web layout. Some other technologies that are on the client-side include Javascript, Java applets, and Shockwave or Flash elements. Javascript is embedded in the HTML and can be used for nice looking and useful interactive elements. Menus that scroll out are often done with Javascript. Java applets, Shockwave, and Flash elements are external files that are downloaded and embedded in the web page. Java applets are actually little programs that someone programmed and can do almost anything, but are usually not practical or needed in most circumstances. Shockwave and Flash can provide very nice interfaces and display graphical content well, but require expensive software to create and result in large files, which doesn't work well for those with dial-up Internet access.

Server-side

What happens on the server is generally invisible. This is good because it is secure. If you have information that you only want certain people to see, you can keep it on the server and make sure only the right people can see it. The server can also do a lot of processing that would be impractical for the client. Considering Google again, the Google servers stores millions of records and return the results to clients' searches. It would be entirely impractical for Google's entire database to be present on everyone's computer and have everyone search through that database themselves. If that happened, searched would start talking hours instead of the 0.41-ish seconds that Google takes. What happens on the server side is what affects whether a web page is static or dynamic. This is a static web page. When you want it, you ask the server for it, and the server returns exactly this page. In the case of a dynamic web page, the server must do a bit more work. Also, there are a lot of ways that it can do that work. One way is through Microsoft's ASP (Active Server Pages). Another is Perl scripts. With Java you can create JavaBeans. You can also embed PHP into HTML, which then gets parsed by the server and passed to the client. In short, there are a lot of ways for a server to work before passing an HTML document to the client. Now I'd like to talk a little bit about the actual server application. There are a couple common web servers. Microsoft's server is called Internet Information Services (IIS). Probably the most widespread server is the Apache web server. Apache is a free program available on most platforms including Windows, Linux, and OS X. Everything on the server side goes through the web server. Often the work that must be done before a web page is ready requires an external program or perhaps a plugin, but this is all managed by the web server. Another program that can be present on the server side is a database server. For example, a client requests a webpage that has an element in a database. The web server will run a script through Perl, which will call the database server to return a piece of data. The script will then return an HTML document that the webserver will send back to the client. Some common free database servers include MySQL and PostgreSQL. Perl is a programming language that works pretty well for a lot of dynamic content. A recent and very good contender in the dynamic content arena is PHP. PHP code is embedded in HTML. When a client requests, say, index.php, the server will pass it to the PHP interpreter, which will leave all the HTML as it is, but run the PHP code which will result in a complete HTML >document. The server then sends it to the client. There are a two ways to get a dynamic webpage. The first is to simply have a page that is constructed dynamically on the server with no input from the client. The other way involves the client passing the server parameters via CGI (Common Gateway Interface). The way to send parameters via CGI from the client is as follows: using Google once again, I want to search for "stuff". I go to the website and search for "stuff". Looking at the URL of the site it returns, I find http://www.google.com/search?hl=en&lr=&ie=UTF-8&oe=UTF-8&q=stuff&btnG=Search. We know that http:// is the protocol and www.google.com is the address, but what is all that other stuff? Well, search is the CGI script. The "?" indicates that parameters are to follow. Then there is a series of equalities separated by ampersands. Without some further investigation, I'm not sure what everything is, but I see q=stuff which looks like stuff is assigned to q. Then, the script takes that, and all of the other inputs, and returns the site based on that search. I can reasonably assume that instead of going through the main Google page, I can simply replace "stuff" with whatever search I want and get the results that way. Another factor is the lack of a continuous connection between the the client and the server. As it works, the client sends a request. The server returns a document. Then the server promptly forgets the client. When you use hotmail, yahoo mail, or anything that involves logging in, there needs to be some way for the server to know that the requests are still coming from you, and in fact who "you" are. When you click "next" to see the next email in your inbox, how is the server to know who is asking for a next email, and what the current email is? This disconnected architecture is inherently flawed in maintaining and idea of state, but as you can see from the current state of the web, there are workarounds. This is one place where cookies come into play. Summarily, the solutions involve both the client and the server. In closing, it is clear that while it is blissfully invisible to the end user, an awful lot can go on server-side.

Specific Technologies

In this next section I hope to explain in more detail some of the specific technologies that are key to creating a web page. There is a vast amount of online resources pertaining to web development so I am not trying to be comprehensive. I simply want to provide a solid theoretical foundation from which further research will make anything possible.

HTML

Tags and Elements

The essential element of an HTML document is the text. If you write some text in an HTML document, it will appear on the screen. What distinguishes an HTML document from a plain text document is the presence of tags. Tags change the way that the text appears. For example, if I write <em>word</em>, the result in a browser will be word. As you can see, tags are enclosed in < and >. Any text will be printed on the screen unless it is enclosed in a tag. The text after the start tag is affected by the tag, and the text stops being affected after the end tag. An end tag is the same as the start tag, but with a "/" immediately after the first "<". There are a couple of different types of tags. Some tags are empty. They do not affect the following text and they do not have a end tag. <br> is such a tag. Some tags do not require an end tag, but for the sake of clean, easily understandable HTML, they should be included. Many tags accept attributes Attributes are included after the name of the tag but still within the tag. The basic structure is as follows: <tag opt1="stuff" opt2="stuff stuff2 stuff3">My Content</tag>. Attributes are always in the start tag. An element is the start tag, the content between the tags, and the end tag. There are two types of elements. Block-level elements distinguish blocks of text. They usually start on a new line and can contain other block-level elements or inline elements. Inline elements can only contain text or other inline elements. A practical example is an essay. This essay is composed of, say, five paragraphs. Each of these paragraphs would be in a block-level element. In this case, there is a nice tag for that—the paragraph, or p tag. Now, within each paragraph, there are can be additional formatting. If a word is italicized, it can be enclosed in the i (for italic) tag. Each paragraph begins on a new line, but the italic words are simply italic.

Miscellaneous Notes

You may wonder what you would do if you wanted to use a "<" in your text. The answer is entities. A special character entity begins with a "&" and ends with a ";". For example, "<" is &lt;. There are many entities that you can use.

White space is another interesting issue in HTML. In short, it has very little effect. Any amount of white space is treated as one space. I could start writing something in HTML, then press <enter> a dozen times, and continue writing. When viewed in a browser, it would look like one space. One advantage of this is that you can construct HTML documents to be readable using indentation. The disadvantage (or maybe it's an advantage!) is that all of the formating is done with HTML.

You can put comments in HTML. This is useful if you want to write complex HTML but help other people understand what you are doing. The format of a comment is: <-- This is a comment -->

Letting users input text for searches or logging in is handled by form elements filled with such elements as textarea and button.

Structure of an HTML document

Here is the basic structure of an HTML document.

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
  <head>
    <title>My Webpage</title>
    <-- other head elements -->
  </head>
  <body>
    <-- all content -->
  </body>
</html>

The first line states what version of HTML is used in the following document. HTML documents will work without this statement, but it should be there to let the browser know what to do with the rest of the document. HTML 4.0 Transitional should work fine in most cases. The next line is the general wrapper tag for the entire HTML document. The rest of the document is split into two parts. The head element contains general information about the document. In this case, the title of the document is given using the title element. Client-side scripts as well as meta data and style sheets go in the head element. Finally, the body element contains all of the content of the document. Everything that you have read so far is in the body.

CSS

CSS (or Cascading Style Sheets) are a recent addition to web page design. The idea behind CSS is to separate content from presentation. If you look at the source of this document, you'll see that I start out with a style sheet in the head, then all of the content is pretty much straight forward. All of the HTML tags I use in this document simply define the structure of the document. All of the colors and fonts are defined in the CSS style sheet. The design on a web site can be broken into three parts. The most obvious part is the content. The content is all of the words and the pictures that are present on a web site. Now, the content is meaningless if there is no structure. In an essay the structural elements would include a name, date, class, etc. Then there would be a title. Then you have the paragraphs in the essay. Without this structure, you only have a string of words. Finally, you have the layout. Again using the essay example, the layout includes what font you use, what your page margins are, whether your name is in the upper right or upper left of the page, whether your title is in bold, etc. So in a web page, HTML defines the structure and CSS defines the layout. Since these are separate, the layout can change drastically without having to even touch the content and structure. Instead of getting into details, here is some good information on CSS.

Links

There is a vast number of resources on the web. Here are some good ones.

{% include tracking.html %}