About HTML

Writing HTML is exciting, often frustrating and usually very time-consuming. The following are some of the rules I have learned the hard way.

File structure

Organize the files properly.

Content match onto files and directories, subsections match onto subdirectories, subsubsections match onto subsubdirectories. Full stop. Don't fragment the text and don't use deeply nested directories.

Put the most important page in each dirctory in a file with a fixed name, e.g main.html. Links from one directory to another points to main.html, not to the less important files. If the links turns into spaghetti, the content does not properly match the files.

Put all images used as icons or decoration into a single directory. All instances of an icon in the HTML code points to the same physical file in the icon-directory. This generates a uniform style and makes it possible to change the style with a minimum of effort.

Use short filenames, max 8 characters. The filenames should be speakable. Before you know it, people call to ask you to spell dubble you dubble you dubble you dot ..

All directory names should be unique. While standardized filenames helped getting the links point to the proper file, the use of unique directory names allows file not found bugs to be fixed.

Filenames and usage should be parallel, somefile.gif, somefile.txt and somefile.dat are related to somefile.html, not to otherfile.html.

Before releasing the HTML, make sure that your filenames are stable. The users will create links to just about any file and complain if you ever change the filename or move the file.

For security reasons, never create HTML-links or, even worse, UNIX soft or hard links that points from your WWW directory tree to files or directories outside this tree.

Source

Always use the structure:

<HTML>
<HEAD>
<TITLE>The title </TITLE>
</HEAD>
<BODY BGCOLOR=#FFFFFF TEXT=#000000>


</BODY>
</HTML>

Don't be creative outside the body. Some browsers treat text outside the body in surprising ways.

Writing HTML

Before you consider writing your pages in another language and using a program to convert to HTML, check two things:
  1. Some convertes generate HTML which is full of graphics. The time it takes a browser to load a page may be unacceptable.
  2. Some convertes generate HTML which is too complex. If the licence expires, the vendor folds or you upgrade to incompatible hardware, you may not be able to work from the HTML.

If at all possible, use straight HTML. Don't even think about creating fancy fancy lettering by representing each character or each word as an image. Not only will the page load extremely slowly, but different browsers may position the images slightly differently and the result may be unreadable.

Before you consider using an HTML editor, make the following test:
Use the HTML editor to create a reasonably complex page. Exit the HTML editor. Use a conventional text editor to create a few syntax errors in the file, e.g delete the <UL> from a <UL> <LI></UL> construct. Exit the text editor and enter the HTML editor. Some HTML editors will very agressively edit the file until it appears syntactically correct. The result may have no resemblence to the original file.

Math

Although straight HTML is preferrable with respect to performance, you may have to represent even simple equations as images. Browsers consider points between HTML operators as good places to break the line and E=mc<SUP>2</SUP> may end up with E=mc on one line and "2" at the beginning of the next.

To include equations in HTML write the math as LaTeX:

\documentstyle[12pt]{article}
\pagestyle{empty}
\begin{document}
\large
\begin{displaymath}
\sf
E=mc^2
\end{displaymath}
\end{document}
store in a file, here einstein.tex, convert to gif
        latex einstein
        dvips einstein
	gs -q -dNOPAUSE -sDEVICE=ppmraw -sOutputFile=- einstein.ps | \
                pnmcrop | \
                ppmtogif > einstein.gif
and include as an image: <IMG SRC="einstein.gif" ALT="E=mc^2">

Including HTML

Don't. The server is busy enough without having to process the pages before serving them to the world. Further, if and when security related problems are discovered in the server, the server will be patched or replaced without notice.

If you want to include the same HTML-elements in a few different files, use an editor.

If you want to include the same HTML-elements in many different files, write a perl-script which will generate the files.

If you want to include the same page in different places in your project, use a soft-link.

Title

Always supply a <TITLE>title</TITLE>. When the user creates a bookmark, the browser will use the title to label the bookmark.

Tags

Use tags in uppercase, <P>, to make them stand out.

Use indentation to format the source for lists and tables. Finding and fixing an error is much easier if the source has at least some resemblence to the output.

<H1>, <H2> ... <H6> <OL>, <UL> <P> and <HR> will end the current paragraph and should be on a line by themself in the source.

Address

Always supply the email address of the webmaster between <ADDRESS>-</ADDRESS> at the bottom of the page. This allows automatic checking and updating of the webmaster address.

Links

Be very careful about uppercase and lowercase in the filename in a link. Browsers and servers interact in strange ways and filenames on WWW are neither case-sensitive nor case-insensitive.

When you create links to files in the current directory, create the simplest possible form
<A HREF="somefile.html">dyt dyt</A>
Don't forget the quotes around the filename. Some browsers react in surprising ways if one or both quotes are missing.

When linking to a file in a different directory relative links e.g.
<A HREF="../..somefile.html">dyt dyt</A>
will cause problems when the user access the file through a bookmark or a search engine. However, absolute links, e.g.
<A HREF="http://www.somewhere.com/subdir/somefile.html">dyt dyt</A>
cause performance problems.

As a compromise make one absolute link from each page back to the main page and to make all other links relative.

Links to external sites are always absolute, but don't make the external site come up in a new browser. Taking control over the users display by starting a new browser is both bad judgement and bad taste.

Don't use underlining of text: Underlined text looks like a link

If the color of the links reduces the redability of the text, consider using LINK=#000000 VLINK=#000000 in the <BODY>-tag.

Comments

Don't use comments. If you don't want people to see the text of the comment, they will use "View source" to do so. Except, of course, for the guy the comment was intended for.

If comments are necessary, put the comments to filename.html in the file filename.txt and then chmod go-rw filename.txt

Frames

Avoid frames. Frames defeats bookmarking and defeats printing and interact in strange ways with external links.

Images

When an image is an anchor, use BORDER=0 to avoid double frames around the image.

If an image is a clicable map, make sure that this is adverticed. If not, the user may not discover that the image is linked to more than one page. The user may even fail to discover that the image is clickable.

Color

Make text appear black on white. It may be necessary to include graphics and you probably want to control if the background colors in the page and in the graphics is different.

Background color may defeat printing.

Don't insult color blind people by using red and green as alternatives whithout additional visual clues.

Checking

Use weblint to check for syntactic problems.

Browse your pages using Netscape and MSIE and check that the pages look similar.

Browse your pages using a very narrow and a very wide window to check that for implicit assumptions on the size of the window.

Robots

If you don't want the material included in various search engines, ask the webmaster to apply robot exclusion. This can be done at the directory level.

Avoiding links to a file is not enough to prevent it from being found by robots. Some robots take a wholesale approach to searching.

If you have sensitive information, store it outside the WWW directory for the system and outside the WWW directories for users. UNIX will then severly restrict the access by the WWW-server, while local users can browse the files without accessing the server.

From time to time, use various search engines to search for your own files. Check that the files you want the engine to know about appear near the top of the result. Also check that the files you don't want the engine to know about do not appear in the result at all.


Author: Per Stoltze