Writing HTML is exciting, often frustrating and usually very time-consuming. The following are some of the rules I have learned the hard way.
Content match onto files and directories, subsections match onto subdirectories, subsubsections match onto subsubdirectories. Full stop. Don't fragment the text and don't use deeply nested directories.
Put the most important page in each dirctory in a file with a fixed name, e.g main.html. Links from one directory to another points to main.html, not to the less important files. If the links turns into spaghetti, the content does not properly match the files.
Put all images used as icons or decoration into a single directory. All instances of an icon in the HTML code points to the same physical file in the icon-directory. This generates a uniform style and makes it possible to change the style with a minimum of effort.
Use short filenames, max 8 characters. The filenames should be speakable. Before you know it, people call to ask you to spell dubble you dubble you dubble you dot ..
All directory names should be unique. While standardized filenames helped getting the links point to the proper file, the use of unique directory names allows file not found bugs to be fixed.
Filenames and usage should be parallel, somefile.gif, somefile.txt and somefile.dat are related to somefile.html, not to otherfile.html.
Before releasing the HTML, make sure that your filenames are stable. The users will create links to just about any file and complain if you ever change the filename or move the file.
For security reasons, never create HTML-links or, even worse, UNIX soft or hard links that points from your WWW directory tree to files or directories outside this tree.
<HTML>
<HEAD>
<TITLE>The title </TITLE>
</HEAD>
<BODY BGCOLOR=#FFFFFF TEXT=#000000>
</BODY>
</HTML>
Don't be creative outside the body. Some browsers treat text outside the body in surprising ways.
If at all possible, use straight HTML. Don't even think about creating fancy fancy lettering by representing each character or each word as an image. Not only will the page load extremely slowly, but different browsers may position the images slightly differently and the result may be unreadable.
Before you consider using an HTML editor, make the following
test:
Use the HTML editor to create a reasonably complex
page. Exit the HTML editor. Use a conventional text editor to create
a few syntax errors in the file, e.g delete the <UL> from a
<UL> <LI></UL> construct.
Exit the text editor and enter the HTML editor.
Some HTML editors will very agressively edit the file until it
appears syntactically correct. The result may have no resemblence
to the original file.
To include equations in HTML write the math as LaTeX:
\documentstyle[12pt]{article} \pagestyle{empty} \begin{document} \large \begin{displaymath} \sf E=mc^2 \end{displaymath} \end{document}store in a file, here einstein.tex, convert to gif
latex einstein dvips einstein gs -q -dNOPAUSE -sDEVICE=ppmraw -sOutputFile=- einstein.ps | \ pnmcrop | \ ppmtogif > einstein.gifand include as an image: <IMG SRC="einstein.gif" ALT="E=mc^2">
If you want to include the same HTML-elements in a few different files, use an editor.
If you want to include the same HTML-elements in many different files, write a perl-script which will generate the files.
If you want to include the same page in different places in your project, use a soft-link.
Use indentation to format the source for lists and tables. Finding and fixing an error is much easier if the source has at least some resemblence to the output.
<H1>, <H2> ... <H6> <OL>, <UL> <P> and <HR> will end the current paragraph and should be on a line by themself in the source.
When you create links to files in the current directory, create the
simplest possible form
<A HREF="somefile.html">dyt dyt</A>
Don't forget the quotes around the filename. Some browsers react
in surprising ways if one or both quotes are missing.
When linking to a file in a different directory
relative links e.g.
<A HREF="../..somefile.html">dyt dyt</A>
will cause problems when the user access the file through a
bookmark or a search engine.
However, absolute links, e.g.
<A HREF="http://www.somewhere.com/subdir/somefile.html">dyt dyt</A>
cause performance problems.
As a compromise make one absolute link from each page back to the main page and to make all other links relative.
Links to external sites are always absolute, but don't make the external site come up in a new browser. Taking control over the users display by starting a new browser is both bad judgement and bad taste.
Don't use underlining of text: Underlined text looks like a link
If the color of the links reduces the redability of the text, consider using LINK=#000000 VLINK=#000000 in the <BODY>-tag.
If comments are necessary, put the comments to filename.html in the file filename.txt and then chmod go-rw filename.txt
If an image is a clicable map, make sure that this is adverticed. If not, the user may not discover that the image is linked to more than one page. The user may even fail to discover that the image is clickable.
Background color may defeat printing.
Don't insult color blind people by using red and green as alternatives whithout additional visual clues.
Browse your pages using Netscape and MSIE and check that the pages look similar.
Browse your pages using a very narrow and a very wide window to check that for implicit assumptions on the size of the window.
Avoiding links to a file is not enough to prevent it from being found by robots. Some robots take a wholesale approach to searching.
If you have sensitive information, store it outside the WWW directory for the system and outside the WWW directories for users. UNIX will then severly restrict the access by the WWW-server, while local users can browse the files without accessing the server.
From time to time, use various search engines to search for your own files. Check that the files you want the engine to know about appear near the top of the result. Also check that the files you don't want the engine to know about do not appear in the result at all.