Intro to HTML: Here’s the Basics

Web pages are documents written in HTML, which stands for HyperText Markup Language. HyperText is simply “text that contains links to other text.” A markup Language is a system for annotating text. So when we’re writing HTML, we’re simply just “annotating text which happens to contain links.” Often you’ll hear the terms “document” and “web page” used interchangeably.

HTML annotations are made by using tags. The structure of a tag is explained in the figure below, using the anchor tag (for creating links) as an example.

Parts of an HTML tag explained.

You can see that to annotate a piece of text in a document, you enclose it within an opening and closing tag. Tags start and end with angle brackets (the less-than and greater-than signs). The opening tag can include atributes that take the form of name-value pairs. The name of an attribute is followed by an equals sign, then the value for that attribute which is enclosed in quotation marks. In the example above the href attribute is given the value of http://google.com. The closing tag </a> includes a forward slash just after the first angle bracket and just before the name of the tag that’s being closed. Always write the tag name in lower case.

A tag is a description of the text within it. In the example above the <a> says, “The words, ‘Google Homepage’ link to this other document which can be found at over here at http://google.com.” Other tags might say, “This is a paragraph”, “this is the document’s title”, or “Holy crap! Loads of data here, this is a table.” HTML defines many tags that you can use to “mark up” (or annotate) your document.

Together with its attributes and the text inside it, a tag is often referred to as an element and you’ll hear the terms “tag” and “element” used interchangeably.

Tags in HTML may be nested. In fact, your entire document should be enclosed within an <html> tag. Nested in there are two other elements, <head> and <body>. A very boring HTML document using these three tags would look like this.

<html>
  <head></head>
  <body></body>
</html>

The top-level (or outermost) <html> element says “Herein lies an HTML document.” Both the <head> and <body> tags will have other tags nested within them, each one with a specific purpose. Here are descriptions of the most common ones.

A Long Aside About HTML Validity

Browsers have been notoriously forgiving of malformed HTML. Originally this was because the browser makers were afraid that if they showed error messages when encountering broken HTML, users would blame the browser instead of the document’s author and use another browser instead. So the early browsers were built to guess what the author intended and show that. If there’s a lesson in there for you, it’s that you don’t have to be perfect. Your end users will likely be able to read what you wrote even if you screwed up.

BUT...

There is a high likelihood that the formatting you intended will be all kinds of screwed up, and if your page is sufficiently complex this might be so confusing to your end users that they might not do what you want them to. As a bonus, different browsers will bork things up in bafflingly different ways. It’s worth learning how to write HTML properly to avoid as much of this potential confusion as possible.

There are a couple things you can do here:

  1. Make sure your tags are properly closed and properly nested
  2. Make sure that your attribute values are properly quoted

First, make sure to close your tags. Most elements require a closing tag. Forgetting to close a tag (like the emphasis tag in the example below) might render the rest of your document in italics:

..code-block:: html

<!– Un-closed emphasis tag –> <p>There were a <em>lot of people at the party.</p>

An element’s closing tag should not come before the closing tags of any of the elements it contains (its children), and an parent element should not contain a closing tag if it also doesn’t contain the corresponding opening tag. The following are examples of improperly nested tags:

<!-- Improperly nested strong tag (it overlaps the list item tag) -->
<li>Some <strong> whisky</li></strong>

To properly quote an attribute, you just need to make sure that the value is surrounded by matching single- or double-quotes. It’s a problem if you leave a quote off either end, or if they don’t match:

<!-- Mismatched quotes -->
<a href="example.com'>example.com</a>

<!-- Missing end-quote -->
<p style='border-width: 1px>The arctic fox shows some incredible adaptation for living in the cold…</p>

There are tools to help you check whether your markup is valid or not. It’s beneficial to use these when working on a project. Validators (or linters) can help point out subtle errors while you’re learning. I use them to troubleshoot problems when something has gone wrong but I’m not sure what it was.

The <body> and (some of) its Children

There should only be one <body> element in a document. It encloses all the content your readers are going to see.

Paragraph: <p>

The most common HTML element is the paragraph:

<p>My dog has fleas. Like, a lot of fleas. Like, the thing's not allowed in the house anymore. Josh let him in the other day and he got up on mom and dad's bed, and they were itching for a week.</p>

Headings: <h1>, <h2>, <h3>, <h4>, <h5> and <h6>:

Headings are used to signal sections of a document. The numbers in the names of these tags denote their rank, with the <h1> tag being the highest rank and the <h6> being the lowest. By default the browsers render these tags larger and bolder than other tags, the <h1> tag largest, and each successive rank smaller than the one preceding it. The title, headings and sub-headings on this page are rendered with different heading tags.

<h1>I'm the firstest, most importantest section</h1>

Ordered List: <ol> and List Item: <li>:

Use these elements when you want to present a list and the ordering of the items matters. The <ol> tag defines the list as a whole. Individual list items are nested within the <ol>, and are contained in <li> tags. A list of instructional steps for operating a DVD player might be marked up like this:

<ol>
  <li>Remove DVD player from the box</li>
  <li>Plug the DVD player's electrical cable into an outlet</li>
  <li>Plug the HDMI cable into the appropriate ports on the television set and DVD players</li>
  <li>Press the DVD player's power button</li>
</ol>

which the browser would display as:

  1. Remove DVD player from the box
  2. Plug the DVD player’s electrical cable into an outlet
  3. Plug the HDMI cable into the appropriate ports on the television set and DVD players
  4. Press the DVD player’s power button

Note that the browser will automatically generate the item numbering. This is handy if you later decide you need to re-order the items or insert a new one somewhere in the middle of the list.

Unordered List: <ul> and List Item:

An unordered list works the same way as an ordered list does, only using the <ul> tag. It’s simply used for lists where the order doesn’t matter. List Items are still marked up with the <li> tag. A grocery list might be written like so:

<ul>
  <li>Cheese</li>
  <li>Tortilla Chips</li>
  <li>Salsa</li>
  <li>Guacamole</li>
</ul>

Browsers have many options for what to render at the beginning of each list item, but by default they use a simple bullet:

  • Cheese
  • Torilla Chips
  • Salsa
  • Guacamole

Emphasis: <em> and Strong: <strong>:

Emphasis in HTML can be noted by wrapping the <em> tag around the text you’d like to emphasize. Traditionally, browsers have rendered emphasized text as italic. Text within <strong> tags is generally output as bolded:

<p>My dog has fleas. Like, a <em>lot</em> of fleas. Like, the thing's not allowed in the house anymore. Josh let him in the other day and he got up on mom and dad's bed. They were itching for a <strong>week</strong>.</p>

Which would render like this:

My dog has fleas. Like, a lot of fleas. Like, the thing’s not allowed in the house anymore. Josh let him in the other day and he got up on mom and dad’s bed. They were itching for a week.

Image: <img>:

Images are included into your HTML document by providing the URL to them in the src attribute of the <img> tag:

<img src="http://example.com/images/dove.jpg" title="White dove flying above castle ramparts" height="50px" width="50px">

The src attribute is required, and includes a path or URL to a place where the browser can find the image. The title attribute is useful for sight-disabled people who use screen reading applications to browse the Internet. It’s a best-practice to include the dimensions as in the height and width attributes as well, this helps your page display more quickly.

Note

Paths and URLS

In HTML, links and references to other files can be either relative or absolute. The src attribute in the image tag above is an example of an absolute path; the browser has been given the entire address of the image to load. A convenient shorthand for absolute paths is to omit the beginning of the URL all the way up to (but not including) the first forward slash. A forward slash at the beginning implies that the browser should look for this image file on the same domain as the current document. So the example above could be shortened to:

<img src="/images/dove.jpg" title="White dove flying above castle ramparts" height="50px" width="50px">

as long as the document that includes this tag is also found on the example.com domain.

In the example above, the browser would attempt to find dove.jpg at http://example.com/images/dove.jpg. The single forward slash is a convenient shorthand for the domain name. It is possible to include files and other images from domains that you don’t control. Some administrators are OK with you doing that, but it’s very rude to do so without permission.

A relative URL does not include the forward slash (nor the domain name), eg:

<img src="images/dove.jpg">

This would instruct the browser to begin the looking in the same folder that the current document is in, i.e. start looking relative to the current document. For example, if the document that includes this tag is found at http://example.com/castles/buckingham.html, then the browser would look for dove.jpg inside /castles/images.

Named Character References (Entities):

Often you’ll want to include symbols in the content for your reader that have special meaning in HTML, like the greater-than or less-than symbols. Because these symbols denote tags in HTML, browsers do confusing things when you include them directly in the text of a document. In order to get around this, HTML uses the concept Named Character References. Often called entities, these references allow you to include characters in your content that the browser might otherwise interpret as part of a tag. The most common you’d need to know are:

  • &gt; which denotes the greater-than sign: >
  • &lt; which denotes the less-than sigh: <
  • &amp; which denotes the ampersand: &
  • &quot; which denotes the double-quote: “

So if you’d want to express a mathematical inequality for your readers, you’d do this:

<p>and I quote, &quot;5 &gt; 3&quot;</p>

Which outputs:

and I quote “5 > 3”

There are many more named character references, but the above four are crucial because of their special meaning in HTML. Typing the character in your content when you should have used the entity is likely to cause errors in the way your page display to users.

These are the most common and most basic elements you’ll use when marking up content for your site’s visitors. Then next chapter will cover some more elements and talk more in-depth about HTML validity.

The <head> and its Children

The <head> element should be the first element within an <html> tag of a document. The contents of this tag are not displayed to the user, but rather are a collection of data about the document itself. The elements you’ll use most often in the head of a document are:

Document Title: <title>

This tag defines the title of the document, e.g.:

<title>The elements of HTML</title>

The contents of the <title> element are usually displayed in the browser’s title bar or the browser tab the document is being viewed in. It has no attributes.

Meta: <meta>

The meta element is generic. It’s also a little different from the previous tags we’ve seen; it’s self-closing, i.e, it doesn’t have a closing tag. Instead of including the information between an opening and closing tag, all the information is contained within a pair of attributes: name and content. The name attribute acts like the name part of any other tag: it describes what the tag contains. The content attribute contains the actual information. The two most common values for the name attribute are:

description:
A description is a short, accurate summary of the contents of the document.
author:
The name of the document’s author

An example of two meta tags using these values for the name attribute, as well as correlated content:

<meta name="author" content="Trevor Hunsaker, from http://saturdayplace.com">

<meta name="description" content="A short summary of the common tags used in creating HTML documents">

About Indentation in the examples

The examples above include indentation and line breaks to make the code easier to read, but in HTML these are totally optional. If you like, you can write everything on one line. The code would be perfectly valid, and browsers wouldn’t have any trouble with it. But doing so would make the markup hard to understand when you want to come back to it and make a change. Readability and maintainability are key concepts when it comes to writing any kind of code, including HTML.

Exercise: Create an HTML document

Create an HTML document from scratch.

Hint

On Windows

Right-click anywhere you’d like to create the file, and select “New > Text File”. When prompted for a name, you’re free to call it anything you like (I’d recommend leaving spaces out of file names; I’ll explain why later). Make sure to change the extension from .txt to .html.

If your computer isn’t configured to show file extensions (the part after the dot in the file name), Microsoft has instructions on their site.

Hint

On OS X

TODO: Need to research this, as I haven’t got a Mac and haven’t used one since college. (If anyone reading this has actually gotten this far and want to type out some short instructions for me to include here, I’ll include your name as a contributor.)

Once you’ve got your document created open it in a text editor. Notepad will suffice on Windows, TextEdit on OS X. Do not use a word processor (like Microsoft Word™), these will insert steaming piles of garbage into your files that you won’t want to deal with later. Do not use something that will create Rich Text files, with formatting. We don’t want formatting of any kind here.

Pretend you’re writing a manual about how to accomplish something from your job, and you’d like to put in on the company’s internal documentation server for anyone in the company to read. Pick a task that you’d document and mark it up using the tags described above.

If you’re still a student, pick something you know how to do really well, and use HTML to create a document that would teach someone else.

Once you think you’ve completed it, open the document in a web browser (keep it open in your text editor too). The simplest way to do this is to press the control-O (command-O) keyboard shortcut in your browser, then select your file. Take a look at how it’s displayed in the browser. If the formatting doesn’t look how you expected it, go back to your text editor and see if you can find any tags you forgot to close, or any that were nested improperly. Make any necessary changes, and save the file. Then switch back to your browser and refresh the document (F5 is the browser shortcut for refresh). Switch back and forth between your text editor and your browser until you’ve gotten everything looking the way you expect.

Receive email whenever an update to the book is released

* indicates required