Why HTML5 is more semantic

December 17, 2012

Thumbnail Anyone involved in Web design or development will have come across the term "semantic" with reference to HTML5 and the Web in general. This often problematic term is understandably confusing to many of us, particularly since there is a distinct lack of consensus on its definition in certain contexts.

In this article, we will explore what makes HTML5 more semantic than its predecessors, outlining what this means for Web development now and in the future.

Semantics is about meaning

The concept of semantics comes from the field of linguistics dedicated to the study of meaning. With natural languages such as English, we distinguish between syntax (or grammar) and meaning. If you think of a sentence, meaning has to do with how people interpret it:

"The man threw the computer through the window."

Semantics relates to the aspect of the sentence that allows people reading it to understand the message contained within it. Together with syntax, semantics is a big part of what facilitates communication via language. When we talk about semantics in relation to HTML, we are talking about communication between computer programs, not humans. Semantic HTML is essentially aimed at enhancing the extent to which applications can process, or interpret, Web content. For example, consider the following Web page excerpt containing some of the longer-standing HTML structures:

<p>The man threw the computer through the window.</p>
<img src="broken_window.png" alt="Broken Window" />

The elements (and attributes) give the browser information about how to present the content to the user. Paragraph elements will by default be displayed with whitespace above and below them, image elements are displayed using the image file included in the src attribute and so on. When the browser encounters each of these elements, it renders the content in a particular way that is ultimately determined by the tags used.

HTML structures already have meaning

It is important to understand that HTML5 does not introduce semantics to HTML for the first time. HTML already had a level of semantics built-in. The existing HTML structures are meaningful to varying degrees. If you look at this familiar HTML element as included in the above excerpt you'll see what I mean:

<img src="broken_window.png" alt="Broken Window" />

Although it's abbreviated, the name of the element img indicates something meaningful about the content of the tag, i.e. that it's an image. In this way, you can think of the semantic aspect of HTML as being similar to metadata, in that the element tag and attribute names describe the data (the data in a web page being the element and attribute content).

Remember when we started separating content from style?

Some of the structures we've used in HTML tell the browser how to style the content items in a page. As time has passed, we have been encouraged to separate the formatting of a page from its content.

For example, we replaced the i tag with em, which is more meaningful and does not tell the browser exactly how to display the text inside the element. The purpose of using em rather than i is to convey information about the nature of the content item, rather than information about styling it. The em of course affects the style, which is the main reason we use it, however it leaves the details of the style up to the browser and/or CSS code ideally separated from the page markup.

Semantic HTML5 is a bigger step in this process. The ultimate goal is to create a system in which applications have access to a greater level of meaning — this isn't AI though, it's just about including descriptive information about data items within the code structures modeling them.

Isn't this similar to XML?

If you have used XML in the past, you will have some familiarity with the concepts in semantic markup. For example, when you design an XML document (or schema) for a dataset, you choose elements and attributes to model items within the data. Ideally the element and attribute names define the data items in a meaningful way:

<news_story piece_id="12345">
 <journalist>Jim Smith</journalist>
 <posted>23 November 2012</posted>
</news_story>

The developer here has chosen names that intuitively describe the data values being modeled. With HTML5 you cannot choose your own elements, as it is not freely extensible. The structures chosen for it simply have more inherent meaning when compared to previous versions.

By the way, there are different types of meaning

We've talked about meaning, but in fact there are different ways in which an element or other code excerpt can be meaningful.

The img tag is meaningful in that it tells something about the element content, describing what it is.

Some of the new HTML5 elements such as header and footer are meaningful in that they indicate something about the role or purpose of the element within the overall structure of a page.

How does all of this relate to HTML5 code then?

So what does this enhanced meaningful aspect of HTML5 entail? Essentially HTML5 has some new elements with which you can include more semantic information in your page markup. There are a load of new elements, only a few of which we will look at here. The header tag indicates information about the content of the element and about its role within the page structure:

<header>
 <h1>Man in Window Outburst</h1>
</header>

The header element can contain other elements and tends to include at least one heading element. The footer tag is similar, with the tag again expressing something meaningful about the content of the element and its relation to the rest of the page:

<footer>
 <p>The information on this website is nothing but lies.</p>
</footer>

The nav tag describes the purpose of a page section, i.e. that it contains navigation links:

<nav>
 <ul>
 <li><a href="news.php">News</a></li>
 <li><a href="weather.php">Weather</a></li>
 <li><a href="entertainment.php">Entertainment</a></li>
 </ul>
</nav>

The section element typically holds a group of items on the same theme, often together with a header. The section element has a fairly abstract meaning, but it is meaningful nonetheless:

<section id="overview">
 <h2>What happened</h2>
 <p>Police officers apprehended the man at 3.30pm...</p>
 <img src="images/arrest.jpg" alt="The Arrest"/>
</section>

The article element is similar, used to define an item that is self-contained:

<article>
 <h2>The Law</h2>
 <p>The law on throwing items through windows is very clear...</p>
</article>

An aside tag indicates the role of an element relative to its context within the page, as in the following extended version of the article code above:

<article>
 <h2>The Law</h2>
 <p>The law on throwing items through windows is very clear...</p>
 <aside>In 1998 four people were arrested for throwing a server machine...</aside>
</article>

These are just a few of the new HTML5 elements offering semantic improvements, others include media and user input elements as well as additional attributes. The inclusion of micro-data in HTML5 also provides increased scope for including semantic information in Web pages and applications. As you can see, some of these new elements are meaningful in terms of both content and structure.

Think about some of the older tags (many of which are still around), such as div. The div element is simply a chunk of a page - the tag name tells us absolutely nothing about the content of the element or its role within the page. In other words, the tag conveys very little meaning. Lots of the long-standing tags convey either virtually no meaning at all, or in some cases generic, loosely defined meaning. Each item in a Web page was contained within one of a set of very general element categories. The key to making something meaningful is being specific. The new HTML5 tags allow us to define Web content using more specific terms.

Were you already adding meaning to your markup?

If you've been creating Web pages for a reasonable length of time, some of the new HTML5 elements may ring some bells for you. In reality, developers were already building a level of meaning into their pages using the element attributes, particularly class and ID. For example, if you've ever given an element a class or ID attribute of "footer" or "header" you are certainly not alone. With HTML5, this meaning is conveyed in the markup itself rather than in attribute values. If you used these attributes to implement particular styling properties, you were effectively doing something manually that is built into HTML5 out of the box — and with semantic elements there are additional benefits…

Why are we doing all of this?

OK, this is all very well but you'd be forgiven for asking why we're going to all of this trouble for something that seems essentially conceptual/academic. Well, you can rest assured that there are good reasons for moving in a more semantic direction. As we have seen, HTML5 semantics allows us to create markup code that describes content items. This descriptive aspect of the code allows other programs to make more effective use of the content, with various applications:

Searchability is set to be transformed by the advance of Web semantics. Semantic markup makes content/data more searchable. web pages are of course not just viewed in the web browser, they are also processed by other programs such as search engine robots. Since semantic markup is designed to let applications interpret web pages in more meaningful ways, this should ultimately improve the quality of search/query functions by a significant amount. In Tim Berners-Lee's often quoted "dream" for the Web, computers would be able to analyze all of the data online — that might be a long way off, but the semantic thrust of HTML5 is motivated by that type of long-term goal.
Accessibility is one of the key advantages to semantic markup. Accessibility tools can benefit hugely from more meaningful access to Web content. Such tools include browser add-ons for users with restricted vision, hearing, learning difficulties and so on. Semantic markup makes it more feasible for an application to process web content and the result to communicate the original message to the user in a way that suits their needs. This concept extends beyond accessibility and into the realms of device flexibility, through techniques such as responsive design. The result is a more inclusive approach to delivering web content.
Consistency should be a real benefactor of semantic HTML5. Semantic markup improves consistency, as content items are more logically assignable to particular element types. This is in contrast with the older models, in which items could often just as logically be contained in any of a range of different element types — choosing one wasn't an indicator of the nature of the content or its role within the page, it was just a reflection of the developer's choice. With semantic markup, the more specific level of meaning makes these choices less free but the results inherently more reliable when it comes to interpretation either by the browser or other applications.

Developers drive the progress of Web technologies

When I was at uni (a number of years ago) I remember a lecturer telling us that the field of academic research was going to be revolutionized by advancements in search. He was talking about the Semantic Web — needless to say it hasn't happened just yet. Taking any sort of focused new direction with something as diverse and erratic as the World Wide Web is always going to be a tough task. However, by getting on board with the idea of semantic markup at least, we as developers can act to influence the movement towards a future Web that is more accessible, searchable and consistent, for all users.

Do you use HTML5's semantic elements? Does focusing on semantics produce a higher-quality product? Let us know what you think in the comments.

Featured image/thumbnail, uses language image via Shutterstock.

Susan Smith

Sue Smith is a Web/ software developer and technical writer ‚Äì see BeNormal Development for details. Currently focusing on mobile application development for the Android platform and HTML5, Sue specialises in writing educational material on programming topics for Web publication. Follow Sue on Twitter @BrainDeadAir or email [email protected].