Chapter 5
Site Structure

I want to know what things are for, how they work, what they can or should be made of, before I even begin to think what they should look like.
—Jonathan Ive

Much as foundational concrete and piles define the stability and longevity of buildings, so the structural underpinnings of web sites affect their success in ways that, though not visible on the surface, are ultimately far more important than color and typography. Site structure determines how well sites work in the broader context of the web, and on all the various mobile and desktop screens we use today. The methods you use to mark up pages determine whether they can be read well by software and indexed well by search engines. The logic and stability of the underlying files and directories on which your web site rests affect its functionality, as well as its potential for growth and expansion.

The content management system you choose will affect your web design choices for years, but will also bring power and flexibility that you could never practically achieve with static HTML methods. Attention to these behind-the-scenes structural components from the start produces a web site that will hold up over time, work effectively within the larger web environment, and adapt and grow as needed.

Components of a web site

Here we describe the main technologies that make up the platform on which we structure web sites. We provide further details and best practice recommendations for specific site components in subsequent chapters.

Although most web sites are now built using a web content management system (CMS) that will insulate you from much of the code that makes up your site, it is still important to have a solid understanding of basic code and principles. It is possible to become a good musician without being able to read music, but the inability to read musical charts will be a lifelong impediment to becoming a great musician. So it is with web technologies: much of the foundational concepts and abilities of the web will remain a mystery to you if you don’t understand at least basic HTML.

Using hypertext markup language (HTML)

Creating web content used to be easier. All you had to do was check your HTML code against two or three major desktop web browsers and you were all set. Today web content is accessed using many kinds of desktop applications in addition to the major web browsers. Mobile computing devices of all kinds, including screen readers, many smaller “wearable” devices like fitness bands, Apple Watches, and other mobile devices, display web content. Many kinds of cameras, household appliances, televisions, medical devices, and other “smart” products now depend on web content and communications. Web content is also read by search engines and other computing systems that extract meaning and context from how the content is marked up in HTML. All of this makes it much more critical that you understand the core principles of what makes good HTML, and the most critical concept to understand is semantic markup.

The following section presents some of the core principles of HTML content markup, but presenting the full breadth of HTML markup is beyond the scope of this book, and we strongly urge you to consider one of the basic HTML books we recommend at the end of this chapter. The most current version of HTML at this writing is HTML5, and the examples we present here use HTML5 conventions.

Semantic markup

Proper use of HTML is the key to getting maximum flexibility and return on your investment in web content. From its earliest origins, HTML was designed to distinguish clearly between a document’s hierarchical outline structure (Headline 1, Headline 2, paragraph, list, and so on) and the visual presentation of the document (boldface, italics, font, type size, color, and so on). HTML markup is considered semantic when standard HTML tags are used to convey meaning and content structure, not simply to make text look a certain way in a browser.

This semantic approach to web markup is a central concept underlying efficient web coding, information architecture, universal usability, search engine visibility, and maximum display flexibility. Consider this simple piece of HTML coding:

<h1>This is the most important headline</h1>
<p>This is ordinary paragraph text within the body of the document, where certain words and phrases may be <em>emphasized</em> to mark them as <strong>particularly important</strong>.</p>
<h2>This is a headline of secondary importance to the headline above</h2>
<p>Any time you list related things, the items should be marked up in the form of a list:</p>
<ul>
<li>A list signals that a group of items are conceptually related to one another</li><li>Lists may be ordered (numbered or alphabetic) or unordered (bulleted items)</li>
<li>Lists may also be menus or lists of links for navigation</li>
<li>Cascading style sheets can make lists look many different ways</li>
</ul>

Even in the simple example above, a search engine would be able to distinguish the importance and priority of the headlines, discover which keywords were salient, and identify conceptually related items in list form. A cascading style sheet (CSS) designed to respond to various screen sizes could display the headlines and text in fonts appropriate for small mobile screens, and a screen reader would know where and how to pause or change voice tone to convey the content structure to a blind listener. All of this flows from the semantic structure embedded in the HTML: the ranking of headlines by importance, the emphasis of certain keywords, and a list markup that signals a group of related items.

Document structure

Properly structured HTML documents may contain the following elements:

In properly formed HTML, all web page code is contained within two basic elements:

  1. head (<head>…</head>)
  2. body (<body>…</body>)

In the past these basic divisions in the structure of page code were there primarily for good form: strictly correct but functionally optional and invisible to the user. In today’s much more complex and ambitious World Wide Web, in which intricate page code, many different display possibilities, elaborate style sheets, and interactive scripting are now the norm, it is crucial to structure the divisional elements properly.

The <head> area is where your web page declares its code standards and document type to the display device (web browser, mobile phone, tablet, and so on) and where the all-important page title resides. The page head area also can contain links to external style sheets and JavaScript code that may be shared by many pages in your site. Both JavaScript coding listings and CSS style code may be complex and lengthy these days, so often web designers and CMS programs keep long code listings in separate files that are linked to the HTML file. This shared code arrangement simplifies the code required in each HTML file, and (most important) allows a single CSS or JavaScript file to be shared across all the pages of your site.

The <body> area encompasses all page content and is important for CSS control of visual styles, programming, and semantic content markup. Areas within the body of the page are usually functionally segmented with division (<div>) or span (<span>) tags. For example, most web pages have header, footer, content, and navigation areas, all designated with named <div> tags that can be addressed and visually styled with CSS.

The HTML document type declares which version and standards the HTML document conforms to and is crucial in evaluating the quality and technical validity of the HTML markup and CSS. Your web development technical team should be able to tell you which version of HTML will be used for page coding (for example, XHTML or HTML5) and which document type declaration will be used in your web site. HTML5 is the current basic standard for web page markup. The older XHTML standard is similar to HTML5, but XHTML has more exacting markup requirements. Although XHTML is still common on the web, there are powerful advantages in using HTML5 as your standard for page markup, including:

Content structure

Semantic markup is a fancy term for commonsense HTML usage: if you write a headline, mark it with a headline tag (<h1>, <h2>). If you write basic paragraph text, place the text between paragraph tags (<p>, </p>). If you wish to emphasize an important phrase, mark it with strong emphasis (<strong>, </strong>). If you quote another writer, use the <blockquote> tag to signal that the text is a quotation. Never choose an HTML tag based on how it looks in a particular web browser. You can adjust the visual presentation of your content later with CSS to get the look you want for headlines, quotations, emphasized text, and other typography.

A few exclusively visual HTML tags such as <b> (boldface) and <i> (italics) persist in HTML because these visual styles are sometimes needed to support purely visual typographic conventions, such as italicizing a scientific name (for example, Homo sapiens). If you use semantically meaningless tags like <b> or <i>, ask yourself whether a properly styled emphasis (<em>) or strong emphasis (<strong>) tag would convey more meaning.

HTML also contains semantic elements that are not visible to the reader but can be enormously useful behind the scenes with a team of site developers. Elements such as classes, ids, divisions, spans, and meta tags can make it easier for team members to understand, use, visually style, and programmatically control page elements. Many style sheet and programming techniques require careful semantic naming of page elements in order to make your content more universally accessible and flexible.

Web page files don’t contain graphics or audiovisual material directly but use image or other pointer links to incorporate graphics and media into the final assembly of the web page in the browser. These links, and the alternate text (“alt” text) or long description (“longdesc”) links they contain, are critical for universal usability and search engine visibility. Web users don’t just search for text. Search engines use the alternate text descriptions to label images with keywords, and visually impaired users depend on alternate text to describe the content of images. Proper semantic markup will ensure that your audiovisual media are maximally available to everyone in your audience and to search engines.

Set careful markup and editorial standards based on semantic markup techniques and standard HTML document types, and adhere to those standards throughout the development process. Today’s web environment is a lot more than just Google Chrome or Firefox on a desktop computer—hundreds of kinds of mobile computing devices are now in use, and new ways of viewing and using web content are being invented every day. Following semantic web markup practices and using carefully validated page code and style sheets is your best strategy for ensuring that your web content will be broadly useful and visible into the future.

Using cascading style sheets (CSS)

Cascading style sheets allow web publishers to retain the enormous benefits of using semantic HTML to convey logical document structure and meaning while giving graphic designers complete control over the visual display details of each HTML element. CSS works just like the style sheets in a word-processing program such as Microsoft Word. In Word, you can structure your document with ranked headlines and other styles and then globally change the visual look of each instance of a headline just by changing its style. CSS works the same way, particularly if you use linked external style sheets that every page in your web site shares. For example, if all of your pages link to the same master CSS file, you could change the font, size, and color of every <h1> heading in your site just by changing the <h1> style in your master style sheet.

Many users of cascading style sheets know how to change the look of standard HTML components with CSS but don’t pay much attention to the powerful cascade features of CSS. CSS is an extendable system, in which a related set of CSS instructions spread across multiple CSS files can cascade from very general style and layout instructions shared by all of your pages to extremely specific styles that only a handful of pages in your site may share.

CSS cascade hierarchy

CSS has multiple hierarchical levels that cascade in importance and priority, from general CSS code shared by all pages, to code that is contained in a particular page file, to code that is embedded in specific HTML tags. General page code overrides shared site code, and CSS code embedded in HTML tags overrides general page code. This hierarchical cascade of CSS priorities allows you to set very general styles for your whole site while also permitting you to override the styles where needed with specific section or page styles.

Shared CSS across many pages

Multiple CSS files can work together across a site. This concept of multiple CSS files working together in a modular way is the heart of the cascade system of pages that all share code via links to master CSS files that control styles throughout the site. This system has obvious advantages: if all your pages share the same master CSS file, you can change the style of any component in the master CSS file, and every page of your site will show the new style. For example, if you tweak the typographic style of your <h1> headings in the master file, every <h1> heading throughout the site will change to reflect the new look.

In a complex site, page designers often link groups of CSS files to style a site. Packaging multiple CSS files can have many practical advantages. In a complex site CSS code can run to hundreds of lines, and it’s often more practical to subdivide such elements as the basic page layout CSS from the master site typography styles. It’s easy to link to CSS files and let the master CSS layout and typography styles control all the pages in your site.

The powerful advantage of the “cascade” in CSS is built into the themes used by CMS programs like WordPress and Drupal, but both programs also allow you to create custom variations on the theme styling by creating a master style sheet that tweaks the specific styling of the theme you are using. For example, you may be very happy with the overall look and feel of your WordPress theme, but might wish that all of the headlines used the Tahoma font instead of the theme’s built-in Arial font for headlines. Most WordPress themes allow you to add your own CSS to customize various aspects of the theme. For example, in the WordPress Dashboard, see the menus Appearance > Customize > CSS to access the custom CSS listing. To change all the headers in your site to Tahoma you could just add this line to the custom CSS listing:

h1, h2, h3, h4, h5, h6 { font-family: Tahoma, sans-serif; }

Thanks to the “cascade” in style sheets, you do not have to specify every aspect of every header styling (font size, weight, color, spacing, and so on) in your custom sheet, because through the style cascade your custom Tahoma headers inherit all those other header properties from the master theme style sheet. So your headers all change to the Tahoma font, but all other aspects of the theme header’s size and styling remain the same. The Drupal CMS has a similar system that allows you to tweak the CSS of a Drupal theme through a CSS “Injector” module for custom CSS code.

Media style sheets and responsive CSS styling

Another advantage of CSS is the ability to provide context-appropriate designs using media style sheets that are specific for display screens or printing on paper. With media style sheets, it’s possible to adapt a page layout specifically for printing on paper. Print style sheets often drop header and sidebar navigation elements and strip away the web page framing to emphasize page content. Print styling can also make the full URL of embedded links visible to the reader of the printed document, so that a reader who wants to follow a link has the URL as a reference.

Similarly, “responsive” CSS styling customizes the presentation of navigation and content based on the size of the user’s screen, using CSS3 media queries to determine the maximum or minimum width of the user’s display screen. In this simple example we use a media query statement (@media) to hide a left navigation sidebar on small mobile screens:

<style>
@media (max-width: 600px) {
.left_sidebar { display: none; }
}
</style>

See the next chapter, Page Structure, for more information on responsive web design techniques.


Native, web, or hybrid

With the rise of mobile device usage come interesting questions about how best to provide a good user experience for your audience. If people are going to access your services primarily on mobile devices, you have an important choice to make about how best to deliver your product. Currently there are three choices: native, web, or hybrid:

Regardless of approach, most enterprises need a mobile-friendly web site, if only to provide a web “storefront” where people can learn about and access native and hybrid apps that are available only in app stores. Read about how to use responsive design techniques to create mobile-first and mobile-friendly web sites in Chapter 6, Page Structure.


Interactive scripting

JavaScript is a language commonly used to create interactive behaviors on web pages. JavaScript is also a key technology in web page content delivery strategies such as ajax, and widely used code libraries like jquery. In most circumstances JavaScript code belongs in the “head” area of your web page, but if your code is complex and lengthy, your “real” page content will be pushed dozens of lines down below the code and may not be found by search engines. If you use page-level JavaScript scripting (also called client-side scripting), you should place all but the shortest bits of code in a linked JavaScript file. This way you can use lengthy, complex JavaScript without risking your search ranking.

Other document formats

The web supports document formats other than HTML. PDF (portable document format) is a document format widely used to provide functionality and paper-oriented formatting that is not available using basic HTML. PDF files are often favored for documents that originated in word-processing and page layout programs, in order to retain the appearance of the original document. In general, the best approach is to offer documents as plain HTML because the markup offers greater flexibility and is designed to enable universal usability. At times, however, the additional features and functionality offered by these other formats is essential; in this case, be sure to use the software’s accessibility features. Adobe in particular has made efforts to incorporate accessibility features into its web formats by supporting semantic markup, text equivalents, and keyboard accessibility. Major search engines like Bing and Google can “read” and index the content of PDF files, but many mobile devices don’t display PDF files well on small screens.

Building a solid structure

Well-designed sites contain modular elements that are used repeatedly across many dozens or hundreds of pages. These elements may include the global navigation header links and graphics for the page header or the contact information and mailing address of your enterprise. It makes no sense to include the text and HTML code that make up standard page components in each file. Instead, use a single file containing the standardized element that repeats across hundreds of pages: when you change that one file, every page in your site containing that component automatically updates. HTML, CSS, and current web servers offer the power and flexibility of reusable modular components, and most large, sophisticated sites are built using dozens of reusable components.

Browser variations

Web browsers have become much more consistent in following web standards for HTML and CSS, but typography, forms, positioning, and alignment sometimes work slightly differently in each brand or operating system version of web browser, and the proliferation of various mobile web browsers has added complexity. The subtle variations between browsers often pass unnoticed or make little difference to the function or aesthetics of sites, but in precise or complex web page layouts the browser differences can lead to nasty surprises. Never trust the implementation of HTML5, CSS, JavaScript, Java, or any browser plug-in architecture such as Adobe Flash until you have seen your web pages displayed and working reliably with each major desktop and mobile web browser and both major operating systems (Microsoft Windows, Apple Macintosh, Apple’s mobile iOS, and Google’s Android mobile OS).

Check your web logs or use a service such as Google Analytics to be sure that you understand what browser brands, browser versions, and operating systems (Mac, Windows, mobile) are most common in your particular readership. If you encounter a discrepancy in how your pages render in different browsers, you can confirm that you are using valid HTML and CSS code by using a code validation service such as those from the W3C (for HTML, validator.w3.org; for CSS, jigsaw.w3.org/CSS-validator). Not every browser supports every feature of CSS3 (the most current version at this writing), particularly if that feature is seldom used or has recently been added to the official standards for CSS3 code. For example, although drop-shadowed text is a valid CSS3 option, not every browser supports it.

File names

Web pages are a constellation of files delivered to and assembled by the browser into the coherent page we see on our screens. Attention to file and directory names is essential to keeping track of the myriad pages and supporting files that make up a web site.

Never use technical or numeric gibberish to name a component when a plain-language name will do. In the early days of personal computing, clumsy systems like ms-dos and old versions of Microsoft Windows imposed an “eight-dot-three” file name convention that forced users to make up cryptic codes for file and directory names (for example, “whtevr34.htm”). No word spaces and few nonalphanumeric characters were allowed in file names, so technologists often used characters like the underscore to add legibility to cryptic file names (for example, “cats_003.htm”).

Habits developed over decades can be hard to break, and looking into the file structure of another team’s web site can sometimes feel like cracking the German Enigma codes of World War II. Current file name conventions in Windows, Macintosh, and Linux systems are much more flexible, and there’s no reason to impose cryptic names on your team members, site users, and colleagues who may one day have to figure out how you constructed your site.

Most CMS programs like WordPress or Drupal will allow you to use “friendly” URL naming conventions that have two advantages: they are easier for people to make sense of, and they contribute to the relevance rankings in search engines like Bing and Google.

It’s pretty easy to figure out the page content of these “friendly” WordPress URLs:

There’s an old saying in programming that when you use plain-English labels and add explanatory comments to your code, the person you are probably doing the biggest favor for is yourself, three years from now. Three years from now, will you know what’s in a site directory called “x83_0002”?

Use plain-language names for all of your files and directories, separating the words with “breaking” hyphen characters. This system is easy to read and understand, and since conventional word spaces are not allowed, the hyphens “break” the file name into individual words or number strings that can be analyzed by search engines and will contribute to the search rankings and content relevance of your pages. We recommend this convention for directory names, too. And always try to mirror the visible structure of your site’s content organization in the directory and file structure you set up on the web server.

Content Management Systems

A web content management system (CMS) is server-based software that simplifies, structures, and manages the creation and delivery of web content, providing a graphic user interface that allows users to create web pages and other web information without having to learn HTML, CSS, or other kinds of web coding. Content management systems offer powerful advantages over old-style hand-coded static web pages. The key to understanding the advantages of a CMS is the separation of content from presentation code. This separation of content and form is much more flexible than static HTML web pages, where content is embedded in one fixed format of HTML markup and CSS page styling. In a CMS the content is drawn from a web database and can be presented in many different kinds of templates, in many different arrangements, and for many kinds of display devices, including desktop computers, laptops, tablets, and other mobile devices.

Users of a CMS typically do not need to know any more than rudimentary HTML or CSS code, and can use the CMS much the way they use a word-processing program. Editorial workflow, collaborations to create content, and permission to publish content are handled by the CMS, facilitating the work of creating and maintaining content. CMS-based content is also enormously more flexible than static web content, as elements of content are not in fixed web pages but can be assembled multiple ways, in various kinds of pages and formats, without having to duplicate content. The CMS can also handle site administration tasks, such as taking the whole site offline for maintenance.


Popular content management systems

Virtually all large business, e-commerce, and other enterprise sites are now created and delivered using some form of CMS. Popular open-source content management systems like WordPress, Drupal, and Joomla are the most widely used CMS programs for individuals, governments, universities, and small to medium-sized businesses. Large business, government, news, and e-commerce sites typically use more complex commercial CMS programs like OpenText CMS (formerly Red Dot), Ingeniux CMS, or Ektron CMS. Commercial CMS products typically can handle much larger volumes of content, and offer sophisticated systems to support e-commerce, large-volume credit card transactions, and other business and financial functions.

Here we’ll mostly use as examples two open-source products, WordPress and Drupal, for several reasons:

We strongly suggest that you spend a bit of time with both Drupal and WordPress in their hosted versions, which is the fastest way to get acquainted with the world of content management systems.


Establishing an editorial workflow

One of the central features of a content management system is to formalize editorial roles and create an organized workflow for content creation and publication. If you are not already working within an experienced editorial department, these defined roles, responsibilities, access, and publication privileges may seem a bit alien and excessively complex. In many small businesses the person who manages the web site is chief cook, bottle washer, and all-around web task manager rolled into one. But even if you are the sole person managing a web site, the workflow features of a content management system can still help you manage your site, by referring draft web pages to colleagues and content experts for review before publication, and through scheduling features that allow you to choose exactly when a particular page becomes “live” on the site.

For collaborative content creation groups a CMS offers powerful advantages to carefully structure and formalize editorial work and publication procedures. In business and government sites the ability to post content is often more complex than just meeting quality standards and stated business objectives—proposed content for the site may also need to pass formal reviews by the legal department, product managers, or senior managers before publication. Without workflow features these multiple cycles of approval can be a nightmare round-robin of emails, faxes with handwritten markup, and lots of telephone calls. Good workflow features can manage the process from initial content creation by a writer, through review by an editor, to review by other company executives, to final publication, with each step triggering emails or other CMS-based notifications to all participants along the way, so there are no nasty surprises when the new content goes online.

Writers, designers, photographers, and other media experts can create new content collaboratively with their peers, without deep technical knowledge of the web or HTML markup. Subject matter experts can review material and comment on the content, without being able to complicate the web site by accidentally publishing material that is not ready for public view. Editors can verify that the final versions of content are ready for public release, and immediately post the new content, or hold it for publication at a particular date and time in the future.

Roles and responsibilities

The ability to specify roles and levels of access to unpublished content is an important feature of content management systems. Some organizations have simple arrangements of writers, an editor, reviewers who check the accuracy of content, and a designated person—perhaps a senior editor or department manager—who has the formal authority to publish new content on the web site. Writers typically have the ability to add unpublished content to a CMS. The new content is available to other editorial team members but not visible on the public site until publication. Designated reviewers or content experts have the ability to see unpublished content, but may or may not have the ability to annotate or change the content. Editors can see and change the work of writers and reviewers, but may or may not have the authority to publish the new content to the live web site. Finally, publishers have the ability to see and change all unpublished content, and the ability to publish new content, remove older content, and administer other aspects of the site, such as taking the whole site offline for maintenance or major changes.

Workflow and notifications help avoid process “choke points,” where publication is held up because one of the team members or reviewers didn’t know he or she was supposed to do something at a particular time, or team members don’t know what the exact status of a piece of content is (edited, reviewed, ready to publish?). The CMS helps manage the traffic problem in workflows by sending notifications to authors and editors at each stage of the process, so that everyone can see the current status of unpublished content. Workflow processes can also help by reminding everyone on the team to add metadata to content, such as keywords for search engine optimization (SEO), or alternate text for images that both aids visually impaired readers and improves the SEO of content by accurately describing the content of images.

CMS as part of content strategy

Advanced content management systems like Drupal give you so many options to structure and display your content that careful strategic planning is required to both assess your existing and required new content and design an efficient system to produce and accurately structure and label the content. “Content strategy” differs from more conventional editorial processes in that it looks well beyond the production of appropriate text, graphics, and photography, considering how best to organize the content within the information blocks, views, and taxonomies of a CMS, then structuring the requirements for content production so that writers and editors understand the eventual contexts within which their writing will appear, and how to categorize and label (“tag”) the content for efficient entry into the CMS database.

Among the primary “deliverables” from a content strategy project are content production templates that not only describe the general intent for a new piece of content and its intended audiences and uses but also include instructions to the writers to generate the suggestions for the best categories, subcategories, and keywords that will be used within the CMS. See Chapter 1, Strategy, for an overview of content strategy.

Choosing a CMS

Choosing a web content management system is a consequential decision that should be made only after careful research on the features and advantages of various CMS products, as well as your own business goals and current and future needs. One immediate division in the CMS marketplace is price: open-source CMS products like WordPress, Drupal, and Joomla are free for downloading. Commercial proprietary content management products can cost thousands of dollars just for software licensing alone. Here are a few realities to consider in choosing a CMS.

“Free” systems like WordPress and Drupal may become deceptively costly if you try to build a complex site with them. Most major CMS installations require extensive customizations, server hardware and software configurations, and custom programming and template development. Even if you choose a hosted solution like WordPress.com or Drupal Gardens, you’ll still have a great deal of work to do in customizing your site, developing themes suitable for your needs, and setting up your content structures. If you have a small site, limited content, straightforward theme needs, and an experienced editorial crew, you can get a moderately sized WordPress site launched in a week or two, and a Drupal site almost as quickly. Since these tools are so widely used, there are numerous tutorials and how-to guides available.

But most business, government, and education web sites are more complex, and in a months-long process of bringing up a major new business site, the initial costs of the CMS software might be just a small line in the budget compared to the personnel time, hardware and services costs, custom code development, and content creation required for the site.

On the other hand, expensive commercial CMS products aren’t automatically better than open-source products. Many major business and consumer sites use open-source CMS software. Products like Drupal, WordPress, or Joomla have improved tremendously over the past decade and now rival all but a few top commercial CMS products in range of features and performance. The open-source technical community is far larger than the support groups and specialists of particular commercial CMS products, and it is much easier to find server expertise for Drupal, WordPress, PHP, and Linux-Apache-Mysql-PHP (LAMP) than for unique proprietary CMS systems running on more exotic server configurations. WordPress alone powers about 23 percent of the top ten million web sites. Even the largest commercial CMS vendors have tiny installed bases (less than 0.5 percent of the top ten million web sites) and user groups compared with open-source products, so if you go with a commercial CMS product, you’ll have to grow your own staff experts, or pay a significant premium for experienced people who already know your proprietary CMS.

Commercial CMS products may offer much deeper capabilities in workflow design, access management, e-commerce features, and integration with other corporate or enterprise systems, and they may include capable digital asset management systems (DAMs), which are critical for organizations that need to management huge collections of graphics or other media files.

WordPress

WordPress has by far the friendliest and most polished user interface of any of the major web content management systems. It is also the most widely used CMS on the web today, with more than forty-six million downloads of the software, powering about 46 percent of all web sites using a CMS. You can download and install WordPress on your own server or personal computer from WordPress.org, and many web hosting services like Rackspace and Media Temple offer “one-click” installations of WordPress on your hosted server. The simplest way to start with WordPress is to use it in the hosted version from WordPress.com, which manages all server issues and you just use the software. Simple WordPress.com sites are free, and more complex sites with custom domain names, more advanced themes, and other features are modestly priced.

Both WordPress and Drupal were initially created to support web logs (blogs), but WordPress has remained closer to its blogging roots, particularly in the simplicity with which it handles content creation and workflow. WordPress is justly renowned for ease of use, and many smaller sites or web teams just don’t require the complex content structuring capabilities or workflow features offered by Drupal. If your content needs are not complex, and you want to get up and running quickly with a site that easily mixes short pages of text and graphics, WordPress may be the perfect solution for you.

WordPress is extensible via plug-ins—add-on pieces of software that bring new features and capabilities to the core WordPress functionality—particularly for adding in more advanced CMS features. UltimateCMS and White Label CMS are two such plug-ins, but a word of caution: the core virtue of WordPress is simplicity. If you have tried WordPress and quickly hit its limits as a content organizing tool, you may be better off considering Drupal instead. Drupal is harder to get started with, but it will take you much farther if you have complex content needs.

Drupal

Drupal is a powerful web content management framework that can support a wide range of sites, from simple blogs up to major institutional sites with thousands of pages of content and complex information architecture needs. All this power and flexibility comes at the price of initial ease of use. Drupal has a reputation for being considerably less “friendly” than WordPress, with a much less polished user interface. However, the latest version of Drupal (version 7 at this writing) has made giant strides toward a less intimidating interface for new users, and the coming Drupal 8 is also focused heavily on making Drupal easier to use. About 7–8 percent of sites using a CMS use Drupal, but this probably understates Drupal’s core market of medium-sized institutional and commercial web sites. For example, Drupal is the dominant CMS in higher education, with about 27 percent of the market.

Drupal includes powerful tools for structuring content and creating taxonomies (controlled vocabularies for sorting and labeling content), and offers lots of flexibility in designing workflow roles and editing access privileges; moreover, Drupal has a huge and active user and developer community, far larger than that of any equivalent commercial CMS product. Modular structure makes Drupal attractive to experienced PHP developers, because the basic Drupal software core can be easily extended with code modules that add new functionality.

Organizing content and functionality

Every web content management system has a unique internal structure, and many of the complex commercial CMS products also have exacting server hardware and operating system requirements. However, all CMS software is structured in layers of functionality, built from the base server operating system and configuration up to the graphic and presentational layers that users actually see on your web pages. Here we have used the open-source product Drupal as an example of a moderately complex and capable CMS with a lot of built-in tools for organizing and structuring content, for organizing editorial workflow and access privileges, and for the visual display of your content and interactive functionality. While the exact details of CMS structure and organizational details vary from system to system, Drupal makes a great (and free) introduction to CMS concepts, even if you expect to later move on to a more complex commercial product for large enterprise needs.

Blocks

Blocks are areas of content or interactive functionality within page layout regions. Think of blocks as predesigned modular units or “building blocks” that can be placed into a page-layout template to add predefined bits of functionality to the page. For example, a user login area could be a block, as could a user poll, a search entry form, or a particular kind of navigation link layout. In Drupal blocks are often the visual interfaces of add-on modules that extend the basic capabilities of the CMS. Blocks can be configured to appear a number of different ways, and a CMS administrator can decide where the block should appear within a page region (header, left sidebar, footer).

Views

Views allow you to arrange specific kinds of text and visual content in a number of different ways. For example, one commonly seen “view” in Drupal business sites is a directory of people in a department. The directory draws specific bits of content from the CMS database to assemble a brief profile of each person, which might include a link to a photo, the person’s name and title, contact information, email address, and so on. You establish a “view” that repeats this basic setup for everyone within your department, producing a department directory on a page. A related profile view might appear when you click on a particular person’s name, linking to a view that shows a larger version of the same portrait photo, the person’s basic contact information, and also perhaps additional information on his roles and current projects.

Each of the two views (directory, profile) draws information from the same database sources but displays the information in a different way: a compact listing for a directory, a much larger layout for a staff profile. In this way views allow you to reuse content flexibly. For example, a listing of company services view might include the photos of the people to contact about those services. The contact photos and information in the services view would be drawn from the same database listings as the company directory and staff profiles—three very different page views drawing upon the same sources of information.

Taxonomy

CMS taxonomy systems allow you to precisely structure and control information used to label your content—metainformation such as general and specific categories the content might fit into, and keywords and other descriptors about the content. The English language is enormous, often rich enough to allow several people to accurately describe a piece of content without using the same words. Without a shared and controlled vocabulary of terms, it is impossible to organize complex content. “Heart attack” and “myocardial infarction” both describe the same medical event, but unless there is general agreement about how to label heart attacks, you’ll end up with multiple categories with redundant information—or even worse, with mislabeled information that is essentially lost to site users even though it is in the database.

Taxonomies allow you to create controlled lists of vocabulary terms for how your content will be labeled and categorized, and also to add additional keywords that provide detail beyond the controlled vocabulary terms. The categories and subcategories you create with your taxonomy system often then become the basis for your site navigation.

A successful taxonomy is flexible, both describing the current site content and anticipating future content. As you add new content to the site, revisit your taxonomies to ensure that they include all needed categories and tags.

Creating themes and templates

Themes control most of the visual aspects of your CMS-based web site, including the overall page layout, the typography, the color scheme, graphics, and other visual details of your pages and page headers, and positioning of content organizational elements such as regions, blocks, and views. Most WordPress or Drupal themes come with several different “page types” or “page templates”—the terminology varies from one CMS to another but the basic concepts are the same. Page types like the home page, basic content web pages, blog-style posting pages, and image galleries are common page varieties within themes.

Theme regions divide the page into familiar layout conventions like headers, footers, and columns (see fig. 5.10a). More complex themes may provide many more subdivisions of the page regions (see fig. 5.10b and c). While you will rarely use all of the regions and subregions of a complex theme on a single page, this network of potential regions gives you enormous flexibility in where you can place content, blocks, and views on the page, all without having to know or write HTML or CSS code.

Many of the basic types of pages on your site are created semiautomatically by the theme you choose. As you enter content, you determine what basic kind of initial presentation you want (conventional web page, blog post, and so on), and the template lays out the content for you within your chosen theme. You can easily change the formatting of content later if you choose to, or even change the whole theme of your site later if you find another theme that suits you better.

More sophisticated CMS programs like Drupal allow you to go much farther than basic page types. Through the use of content regions, blocks, and views you can develop a wide variety of page layouts. This flexibility requires you to know your CMS program pretty well, but in systems like WordPress or Drupal you can usually achieve great page layout flexibility without advanced HTML/CSS or programming skills.

Custom themes

Most themes for WordPress or Drupal allow you to add CSS code that supersedes the built-in theme CSS to customize various visual or typographic aspects of the theme. For example, with relatively modest CSS code additions you could change the background graphic of a theme, modify the colors of specific elements of the theme, or even change the typography of all of the headers in a theme.

If you are more ambitious or have exacting requirements for a site layout theme, there are several plastic and modifiable Drupal themes that have been created specifically for flexible customization without your having to create a custom theme entirely from scratch. For example, Drupal’s “Zen” theme is a highly customizable theme framework designed to provide a rich visual and layout toolkit for more advanced Drupal users who are experienced with HTML and CSS.

Before tackling custom CMS theme development—or more likely, hiring a theme developer to create one for you—you should thoroughly investigate the secondary marketplace for Drupal, WordPress, Joomla, and commercial CMS themes. There are literally hundreds of sophisticated themes you can purchase for relatively modest costs, and many dozens more open-source themes that you can start with for free and then customize. When investigating themes, don’t get too distracted by the superficial aspects of colors, graphics, and typography. These are certainly important, but particularly in a professionally developed commercial theme you should also be looking in detail at the theme’s region layouts, prebuilt options for menu building, page types, and options to further customize the theme with your own custom CSS, blocks, and views. The theme should be adaptable to your particular content taxonomy, graphic and multimedia needs, and any e-commerce functions you plan for your site. Carefully review the support documentation that should accompany any professionally developed theme, and if possible, ask the developer for examples of how the theme has been successfully applied in existing sites.

Search Engine Optimization

When the web first rose to popularity in the 1990s, people spoke of browsing or surfing the web, and users located interesting sites primarily by finding and clicking on links listed on major web directory sites like Yahoo! and Netscape. As the size of the web has exploded over the past decade (Google now indexes well over thirty trillion web pages), browsing through sites by following web links has become an increasingly inefficient means to initiate a search for new or specific information. You may still browse the home page of the New York Times or a personal portal page like MyYahoo!, but if you need anything more specific, you will probably go straight to a search engine such as Bing or Google.

The way your pages appear to the automated software that search engines use to “crawl” links between web pages and create search indexes has become the most important factor in whether users will find the information you publish on the web. SEO isn’t difficult and will make your site better structured and more accessible. If your site uses proper HTML structural markup and all of your pages are linked together well, you’ve already done at least 80 percent of the work to make your site as visible as possible to search engines.

Search optimization techniques are not the magic sauce that will automatically bring your site to the top of Google’s page rankings, however. Nor is SEO a cure-all for an ineffective site—it can increase the traffic volume to your site and make things easier to find, but it can’t improve the quality of your site content. SEO techniques ensure that your site is well formed and lessen the possibility that you have inadvertently hidden important information while constructing your site. Over the long run, though, only good content that is popular with readers and has many reference links from other highly ranked web sites will get you to the first page of Google or Bing’s search results.

A note on language: In any discussion on SEO you’ll hear a lot about “keywords,” the words that users type into search engines to find relevant web sites. Keywords could literally be single words (for example, “Honda”), but more often they are actually multiword key phrases like “2015 Honda Accord.” For the sake of brevity we’ll refer to both keywords and key phrases as “keywords.”

Understanding search

Most patterns of web site use follow what are widely known as long-tail distributions. That is, a few items are overwhelmingly popular, and everything else gets relatively little attention. If you rank the popularity of every web page in your site, you will typically see a long-tailed curve, in which the home page and a few other popular pages get lots of views, and most other pages get much less traffic. This long-tailed distribution pattern in popularity is true for products in stores, books for sale at Amazon, songs to download on iTunes, or Blu-ray discs for sale at Walmart.

Although Wired magazine’s Chris Anderson popularized the concept of the “long-tail” distribution for many things on the Internet, interface expert Jakob Nielsen first used Zipf curves (the formal mathematical term for long-tail phenomena) to describe the distribution patterns seen in web site usage. Long-tail usage patterns are fundamental to explaining why web search has become the most popular tool for finding information on the web, whether you are making a general Internet search or merely searching your company’s internal web site. Once users get past the home page and major subdivisions of a large site, they are unlikely to browse their way through all the links that may be required to find a specific page, even if every link is well organized, intuitively labeled, and working properly.

Search engine components

Links and individual web pages are the primary elements of web search. Search engines find web pages by following web links from one page to another. Search engine companies use an automated process to find and follow web links and to analyze and index web page content for topical relevance. These automated search programs are collectively called web crawlers. This emphasis on links and pages is crucial to understanding how web search works: crawlers can find your pages only if links exist for them to follow, and search engines do not rank web sites—they rank the popularity and content relevance of individual web pages. Since the home page of a site is almost always the most popular page on a site, the home page is usually the first page listed on a search engine result page (SERP). Each page of your site needs to be optimized for search and well linked to other pages, because from a search engine’s point of view, each web page stands alone.

Search engine crawlers

Search engine crawlers can only analyze the text, the web links, and some of the HTML markup code of your web page and then make inferences about the nature, quality, and topical relevance of your pages based on statistical analysis of the words on each page.

The following are not visible to most search engines:

The following may cause search crawlers to bypass a web page:

In addition to making your pages less searchable, these poor practices make your site less accessible, particularly to people who use screen reader software to access web content. SEO, valid HTML markup of content, and universal usability make a wonderful confluence of worthy objectives: by using the best web practices for content markup and organizing your content and links with care, your site will be both more visible to search and more accessible to all users. Commercial SEO products like Moz Pro Tools (moz.com/tools) can help you do detailed analysis of the state of your current site SEO, and can make detailed suggestions for improving your search rankings. A subscription to Moz Tools is not inexpensive, but you might consider using the service for a few months to give you good data while you create new content or overhaul your existing site for SEO. A thirty-day trial subscription to Moz Tools is free.

Search engine rankings

So what exactly are the rules for good search rankings? We can’t tell you, and the search engine companies won’t give you any exact formulas for high search rankings either. If search engines like Google and Yahoo! told everyone how they rank pages and how they detect and ban search-scamming techniques, unscrupulous web publishers would instantly start to game the system, and soon we’d all be back to the pre-Google 1990s, when general web search had become almost useless. What we can say is what the search engines themselves gladly tell web content developers: create compelling page content with proper structural markup and good linkages to other pages and sites. Don’t hide your content with poor page development techniques, and your pages will rank well in any search engine.

Current search engines use a combination of two information sources to rank the relevance of a web page to any given search term:

When web search services became popular in the 1990s, early search engines used internal content factors almost exclusively to rate the relevance and ranking of web pages. Search rankings were thus childishly easy to manipulate. By inserting dozens of hidden keywords on a page, for example, an aggressive web page author could make a page seem richer in popular topic relevance than other web pages (“sex, sex, sex, sex”). By the late 1990s even the largest search engines were considered only marginally useful in locating the best sources of information on a given topic, and the top-ranked sites were often those that used the most effective manipulation techniques to bias the early web search engines.

The innovation that transformed web search in the late 1990s was Google’s heavy use of external page factors to weigh pages’ relevance and usefulness. Google’s algorithms balance external ranking factors with statistical analysis of the page text to determine relevance and search ranking. Google’s fundamental idea is similar to peer citations in academic publications. Every year thousands of science papers are published, so how can you tell which articles are the best? You look for those that are most frequently cited (“linked”) by other published papers. Important science papers get cited a lot. Useful web sites get linked to a lot, and every link is a popularity vote that increases the results ranking of a site.

Using keywords and key phrases

People find your content by entering keywords or key phrases into search engines like Google or Bing. Keyword targeting is a key concept in both SEO and online search advertising, which are basically two sides of the same coin. The searcher wants to find your site, and you want to be found, so you might “buy” keywords in Google’s Adwords auction to get more prominent listings on SERPs. In either case, you need a keen understanding of the words or phrases that best describe your site, its content, and any products you have for sale.

You start the process of optimizing your site for search by generating lists of keywords or key phrases that best describe your content or products. The Google Adwords Keyword Planner (see adwords.google.com/KeywordPlanner) is ideal for conducting research on keywords and phrases in your industry. Although the Planner is designed to help businesses that buy access to search terms in Google’s Adwords keyword auctions, anyone can set up an account and use the Keyword Planner for free to analyze the results of search terms relevant to a particular site, business, or general industry, in a specific geographic region, city, or town. Most important, the Planner can help you identify the most common search terms that will bring a potential customer to your web site. With this information you can optimize your page content for those target keywords. As you evaluate your search rankings and keywords over time, it might be a good idea to occasionally buy keywords in Google’s Adwords auction that you want to investigate, as this will give you even more detailed data on whether your primary keywords are actually driving sales or visits to your site.

As you evaluate possible keywords to emphasize on your pages, remember that you are not your audience. Avoid “insider” jargon and professional language for sites aimed at the general public. For example, in medicine it’s common to refer to people with M.D.s as “physicians.” In the real world people call M.D.s “doctors.” If possible do some focus groups or user research with representative members of your audience to see how they might search for the information on your site. Also, once you have your keyword list for each page, look for logical synonyms or alternate language that means the same thing as your primary keywords. Having alternate language can help you guess other relevant words or phrases readers may search for, and will also help you avoid repeating the same words or phrases too many times in page content or headings.

Like the popularity of pages within a web site, keywords for a given topic typically follow a long-tail distribution pattern. A few words are searched thousands of times, but most terms appear much less frequently. Long-tail keywords tend to be much more specific and relevant. For example, if you start your web research to buy a new car with the term “auto sales” you are likely to get too general a listing on the SERP. The phrase “auto sales” is popular and widely used, but not terribly useful if you already know that you are interested in Honda models. It’s much better to narrow your search term to something much more specific, like “new Honda Accord Connecticut.” If you sell Honda autos, there is little point in optimizing your site content for nebulous but frequently used keywords like “auto sales.” Even though the terms are used less frequently, you are much better off optimizing your site content for more specific keywords and phrases like particular models of Honda cars, listed on pages that prominently feature your address and other contact information.

General considerations

Once you have your lists of target keywords or phrases, you need to look at each page of your existing site and see how each of these elements either supports your keyword strategy or needs to be rewritten to include the most relevant keywords for the content on the page:

The language analysis software used by Google and Bing is sophisticated, using detailed statistical analysis to look at the normal patterns in language as seen in millions of example pages, and to assess how pages that are popular (lots of traffic, many clicks from search results pages) look when examined statistically. If your page falls significantly outside these normal statistical parameters for language related to your keywords, your content may be de-ranked for not being relevant to a user’s search keywords. If your content contains an unusual pattern of repeated keywords, your page may be de-ranked for trying to game the search engine into a higher relevance ranking. The general rule for major keyword repetition is five to seven times in the equivalent of about a letter-sized page of text (roughly one thousand words). Repeating the same keyword or phrase more often than that could harm your search ranking, as the repeats produce abnormal language statistics, which suggest that you are trying to game the system by artificially loading the page with keywords (a practice called “keyword spamming”).


Google and mobile-friendly sites

Google started favoring mobile-friendly sites in its search rankings in April 2015, with the objective of making it easier for users to find web sites that are optimized for mobile devices. The algorithm also surveys content from mobile apps installed on the user’s device, in an effort to provide useful search results that are relevant to and actionable by Google search users. Google’s Guide to Mobile Friendly Web sites (developers.google.com/webmasters/mobile-sites/) and Introduction to App Indexing (developers.google.com/app-indexing/) resources provide tools and best practices to help web site and mobile app developers ensure that their sites and apps are indexed appropriately and appear among the search results when users are looking for related content.


Links

Selecting text on your web page and linking that text to other relevant web pages is a powerful semantic statement: it establishes that the linked text is important and highly relevant to the content on the page, and thus associates the page with other relevant content. On a typical web page there are two basic kinds of links:

Web crawlers use navigational links as a means to evaluate the general structure of your site and major content topics. When a crawler hits a link, it looks at the linked text for keyword relevance, but it also evaluates the page the link points to for relevance. Ideally both pages should share similar keywords, as this reinforces the relevance ranking, especially if you pointed the link to a page that is already ranked high by Google or Bing for similar keywords. Links like this are essentially a signal to the search crawler that your content is well matched to the keywords in your page title, headers, and other page elements. This also suggests that you should avoid linking to pages outside your site that are only parenthetically related to the content of your page. Seemingly gratuitous links to pages that seem unrelated will frustrate both your human readers and web search crawlers.

The fact that you have selected and linked a word or phrase signals to the search crawler that the linked text is important. This is why it is critical to avoid using uninformative phrases like “click here” or “link” in your page links. “Click here” does nothing to inform your readers or the search crawler about either the on-page content or the content you are pointing to. Always use descriptive keywords and phrases in your link text, and use the link title attribute to further inform the reader about where the link will bring her.

Schemas

Web search engines are pretty good at finding and interpreting information like addresses, but there are ways to add extra markup information to your web pages that make it easier for search engines to interpret your key business information. “Schemas” (see schema.org) are established sets of extra markup you can add to your page’s HTML markup that label some of your page content in carefully structured arrangements that search engines can find and read. Schema information won’t be visible to visitors to your site, but can be read by search engines and other kinds of Internet directories. Schemas can go well beyond your basic “nap” information (name, address, phone), and there are existing schemas for places, restaurants, menus, local businesses, events, schools and other organizations, and many others.

Local search

You have probably noticed that SERPs are much more informative now about local businesses. Now you can enter “Whole Foods hours” and get not just the hours but the locations of the stores nearest to you. For “brick and mortar” local businesses, search engine optimization boils down to four major factors: relevance, distance, reputation, and mobile search.

Code optimization

Beyond your content and keywords, there are a few technical issues to look at on both your web pages and your web server to be sure that your site is regularly crawled by search engines, and that technical problems with your code or your server hardware don’t harm your search result rankings.

Search engines penalize sites with lots of poorly formed HTML code, broken links, and haphazard patterns of linking within the site. Broken links are especially important to fix, because web crawlers find pages only by following links. Be sure that your links work, and use the Google and Bing webmaster tools to see whether the search engine crawlers have flagged your site for broken links or poor HTML code.

The webmaster tool sets will help you evaluate how your whole site is crawled. If you have specific pages that need to be fixed or improved, you can also use the W3C’s HTML code validator to be sure you have found and fixed your code issues:

Server optimization

Web sites that are slow to respond frustrate both human readers and web search crawlers. Google’s crawlers now index more than 30 trillion web pages and produce about 100 billion crawler sessions a month. With all that web hopping to do, neither search crawlers nor human readers are likely to linger over your slow-loading pages. If your pages seem to be loading very slowly, use a tool like Google’s PageSpeed Insights (developers.google.com/speed/pagespeed/insights) to check load speed. Slow page loading can be a particular problem on modest servers that support a CMS like Drupal or WordPress (or most other kinds of CMSs) to deliver content. Most CMS programs create pages by dynamically assembling the page each time a request for the page is made from a web server. All that assembly takes time and server cycles, and if your server is not up to the task, your pages will be slow to load.

There are potential fixes for slow-loading pages that do not involve getting a new server. Many CMS-based sites use a server cache to help reduce the load on the CMS. Each time a page is requested from the server, the server keeps a copy of the assembled page in a memory cache, and if the page is requested again, the server sends the cached version, reducing the load on the CMS itself. This caching of popular pages can greatly increase the apparent loading speed of high-traffic pages like the home page. Most caching software has a time limit for sending cached pages—typically several minutes, sometimes longer. After the time limit expires, the server retrieves a new version of the requested page and places the updated version in the cache. This way pages that actively change will never be more than a few minutes out of date. Check with your webmaster or web server administrator to see whether you are using a caching scheme with your CMS.

Overall server reliability also affects search engine rankings. If a crawler has previously indexed your site and revisits only to find the server down and the site dark, this will decrease your search rankings because it makes your site look unreliable. A professionally managed web server should be operating well over 99.5 percent of the time. While this may sound almost perfect, a 99.5 percent availability rate means that your server could be down almost forty-four hours a year—a major problem for e-commerce or other high-traffic sites. Get statistics from your it department or web hosting service on down time rates over the past year.

Submitting a site for indexing

For new sites by far the best way to get your site listed in the major search engines is to request links from other existing sites that point to your new site, via news releases, by contacting local business directory sites, or simply by asking other related but noncompetitive organizations if they might list your new site in a brief news piece, or along with other “Resources” or “Related sites.” The largest search engines offer pages that allow you to submit the URL for a new web site, but there is no guarantee that the search crawlers will find your site immediately. It could take several weeks or more for web search crawlers to visit your new site and index it for the first time, but normally the process just takes a day or two.

Site maps

In the context of search optimization, the term “site map” has several meanings, depending on its context:

Creating an XML site map requires some technical steps, but if you’ve had a bit of experience with HTML markup, the process is straightforward and the instructions at sitemaps.org are thorough. An XML site map is just a carefully structured plain-text file that you can submit to Google or Bing using either of their webmaster tools sites.

Recommended Reading

Figures from Chapter 5: Site Structure on Flickr