When the web first rose to popularity in the 1990s, people spoke of browsing or surfing the web, and users located interesting sites primarily by finding and clicking on links listed on major web directory sites like Yahoo! and Netscape. As the size of the web has exploded over the past decade (Google now indexes well over twenty billion web pages), browsing through sites by following web links has become an increasingly inefficient means to initiate a search for new or very specific information. You may still browse the home page of the New York Times or a personal portal page like iGoogle or MyYahoo!, but if you need anything more specific, you will probably go straight to a search engine such as Google.
The way your pages appear to the automated “spider” software that search engines use to “crawl” links between web pages and create search indexes has become the most important factor in whether users will find the information you publish on the web. Search engine optimization isn’t difficult and will make your site better structured and more accessible. If your site uses proper html structural markup, you’ve already done 80–90 percent of the work to make your site as visible as possible to search engines.
Search optimization techniques are not the magic sauce that will automatically bring your site to the top of Google’s page rankings, however. seo isn’t a cure-all for an ineffective site—it can increase the traffic volume to your site and make things easier to find, but it can’t improve the quality of your site content. seo techniques ensure that your site is well formed and lessen the possibility that you have inadvertently hidden important information while constructing your site. Over the long run, though, only good content and many reference links from other highly ranked web sites will get you to the first page of Google search results and keep you there.
Most patterns of web site use follow what is widely known as long-tailed distribution. That is, a few items are overwhelmingly popular, and everything else gets relatively little attention. If you ranked the popularity of every web page in your site, you will typically see a long-tailed curve, in which the home page and a few other pages get lots of views and most others get much less traffic. This long-tailed distribution pattern in popularity is true for products in stores, books for sale at Amazon, songs to download on iTunes, or dvds for sale at Wal-Mart.
Although Wired magazine’s Chris Anderson popularized the concept of the long tail on the Internet, interface expert Jakob Nielsen first used Zipf curves (the formal mathematical term for long-tailed phenomena) to describe the distribution patterns seen in web site usage. Long-tailed usage patterns are fundamental to explaining why web search has become the most popular tool for finding information on the web, whether you are making a general Internet search or merely searching your company’s internal web site. Once users get past the home page and major subdivisions of a large site, they are unlikely to browse their way through all the links that may be required to find a specific page, even if every link is well organized, intuitively labeled, and working properly (fig. 5.8).
Links and individual web pages are the primary elements of web search. Search engines find web pages by following web links from one page to another. Search engine companies use an automated process to find and follow web links and to analyze and index web page content for topical relevance. These automated search programs are collectively called web spiders or web crawlers. This emphasis on links and pages is crucial to understanding how web search works: crawlers can find your pages only if they have links to follow to find them, and search engines do not rank web sites—they rank the popularity and content relevance of individual web pages. Each page of your site needs to be optimized for search and well linked to other pages, because from a search engine’s point of view, each page stands alone.
We can’t tell you, and the search engine companies won’t give you the formulas either. If search engines like Google and Yahoo! told everyone how they rank pages and how they detect and ban search-scamming techniques, unscrupulous web publishers would instantly start to game the system, and soon we’d all be back to the pre-Google 1990s, when general web search had become almost useless. What we can say is what the search engines themselves gladly tell web content developers: create compelling page content with proper structural markup and good linkages to other pages and sites. Don’t hide your content with poor page development techniques, and your pages will rank well in any search engine.
Current search engines use a combination of two information sources to rank the relevance of a web page to any given search term:
When web search services became popular in the 1990s, early search engines used internal content factors almost exclusively to rate the relevance and ranking of web pages. Rankings were thus childishly easy to manipulate. By inserting dozens of hidden keywords on a page, for example, an aggressive web page author could make a page seem richer in popular topic relevance than other web pages (“sex, sex, sex, sex”).
By the late 1990s even the largest search engines were considered marginally useful in locating the best sources of information on a given topic, and the top-ranked sites were often those that used the most effective manipulation techniques to bias the search engines. The innovation that transformed web search in the late 1990s was Google’s heavy use of external page factors to weigh pages’ relevance and usefulness.
Google’s algorithms balance external ranking factors with statistical analysis of the page text to determine relevance and search ranking. Google’s fundamental idea is similar to peer citations in academic publications. Every year thousands of science papers are published: How can you tell which articles are the best? You look for those that are most frequently cited (“linked”) by other published papers. Important science papers get cited a lot. Useful web sites get linked a lot.
Search engine crawlers can only analyze the text, the web links, and some of the html markup code of your web page and then make inferences about the nature, quality, and topical relevance of your pages based on statistical analysis of the words on each page.
The following are not visible to a search engine:
The following may cause search crawlers to bypass a web page:
In addition to making your pages less searchable, these poor practices make your site less accessible, particularly to people who use screen reader software to access web content. seo, structural markup of content, and universal usability are a wonderful confluence of worthy objectives: by using the best web practices for content markup and organizing your content and links with care, your site will be both more visible to search and more accessible to all users.
Write for readers, not for search engines. The most popular sites for a given topic got that way by providing a rich mix of useful, interesting information for readers. Think about the keywords and phrases you would use to find your own web pages, and make a list of those words and phrases. Then go through each major page of your site and look at the page titles, content headers, and page text to see if your title and headers accurately reflect the content and major themes of each page.
Put yourself in the user’s place: How would a user find this page in your site? What keywords would they use to look for this content? Remember, search engines have no sense of context and no idea what other related content lies on other pages on your site. To a search engine crawler, every page stands alone. Every page of your site must explain itself fully with accurate titles, headers, keywords, informative linked text, and navigation links to other pages in your site.
The ideal optimized web page has a clear editorial content focus, with the key words or phrases present in these elements of the page, in order of importance:
<h1>
, <h2>
, and so on)Note that the singular and plural forms of words are different keywords, and adjust your keyword strategy accordingly. Thus “tiger” and “tigers” are different keywords. Search engines are not sensitive to letter case, so “Tiger” and “tiger” are exactly equivalent. Also think about context when you work out your content keywords. Search engines are the dumbest readers on the web: they don’t know anything about anything, and they bring no context or knowledge of the world to the task of determining relevance. A search crawler doesn’t know that Bengal tigers are carnivores, that they are large cats of the genus Panthera, or that they are also called Royal Bengal Tigers. Your optimized page on Bengal tigers might use all the following keywords and phrases, because a user could search with any of these terms:
If your site has been on the web long enough to get indexed by Google or Yahoo!, use your chosen keywords to do searches in the major search engines and see how your site ranks in the search results for each phrase. You can also use the web itself to find data on the keywords and phrases that readers are currently using to find your site. Both Google and Yahoo! offer many tools and information sources to webmasters who want more data on how their site is searched, what keywords readers use to find their site, and how their site ranks for a given keyword or phrase:
Even in well-written content with a tight topical focus, the primary topical keywords are normally a small percentage of the words on the page, typically 5 to 8 percent. Because of the widespread practice of “keyword spamming” (adding hidden or gratuitous repetitions of keywords on a page to make the content seem more relevant), search engines are wary of pages where particular keywords appear with frequencies of over 10 percent of the word count. It is important to make sure your keywords appear in titles and major headings, but don’t load in meaningless repetitions of your keywords: you’ll degrade the quality of your pages and could lose search ranking because of the suspiciously high occurrence of your keywords.
There is some evidence that placing your keywords near the top and left edges of the page will (slightly) benefit your overall ranking, because, on average, those areas of pages are the most likely to contain important content keywords. The top and left edges also fall within the heaviest reader scanning zones as measured by eye-tracking research, so there are human interface advantages to getting your keywords into this page zone, too. For optimal headings, try to use your keywords early in the heading language. Search crawlers may not always scan the full text of very long web pages, so if you have important content near the bottom of long pages, consider creating a page content menu near the top of the page. This will help readers of long pages and will give you an opportunity to use keywords near the page top that might otherwise be buried at the bottom (fig. 5.9).
Use your major topical keywords in your file and site directory names. This helps a bit with search optimization, and it makes the organization of your site much more understandable to both your users and your partners on the web site development team. Always use hyphens in web file names, since hyphens are “breaking” characters that divide words from each other. For example, in the file name “bengal-tiger-habitat.html” a search engine will see the words “Bengal,” “tiger,” and “habitat” because the words are separated by hyphens. If you use “nonbreaking” underscore characters as dividers or run the words together, the file name is seen as one long nonsense word that won’t contribute to page ranking. The file names “bengal_tiger_habitat.html” or “bengaltigerhabitat.html” are equivalent, and neither is ideal for most search engines.
Both your readers and search crawlers can easily make sense of plain-language directory and file names in your urls:
something.edu/cats/tigers/bengal-tiger-habitat.html
Requesting links from established, high-traffic web sites is crucial to search optimization, particularly for new web sites. These links weigh heavily in search engine rankings, so they are well worth the effort to establish. If you work within a larger company or enterprise, start by contacting the people responsible for your primary company web site and make sure that your new site is linked from any site maps, index pages, or other enterprise-wide directory of major pages. Although it may not always be possible, the ideal link would be from your company’s home page to your new site. Smart company web managers often reserve a spot on the home page for such “what’s new” link requests because they know how to leverage their existing search traffic on the home page to promote a new site. The link does not have to be permanent: a few weeks of visibility after your site launches and gets an initial pass from the major search crawlers will be enough to get you started.
Your company’s central web organization will also likely be responsible for any local web search capabilities, and you want to be sure they are aware of your new web site, particularly if it is housed on a new web server. Create a standard press release or email announcement, and send the announcement to colleagues, professional associations, partner companies, and the local press, requesting that related sites link to your site. The more links your site gets from established, high-traffic sites that already rank well in Google or Yahoo!, the faster your site will climb the search results rankings.
By far the best way to get your site listed in the major search engines is to request links from other existing sites that point to your new site. The two largest search engines offer pages that allow you to submit the url for a new web site, but there is no guarantee that the search crawlers will find your site immediately. It could take several weeks or more for them to visit your new site and index it for the first time.
Site submissions pages:
Meta tags are a great intellectual notion that has largely fallen victim to human nature. The basic idea is excellent: use a special html header designation called a “meta” tag to hold organized bits of meta-information (that is, information about information) to describe your page and site, who authored the page, and what the major content keywords are for your page. The information is there to describe the page to search engines but is not visible to the user unless he or she uses the browser “View Source” option to check the html code. Unfortunately, in the 1990s, search scammers began to use meta tags as a means to load in dozens or even hundreds of hidden keywords on a page, often in many repetitions, to bias the results of web searches. Because of these fraudulent practices, recent generations of search engine software either ignore meta tags or give them little weight in overall search rankings. Current search crawlers will also down-rank or ban pages that abuse meta tags, so the practice of abusing meta tags has become pointless.
Should you use meta tags on your pages? We think they are still a useful structured means to provide organized information about your site. And although search engines may not give heavy ranking weight to meta tag information, most search engines will grab the first dozen or so words of a “description” meta tag as the descriptive text that accompanies your page title in the search results listing.
The basic forms of meta tags are useful, straightforward to fill out, and cover all the basic information you might want to describe your page to a search engine:
<meta name="author" content="Patrick J. Lynch" />
<meta name="description" content="Personal web site of artist, author, designer and photographer Patrick J. Lynch." />
<meta name="keywords" content="web design, web style guide, yale university, patrick j. lynch" />
The bottom line on meta tags: they never hurt, they might help a little, and they are a simple way to supply structured meta-information about your page content.
Basic navigation links are an important part of search optimization, because only through links can search crawlers find your individual pages. In designing your basic page layout and navigation, be sure you have incorporated links to your home page, to other major subdivisions of your site, and to the larger organization or company you work in. Remember, each link you create not only gives a navigation path to users and search engine crawlers but associates your local site with larger company or other general Internet sites that have much higher user traffic than your site. The more you use links to knit your site into your local enterprise site and related external sites, the better off you’ll be for search visibility.
In the context of search optimization, the term “site map” has several meanings, depending on its context:
You should use both kinds of site maps to ensure maximum visibility of your site content.