Web Site Audit
Sunday, 23 December 2007
Search Engine Optimization (SEO), in the simplest terms, means writing pages to appeal to search engines. Too often, the only concern of web designers and their clients is a page that is attractive to human visitors. Little thought is given to the organization or efficiency of the code itself, so long as the end result looks good in the browser.
Search engines, however, place no value on a page’s aesthetics and look directly to the code of the page to determine its quality. We have the knowledge and experience to see the page in the same light.
Our web site audit assesses the pages of your site for more than a dozen factors directly related to their rankings in the search engines. From highly technical server settings to XML Sitemaps, we’ll perform an in-depth analysis of your site and provide you with an easy-to-understand report on its condition. Should you decide you want to fix any issues uncovered by the audit, we can do that, too.
If you are interested in learning more about how your web site works and what can be done to improve your pages’ organic rankings in Google and the other search engines, please contact us by filling out the form at the bottom of this page.
A glossary of terms and more detailed explanations of the areas of concern are provided below.
Server:
Basic, publicly available information about the server running the web site is reported.
DOCTYPE:
Web pages are written in a ‘markup language’ called HTML, short for Hypertext Markup Language. It provides a means to describe the structure of text-based information in a document - by denoting certain text as headings, paragraphs, lists, and so on - and to supplement that text with interactive forms, embedded images, and other objects. HTML is written in the form of labels (known as tags), surrounded by less-than (<) and greater-than signs (>).
As the technology of the web evolves, new versions of HTML are developed to improve upon older versions. Most of the time, the latest version will be the best choice. A web page can identify which version of HTML it uses with a line of code called a DOCTYPE Declaration (DTD).
Title Tag:
The title tag is a required element in an HTML document and is among the most important parts of a web page because it contains the most heavily weighted text on the page. You can view the contents of the title tag in the blue bar at the top of your browser.
The title serves a number of purposes. For many potential visitors, it is their first exposure to your site, as the search engines present page titles in their results pages, where they are displayed as the links that visitors click to go to each site. Effectively written titles will therefore increase your traffic, while poorly written tags will result in potential visitors passing over your page. It is important to find a balance between a title that humans will find compelling and one that incorporates targeted key words for the search engines.
The audit includes a list of all spiderable pages on your site and their corresponding title tags. You should review this list carefully.
An excellent explanation of the title tag can be found at http://www.seologic.com/faq/title-tags.php
Meta Description Tag:
The description meta tag is optional. In terms of SEO, the tag is of little importance to Google, is of moderate importance to MSN/Live Search, and is still important to Yahoo!. Also, Yahoo! may display the contents of the description tag on its results pages, but Google usually creates excerpts of body text near the searched-for terms. For this reason, including a meta description tag with a concise summary of the content is a valuable part of a page. The description should be no more than 150 characters (including spaces) in length.
Meta Keywords Tag:
The keywords meta tag is optional. Some search engines still consider its contents while many others ignore it completely. Because it has been and may still be used by Google to identify ’spam’ pages, careful attention must be paid to the keywords in this tag. Omitting the tag entirely is acceptable, and doing so will reduce the risk that Google will use it to incorrectly identify a page as spam.
H1 Tag:
The H1 tag indicates the top-level heading of a page, and there can be only one H1 tag per page. The tag should contain text of primary importance and relevance to the other content of the page, comparable to the headline of a newspaper. Search engines place substantial value on the text within the H1 tag.
Absolute and Relative URLs:
Links to pages on the same domain can be written in two ways, using either absolute or relative URLs. For example, a link using an absolute URL would be written as http://example.com/bar.html while the same link using a relative URL would be written as bar.html.
Google has recommended using absolute URLs in all of your internal linking. It has been suggested that PageRank only passes through absolute URLs, which would make it possible to channel PageRank by using the different methods to link to different pages. For example, you could use a relative URL in links to your privacy policy and an absolute URL in links to your index page. If a site uses all absolute URLs, it may be needlessly bleeding PageRank to unimportant pages.
Validation:
Using valid HTML code in web pages is the first step toward maximizing compatibility with the greatest number of browsers and other user agents, such as PDAs and cell phones. Using valid HTML means having HTML code that correctly follows one of the DTDs of the HTML specification. You can check whether a page uses valid HTML by passing the page through an HTML validator, a program that checks the correctness of your document against the declared DOCTYPE. The most widely used validator is located at http://validator.w3.org/.
Validation is not a guarantee of the quality of the code and a valid page may look very different in different browsers. But validation is the fastest and most convenient metric available to the end-user of a page who is concerned about the degree of care with which a page has been created. In my opinion, a page isn’t finished until it validates.
Learn more at http://validator.w3.org/
External CSS:
A modern web page is typically comprised of two files - an HTML file and a CSS file - that allows the separation of the content elements from the presentational elements. The HTML file contains the tags (the basic structural elements, e.x.: the title tag, the H1 tag, etc.) and the content (the text) of the page. The CSS file contains all of the presentational information, such as the color, size, and font face of the text, the amount of space between each line, and the background color or image of the page. While the two files can be combined, it’s preferable to use them separately. The external CSS file is downloaded once and then cached to reduce bandwidth consumption and page load times.
A CSS file can be validated in the same way as HTML.
Learn more at http://jigsaw.w3.org/css-validator/
Tableless Layout:
In the early days of the web, when HTML was a more primitive and limited language and before the advent of CSS, web pages were often structured using tables. Graphic designers looking for ways to precisely control the visual appearance of Web pages used tables to create page layouts that were dependably identical in all browsers.
This table-based method causes a number of problems, however. Complex pages are typically designed with tables nested within tables, resulting in large HTML documents that require more bandwidth than documents with simpler formatting. In addition, a web browser usually has to download all of the content within a table before displaying it on a page, resulting in slower-seeming load times. Furthermore, when a table-based layout is linearized, for example, when being parsed by a screen reader or a search engine, the resulting order of the content can be somewhat jumbled and confusing. This can negatively affect how the search engines prioritize the page’s content.
CSS was developed to improve the separation between design and content and move back towards a semantic organization of content on the web. With CSS, the placement of text on a page doesn’t necessarily correspond to where that text exists in the code. This means that you can display the navigation above the text on the page, but put the text after the navigation in the code - an impossible situation with table-based layouts. This gives CSS-based layouts a terrific advantage over table-based layouts in terms of front-loading important content near the top of the document.
External Javascript:
JavaScript is code that is executed client-side, by browsers, and is often used to change parts of web pages according to user input. A common use of JavaScript is in drop-down or fly-out menus, where parts of the page move when clicked.
JavaScript should be placed in files external to the HTML, rather than added to the code of the page. Using external JavaScript files decreases the ‘weight’, or size, of the page, and allows for lower bandwidth consumption and faster page load times. Moving JavaScripts into external files also increases the ratio of content to code, a factor considered by search engines.
Canonicalization (www redirect):
Canonicalization is the process of selecting a single URL from several possible choices. For example, most people would consider these URLs the same:
- www.example.com
- example.com/
- www.example.com/index.html
And most of the time, a visitor would see the same page for each URL. But technically, all of these URLs are different. A web server could return completely different content for all the URLs above, and so Google sees a page at any one of these URLs as distinct and separate from a page at any of the other URLs.
One of the factors that influences PageRank is the number of incoming links to a page. A problem arises when a page has incoming links to multiple URLs, for example, to both www.example.com/mypage.html and example.com/mypage.html. Google may determine that this page is actually two distinct pages, and so each of the pages starts to accrue its own PageRank, which is disadvantageous. To consolidate PageRank, it is recommended that a site’s owner first pick one URL and use that URL consistently across the entire site. On Apache servers, an .htaccess file is then used to cause the server to rewrite the non-favored URL to match the favored URL, eliminating the ‘other’ page.
In practice, when a browser or spider follows a link to the page at www.example.com/mypage.html, the server would tell the requestor that that page doesn’t exists and instead send the page from example.com/mypage.html. This ensures that the search engines are only ever served one page and that page gets the PageRank credit for all the inbound links, no matter how the links are written.
XML Sitemap:
An XML Sitemap is an XML file that lists the URL of each page of a site, along with additional metadata about each page (when it was last updated, how often it usually changes, and how important it is relative to other pages in the site) so that search engines can more intelligently crawl the site.
Web crawlers normally discover pages of a web site by crawling the links within the site. Sitemaps supplement this data to allow crawlers that support Sitemaps to pick up all the URLs in the sitemap and learn about those URLs using the associated metadata. Using the Sitemap protocol does not guarantee that web pages are included in search engines, but it helps robots do a better job of spidering your site.
XML Sitemaps should not be confused with HTML sitemaps. HTML sitemaps are just web pages that help human visitors navigate a site, while XML Sitemaps are only ever seen by web robots.
Learn more at http://www.sitemaps.org/
robots.txt File:
A robots.txt file is a text file that resides in the root directory of your site and contains instructions for the robots about whether they are allowed to crawl your site and which folders or file types, if any, are off-limits to them. A robots.txt file can also point robots to your XML Sitemap.
An example of a robots.txt file that excludes problem robots: http://www.webmarketingnow.com/robots.txt
Google Webmaster Tools:
The Google Webmaster Tools is an online suite of tools that provide website owners with a free and easy way to make their sites more Google-friendly. The tools can show you how Google views a site, help you diagnose problems, and let you share info with Google to help improve your site’s visibility.
Learn more at https://www.google.com/webmasters/tools/docs/en/about.html
