Ecommerce SEO

CHAPTER 2

Website Architecture

Length: 10,291 words

Estimated reading time: 1 hour, 10 minutes

This e-commerce SEO guide has almost 400 pages of advanced, actionable insights into on-page SEO for ecommerce. This is the 2nd out of 8 chapters.

Written by an ecommerce SEO consultant with 20 years of research and practical experience, this comprehensive SEO resource will teach you how to identify and address all of the SEO issues specific to ecommerce websites, in one place.

The strategies and tactics described in this guide have been successfully implemented on top 10 online retailers, small & medium businesses, and mom and pop stores.

Please share and link to this guide if you liked it.

In this chapter, we will explore the concepts behind building optimized ecommerce website architectures.

Having a great site architecture means making products and categories findable on your website, in a way that users and search engines can reach them as efficiently as possible.

There are two concepts you should be aware of regarding site architecture:

  • Efficient crawling and indexing. This refers to the technical architecture or TA.
  • Classifying, labeling and organizing content. This refers to information architecture or IA.

Together, information and technical architecture form the site architecture (SA). A good understanding of these two concepts will help you build search engine optimized websites that are search-engine and user-friendly.

It is important to differentiate between information and technical architecture:

  • Information architecture is the process of classifying and organizing content on a website while providing user-friendly access to that content, via navigation. This process is done (or should be done) by information architects.
  • Technical architecture is the process of designing the technical and functional aspects of a site. This is mostly done by web developers.

Keep in mind that SEO involves both information and technical architecture knowledge.

Information architecture

The Information Architecture Institute’s definition of IA is:

  • The structural design of shared information environments.
  • The art and science of organizing and labeling websites, intranets, online communities and software to support usability and findability.
  • An emerging community of practice focused on bringing principles of design and architecture to the digital landscape.

This definition shows that information architecture goes beyond websites, and it hints at its complexity. It also reveals how flexible and theoretical information architecture is.

From an ecommerce standpoint, let’s oversimplify the definition of information architecture to this single sentence:

The classification and organization of content and online inventory.

You should be familiar with two other important information architecture concepts: taxonomy and ontology. While these names might be intimidating, the concepts are easy to understand.

Taxonomy is the classification of topics into a hierarchical structure. For ecommerce, this translates into assigning items to one or more categories. Ecommerce taxonomies are usually vertical, “tree-like” structures. A website’s taxonomy is often referred to as its hierarchy. To visualize a taxonomy, think of breadcrumbs.

Figure 1 – Notice how the breadcrumbs above mimic the website taxonomy. In our examples, one branch of the taxonomy “tree” leads to Duvet & Comforter Covers, and the other to Aloe Vera Gels.

The structures depicted in these two screencaps are ordered using a parent-child relationship, from broader to narrower topics, and they are called taxonomies. One way to create ecommerce taxonomies is to use a controlled vocabulary, which is a restricted list of terms, names, labels, and categories. Usually, it is the information architects who develop these vocabularies.

In terms of SEO, you should use semantic markup to help search engines understand taxonomies. One such markup can be applied to your site breadcrumbs.

Microdata or RDFa markup is used by search engines to generate breadcrumb-rich snippets similar to this one:

Figure 2 – Search engines can sometimes display the website taxonomy directly in the search engine results pages (aka SERPs).

We will discuss breadcrumbs in detail later in this book, but briefly, this is how the source code for the previous rich snippet example looks like:

Figure 3 -The highlighted text shows the Breadcrumb vocabulary markup.

The second information architecture concept you need to be aware of is ontology. It means the relationships between taxonomies.

If an ecommerce hierarchy can be visualized as an inverted tree, with the home page at the top, then an ontology is the forest showing relationships between trees. An ontology might encompass various taxonomies, with each taxonomy organizing topics into a particular hierarchy.

Simply put, an ontology is a more complex type of taxonomy, containing richer information about the content and the items on a website. We are just at the beginning of building ontology-driven sites, and one standard ontology vocabulary for ecommerce is GoodRelations.

The Semantic Web aims at helping artificial intelligence agents such as search engine bots crawl through and categorize information more efficiently. It is also designed for assisting identifying relationships between items and categories (e.g., relationships between manufacturers, dealers, and prices).

Figure 4 – Related Categories or Related Products can be considered a form of ontology.

If you are not an information architect or a business analyst, you probably will not be involved in identifying related categories and products, but it is important to know these terms in your discussions with information architects.

Sometimes, related categories and products are automatically identified by the ecommerce platform, or by specialized software.

Why is information architecture important for search engines?

A correctly designed information architecture will result in tiered website architecture. A good architecture has an internal linking structure that will allow child pages (pages that can link upwards in the hierarchy, such as product detail pages or blog posts) to support the more important parent pages (upper-level pages that link down in the vertical hierarchy, such as category and subcategory pages).

Figure 5 – Pages that link to each other at the same level of the hierarchy are called siblings. They share the same parent.

With correct internal linking a blog article, for example, “Top 5 New Features of Canon Rebel T5i DSLR” will support the product detail page Canon Rebel T5i DSLR. Canon Rebel T5i DSLR will support the category Digital Cameras, which will further support the top-level category Electronics.

Figure 6 – This pyramid-like structure is a very common architecture for ecommerce.

One of the questions that often comes up when deciding on the hierarchy is “What is the best number of levels to reach a product detail page?”

The famous three-click rule, which suggests that every page on a website should take no more than three clicks to access, is OK to use as a guide, but do not get stuck on it. If you need a fourth level in the hierarchy, that is perfectly fine.

Information architects, business analysts or merchandising teams can help identify relationships between categories, subcategories, and products. Based on these findings, you will decide on rules for an internal linking strategy. Such rules can include:

  • Only highly related categories will interlink.
  • Categories will link only to their parents.
  • Subcategories will link to related subcategories or categories.
  • Product pages will only link to related products in the same category, and parent categories.

A proper website architecture will help your website rank for the so-called head terms. For ecommerce websites, these are usually the category pages, at all levels of the hierarchy. However, internal linking is not enough for a subcategory page to reach the top of the search engine results pages for category-related search queries.

Because head terms are usually competitive, a page targeting such terms should also include:

  • Relevant and useful content. This means that your listing pages should display more than just a list of items. For product detail pages you need to present more than just product pictures and pricing.
  • Backlinks from related trusted external websites.

Additionally, proper information architecture means good usability. Great usability and content create an excellent user experience, which then leads to an increased dwell time (which is good for SEO).

Dwell time is the amount of time that a searcher spends on a page before returning to the SERPs. The longer this time is, the better.

Pogo-sticking means going back and forth between a SERP and the web pages listed in the results. For example, you search for something, click on the first result, you are not happy, you go back to the SERP. Then you click on the second result, you are still not happy, and you go back to the SERP again, and so forth, until you find what you are looking for, or until you refine your search query.

A SERP bounce happens when a search engine user clicks on your page in the SERPs and then goes back to the results without interacting with any page elements.

Note that a high SERP bounce is not inherently bad for SEO, but a low dwell time might be. An increased dwell time sends quality signals because it hints to search engines that your page is relevant for a given search query.

Navigation, such as primary, secondary, breadcrumbs or contextual, is also one of the critical components of website architecture. Navigation is jointly crafted by various members of the business, led by the information architect. Given that the primary navigation will be present on almost every page of the website, it influences how authority and link signals (i.e., PageRank and anchor text) are passed to other pages.

Fortunately, there are ways to give users what they want (findability, discoverability, and usability), and at the same time, guide search engine bots towards what you want them to discover, crawl and index.

How can SEO add value to IA?

Remember, information architecture is not about technical issues, but about organizing digital inventory and content. So, while SEO has a key role to play in information architecture, it should not dictate how information is labeled and organized. Information architecture is about making content easy to find and helpful for users. However, because most SEOs are biased towards marketing and technology rather than user experience and usability, it is advisable to involve both an information architect and an SEO consultant, when working out the information architecture.

Try to involve the SEO person from the initial stages of the information architecture process, to provide suggestions and feedback from a search engine standpoint, and to contribute to the overall site architecture discussion. Once the information architect designing the draft information architecture listens to what the SEO has to say, he or she can brainstorm with the other teams about how to implement the SEO recommendations with minimal changes to the initial information architecture format.

Many times, technology and marketing teams will dismiss a certain information architecture just because it does not have traffic potential. Do not make that mistake. When optimizing for search engines and their users, you should listen to what other teams in the business have to say and only then suggest solutions.

As mentioned, SEO’s role is to provide consultancy from the perspective of search engines. Let’s look at a few areas where SEO input is valuable.

The concept of flat architecture

In a flat architecture, deep pages – which are pages at the lower levels of the website hierarchy (usually the product detail pages) – are accessible to users and search engine bots within a balanced number of clicks for users (or hops, for bots).

Figure 7 – This figure depicts what flat website architecture looks like.

The opposite of flat architecture is the so-called deep architecture, and it may look like the diagram on the next page:

Figure 8 – In a deep-architecture model, pages are mostly linked in a vertical structure.

We will use math to illustrate the concept of flat architecture:

  • At level 0 (home page), you link to 100 category pages; 100^1=100 pages linked.
  • From each page at level 1 (the category pages), you link to 100 subcategory pages; 100^2=10,000 subcategory URLs.
  • From each page at level 2 (the subcategory pages), you link to 100 product pages; 100^3=one million product page URLs.

In three “clicks” search engines can reach and crawl (and eventually index) one million pages.

Note: the 100 links-per-page example was used as a guide only. In practice you can have more or fewer links, depending on your site authority.

Let’s look at the scenario of a direct visit to your homepage. To reach a product detail page from the home page, a user will have to perform the following actions:

  • The first click on the Cosmetics category page.
  • The second click on the Eye subcategory page.
  • The third click on the product details page.

If no external links point directly to that product details page (also known or abbreviated as PDP), search engines will find the PDP URL in a similar way to users. The bot will crawl from an entry page and will, eventually, find its way to the product detail page. Keep in mind that search engines will enter your website through a multitude of URLs, not only through the home page.

In our scenario, it took only three clicks to reach the PDP, but if the website is structured using deep information architecture, it might take users and search engines more clicks or hops.

But how and why did we adopt flat architecture?

The concept of flat website architecture seems to have its roots in web design, and it started with the three-click rule becoming a best practice around the year 2000.

However, when usability experts tested this rule, they found that it was not necessarily working for users, as expected. As a matter of fact:

“Users’ ability to find products on an ecommerce website increased by 600 percent after the design was changed so that products were four clicks from the homepage instead of three” (p. 322).

Then smart SEOs jumped in thinking that if the rule was good for users, then it should be suitable for search engines as well. SEOs found a way to funnel more PageRank to deeper levels and optimize crawling by providing shorter paths for search engines. However, the initial goal approach was to avoid ending up with pages in the supplemental index, because of their very low PageRank; it was not to flatten the site architecture.

Here are a few important pointers about flat architecture:

  • Unless you sell a limited number of products (e.g., just ten dietary supplement pills) or unless you have a very limited number of pages on the site, do not flatten to the extreme. That means do not link from the home page to hundreds of product detail pages, just to build a flat architecture.
  • Flat architecture is about the distance between pages in terms of clicks, not about the number of directories in the URL. For example, you can link from the home page directly to a subcategory URL at the fourth level of the hierarchy (e.g., mysite.com/Home-Garden/Furniture/Living-Room-Furniture/Recliners/) to promote a subcategory that generates high profits. In this example, the Recliners page is only one click away from the home page (which fits the flat architecture concept), but it is four levels down in the directory hierarchy (which matches the deep architecture concept).
  • If you have already organized your hierarchy using directories in URLs, do not remove them just for the sake of flattening.

As long as the directories do not generate super-long URLs, they have advantages such as:

  • Facilitating easier website “theming” (we will talk about this in the Siloing section).
  • Presenting users with a clear delineation of the categories on your website.
  • Allowing for easier SEO, information architecture and web analysis (e.g., you can use site:domain.com/directory/ to troubleshoot indexation problems).
  • Google and other search engines may use your directory structure to create rich-snippet breadcrumbs.

Figure 9 – SERP breadcrumbs will show up only if the directory hierarchy is clear to search engines.

In this screenshot, you can see how Google displays breadcrumbs directly in SERPs. However, such rich snippets will show up only if the directory hierarchy is structured, or if you mark up your breadcrumbs with Schema vocabulary.

It is not mandatory for URLs to replicate the exact website taxonomy. If you want, you can keep the URL structure under two directories deep. Here’s an example.

On hotel reservations websites, it is common to have a taxonomy based on hotel geo-locations:

Taxonomy: Home > Europe > France > Ile-de-France > Paris

URL: domain.com/europe/france/ile-de-france/paris/

Even though the URL reflects the hierarchical taxonomy, it is too long and too difficult to type-in or to remember.

If the website sells only hotel rooms, the alternative URL might look like:

domain.com/france/paris/

If the website offers other travel services such as air tickets or car rentals, then the alternative URL will include the type of service, and it might look like:

domain.com/car-rentals/france/paris/

Regarding the directory structure for hotel booking websites, it is worth noting that hotels are a special ecommerce case because you cannot re-categorize hotels from one city to another. However, for online retailers, product re-categorization happens frequently.

To avoid issues associated with moving products from one category to another, or issues related to poly-hierarchies (items that are categorized in multiple categories) keep the PDP URLs free of categories, whenever possible.

For example, to reach the product page 3-Level Carousel Media Center, a user will navigate through:

homepage – mysite.com/

category page – mysite.com/office-furniture/

subcategory page – mysite.com/office-furniture/storage/

sub-subcategory page – mysite.com/office-furniture/storage/media-storage/

However, once the searcher reaches the product detail page, the URL is free of categories and subcategories:

mysite.com/3-level-carrousel-media-center.html

Tip: setting product names in stone is also a good idea.

Notice a couple of things about the previous URLs:

  • The product page URL is free of category, subcategory or sub-subcategory names.
  • The category and subcategory URLs include the trailing forward slash (/) at the very end. That hints to search engines that the URLs are directories and there is more content to be found on those pages.

Figure 10 – This is how Google treats trailing slashes in URLs.

  • The product page has a .html file extension. The presence of the file extension hints the search engines that the document is an HTML page (an item page ) and not a directory. The file extension can be anything, i.e., .php or .aspx—because the file extension does not matter at all to search engines.

Removing categories names from URLs is a trade-off with your data analysis, as it will make the web analysis a little bit more challenging. However, this difficulty is surmountable. For example, you can group pages in your analytics tool or you can markup the HTML code with different strings, to group pages based on your own rules.

At the same time make sure your web analytics tool is set up to easily group pages for analysis. Without unique identifiers for URLs, it is more difficult to segment data. You can also use tag managers such as Google Tag Manager to create content groups, using Data Layers.

Figure 11 – The flat architecture concept on an ecommerce site.

Siloing

In the simplest terms, siloing means creating a site architecture that allows users to find information in a structured manner while linking pages using a controlled pattern to guide how search engine bots crawl the website. Usually, this structure is a vertical taxonomy.

Siloing sounds like a fancy term, but it is just good information architecture because siloing is one-part website hierarchy and one-part navigation (using internal linking).

Figure 12 – At a basic level, siloing means that pages in a taxonomy branch/category (i.e., PDPs) should not link to pages in a different branch/category.

In strict hierarchy patterns, child pages are only linked to and from their respective parent pages. This is not possible without a strictly controlled internal linking, and it is challenging to create such a strict internal linking pattern mainly because:

  • Primary navigation is present on all pages, so cross-linking will happen naturally.
  • poly-hierarchies, which means multi-categorizations for products or subcategories. For example, the Office Furniture category can be categorized under Office Products and Furniture.
  • Subcategory cross-linking and crossover products. For instance, you may have to link from a product that is categorized under Home Theater, to another product made by the same brand, but categorized under Audio.

Because ecommerce websites are complex, they are most likely to have a hierarchy that frequently interlinks silos. In practice, it is complicated, and sometimes, not even advisable to prevent internal linking between silos.

Figure 13 – Cross-linking happens naturally

The internal linking architecture can be very cumbersome and difficult to control, even for ecommerce websites with just a few hundred products, as you can see in this graph:

Figure 14 – This node graph shows how complex internal linking can be.

In this example, a website with just a few thousands of pages generated more than 250,000 internal links.

The siloing method

Conceptually, siloing is done by identifying the main themes of the website (for ecommerce, those will be departments, categories, and subcategories) and then interlinking only pages belonging to the same silo (for example, linking only within the same category).

The good part is that ecommerce websites are usually developed using a similar architecture, with separate hierarchies (themes) for each department or category.

The idea is that by siloing the website into themes, you will be able to rank high for semantically unrelated keywords with the same site, even though the themes are entirely different, e.g., “hard drives” and “red wines”.

You can achieve silos with directories or with internal linking.

Directories

Information architects create hierarchies using user research, user testing, keyword research, and by analyzing your web traffic. The labels used to describe these hierarchies will be present in the URL structure. Your silos will be the directories in the URLs.

Whenever possible, use a hierarchy created with directories.

Internal links

With internal linking, you create virtual silos, as pages in the same silo do not need to be placed in the same directory. You achieve virtual silos by controlling internal links in such a way that search engine bots will only find links to pages in the same silo. This is a very similar concept to bot herding or PageRank sculpting, with subtle differences in meaning and application.

Siloing with directories

Siloing with directories is the easiest to implement on new websites during the information architecture process. From a user experience perspective, creating the website hierarchy with directories is the best way to go.

But in the end, siloing with directories is nothing less than creating good vertical hierarchies, which the URLs reflect. Many online retailers create them naturally by branching out all categories, without overthinking about SEO and without being obsessed with keywords in the anchor text or with internal linking patterns.

A sample silo with directories would look like:

mysite.com/category1/subcategory1/

mysite.com/category1/subcategory2/

mysite.com/category1/subcategory3/

….

mysite.com/category1/subcategoryN/

Does this type of siloing look familiar to you? It should if you use directories in your URLs. Moreover, this is nothing more than a proper hierarchy. So, if you design your website hierarchy correctly, you do not even need to worry about siloing with other methods.

Keep in mind that it is best practice to keep the directory depth low, ideally fewer than four or five levels.

Siloing with internal linking

Siloing with directories may not always be possible, for example, if you wish to change an existing hierarchy on an established website. In this case, you will create virtual silos using carefully controlled internal linking.

Usually, pages in a silo need to pass authority (PageRank) and relevance (anchor text) only to other pages within the same silo. This prevents the dilution of the silo’s theme and sends the maximum power to the main silo pages.

Here are some rules for linking within and between silos. A page in a silo:

  • Should link to parents.
  • It can link to siblings, if appropriate. Siblings are pages at the same level in the hierarchy.
  • It should not link to cousins.
  • It could, eventually, link to uncles or aunties. Uncles and aunties are siblings of the node’s parent.

Figure 15 – An over-simplified siloing diagram.

In this simplified siloing example, sibling number one could eventually link to uncles, who are siblings of that node’s parent. That means that if you have to link two related supporting pages found in separate silos (which are called cousins), you should link only to the silo’s uncles.

If you need to link to pages outside the silo, you can block those links from being accessible to search engines (e.g., using AJAX – Asynchronous JavaScript and XML, iframes, JavaScript with robots.txt). Note that there is a fine line between white hat and gray hat SEO, and such linking may cross that line. This is because Google’s definition of manipulative techniques lies in answer to the question: “Would you do it if search engines did not exist?”

The goal is not to take siloing to the extreme. If a page is relevant and you want to link to it, then do so even if it is in a different silo or theme.

Siloing with internal links is a powerful advanced SEO technique especially for large websites with multiple departments, themes, or categories that are not semantically related, i.e., groceries and mobile phones. However, it is important to know that siloing is not easily achieved, and it pays to be aware of the existing dangers.

If you want to silo with internal linking, know that:

  • PageRank sculpting with rel=”nofollowis not recommended.
  • Virtual siloing means that you somehow have to “hide” internal links from search engine bots; doing so may fall outside search engines’ guidelines.
  • Hiding internal links from search engines using iframes, AJAX, JavaScript or other similar techniques can qualify as cloaking since you show different content to users than to search engines; this could result in penalties.
  • If you want to obfuscate links with AJAX or JavaScript for SEO reasons, first identify the percentage of users with JavaScript turned off. If that is a significant segment of your total visitors, make sure your website works correctly without JavaScript. Non-JavaScript users should be able to finish all micro and macro conversions on your site. An example of micro-conversion will be an “add to cart” event, while a macro-conversion is a completed order.
  • Trading away too much for SEO at the expense of usability and accessibility is not the right way to go.
  • Siloing may require hiding entire navigation elements, such as facets and filters, from search engines. There are risks associated with such bold tactics.

Figure 16 – The nofollow links are marked with the red-dotted border

In the image above, you can see how only the top-level categories (Women, Men, Baby, etc.) and the immediate next hierarchy level of subcategories (Clothing, Shoes, Accessories, etc.) pass authority through links. Category links are nofollow. This is a very bold (most likely bad) SEO approach to handling primary navigation menus.

Proper internal cross-linking is helpful and necessary for good rankings, and we will discuss this in detail in the Internal Linking section but remember that internal linking must be built for users first, and only then for search engines. You must link consistently, thematically and wisely (using synonyms, stems, plurals and singulars, and so on) to support rankings for categories and subcategories.

You should not remove navigation elements just for SEO purposes. Keep the links that are useful for users in the interface, and if you want to remove links for SEO reasons do it by blocking those links with AJAX or JavaScript.

Another theming method is to evolve taxonomies into ontologies. Instead of linking based strictly on a vertical taxonomy, interlink items that are conceptually related. For example, you can interlink a particular fragrance with the sunglasses manufactured by the same brand. This type of interlinking requires defining semantic and conceptual relationships between categories and items and then deciding on the internal linking based on predefined business rules.

One such business rule is crowd-sourced recommendations (AKA Customers Who Bought This Item Also Bought…). Do users often buy certain products together? If yes, then cross-link those product detail pages, even if they are in different silos.

If this type of linkage generates too many internal links on some pages, you can always block the less important links (you will have to define how many links is too many for your particular situation). However, for the sake of users, interlink whenever is necessary, without being concerned about siloing.

If the business rules are based on data, then you will not be linking from adult toys to children books. Also, you will not link to hundreds of related products, but just to a few highly related items.

Here’s what Google has to say about the subject of theming an internal architecture, in a post on their official blog:

Q: Let’s say my website is about my favorite hobbies: biking and camping. Should I keep my internal linking architecture “themed” and not cross-link between the two?

A: We haven’t found a case where a webmaster would benefit by intentionally “theming” their link architecture for search engines. And, keep-in-mind, if a visitor to one part of your website cannot easily reach other parts of your site, that may be a problem for search engines as well.

This is a reminder not to take siloing to the extreme. However, siloing with directories is natural, and the resulting internal linking is also great for users and search engine bots.

I lean towards a hybrid siloing concept combining the following:

Generate content ideas

It is widely known that keyword research can help with generating content ideas. Keyword research also enables you to expand from a relatively narrow set of head keywords (category and subcategory keywords) to a large number of torso and long tail keywords. These long tail keywords can then be used to generate content ideas, identify product attributes, and improve product descriptions.

Based on the initial taxonomy created by the information architect, you can identify keyword patterns, tag user intent, group keywords according to buying stages, and find search volumes; I will cover these tactics in the Keyword Research section.

This type of research provides excellent insights that are usually overlooked by the other teams in an ecommerce business.

If you want to consistently publish content that your target market will find relevant consistently, you will have to know not only the queries used by searchers but more importantly, the type of content they are looking for. Are they looking for general information about your products? If so, you would do well to put more emphasis on review-type content and how-to articles. Are they searching for products to buy? If so, you could improve the content on a certain product detail page.

You can better address your target market needs once you gained an understanding of what they want, by discovering the user intent behind the search query. When you do so, you will be better able to address their needs on your landing pages. And when your landing pages address people’s needs, conversion rates will sky-rocket and rankings may improve (as an indirect quality signal).

Here are some interesting facts about search queries:

Figure 17 – The search demand curve, as explained by MOZ. Notice how the long tail of keywords and chunky middle make for more than 80% of the keywords.

Why did I mention these search query facts?

It is because the correct way to start keyword research and build a great website architecture is by recognizing that only a small fraction of your target market is ready to buy, at any given moment. Many ecommerce websites mistakenly focus on targeting keywords such as department, category or subcategory names while completely ignoring a large number of informational search queries (and even navigational). I will detail a keyword research process in the Keyword Research section of the book.

Let’s look again at a typical ecommerce website architecture:

Figure 18 – Under this sample hierarchy, product detail pages are not supported by any other content-heavy level, below the PDP level.

Including the home page, there are four levels in this example: The first level is the home page, which is supported by categories (second level), subcategories (third level) and product detail pages (4th level). The subcategory and product detail pages support the category level; product detail pages then support subcategory pages. However, the product pages are the “leaves” in this example – they are the last level of the hierarchy.

When an ecommerce website does not support important pages (i.e., categories or PDPs) with an additional content-heavy level in the hierarchy, it can miss a considerable amount of organic traffic coming from informational search queries. It will also miss out on the ability to create useful contextual links to product, subcategory and category pages.

In our example, you can overcome these challenges by creating a 5th level in the hierarchy. This level can be a blog, a learning center or a projects section on the website, to name just a few ideas. This content-rich section can also be outside of your existing hierarchy.

On Victoria’s Secret website, I was not able to find a single reference to their blog. This is bad for them, but it is good news for the small guys competing in their niche.

Figure 19 – Only five pages on this website contained the word “blog”, and none of them were part of a real blog.

Here are two ideas for you:

  • Add a new layer of support for all pages on the website, especially for product and subcategory pages. As I mentioned, this layer can be a blog, a forum, expert Q&As, how-to guides, buying guides, white papers, workshops and so on. This layer will generate additional organic traffic and will provide support for contextual, internal linking. Additionally, it may help build a community around your brand, which is always a great thing to do.
  • Conduct keyword research with this new level in mind, which means you will not dismiss informational keywords. Categorize such keywords into the Informational bucket in your spreadsheets, and plan content based on them — more about this process in the Keyword Research section.

Let’s say that you sell home improvement items, and you want more people to come to your website and buy them. However, a lot of searchers in this niche are DIYers, and they use keywords specific to the awareness and research stages. Then why not create a series of DIY home improvement projects and publish on a content-heavy section of the website?

Take a look at the following inspiring piece of content from Home Depot’s blog. Home Depot is not at all into selling instructional DIY DVDs, but they are attracting the target market with highly related content. Home Depot has an entire DIY section on their website.

Figure 20 – Notice how this page supports category and product pages, by linking to them.

When you add a new content-rich layer in the hierarchy, you:

  • Expose your brand to your target market, in the early stages of the buying funnel.
  • Add a new way to generate more traffic.
  • Give visitors more reasons to buy from you.
  • Reinforce product and category pages with better internal linking.

Let’s see how SEO could help regarding information architecture.

Evaluate the information architect’s input

Planning an ecommerce architecture starts with information architects identifying the navigation labels such as departments, categories or subcategories.

In many cases, information architects do not associate this process with the keyword research process, which is good, because navigation has to serve the users, not the bots. However, you should evaluate the architect’s input, from a search engine perspective.

Here’s an example of how to do that using Google Trends. If the information architect wants to label one of the categories in the primary navigation as mp3 players, the following search trend comparison data might change his or her mind.

Figure 21 – The trend for “iPod” is downwards, but it is still a few times more than the one for “mp3 player”.

Indeed, iPod can be a child of the mp3 player parent, but you should brainstorm with others in the team to decide whether making the iPod category easier to find would be more beneficial for users, which may mean displaying it directly in the primary navigation.

Many times, the search volume for a parent category is higher than the search volume for a child category, but as you can see in this example, this rule is not definitive.

Also, note that Google Trends displays normalized data, on a scale of 1 to 100, where 100 is the highest search volume ever recorded. Google Trends does not present absolute search volumes.

All ecommerce websites will have primary navigation (aka global or main navigation), secondary navigation (aka local navigation) and some contextual navigation. Another form of navigation specific to ecommerce websites is faceted navigation.

Primary and secondary navigation

Primary navigation is for the content that most users are interested, but keep in mind that importance is relative (something important for your business may not be as important for another business). In general, on e-commerce websites, primary navigation displays departments, categories or market segments (i.e., men, women, kids, etc.).

Primary navigation is the easiest type for most users to identify. It allows direct access to the website’s hierarchy and is present on almost every page of a website.

Figure 22 – A sample primary navigation on Kohl’s website.

On a side note, it will be difficult for Kohl’s to rank for top-level category keywords (e.g., Home, Bed & Bath, Furniture, Outerwear, etc.) since they will have to compete with niche-specific websites that are laser-focused on a single segment—for example, a company that sells just furniture. It is not impossible for Kohl’s to achieve good rankings, but it will require significant work including onsite SEO and quality backlink development.

Regarding secondary navigation, even information architecture experts like Steve Krug, Jesse-James Garret, and Jacob Nielsen cannot agree on a definitive definition.

Secondary navigation stands for content that is of secondary interest and importance to users. Again, importance is relative to each business.

Strongly connected with navigational links, there is an SEO best practice that recommends keeping the number of links on a page under 100. However, this is an obsolete rule; you can list more than 100 links on your pages, depending on the authority of your website.

You will see high authority websites like Walmart listing hundreds of internal and external links:

Figure 23 – There are 633 links on this page. This may be too many unless you have an excellent site authority.

Walmart’s large number of links results from the use of the so-called fly-out mega menus in the primary navigation, for usability reasons. This type of menu makes deeper sections of the website easily accessible to users.

Mega menus allow direct linking to subcategories and even to products, but you must be careful to keep the number of links to a reasonable limit. Since the primary navigation is present on most of the pages on a website, it has a pretty significant influence on how authority moves back and forth between pages.

Consolidating a long list of departments into a single place has to do with design considerations (limited screen estate) and user experience (too many options to skim at once). However, it also affects the PageRank passed to the other pages.

Figure 24 – Design limitations forced Walmart to reduce the number of links in the navigation. Notice the “See All Departments” link at the bottom of the primary navigation.

However, Walmart has a separate page for the complete list of their departments (i.e., health) and categories (i.e., vitamins):

Figure 25 – The “All Departments” consolidation is a clever idea because this page will act as a sitemap for both people and search engine bots.

SEO can help information architects decide which categories are the most important for users and therefore should be listed in the primary navigation. Use web analytics tools to identify metrics such as the most searched terms on the website, the most viewed pages, the highest search volume from pay-per-click campaigns.

Figure 26 – The keyword with the highest number of internal site searches could eventually be placed in the navigation if it makes sense or, it can be placed near the search field.

Contextual navigation

Contextual navigation refers to the navigation present in the main section of web pages. It excludes boilerplate navigation items such as those found in headers, sidebars or footers.

Some examples of contextual navigation on ecommerce websites include sections such as:

Figure 27 – Customers who viewed this item also viewed

Figure 28 – Best Sellers

Figure 29 – Contextual text links in the main content (MC) areas.

Figure 30 – Links in Recommended Products carousels.

You will need to discuss contextual navigation with the information architect to identify relevant relationships between categories, subcategories, and products and to plan the internal linking accordingly.

Prioritization

SEO can help with the prioritization of labels in the navigation.

It is helpful to know how many pages will be linked from structural sections of the website (primary, secondary and footer links) on each page template. This is important to estimate because you need to determine how many links you can display in the contextual navigation (only if you need to limit the number of links on pages).

This is not a definitive rule, but if you start a new website, it is a good idea to keep the number of links on each page to maximum 200. This is because you will have only a small authority to pass along to lower levels in the beginning.

Here are some prioritization guidelines:

  • Keep the number of top-level categories or departments in the primary navigation low, to avoid the paradox of choice. Research has established that having too many options is bad for decision making.
  • The short-term memory “rule of seven items” does not apply to primary navigation, as users do not need to remember the labels.
  • You can list more categories on a “view-all departments” or “view-all categories” page.

Figure 31 – In a horizontal design, the primary navigation is constrained by design space.

As you can see in the examples above, in a horizontal design, the primary navigation is constrained by design space. Notice how short the category names must be. Macy’s displays eleven labels in the primary navigation, same as with BackCountry, while Office Depot lists only nine labels.

  • Vertical primary navigation placement allows for more categories to be listed:

Figure 32 – Costco displays 18 categories in the menu (the same as Sears) while Walmart displays only 13.

Specialty retailers will probably have less than two or three departments (sometimes none), in which cases they may not even list departments in the menu, but categories. General department stores can have up to 20 departments.

  • You can break each category level into 20 to 40 subcategories, depending on how extensive your inventory is.
  • If a parent category needs more than 40 subcategories, consider adding a new parent category or implementing faceted subcategories.
  • Ideally, the hierarchy depth to reach a product detail page should be under four levels:
    • Two levels deep: home, category, product detail page (this is suitable for niche retailers).
    • Three levels: home, category, subcategory, product detail page (this is the most common setup for medium-sized ecommerce websites).
    • Four levels deep are home, department, category, subcategory, product detail page OR home, category, subcategory, sub-subcategory and product page. This setup is specific to marketplaces, large department stores or websites with extensive inventories.
  • If the hierarchy has more than four or five levels, use faceted navigation to allow filtering by product attributes.
  • To improve the authority (PageRank) and the relevance (anchor text) of product detail pages, add a content layer (e.g., blog, community forums, user reviews and so on) in the hierarchy just below the product detail page level and link to relevant items from there.
  • Ordering categories (or items) alphabetically is not always the best option. You should prioritize based on popularity and logic whenever possible, and eventually, complement with alpha navigation if user testing proves that such type of navigation is indeed, useful.

Figure 33 – An older version of primary navigation on OfficeMax, featuring alpha navigation.

Figure 34 – Newer screenshot after OfficeMax tested the alpha navigation and reverted to category name navigation.

  • If a category has too few items, consider moving them to an existing category with more items but do this only if the new categorization makes sense for users.
  • If a category has too many items (i.e., thousands), it may generate information overload. In this case, you can break the category into smaller subcategories. Additionally, create a user experience that allows better scope selection, before displaying a list of items.

Keyword variations

Planning a categorized product hierarchy is not easy. At the top category level, the labels in the primary navigation must be intuitive, must have the appropriate search volumes and must be concise enough to support menu-based navigation. It is worth repeating that determining the hierarchy of an ecommerce website based solely on keyword research is neither ideal nor recommended. However, keyword research should complement and support information architecture.

One common question regarding keywords is how to handle misspellings, synonyms, stemming or keyword variations for a category. Where do you place them in the web site’s information architecture?

For your internal site search, this should be easy to handle: you must associate each keyword variation, misspelling, etc. to an existing product or category and redirect users to the respective canonical product or category page. If there is no exact match between the keyword variation, misspelling or synonym and a category on your site, then send users to an internal site search result page.

For example, when someone searches for “tees”, “tee shirts” or “tshirts”, you return results for “t-shirts”. If there is an exact match between the search query and the category name you can also redirect the searcher to the t-shirts category listing page.

Figure 35 – Make sure that your internal site search works appropriately and does not return wrong products, as in this example (a search for “t-shirt” returned bras).

In this screenshot, I wanted to highlight the improper handling of internal site search results; returning bras when someone searches for “t-shirts”.

Handling keyword variations for external search engines is a bit more complicated. Commercial search engines like Google and Yahoo! need to understand and connect keyword variations with the right content on your website.

In the past, you would’ve created individual pages to target keyword variations (or a group of keyword variations). However, Google shifted to ranking topics instead of individual keywords. Therefore, it is important that your pages include the searched keyword and semantically-related words (e.g., synonyms, plurals).

Just make sure you are not overdoing it; including all 20 possible variations of a keyword on a single page is spammy.

Here are some ideas for you to consider:

Target the most common variations in the title and description or both.

Figure 36 – Gap targets keyword variations in the description, while Sears uses the title tag.

Use product and category descriptions

One option is to use category or product description sections to add keyword variations in the copy. The bottom of the image below highlights how this website uses two keyword variations for “t-shirts”.

Figure 37 – This retailer uses the words “tees” and “t shirts” in the category description copy, to capture traffic for those keyword variations.

Take advantage of related searches

This approach requires displaying a related searches section on your pages. This section may contain several of the most used keyword variations:

Figure 38 – Keep in mind that Related Searches sections should be useful for users, in the first place, and only then for search engine bots.

Identify possible information architecture problems

You can perform the site: query on Google, for example, “category_name site:mysite.com” (without quotes) to see whether search engines list the right page at the top. You can also use products and subcategories in the site: query. For example, you can search for:

site:www.costco.ca/ gourmet products

site:www.costco.ca/ “gourmet products”

If what the page your optimized for on your website does not show up at the top of the results, various reasons are possible, such as:

  • Improper internal linking. This happens when the internal linking architecture does not support the correct page.
  • Thin content, no content or inaccessible content (e.g., JavaScript reviews) on the right page.
  • External links point to the wrong page(s), diluting and reducing the relevance of the correct pages. If people are linking to the wrong pages, you must ask yourself why. Maybe those other pages are more relevant to them?
  • Page-specific penalties.

Of course, an in-depth analysis is required to identify the cause of these issues. When attempting to determine the cause of such problems, it is important to understand how the targeted page (the page you want to rank with, at the top of the SERPs), is linked internally from other pages on your website, and external sites.

One of the tools for analyzing this is Google Search Console:

Figure 39 – The Internal Links report will display the most important internal links, but only for the most important pages on the website.

This report is basic, but it can provide some immediate insights. Look for signals such as:

  • Are there more internal links to the wrong page(s) than to the desired page?
  • Is the desired page linked from the parent pages (pages higher in the hierarchy)?
  • Is the desired page linked from pages with high authority?
  • Is the desired page linked with the proper anchor text?

If there are issues like these, it is time to restructure your internal linking. Keep in mind that Google will not allow you to download the complete list of links, only the top ones.

Another useful method to assess the internal linking is to run a crawl on your website using tools like Xenu Link Sleuth or Screaming Frog and export the results to Excel.

It is also a good idea to run the most important terms on your internal site to check whether there is a match between the URL returned by your internal site search and the URL returned by search engines.

For instance, let’s say that Google returns the Gourmet Products category URL in the first position when you search for “site:costco.ca gourmet products”. If you were to click on the result, the Gourmet Products page opens:

Figure 40 – Costco’s organic search landing page, pointing visitors to the right category page.

However, Costco’s internal site search returns a different page, which is a search results page. This is not the best approach from a usability point of view or for search engines, because Google does not want to list in SERPs other results pages.

Figure 41 – In Costco’s case, this mismatch may happen because of the setup of the internal site search rules.

In many cases, when there is an exact match between a user’s query and a category name, it is preferable to redirect the user to the listing page instead of to a search results page.

Labeling

In reference to choosing the names of the links in the navigation, labeling is an area where information architecture and search engine optimization overlap. SEOs and information architects must understand the user’s mental model to label the navigation correctly. Labeling is not easy and presents a real challenge for very large ecommerce websites. Research from eBay shows how complicated it can get.

While most ecommerce taxonomies can be architected based on a predefined vocabulary, SEO can assist in the labeling process.

Let’s say you sell toys. Start by searching for the category name (“toys”) using Google’s Keyword Planner:

Figure 42 – Do not forget to set up the targeting options based on your target market.

Download the list generated by Keyword Planner and open it with Excel. Then, categorize keywords into “buckets” by mapping each keyword to either its category, attribute or filter name:

Figure 43 – Categorize keywords into “buckets”.

Insert a pivot table that counts the occurrences of Category:

Figure 44 – Sort by Count of Category.

If you sort by Count of Category, you can get an idea of what needs to be present in the navigation. You can also identify filters values that can be used in the faceted navigation.

Some navigation labels will be easy to identify after tagging fewer than a hundred keywords. For instance, in our example, it seems clear that brand should be a primary or secondary navigation label and users should be able to navigate and filter items by brands. Other possible candidates in this example are age, theme, and character.

Take the findings from this type of research and discuss them with the information architect.

Another thing you should do with the keyword list generated by Keyword Planner is to get the individual word frequency, using tools such as wordle.net:

Figure 45 – Words sorted by frequency.

Visually, this is how the word frequency will look like for our previous example:

Figure 46 – The “word cloud” for a list of keywords.

The image above is what we call the “word cloud,” and in our example I excluded the words “toys” and “toy”, to make the other words stand out.

The frequency of the word “kids” is particularly interesting. If you sell toys only for kids (no other target age, i.e., adults), then you probably should exclude the word “kids” from your analysis.

If you are in this niche, you may notice that a few essential segments/labels are missing from this keyword list:

  • One is the gender label (girls and boys).

  • Is your target market price sensitive? Then pricing might be another segmentation/label ( shop by price).

Insights like the ones above cannot be discovered using keyword tools. So, how do you identify these “hidden” labels? By conducting user research, user testing, creating consumer personas and scenarios, user flows, website maps and wireframes.

Keep in mind that from an information architecture perspective, labeling does not stop with the text used for links and navigation. There are different types of labels as well, such as:

Document labels

  • URLs (whenever possible, URLs should contain keywords that make sense to searchers and to search engines).
  • File names
    (having relevant keywords in filenames is important for SEO and users).

Content labels

  • Page titles should make sense to searchers and search engines. When there is a partial match between the keywords in the HTML title element and the search query, search engines will emphasize (bold) the matched keyword(s), which may help with SERP click-through rates (CTR).
  • Headings and sub-headings. Headings use large fonts and attract the eyes almost immediately. Putting keywords in headings assures users that they are in the right place and help with dwell time and bounce rates.

Other types of navigation labels

  • Breadcrumbs. Keep in mind that since search engines became so popular, home pages are not the only entry points to websites. Therefore, use breadcrumbs to easily and quickly communicate the hierarchy of your site to searchers.
  • Contextual text links. Using keyword-rich anchor text placed in a sentence or paragraph is one of the best ways to interlink pages, either vertically or horizontally.
  • Footers are also a type of navigational label.
    A quick note on this type of navigation: this is probably the place people spam the most, by creating tens of keyword-rich internal links.

Figure 47 – The screenshot depicts a footer that makes this website a good candidate for an over-optimization filter.

This footer is mainly boilerplate text, meaning that search engines will most likely ignore it when assessing this page’s content and the anchor text relevance.

It does not help to repeat “men’s{category name}” across a million pages since search engines can exclude boilerplate text pretty well when computing relevance.

Figure 48 – An excerpt from Google’s webmaster guidelines regarding boilerplate repetition.

It is funny how SEOs refer to the concepts discussed in this section of the course as on-page SEO factors, while information architects refer to the same as labels. It seems that SEOs and information architects work with similar and related concepts, but they still cannot easily come to agreements when it comes to optimizing websites for both searchers and search engines.

Poly-hierarchies

SEO can help information architects with canonicalizing poly-hierarchies.

Very often, multiple suitable hierarchies could be appropriate for a given item. It is important to help the information architect choose the best fit as the canonical hierarchy and to stick to it. From the primary or secondary navigation, you should link only to the canonical hierarchies.

Ideally, all links on the website should point to only one canonical hierarchy.

You can keep as many logical hierarchies as are helpful to users, but to avoid confusing search engines, link to the canonical hierarchy as well.

For example, the Elmo category can be found under:

Toys > Stuffed Animals > Elmo (URL: mysite.com/toys/stuffed-animals/elmo/)

Gifts > Holidays > Christmas > Elmo (URL: mysite.com/gifts/holidays/christmas/elmo/)

If you decide that the first hierarchy is the canonical one (usually canonical hierarchies are the shortest), then whenever you link internally to the Elmo category, use the URL mysite.com/toys/stuffed-animals/elmo/

You can use your web analytics tool to see how most users reached a given page. For example, look at the Navigation Summary report generated using Google Analytics (under Behavior –> All Pages), and see how most people reached the Elmo page:

Figure 49 – To get this report, follow the steps illustrated in this screenshot. If you want a more detailed analysis, use the Visitors Flow report under the Audience tab.

Additionally, you look at the Refined Keywords dimension in the Behavior –> Search Terms section to understand what keyword refinements were made after a search for “Elmo”. The Refined Keyword report can also be a source of keyword variations as you can see in the following screenshot:

Figure 50 – The Refined Keyword report can be a source for keyword variations.

Remember that there is no wrong or right way to classify a product into certain taxonomies if you refine them over time if need be. However, once you decided on a canonical hierarchy, it is a good idea to set that in stone.

Here are some other SEO tips for ecommerce information architecture:

  • If you use Google Analytics (or any other web analysis tool), activate the Site Search Tracking option. Analyze what users search for and use that information to decide on the website’s hierarchy. However, do not rely solely on your web analytics data, because you will miss a lot of data that is sourced outside your site.
  • Use keyword research tools to identify keyword variations and suggestions for the terms you have in mind or for those generated with user research and card sorting.
    • Google Keyword Tool
    • Search Term/Query Reports
    • Wordstream
    • Ubersuggest
    • Keywordtool.io
    • Google Suggest
    • Google Correlate
    • SEMRush
    • SpyFu
  • Analyze your competitors’ website architecture and navigation, but do not copy blindly. Use their information for inspiration but create your site architecture in the end.
  • Use a crawler on your competitors’ websites and sort their URLs alphabetically. You may need to crawl a large number of URLs (i.e., 250k+ URLs), for this to work.
  • Find your competitors’ sitemaps (both the HTML sitemap and the XML Sitemap) and analyze them in Excel.

Figure 51 – Sorting URLs alphabetically can reveal the website structure.

  • Download the DMOZ taxonomy and look at the shopping categorization.
  • When choosing category names, use Google Trends to check whether there is a steep drop in what people search online, over time.

Figure 52 – Notice how the interest in “digital cameras” trends downwards. Maybe this has to do with mobile phones that yield increasingly better pictures.

  • Do not create the website hierarchy solely on keyword research data; validate with card sorting and user interviews. Nowadays, you can quickly do that online.
  • Perform simple navigational queries, like “contact{your_brand}” and make sure the contact URLs, and all other important URLs, are user-friendly.

Figure 53 – This is a not-so-friendly “contact us” URL.

Remember, labeling applies to URLs too, not only to links. In this example, the URL is not optimized for users (nor for search engines). This may be limited by the CMS, in which case it may be the time to ditch the old CMS for a new one.

A friendly URL will read www.jcpenney.com/contact-us OR jcpenney.com/contact-us

  • If you need to categorize large volumes of items, you can use the power of folksonomy, which is an academic term for what we commonly call crowdsourcing. Services such as Mturk from Amazon will allow you to categorize products quickly, and even create relationships between products using real people. However, you need to be careful about how you select participants and what instructions you give them.
  • When card sorting tests are in progress, it is more important to listen and observe than to put words in your users’ mouths.
  • When you remove/update categories from your website (at all levels), make sure that the URLs belonging to the updated categories redirect to the most appropriate working page.
  • When you develop or update the website, create a checklist of SEO requirements for the information architect (e.g., directory and file name conventions, canonicalization rules, lower casing all URLs, data quality rules for data input teams, seasonality, and expired content handling, parameters handling, and so on). I will not provide an extensive checklist here, because people tend to limit to using just the pointers in the list while missing others. After reading this book, you should be able to come up with your list.
  • Send email alerts to the search engine optimizer when someone removes or updates categories, subcategories or products so that he or she can check the header responses for the new and old URLs. This task can be easily automated.

Technical architecture

At the beginning of this section, I mentioned that site architecture (SA) is made of information architecture (IA) and technical architecture (TA). We then looked at several information architecture topics. Now it is time to discuss technical architecture.

While duplicate content and crawlability issues are well-known SEO headaches, many search engine optimizers categorize them under the information architecture umbrella. However, they are in fact technical issues. Most of the SEO tips you will learn during the next chapters are addressing technical architecture issues.