Ecommerce SEO

CHAPTER 2

Website Architecture

Length: 10,291 words

Estimated reading time: 1 hour, 10 minutes

This e-commerce SEO guide has almost 400 pages of advanced, actionable insights into on-page SEO for ecommerce. This is the 2nd out of 8 chapters.

Written by an e-commerce SEO consultant with over 25 years of research and practical experience, this comprehensive SEO resource will teach you how to identify and address all SEO issues specific to e-commerce websites in one place.

The strategies and tactics described in this guide have been successfully implemented on top 10 online retailers, small & medium businesses, and mom-and-pop stores.

Please share and link to this guide if you like it.

This chapter will explore the concepts behind building optimized ecommerce website architectures.

A great site architecture means making products and categories findable on your website so users and search engines can reach them as efficiently as possible.

There are two concepts you should be aware of regarding site architecture:

  • Efficient crawling and indexing. This refers to the technical architecture or TA.
  • Classifying, labeling, and organizing content. This refers to information architecture or IA.

Together, information and technical architecture form the site architecture (SA). Understanding these two concepts will help you build search-engine-optimized websites that are search-engine and user-friendly.

It is important to differentiate between information and technical architecture:

  • Information architecture is the process of classifying and organizing content on a website while providing user-friendly access to that content via navigation. This process is done (or should be done) by information architects.
  • Technical architecture is designing a site’s technical and functional aspects. Web developers mostly do this.

Keep in mind that SEO involves both information and technical architecture knowledge.

Information Architecture

The Information Architecture Institute’s definition of IA is:

  • The structural design of shared information environments.
  • The art and science of organizing and labeling websites, intranets, online communities, and software to support usability and findability.
  • An emerging community of practice focused on bringing principles of design and architecture to the digital landscape.

This definition shows that information architecture goes beyond websites and hints at its complexity. It also reveals how flexible and theoretical information architecture is.

From an ecommerce standpoint, let’s oversimplify the definition of information architecture to this single sentence:

The classification and organization of content and online inventory.

You should be familiar with two other important information architecture concepts: taxonomy and ontology. While these names might be intimidating, the concepts are easy to understand.

Taxonomy is the classification of topics into a hierarchical structure. For ecommerce, this translates into assigning items to one or more categories. Ecommerce taxonomies are usually vertical, “tree-like” structures. A website’s taxonomy is often referred to as its hierarchy. To visualize a taxonomy, think of breadcrumbs.

Notice how the breadcrumbs above mimic the website taxonomy. In our examples, one branch of the taxonomy “tree” leads to Duvet & Comforter Covers and the other to Aloe Vera Gels.

The structures depicted in these two screencaps are ordered using a parent-child relationship, from broader to narrower topics, and they are called taxonomies. One way to create ecommerce taxonomies is to use a controlled vocabulary, a restricted list of terms, names, labels, and categories. Usually, it is the information architects who develop these vocabularies.

In terms of SEO, you should use semantic markup to help search engines understand taxonomies. One such markup can be applied to your site breadcrumbs.

Search engines use Microdata or RDFa markup to generate breadcrumb-rich snippets similar to this one:

Search engines can sometimes display the website taxonomy directly in the search engine results pages (aka SERPs).

Search engines can sometimes display the website taxonomy directly on search engine results pages (SERPs).

We will discuss breadcrumbs in detail later in this book, but briefly, this is how the source code for the previous rich snippet example looks like:

Figure 3 -The highlighted text shows the Breadcrumb vocabulary markup.

The second information architecture concept you need to be aware of is ontology. It means the relationships between taxonomies.

If an ecommerce hierarchy can be visualized as an inverted tree, with the home page at the top, then an ontology is the forest showing relationships between trees. An ontology might encompass various taxonomies, with each taxonomy organizing topics into a particular hierarchy.

An ontology is a more complex taxonomy containing richer information about a website’s content and items. We are just beginning to build ontology-driven sites, and one standard ontology vocabulary for ecommerce is GoodRelations.

The Semantic Web aims to help artificial intelligence agents such as search engine bots crawl through and categorize information more efficiently. It is also designed to assist in identifying relationships between items and categories (e.g., relationships between manufacturers, dealers, and prices).

Figure 4 – Related Categories or Related Products can be considered a form of ontology.

Suppose you are not an information architect or a business analyst. In that case, you probably will not be involved in identifying related categories and products, but it is important to know these terms in your discussions with information architects.

Sometimes, related categories and products are automatically identified by the ecommerce platform or by specialized software.

Why is information architecture important for search engines?

A correctly designed information architecture will result in a tiered website architecture. A good architecture has an internal linking structure that will allow child pages (pages that can link upwards in the hierarchy, such as product detail pages or blog posts) to support the more important parent pages (upper-level pages that link down in the vertical hierarchy, such as category and subcategory pages).

Figure 5 – Pages that link to each other at the same hierarchy level are called siblings. They share the same parent.

With correct internal linking, a blog article, for example, “Top 5 New Features of Canon Rebel T5i DSLR,” will support the product detail page Canon Rebel T5i DSLR. Canon Rebel T5i DSLR will support the Digital Cameras category, further supporting the top-level category: Electronics.

Figure 6 – This pyramid-like structure is a very common architecture for ecommerce.

One of the questions that often comes up when deciding on the hierarchy is, “What is the best number of levels to reach a product detail page?”

The famous three-click rule, which suggests that every page on a website should take no more than three clicks to access, is OK to use as a guide, but do not get stuck on it. However, it is perfectly fine if you need a fourth level in the hierarchy.

Information architects, business analysts, or merchandising teams can help identify relationships between categories, subcategories, and products. Based on these findings, you will decide on rules for an internal linking strategy. Such rules can include:

  • Only highly related categories will interlink.
  • Categories will link only to their parents.
  • Subcategories will link to related subcategories or categories.
  • Product pages link only to related products in the same category and parent categories.

A proper website architecture will help your website rank for the so-called head terms. For e-commerce websites, these are usually the category pages at all hierarchy levels. However, internal linking is insufficient for a subcategory page to reach the top of the search engine results pages for category-related search queries.

Because head terms are usually competitive, a page targeting such terms should also include the following:

  • Relevant and useful content. This means that your listing pages should display more than just a list of items. You must present more than just product pictures and pricing on product detail pages.
  • Backlinks from related trusted external websites.

Additionally, proper information architecture means good usability. Great usability and content create an excellent user experience, leading to an increased dwell time (which is good for SEO).

Dwell time is the time a searcher spends on a page before returning to the SERPs. The longer this time is, the better.

Pogo-sticking means going back and forth between a SERP and the web pages listed in the results. For example, you search for something, click on the first result, are unhappy, and return to the SERP. Then you click on the second result, you are still not happy, and you go back to the SERP again, and so forth, until you find what you are looking for or until you refine your search query.

A SERP bounce happens when a search engine user clicks on your page in the SERPs and then returns to the results without interacting with any page elements.

Note that a high SERP bounce is not inherently bad for SEO, but a low dwell time might be. An increased dwell time sends quality signals because it hints to search engines that your page is relevant for a search query.

Navigation, such as primary, secondary, breadcrumbs, or contextual, is also one of the critical components of website architecture. Navigation is jointly crafted by various business members, led by the information architect. Given that the primary navigation will be present on almost every page, it influences how authority and link signals (i.e., PageRank and anchor text) are passed to other pages.

Fortunately, there are ways to give users what they want (findability, discoverability, and usability) and simultaneously guide search engine bots toward what you want them to discover, crawl, and index.

How can SEO add value to IA?

Remember, information architecture is not about technical issues but about organizing digital inventory and content. So, while SEO has a key role in information architecture, it should not dictate how information is labeled and organized. Information architecture is about making content easy to find and helpful for users. However, because most SEOs are biased towards marketing and technology rather than user experience and usability, it is advisable to involve an information architect and an SEO consultant when working out the information architecture.

Try to involve the SEO person from the initial stages of the information architecture process, to provide suggestions and feedback from a search engine standpoint, and to contribute to the overall site architecture discussion. Once the information architect designing the draft information architecture listens to what the SEO says, they can brainstorm with the other teams about implementing the SEO recommendations with minimal changes to the initial information architecture format.

Technology and marketing teams often dismiss a certain information architecture because it does not have traffic potential. Do not make that mistake. When optimizing for search engines and their users, you should listen to what other teams in the business have to say and only then suggest solutions.

As mentioned, SEO’s role is to provide consultancy from the perspective of search engines. Let’s look at a few areas where SEO input is valuable.

The concept of flat architecture

In a flat architecture, deep pages – pages at the lower levels of the website hierarchy (usually the product detail pages) – are accessible to users and search engine bots within a balanced number of clicks for users (or hops, for bots).

Figure 7 – This figure depicts what flat website architecture looks like.

The opposite of flat architecture is the so-called deep architecture, and it may look like the diagram on the next page:

Figure 8 – In a deep-architecture model, pages are mostly linked in a vertical structure.

We will use math to illustrate the concept of flat architecture:

  • At level 0 (home page), you link to 100 category pages; 100^1=100 pages linked.
  • From each page at level 1 (the category pages), you link to 100 subcategory pages and 100^2=10,000 subcategory URLs.
  • From each page at level 2 (the subcategory pages), you link to 100 product pages; 100^3=one million product page URLs.

In three “clicks,” search engines can reach and crawl (and eventually index) one million pages.

Note: the 100 links-per-page example was used as a guide only. You can have more or fewer links, depending on your site authority.

Let’s look at the scenario of a direct visit to your homepage. To reach a product detail page from the home page, a user will have to perform the following actions:

  • First, click on the Cosmetics category page.
  • The second click is on the Eye subcategory page.
  • The third click on the product details page.

If no external links point directly to that product details page (known or abbreviated as PDP), search engines will find the PDP URL similarly to users. The bot will crawl from an entry page and eventually reach the product detail page. Keep in mind that search engines will enter your website through a multitude of URLs, not only through the home page.

In our scenario, it took only three clicks to reach the PDP, but if the website is structured using deep information architecture, it might take users and search engines more clicks or hops.

But how and why did we adopt flat architecture?

The concept of flat website architecture seems to have its roots in web design, and it started with the three-click rule becoming a best practice around the year 2000.

However, when usability experts tested this rule, they found it did not work for users as expected. As a matter of fact:

“Users’ ability to find products on an ecommerce website increased by 600 percent after the design was changed so that products were four clicks from the homepage instead of three” (p. 322).

Then smart SEOs jumped in, thinking that if the rule was good for users, it should also be suitable for search engines. SEOs found a way to funnel more PageRank to deeper levels and optimize crawling by providing shorter paths for search engines. However, the initial goal was to avoid ending up with pages in the supplemental index because of their very low PageRank; it was not to flatten the site architecture.

Here are a few important pointers about flat architecture:

  • Unless you sell a limited number of products (e.g., just ten dietary supplement pills) or unless you have a very limited number of pages on the site, do not flatten to the extreme. That means not linking from the home page to hundreds of product detail pages to build a flat architecture.
  • Flat architecture is about the distance between pages in terms of clicks, not about the number of directories in the URL. For example, you can link from the home page directly to a subcategory URL at the fourth level of the hierarchy (e.g., mysite.com/Home-Garden/Furniture/Living-Room-Furniture/Recliners/) to promote a subcategory that generates high profits. In this example, the Recliners page is only one click away from the home page (which fits the flat architecture concept). Still, it is four levels down in the directory hierarchy (which matches the deep architecture concept).
  • If you have already organized your hierarchy using URL directories, do not remove them just for flattening.

As long as the directories do not generate super-long URLs, they have advantages such as:

  • Facilitating easier website “theming” (we will discuss this in the Siloing section).
  • Presenting users with a clear delineation of the categories on your website.
  • Allowing for easier SEO, information architecture, and web analysis (e.g., you can use site:domain.com/directory/ to troubleshoot indexation problems).
  • Google and other search engines may use your directory structure to create rich snippet breadcrumbs.

Figure 9 – SERP breadcrumbs will show up only if the directory hierarchy is clear to search engines.

In this screenshot, you can see how Google displays breadcrumbs directly in SERPs. However, such rich snippets will show up only if the directory hierarchy is structured or if you mark up your breadcrumbs with Schema vocabulary.

URLs don’t need to replicate the exact website taxonomy. If you want, you can keep the URL structure under two directories deep. Here’s an example.

On hotel reservation websites, it is common to have a taxonomy based on hotel geo-locations:

Taxonomy: Home > Europe > France > Ile-de-France > Paris

URL: domain.com/europe/france/ile-de-france/paris/

Even though the URL reflects the hierarchical taxonomy, it is too long and difficult to type in or remember.

If the website sells only hotel rooms, the alternative URL might look like:

domain.com/france/paris/

If the website offers other travel services, such as air tickets or car rentals, then the alternative URL will include the type of service, and it might look like this:

domain.com/car-rentals/france/paris/

Regarding the directory structure for hotel booking websites, it is worth noting that hotels are a special ecommerce case because you cannot re-categorize hotels from one city to another. However, for online retailers, product re-categorization happens frequently.

Keep the PDP URLs free of categories whenever possible to avoid issues with moving products from one category to another or issues related to poly-hierarchies (items categorized in multiple categories).

For example, to reach the product page 3-Level Carousel Media Center, a user will navigate through:

homepage – mysite.com/

category page – mysite.com/office-furniture/

subcategory page – mysite.com/office-furniture/storage/

sub-sub category page – mysite.com/office-furniture/storage/media-storage/

However, once the searcher reaches the product detail page, the URL is free of categories and subcategories:

mysite.com/3-level-carrousel-media-center.html

Tip: setting product names in stone is also a good idea.

Notice a couple of things about the previous URLs:

  • The product page URL is free of category, subcategory, or sub-subcategory names.
  • The category and subcategory URLs include the trailing forward slash (/) at the end. That hints to search engines that the URLs are directories, and more content can be found on those pages.

Figure 10 – This is how Google treats trailing slashes in URLs.

  • The product page has a .html file extension. The file extension hints to the search engines that the document is an HTML page, not a directory. The file extension can be anything, i.e., .php or .aspx—because the file extension does not matter at all to search engines.

Removing category names from URLs is a trade-off with your data analysis, as it will make the web analysis a little bit more challenging. However, this difficulty is surmountable. For example, you can group pages in your analytics tool or markup the HTML code with different strings to group pages based on your rules.

At the same time, make sure your web analytics tool is set up to group pages for analysis easily. Without unique identifiers for URLs, it is more difficult to segment data. You can also use tag managers such as Google Tag Manager to create content groups using Data Layers.

Figure 11 – The flat architecture concept on an ecommerce site.

Siloing

In the simplest terms, siloing means creating a site architecture that allows users to find information in a structured manner while linking pages using a controlled pattern to guide how search engine bots crawl the website. Usually, this structure is a vertical taxonomy.

Siloing sounds like a fancy term, but it is just good information architecture because siloing is a one-part website hierarchy and one-part navigation (using internal linking).

Figure 12 – At a basic level, siloing means that pages in a taxonomy branch/category (i.e., PDPs) should not link to pages in a different branch/category.

In strict hierarchy patterns, child pages are only linked to and from their respective parent pages. This is not possible without a strictly controlled internal linking, and it is challenging to create such a strict internal linking pattern mainly because:

  • Primary navigation is present on all pages so that cross-linking will happen naturally.
  • Poly-hierarchies, which means multi-categorizations for products or subcategories. For example, the Office Furniture category can be categorized under Office Products and Furniture.
  • Subcategory cross-linking and crossover products. For instance, you may have to link from a product categorized under Home Theater to another product made by the same brand but categorized under Audio.

Because ecommerce websites are complex, they are most likely to have a hierarchy that frequently interlinks silos. In practice, it is complicated and, sometimes, not even advisable to prevent internal linking between silos.

Figure 13 – Cross-linking happens naturally

The internal linking architecture can be very cumbersome and difficult to control, even for ecommerce websites with just a few hundred products, as you can see in this graph:

Figure 14 – This node graph shows how complex internal linking can be.

In this example, a website with just a few thousand pages generated over 250,000 internal links.

The siloing method

Conceptually, siloing is done by identifying the main themes of the website (for ecommerce, those will be departments, categories, and subcategories) and then interlinking only pages belonging to the same silo (for example, linking only within the same category).

The good part is that ecommerce websites are usually developed using a similar architecture, with separate hierarchies (themes) for each department or category.

By siloing the website into themes, you can rank high for semantically unrelated keywords with the same site, even though the themes are entirely different, e.g., “hard drives” and “red wines”.

You can achieve silos with directories or with internal linking.

Directories

Information architects create hierarchies using user research, user testing, keyword research, and analyzing your web traffic. The URL structure will present the labels used to describe these hierarchies. Your silos will be the directories in the URLs.

Whenever possible, use a hierarchy created with directories.

Internal links

With internal linking, you create virtual silos, as pages in the same silo do not need to be placed in the same directory. You achieve virtual silos by controlling internal links in such a way that search engine bots will only find links to pages in the same silo. This concept is similar to bot herding or PageRank sculpting, with subtle differences in meaning and application.

Siloing with directories

Siloing with directories is the easiest to implement on new websites during the information architecture process. From a user experience perspective, creating the website hierarchy with directories is the best way to go.

But in the end, siloing with directories is nothing less than creating good vertical hierarchies, which the URLs reflect. Many online retailers create them naturally by branching out all categories, without overthinking about SEO and without being obsessed with keywords in the anchor text or with internal linking patterns.

A sample silo with directories would look like this:

mysite.com/category1/subcategory1/

mysite.com/category1/subcategory2/

mysite.com/category1/subcategory3/

….

mysite.com/category1/subcategoryN/

Does this type of siloing look familiar to you? It should be if you use directories in your URLs. Moreover, this is nothing more than a proper hierarchy. So, if you design your website hierarchy correctly, you do not even need to worry about siloing with other methods.

Keeping the directory depth low is best practice, ideally fewer than four or five levels.

Siloing with internal linking

For example, siloing with directories may not always be possible if you wish to change an existing hierarchy on an established website. In this case, you will create virtual silos using carefully controlled internal linking.

Usually, pages in a silo need to pass authority (PageRank) and relevance (anchor text) only to other pages within the same silo. This prevents the dilution of the silo’s theme and sends the maximum power to the main silo pages.

Here are some rules for linking within and between silos. A page in a silo:

  • It should link to parents.
  • It can link to siblings if appropriate. Siblings are pages at the same level in the hierarchy.
  • It should not link to cousins.
  • It could, eventually, link to uncles or aunts. Uncles and aunties are siblings of the node’s parent.

Figure 15 – An over-simplified siloing diagram.

In this simplified siloing example, sibling number one could eventually link to uncles, siblings of that node’s parent. That means that if you have to link two related supporting pages found in separate silos (which are called cousins), you should link only to the silo’s uncles.

If you need to link to pages outside the silo, you can block those links from being accessible to search engines (e.g., using AJAX – Asynchronous JavaScript and XML, iframes, JavaScript with robots.txt). Note that there is a fine line between white hat and gray hat SEO; such linking may cross that line. This is because Google’s definition of manipulative techniques lies in answer to the question: “Would you do it if search engines did not exist?”

The goal is not to take siloing to the extreme. If a page is relevant and you want to link to it, then do so, even if it is in a different silo or theme.

Siloing with internal links is a powerful advanced SEO technique, especially for large websites with multiple departments, themes, or categories that are not semantically related, i.e., groceries and mobile phones. However, it is important to know that siloing is not easily achieved, and it pays to be aware of the existing dangers.

If you want to silo with internal linking, know that:

  • PageRank sculpting with rel=” nofollowis not recommended.
  • Virtual siloing means you somehow have to “hide” internal links from search engine bots; doing so may fall outside search engine guidelines.
  • Hiding internal links from search engines using iframes, AJAX, JavaScript, or similar techniques can qualify as cloaking since you show users content different from search engines; this could result in penalties.
  • If you want to obfuscate links with AJAX or JavaScript for SEO reasons, identify the percentage of users with JavaScript turned off. If that is a significant segment of your total visitors, ensure your website works correctly without JavaScript. Non-JavaScript users should be able to finish all micro and macro conversions on your site. An example of a micro-conversion will be an “add to cart” event, while a macro-conversion is a completed order.
  • Trading away too much for SEO at the expense of usability and accessibility is not the right way.
  • Siloing may require hiding entire navigation elements, such as facets and filters, from search engines. There are risks associated with such bold tactics.

Figure 16 – The nofollow links are marked with the red-dotted border

The image above shows how only the top-level categories (Women, Men, Baby, etc.) and the immediate next hierarchy level (Clothing, Shoes, Accessories, etc.) pass authority through links. Category links are nofollow. This is a bold (likely bad) SEO approach to handling primary navigation menus.

Proper internal cross-linking is helpful and necessary for good rankings, and we will discuss this in detail in the Internal Linking section. However, remember that internal linking must be built for users first and only then for search engines. It would be best to link consistently, thematically, and wisely (using synonyms, stems, plurals, singulars, and so on) to support rankings for categories and subcategories.

You should not remove navigation elements just for SEO purposes. Keep the links that are useful for users in the interface, and if you want to remove links for SEO reasons, do it by blocking those links with AJAX or JavaScript.

Another theming method is to evolve taxonomies into ontologies: instead of linking based strictly on a vertical taxonomy, interlink conceptually related items. For example, you can interlink a particular fragrance with the sunglasses manufactured by the same brand. This type of interlinking requires defining semantic and conceptual relationships between categories and items and then deciding on internal linking based on predefined business rules.

One such business rule is crowdsourced recommendations (AKA Customers Who Bought This Item Also Bought…). Do users often buy certain products together? If yes, then cross-link those product detail pages, even if they are in different silos.

If this type of linkage generates too many internal links on some pages, you can always block the less important links (you must define how many links are too many for your particular situation). However, for users’ sake, interlink whenever necessary without concern about siloing.

If the business rules are based on data, you will not link adult toys to children’s books. Also, you will not link to hundreds of related products but just to a few highly related items.

Here’s what Google has to say about the subject of theming an internal architecture in a post on their official blog:

Q: Let’s say my website is about my favorite hobbies: biking and camping. Should I keep my internal linking architecture “themed” and not cross-link between the two?

A: We haven’t found a case where a webmaster would benefit by intentionally “theming” their link architecture for search engines. And, keep-in-mind, if a visitor to one part of your website cannot easily reach other parts of your site, that may be a problem for search engines as well.

This is a reminder not to take siloing to the extreme. However, siloing with directories is natural, and the resulting internal linking is also great for users and search engine bots.

I lean towards a hybrid siloing concept combining the following:

  • Good website hierarchy is reinforced by directory structure (a patented Google signal for classifying pages).
  • Rule-based internal linking.
  • Depending on the case, fewer links are available to search engines, which can be done with or without AJAX/JavaScript. We will discuss this subject later in the course.

Generate content ideas

It is widely known that keyword research can help with generating content ideas. Keyword research also enables you to expand from a relatively narrow set of head keywords (category and subcategory keywords) to a large number of torso and long tail keywords. These long-tail keywords can then be used to generate content ideas, identify product attributes, and improve product descriptions.

Based on the initial taxonomy created by the information architect, you can identify keyword patterns, tag user intent, group keywords according to buying stages, and find search volumes; I will cover these tactics in the Keyword Research section.

This type of research provides excellent insights usually overlooked by the other teams in an ecommerce business.

Suppose you want to consistently publish content that your target market will find relevant consistently. In that case, you must know the queries searchers use and, more importantly, the type of content they seek. Are they looking for general information about your products? If so, you would do well to emphasize review-type content and how-to articles. Are they searching for products to buy? If so, you could improve the content on a product detail page.

You can better address your target market needs once you understand what they want by discovering the user intent behind the search query. When you do so, you will be better able to address their needs on your landing pages. When your landing pages address people’s needs, conversion rates will skyrocket, and rankings may improve (as an indirect quality signal).

Here are some interesting facts about search queries:

Figure 17 – The search demand curve, as explained by MOZ. Notice how the long tail of keywords and chunky middle make for more than 80% of the keywords.

Why did I mention these search query facts?

It is because the correct way to start keyword research and build a great website architecture is by recognizing that only a small fraction of your target market is ready to buy at any given moment. Many e-commerce websites mistakenly focus on targeting keywords such as department, category, or subcategory names while completely ignoring a large number of informational search queries (and even navigational). I will detail a keyword research process in the Keyword Research section of the book.

Let’s look again at a typical ecommerce website architecture:

Figure 18 – Under this sample hierarchy, product detail pages are not supported by any other content-heavy level below the PDP level.

There are four levels in the example above: The first level is the home page, which is supported by categories (second level), subcategories (third level), and product detail pages (4th level). The subcategory and product detail pages support the category level; product detail pages then support subcategory pages. However, the product pages are the “leaves” in this example – the last level of the e-commerce hierarchy.

When an ecommerce website does not support important pages (i.e., categories or PDPs) with an additional content-heavy level in the hierarchy, it can miss a considerable amount of organic traffic coming from informational search queries. It will also miss out on the ability to create useful contextual links to product, subcategory, and category pages.

In our example, you can overcome these challenges by creating a 5th level in the hierarchy. This level can be a blog, a learning center, or a projects section on the website, to name just a few ideas. This content-rich section can also be outside of your existing hierarchy.

I could not find a single reference to their blog on the Victoria’s Secret website. This is bad for them but good news for the small guys competing in their niche.

Figure 19 – Only five pages on this website contained the word “blog”, and none were part of a real blog.

Here are two ideas for you:

  • Add a new layer of support for all pages on the website, especially for product and subcategory pages. As I mentioned, this layer can be a blog, a forum, expert Q&As, how-to guides, buying guides, white papers, workshops, etc. This layer will generate additional organic traffic and support contextual, internal linking. Additionally, it may help build a community around your brand, which is always great.
  • Conduct keyword research with this new level in mind so you will not dismiss informational keywords. Categorize such keywords into the Informational bucket in your spreadsheets and plan content based on them. There is more about this process in the Keyword Research section.

Let’s say you sell home improvement items and want more people to visit your website and buy them. However, many searchers in this niche are DIYers, using keywords specific to the awareness and research stages. Then why not create a series of DIY home improvement projects and publish them on a content-heavy website section?

Look at the following inspiring piece of content from Home Depot’s blog. Home Depot is not into selling instructional DIY DVDs, but they are attracting the target market with highly related content. Home Depot has an entire DIY section on its website.

Figure 20 – This page supports category and product pages by linking to them.

When you add a new content-rich layer in the hierarchy, you:

  • Expose your brand to your target market in the early stages of the buying funnel.
  • Add a new way to generate more traffic.
  • Give visitors more reasons to buy from you.
  • Reinforce product and category pages with better internal linking.

Let’s see how SEO could help regarding information architecture.

Evaluate the information architect’s input.

Planning an e-commerce architecture starts with information architects identifying the navigation labels such as departments, categories, or subcategories.

In many cases, information architects do not associate this process with the keyword research process, which is good because navigation has to serve the users, not the bots. However, you should evaluate the architect’s input from a search engine perspective.

Here’s an example of how to do that using Google Trends. If the information architect wants to label one of the categories in the primary navigation as mp3 players, the following search trend comparison data might change their mind.

Figure 21 – The trend for “iPod” is downwards, but it is still a few times more than the one for “mp3 player”.

Indeed, the iPod can be a child of the mp3 player parent. Still, it would be best to brainstorm with others in the team to decide whether making the iPod category easier to find would be more beneficial for users, which may mean displaying it directly in the primary navigation.

The search volume for a parent category is often higher than the search volume for a child category, but as you can see in this example, this rule is not definitive.

Also, note that Google Trends displays normalized data on a scale of 1 to 100, where 100 is the highest search volume ever recorded. Google Trends does not present absolute search volumes.

All e-commerce websites will have primary navigation (aka global or main navigation), secondary navigation (aka local navigation), and some contextual navigation. Another form of navigation specific to ecommerce websites is faceted navigation.

Primary and secondary navigation

Primary navigation is for the content/links most users are interested in, but remember that importance is relative (something important for your business may not be as important for another business). Generally, on e-commerce websites, primary navigation displays departments, categories, or market segments (i.e., men, women, kids, etc.).

Primary navigation is the easiest type for most users to identify. It allows direct access to the website’s hierarchy and is displayed on almost every page.

Figure 22 – A sample primary navigation on Kohl’s website.

On a side note, it will be difficult for Kohl’s to rank for top-level category keywords (e.g., Home, Bed & Bath, Furniture, Outerwear, etc.) since they will have to compete with niche-specific websites that are laser-focused on a single segment—for example, a company that sells just furniture. Kohl’s can achieve good rankings but will require significant work, including onsite SEO and quality backlink development.

Regarding secondary navigation, even information architecture experts like Steve Krug, Jesse-James Garret, and Jacob Nielsen cannot agree on a definitive definition.

Secondary navigation stands for content of secondary interest and importance to users. Again, importance is relative to each business.

Strongly connected with navigational links, an SEO best practice recommends keeping the number of links on a page under 100. However, this is an obsolete rule; you can list more than 100 links on your pages, depending on your website’s authority.

You will see high authority websites like Walmart listing hundreds of internal and external links:

Figure 23 – There are 633 links on this page. This may be too many unless you have an excellent site authority.

For usability reasons, Walmart’s many links result from using the so-called fly-out mega menus in the primary navigation. This type of menu makes deeper sections of the website easily accessible to users.

Mega menus allow direct linking to subcategories and products, but you must be careful to keep the number of links to a reasonable limit. Since the primary navigation is present on most pages, it significantly influences how authority moves back and forth between pages.

Consolidating a long list of departments into one place involves design considerations (limited screen estate) and user experience (too many options to skim at once). However, it also affects the PageRank passed to the other pages.

Figure 24 – Design limitations forced Walmart to reduce the number of links in the navigation. Notice the “See All Departments” link at the bottom of the primary navigation.

However, Walmart has a separate page for the complete list of their departments (i.e., health) and categories (i.e., vitamins):

Figure 25 – The “All Departments” consolidation is a clever idea because this page will act as a sitemap for people and search engine bots.

SEO can help information architects decide which categories are the most important for users and should be listed in the primary navigation. Use web analytics tools to identify metrics such as the most searched terms on the website, the most viewed pages, and the highest search volume from pay-per-click campaigns.

Figure 26 – The keyword with the highest number of internal site searches could eventually be placed in the navigation if it makes sense, or it can be placed near the search field.

Contextual navigation

Contextual navigation refers to the navigation present in the main section of web pages. It excludes boilerplate navigation items like those displayed in headers, sidebars, or footers.

Some examples of contextual navigation on ecommerce websites include sections such as:

Figure 27 – Customers who viewed this item also viewed

Figure 28 – Best Sellers

Figure 29 – Contextual text links in the main content (MC) areas.

Figure 30 – Links in Recommended Products carousels.

You must discuss contextual navigation with the information architect to identify relevant relationships between categories, subcategories, and products and plan the internal linking accordingly.

Prioritization

SEO can help with the prioritization of labels in the navigation.

It is helpful to know how many pages will be linked from structural sections of the website (primary, secondary, and footer links) on each page template. This is important to estimate because you must determine how many links you can display in the contextual navigation (only if you need to limit the number of links on pages).

This is not a definitive rule, but if you start a new website, keeping the number of links on each page to a maximum of 200 is a good idea. This is because you will initially have only a small amount of authority to pass to lower levels.

Here are some prioritization guidelines:

  • Keep the number of top-level categories or departments in the primary navigation low to avoid the paradox of choice. Research has established that having too many options is bad for decision-making.
  • The short-term memory “rule of seven items” does not apply to primary navigation, as users do not need to remember the labels.
  • You can list more categories on a “view-all departments” or “view-all categories” page.

Figure 31 – In a horizontal design, the primary navigation is constrained by design space.

As you can see in the examples above, the primary navigation is constrained by a horizontal design space. Notice how short the category names must be. Macy’s displays eleven labels in the primary navigation, the same as with BackCountry, while Office Depot lists only nine.

  • Vertical primary navigation placement allows for more categories to be listed:

Figure 32 – Costco displays 18 categories in the menu (the same as Sears), while Walmart displays only 13.

Specialty retailers will probably have less than two or three departments (sometimes none). In those cases, they may not list departments in the menu but categories. General department stores can have up to 20 departments.

  • You can break each category level into 20 to 40 subcategories, depending on how extensive your inventory is.
  • If a parent category needs more than 40 subcategories, consider adding a new parent category or implementing faceted subcategories.
  • Ideally, the hierarchy depth to reach a product detail page should be under four levels:
    • Two levels deep: home, category, and product detail page (this is suitable for niche retailers).
    • Three levels: home, category, subcategory, and product detail page (this is the most common setup for medium-sized e-commerce websites).
    • Four levels deep are home, department, category, subcategory, product detail page OR home, category, subcategory, sub-sub category, and product page. This setup is specific to marketplaces, large department stores, or websites with extensive inventories.
  • If the hierarchy has more than four or five levels, use faceted navigation to allow filtering by product attributes.
  • To improve the authority (PageRank) and the relevance (anchor text) of product detail pages, add a content layer (e.g., blog, community forums, user reviews, and so on) in the hierarchy just below the product detail page level and link to relevant items from there.
  • Ordering categories (or items) alphabetically is not always the best option. You should prioritize based on popularity and logic whenever possible and, eventually, complement it with alpha navigation if user testing proves that such a type of navigation is useful.

Figure 33 – An older version of primary navigation on OfficeMax, featuring alpha navigation.

Figure 34 – Newer screenshot after OfficeMax tested the alpha navigation and reverted to category name navigation.

  • If a category has too few items, consider moving them to an existing category with more items, but do this only if the new categorization makes sense for users.
  • A category with too many items (i.e., thousands) may generate information overload. In this case, you can break the category into smaller subcategories. Additionally, create a user experience that allows better scope selection before displaying a list of items.

Keyword variations

Planning a categorized product hierarchy is not easy. At the top category level, the labels in the primary navigation must be intuitive, have the appropriate search volumes, and be concise enough to support menu-based navigation. It is worth repeating that determining the hierarchy of an ecommerce website based solely on keyword research is neither ideal nor recommended. However, keyword research should complement and support information architecture.

One common question regarding keywords is handling misspellings, synonyms, stemming, or keyword variations for a category. Where do you place them in the website’s information architecture?

This should be easy for your internal site search: you must associate each keyword variation, misspelling, etc., to an existing product or category and redirect users to the respective canonical product or category page. If there is no exact match between the variation, misspelling, or synonym and a category on your site, send users to an internal search result page.

For example, when someone searches for “tees,” “tee shirts,” or “t-shirts,” you return results for “t-shirts.” You can redirect the searcher to the t-shirts category landing page or product listing page if there is an exact match between the search query and the category name.

Figure 35 – Make sure your internal site search works appropriately and does not return wrong products, as in this example (a search for “t-shirt” returned bras).

In this screenshot, I wanted to highlight the improper handling of internal site search results, returning bras when someone searches for “t-shirts”.

Handling keyword variations for external search engines is a bit more complicated. Commercial search engines like Google and Yahoo must understand and connect keyword variations with the right content on your website.

Previously, you would’ve created individual pages to target keyword variations (or a group of keyword variations). However, Google shifted to ranking topics instead of individual keywords. Therefore, your pages must include the searched keyword and semantically related words (e.g., synonyms, plurals).

Ensure you are not overdoing it; including all 20 possible keyword variations on a single page is spammy.

Here are some ideas for you to consider:

Target the most common variations in the title and description or both.

Figure 36 – Gap targets keyword variations in the description, while Sears uses the title tag.

Use product and category descriptions.

One option is to use category or product description sections to add keyword variations in the copy. The bottom of the image below highlights how this website uses two keyword variations for “t-shirts”.

Figure 37 – This retailer uses the words “tees” and “t shirts” in the category description copy to capture traffic for those keyword variations.

Take advantage of related searches.

This approach requires displaying a “related searches” section on your pages. This section may contain several of the most used keyword variations:

Figure 38 – Remember that Related Searches sections should be useful for users first and only then for search engine bots.

Identify possible information architecture problems.

You can perform the “site:” query on Google, for example, “category_name site:mysite.com” (without quotes), to see whether search engines list the right page at the top. You can also use products and subcategories in the site: query. For example, you can search for:

site:www.costco.ca/ gourmet products

site:www.costco.ca/ “gourmet products”

If the page you optimized for on your website does not show up at the top of the results, various reasons are possible, such as:

  • Improper internal linking. This happens when the internal linking architecture does not support the correct page.
  • Thin content, no content, or inaccessible content (e.g., JavaScript reviews) on the right page.
  • External links point to the wrong page(s), diluting and reducing the relevance of the correct pages. If people link to the wrong pages, you must ask yourself why. Maybe those other pages are more relevant to them?
  • Page-specific penalties.

Of course, an in-depth analysis is required to identify the cause of these issues. When determining the cause of such problems, it is important to understand how the targeted page (the page you want to rank with at the top of the SERPs) is linked internally from other pages on your website and external sites.

One of the tools for analyzing this is Google Search Console:

Figure 39 – The Internal Links report will display the most important internal links, but only for the most important pages on the website.

This report is basic, but it can provide some immediate insights. Look for signals such as:

  • Are there more internal links to the wrong page(s) than to the desired page?
  • Is the targeted page linked from parent pages (pages higher in the hierarchy)?
  • Is the targeted page linked from pages with high authority?
  • Is the targeted page linked with the proper anchor text?

If there are issues like these, it is time to restructure your internal linking. Remember that Google will not let you download the complete list of links, only the top ones.

Another useful method to assess the internal linking is to run a crawl on your website using tools like Xenu Link Sleuth or Screaming Frog and export the results to Excel.

It is also a good idea to run the most important terms on your internal site to check whether there is a match between the URL returned by your internal site search and the URL returned by search engines.

For instance, let’s say that Google returns the Gourmet Products category URL in the first position when you search for “site:costco.ca gourmet products”. If you were to click on the result, the Gourmet Products page opens:

Figure 40 – Costco’s organic search landing page, pointing visitors to the right category page.

However, Costco’s internal site search returns a different page: a search results page. This is not the best approach from a usability point of view or for search engines because Google does not want to list other results pages in its SERPs.

Figure 41 – In Costco’s case, this mismatch may happen because of the setup of the internal site search rules.

When there is an exact match between a user’s query and a category name, it is preferable to redirect the user to the listing page instead of to a search results page.

Labeling

Regarding choosing the links’ names in the navigation, labeling is an area where information architecture and search engine optimization overlap. SEOs and information architects must understand the user’s mental model to label the navigation correctly. Labeling is difficult and presents a real challenge for large ecommerce websites. Research from eBay shows how complicated it can get.

While most ecommerce taxonomies can be architected based on a predefined vocabulary, SEO can assist in labeling.

Let’s say you sell toys. Start by searching for the category name (“toys”) using Google’s Keyword Planner:

Figure 42 – Do not forget to set up the targeting options based on your target market.

Download the list generated by Keyword Planner and open it with Excel. Then, categorize keywords into “buckets” by mapping each keyword to either its category, attribute, or filter name:

Figure 43 – Categorize keywords into “buckets”.

Insert a pivot table that counts the occurrences of the category:

Figure 44 – Sort by Count of Category.

If you sort by Count of Category, you can get an idea of what needs to be present in the navigation. You can also identify filter values that can be used in the faceted navigation.

Some navigation labels will be easily identified after tagging fewer than a hundred keywords. For instance, in our example, it seems clear that “brand” should be a primary or secondary navigation label, and users should be able to navigate and filter items by brands. Other possible candidates in this example are “age,” “theme,” and “character.”

Take the findings from this type of research and discuss them with the information architect.

Another thing you should do with the keyword list generated by Keyword Planner is to get the individual word frequency using tools such as wordle.net:

Figure 45 – Words sorted by frequency.

Visually, this is how the word frequency will look like for our previous example:

Figure 46 – The “word cloud” for a list of keywords.

The image above is what we call the “word cloud,” and in our example, I excluded the words “toys” and “toy” to make the other words stand out.

The frequency of the word “kids” is particularly interesting. If you sell toys only for kids (no other target age, i.e., adults), you probably should exclude the word “kids” from your analysis.

If you are in this niche, you may notice that a few essential segments/labels are missing from this keyword list:

  • One is the gender label (girls and boys).

  • Is your target market price-sensitive? Then “pricing” might be another segmentation/label ( shop by price).

Insights like the ones above cannot be discovered using keyword tools. So, how do you identify these “hidden” labels? By conducting user research, user testing, creating consumer personas and scenarios, user flows, website maps, and wireframes.

Remember that from an information architecture perspective, labeling does not stop with the text used for links and navigation. There are different types of labels as well, such as:

Document labels

  • URLs (whenever possible, URLs should contain keywords that make sense to searchers and search engines).
  • File names
    (having relevant keywords in filenames is important for SEO and users).

Content labels

  • Page titles should make sense to searchers and search engines. When there is a partial match between the keywords in the HTML title element and the search query, search engines will emphasize (bold) the matched keyword(s), which may help with SERP click-through rates (CTR).
  • Headings and sub-headings. Headings use large fonts and attract the eyes almost immediately. Putting keywords in headings assures users they are in the right place and helps with dwell time and bounce rates.

Other types of navigation labels

  • Breadcrumbs. Remember that since search engines became so popular, home pages have not been the only entry points to websites. Therefore, use breadcrumbs to communicate your site’s hierarchy to searchers easily and quickly.
  • Contextual text links. Using keyword-rich anchor text placed in a sentence or paragraph is one of the best ways to interlink pages vertically or horizontally.
  • Footers are also a type of navigational label.
    A quick note on this type of navigation: this is probably the place people spam the most by creating tens of keyword-rich internal links.

Figure 47 – The screenshot depicts a footer that makes this website a good candidate for an over-optimization filter.

This footer is mainly boilerplate text, meaning that search engines will most likely ignore it when assessing this page’s content and the anchor text’s relevance.

It does not help to repeat “men’s {category name}” across a million pages since search engines can exclude boilerplate text pretty well when computing relevance.

Figure 48 – An excerpt from Google’s webmaster guidelines regarding boilerplate repetition.

It is funny how SEOs refer to the concepts discussed in this course section as on-page SEO factors, while information architects refer to the same as labels. It seems that SEOs and information architects work with similar and related concepts. However, they still cannot easily agree on optimizing websites for both searchers and search engines.

Poly-hierarchies

SEO can help information architects with canonicalizing poly-hierarchies.

Very often, multiple suitable hierarchies could be appropriate for a given item. It is important to help the information architect choose the best fit for the canonical hierarchy and to stick to it. You should link only to the canonical hierarchies from the primary or secondary navigation.

Ideally, all links on the website should point to only one canonical hierarchy.

You can keep as many logical hierarchies as are helpful to users, but to avoid confusing search engines, link to the canonical hierarchy as well.

For example, the Elmo category can be found under:

Toys > Stuffed Animals > Elmo (URL: mysite.com/toys/stuffed-animals/elmo/)

Gifts > Holidays > Christmas > Elmo (URL: mysite.com/gifts/holidays/christmas/elmo/)

If you decide that the first hierarchy is the canonical one (usually canonical hierarchies are the shortest), then whenever you link internally to the Elmo category, use the URL mysite.com/toys/stuffed-animals/elmo/

You can use your web analytics tool to see how most users reached a page. For example, look at the Navigation Summary report generated using Google Analytics (under Behavior –> All Pages) and see how most people reached the Elmo page:

Figure 49 – To get this report, follow the steps illustrated in this screenshot. Use the Visitors Flow report under the Audience tab for a more detailed analysis.

Additionally, you look at the Refined Keywords dimension in the Behavior –> Search Terms section to understand what keyword refinements were made after a search for “Elmo”. The Refined Keyword report can also be a source of keyword variations, as you can see in the following screenshot:

Figure 50 – The Refined Keyword report can be a source for keyword variations.

Remember that there is no wrong or right way to classify a product into certain taxonomies if you refine them over time. However, setting that in stone is a good idea once you decide on a canonical hierarchy.

Here are some other SEO tips for ecommerce information architecture:

  • If you use Google Analytics (or any other web analysis tool), activate the Site Search Tracking option. Analyze what users search for and use that information to decide on the website’s hierarchy. However, do not rely solely on your web analytics data because you will miss a lot of data sourced outside your site.
  • Use keyword research tools to identify keyword variations and suggestions for the terms you have in mind or those generated with user research and card sorting.
    • Google Keyword Tool
    • Search Term/Query Reports
    • Wordstream
    • Ubersuggest
    • Keywordtool.io
    • Google Suggest
    • Google Correlate
    • SEMRush
    • SpyFu
  • Analyze your competitors’ website architecture and navigation, but do not copy mindlessly. Use their information for inspiration, but ultimately create your site architecture.
  • Use a crawler on your competitors’ websites and sort their URLs alphabetically. For this to work, you may need to crawl many URLs (i.e., 250k+).
  • Find your competitors’ sitemaps (the HTML and the XML Sitemaps) and analyze them in Excel.

Figure 51 – Sorting URLs alphabetically can reveal the website structure.

  • Download the DMOZ taxonomy and look at the shopping categorization.
  • When choosing category names, use Google Trends to check whether there is a steep drop in what people search online over time.

Figure 52 – Notice how the interest in “digital cameras” trends downwards. Maybe this has to do with mobile phones that yield increasingly better pictures.

  • Do not create the website hierarchy solely on keyword research data; validate with card sorting and user interviews. Nowadays, you can quickly do that online.
  • Perform simple navigational queries, like “contact{your_brand}” and make sure the contact URLs, and all other important URLs, are user-friendly.

Figure 53 – This is a not-so-friendly “contact us” URL.

Remember, labeling applies to URLs, too, not only to links. In this example, the URL is not optimized for users (nor for search engines). The CMS may limit this, suggesting it’s time to ditch the old CMS for a new one.

A friendly URL will read www.jcpenney.com/contact-us OR jcpenney.com/contact-us

  • If you need to categorize large volumes of items, you can use the power of folksonomy, which is an academic term for what we commonly call crowdsourcing. Services such as Mturk from Amazon will allow you to categorize products quickly and even create relationships between products using real people. However, it would be best if you were careful about how you select participants and what instructions you give them.
  • When card sorting tests are in progress, listening and observing are more important than putting words in your users’ mouths.
  • When you remove/update categories from your website (at all levels), ensure that the URLs belonging to the updated categories redirect to the most appropriate working page.
  • When you develop or update the website, create a checklist of SEO requirements for the information architect (e.g., directory and file name conventions, canonicalization rules, lower casing all URLs, data quality rules for data input teams, seasonality, and expired content handling, parameters handling, and so on). I will not provide an extensive checklist here because people tend to limit themselves to using just the pointers in the list while missing others. After reading this book, you should be able to come up with your list.
  • Send email alerts to the search engine optimizer when someone removes or updates categories, subcategories, or products so that they can check the header responses for the new and old URLs. This task can be easily automated.

Technical architecture

At the beginning of this section, I mentioned that site architecture (SA) is made of information architecture (IA) and technical architecture (TA). We then looked at several information architecture topics. Now, it is time to discuss technical architecture.

While duplicate content and crawlability issues are well-known SEO headaches, many search engine optimizers categorize them under the information architecture umbrella. However, they are, in fact, technical issues. Most SEO tips you will learn during the next chapters address technical architecture issues.