Pagination for SEO
Late last week Google wrote a blog post announcing a new way of telling search engines about pagination. This has prompted me to talk a bit about pagination from an SEO perspective, what I recommend to clients right now and how Google's new method could change this.
Before I continue, for those who don't know, pagination is where you divide a long list of products into multiple numbered pages for ease of usability (whether users actually prefer multiple numbered pages to a single page with everything on it is open to discussion, but in any case pagination is used extensively today). It's used on most ecommerce sites and usually manifests as a row of numbered links at the top and/or bottom of a category page allowing users and search engines to navigate between the pages. The pages are typically identified in the URL by way of a query string, for example page 2 may have a string such as &p=2 denoting it.
From an SEO perspective, for large ecommerce sites pagination is a bit of a necessary evil. On the one hand you need it in order to provide accessible links to all of your products (the wisdom of having thousands of links on one page being somewhat questionable). But it also presents a few problems:
- Dilution of link equity. Say you have 10 pages of products in your toaster category; that's 10 pages that people could link to. For obvious reasons, you want people who are going to link to your toaster category to link consistently to page 1, consolidating the value of those links at your target page.
- Duplication. Most implementations of pagination result in two versions of page 1 of each category; the canonical (default) version, and the version with a query string parameter denoting that this is page 1. Continuing with my hypothetical toaster category, the two versions might be /toasters/ and /toasters/?page=1. Duplication causes further dilution of link equity.
- Possibility of wrong page ranking. Paginated pages tend to be very similar, and lacking any signals to tell them which page is the first one in a sequence search engines may end up ranking a page half way through a sequence (admittedly this is rare unless some important links have been fired at the deeper page, so mostly this is a sub-problem of problem 1).
Solutions to Pagination
Historically I've recommended that clients use a "noindex, follow" meta tag on page 2+, preventing problem three. Problem two is also easy to prevent - you simply need to code links to page 1 in your templates consistently so that they always reference the default version of the page.
Thus far there basically has been no solution to problem one. I've heard of some people trying to use the canonical tag on pages 2+, pointing to page 1, but that doesn't work because the pages aren't similar enough (remember that the canonical tag is designed to canonicalise duplicate pages). Just as well - if it did work then page 2+ wouldn't be indexed, and the product links on those pages would never be crawled!
What Google has done is solve this by creating a type of microdata that sign posts a sequence of pages in a way that Googlebot will understand, using the <link> HTML element in the same way as the canonical tag does, except with the values "next" and "previous". This solves problem three without resorting to noindex meta tags, and potentially solves problem one for the first time. Paraphrasing from Google, and with my emphasis:
Now, if you choose to include rel="next" and rel="prev" markup on the component pages within a series, you're giving Google a strong hint that you'd like us to consolidate indexing properties, such as links, from the component pages/URLs to the series as a whole (i.e., links should not remain dispersed between page-1.html, page-2.html, etc., but be grouped with the sequence).
Let's take a simple example of 3 paginated toaster pages:
- Page 1 - /toasters/
- Page 2 - /toasters/?page=2
- Page 3 - /toasters/?page=3
With the new tags you'd add the following to these pages:
Page 1 -
<link rel="next" href="http://www.domain.com/toasters/?page=2" />
Page 2 -
<link rel="next" href="http://www.domain.com/toasters/?page=3" />
<link rel="previous" href="http://www.domain.com/toasters/" />
Page 3 -
<link rel="previous" href="http://www.domain.com/toasters/?page=2" />
Even if these tags don't turn out to 100% solve the problems associated with pagination, I'd recommend to any site that uses pagination that they consider implementing them, pending adoption by Bing as well. They are easy to implement at a template level, so won't take lots of tech resource or time, and adding them better aligns your site with the push towards the semantic web where things are coded in a way that make logical sense. For me, this recommendation replaces the use of noindex, follow meta tags (which aren't even remotely semantic) as a way to help search engines understand the structure of a paginated site (again, assuming Bing adopts them as well).