Google's "Farmer" Algorithm Update
Last month I wrote about an algorithm update released by Google designed to combat duplicate content. Last week Google was hot on its own heels with another algo change that hits purveyors of "low quality content".
"Many of the changes we make are so subtle that very few people notice them. But in the last day or so we launched a pretty big algorithmic improvement to our ranking-a change that noticeably impacts 11.8% of our queries-and we wanted to let people know what's going on. This update is designed to reduce rankings for low-quality sites-sites which are low-value add for users, copy content from other websites or sites that are just not very useful. At the same time, it will provide better rankings for high-quality sites-sites with original content and information such as research, in-depth reports, thoughtful analysis and so on."
The algorithm update (which, in a fine SEO tradition of being branded with an easily remembered name, is being called the "Farmer" update) seems to be doing exactly what it was designed for: destroying the rankings of sites that many Google users are sick and tired of seeing in the SERPs.
Note, at this time no information on the exact nature of the Farmer update has been given by Google beyond the above, so this blog post should be considered informed (and meticulously researched) speculation rather than fact.
Also, it's important to note that the update has only been rolled out in the US at present. On previous form UK sites should expect to wait at least a couple of months before having to deal with the update (though, of course, can and should take steps to ensure they aren't affected when it does happen).
Google algorithm updates have historically been named by the Webmaster World community. This time round, Danny Sullivan of searchengineland.com is responsible for the name, christening the update "Farmer" in his initial report last Thursday. He's called it "Farmer" due to the widely perceived notion that the algorithm update is designed to tackle content farms.
What's a Content Farm?
There's no industry standard definition for a content farm, but I'd like to propose this one:
A web site whose business model relies on the production and publication of content on a massive scale. A trickle of ad revenue from each page adds up to a torrent for the site as a whole, but the low revenue generated by each page in the site necessitates the production of tens or hundreds of thousands of pages of content at the lowest possible price. This often leads to poorly researched, poorly written content produced by low paid writers or, in the worst instances, machine generated content cranked out by software that cobbles together pages of text automatically based on lists of frequently searched for keywords.
Most of us who use search engines on a regular basis will know the type of site in question, and while I'm not going to call any site out in particular a small amount of digging into the Farmer update will reveal multiple examples. Google doesn't single out content farms in particular but the intention to target them is implicit in the update.
In any case, Google is now judging content quality in a way that it wasn't doing before. What has changed?
I think the most likely explanation of Farmer is a combination of more emphasis on user click data and a revised document level classifier.
User Click Data
User click data concerns the behaviour of real users, during and immediately after their engagement with the SERPs (search engine results pages). Google can track click through rates on natural search results easily. It can also track the length of time a user spends on a site, either by picking up users who immediately hit the back button and go back to the SERPs, or by collating data from the Google Toolbar (or any third party toolbar that contains a PageRank meter) and potentially from Google Chrome users and Google Analytics.
Google does claim not to use these last two sources, something I'm inclined to believe considering the legal and trust implications for Google if it was ever found to be using it. In any case, the prevalence of the Google Toolbar and the various PR meters probably provide enough data to draw conclusions about user behaviour on their own (stepping outside of Google-only land for 2 seconds, Bing also has its own toolbars that allows it to collect similar data).
Using this data, Google might conclude that pages are more likely to contain low value content if a significant proportion of users display any of the following behaviours:
- Rarely clicking on the suspect page, despite the page ranking in a position that would ordinarily generate a significant number of clicks.
- Clicking on the suspect page, then returning to the SERPs and clicking a different result instead.
- Clicking on the suspect page, then returning to the SERPs and revising their query (using a similar but different search term).
- Clicking on the suspect page, then immediately or quickly leaving the site entirely.
What might constitute "quickly" in this context? Probably Google compares the engagement time against other pages of similar type, length and topic. It is important to avoid judging all content through the same lens (for example, you'd expect the average length of time a user spends looking at a piece of news to be significantly less than the average time spent looking at a recipe of the same word length, so it makes no sense to compare those two types of content).
We know that Google has strongly considered using user click data in this way, because they filed (and were granted) a patent called method and apparatus for classifying documents based on user inputs that describes just this. However, user click data as a quality signal is highly susceptible to manipulation, which is why it has historically been such a minor part of search engine algorithms. Because of this it's likely that Google only uses this data heavily in combination with other signals.
For example, Google could give a percentage likelihood of a page containing low value content, and then any page that exceeds a certain % threshold might be analysed in terms of its user click data. This keeps such data as confirmation of low quality only, rather than a signal of quality (high or low) in its own right, so it can't be abused by webmasters eager to unleash smart automatic link clicking bots on the Google SERPs.
How might Google arrive at this "low value content" score in the first place? Enter the Document Level Classifier...
Document Level Classifier
A "document level classifier" is the part of the search engine that decides such things as what language a document is written in and what type of document it is (be that blog post, news, research paper, patent, recipe or whatever). This decision is made early on in the indexing process, enabling the search engine to handle the document as efficiently as possible. The document level classifier could also be used to determine whether a document is spam, or contains low value content.
For example, the document level classifier might...
- Look for content containing excessive repetition of a particular keyword or phrase that lacks the sort of semantic variation and sentence structure you'd expect to see in a naturally written document.
- Look for content that contains lots of keywords but few proper sentences (indicating that it could be machine generated).
- Look for content that lacks relevant supporting video and/or images.
- Look for newly created content that is too closely aligned with keywords that are regularly searched for (a hallmark of content farms is the creation of content optimised specifically and heavily for keywords that are suggested by the search engines own keyword suggestion tools).
Google announced a redesigned document level classifier in a blog post in late January (emphasis mine):
"As we've increased both our size and freshness in recent months, we've naturally indexed a lot of good content and some spam as well. To respond to that challenge, we recently launched a redesigned document-level classifier that makes it harder for spammy on-page content to rank highly. The new classifier is better at detecting spam on individual web pages..."
It's possible that the first algorithm update of the year (the one in January) was the roll out of the document level classifier, and Farmer added the additional layer of user click data. Or, the new classifier may only have been "soft launched" on a few data centres or for internal testing, before being rolled out last week alongside the user click data component.
Google's "Personal Blocklist" Chrome Extension
Some people in the industry are nervous of Google making qualitative judgements about content quality. I'd question the logic of that concern, given that Google has been making those judgements in various ways (hint hint, links) for over a decade. Also, looking at the sites affected by Farmer, I'm inclined to believe Google has got it right this time. But regardless of my personal opinion there is a way for Google to validate what its algorithm believes are low quality content sites against real user feedback - the Personal Blocklist extension for its browser, Google Chrome.
This extension was launched in mid February. It let's Chrome users block specific sites from appearing in their search results on Google, and passes back information about what sites are being blocked to Google.
Google claims that Personal Blocklist has no algorithmic impact on rankings (yet), and you should believe them, as less than two weeks is certainly not enough time to properly analyse and build the data into the algorithm. That said, Google has compared the sites affected by Farmer to the sites people are blocking with Personal Blocklist:
"It's worth noting that this update does not rely on the feedback we've received from the Personal Blocklist Chrome extension, which we launched last week. However, we did compare the Blocklist data we gathered with the sites identified by our algorithm, and we were very pleased that the preferences our users expressed by using the extension are well represented."
I also wouldn't rule out the use of this data in the future in a similar capacity to click data - a second or third line validation of assumptions Google has already made about quality in other ways.
What Does This Mean For You?
To wrap this up I want to take a look at the implications for companies in the UK (if you are in the States, you should already know whether or not you've been affected by Farmer, and are probably already working on a plan of action if you were). It's quite clear that to avoid any negative impacts the content on your site should be well written (which means not harping on one keyword constantly, using semantic variation and natural sentence structure that comes from writing dead proper) and engaging. You should aim to attract as many clicks as possible when you are ranking in Google, by optimising the message you are putting across to users with your page title, meta description and URL. And once users land on your site you should keep them happy by providing a rich experience, with as much supporting multimedia as you can, and clear options for where to go elsewhere on your site if the first landing page doesn't "do it" for the user.
But wait; surely those are basic requirements for almost any online business, regardless of what Google is doing or what I might say?
Well, yes. And that really gets at the heart of what most Google algorithm updates, and indeed SEO, are all about. Should you start panicking about what will happen when Farmer, or any other update, comes to the UK, and what you can do to avoid getting slammed by this particular update? No. Should you be spending hours trying to reverse engineer the exact state of Google's algorithm ("chasing the algorithm")? No.
Should you (always) be aiming to provide well written, high quality content that engages users? Yes.