Black SEO: Referral Spam

by Dmitry Kirsanov 27. February 2012 11:19

Google bannerWhen writing articles about SEO, it’s easy to fall into one of two categories. Either you will write something, that everybody knows, or something that is perhaps shouldn’t be revealed, as it will loose it’s value very quickly because of misuse.

So you don’t see many articles about the Search Engine Optimization here, mainly because I am trying to be original. However, I can’t count on everyone to study the subject and avoid traps, so part of my articles are like “Achtung, minen” sign for those, who focus on other areas of life than Informational Technology.

So, this article will be about so called Black SEO discipline named “Referral Spam”. What it is, how it works and why you should avoid it.

What is Black SEO

SEO, as you know, is Search Engine Optimization. It’s a mixture of science and art which makes your web pages written in two languages – the language of your visitors and the language of Search Engine. When Google or Bing visits your page, it tries to extract the necessary information from it and to understand what it’s about. The same is the goal of your visitors – they want to understand what your article is about.

In order to extract the essence of your article and index it in it’s database, Search Engine employs very sophisticated algorithms.  I’m not going to talk about these algorithms here, but will have to highlight one aspect that they all have – a variation of Page rank algorithm. More on it later.

So, the SEO started as the collection of techniques, aimed at increasing the amount of relevant information that Search Engine could extract from your web page. It’s goal is that simple – let Google understand what your page is about, and if your page contains the answer, let the Search Engine give the link to your page when someone asks the question.

The Black SEO, on the other hand, is a collection of techniques which aim at tricking the Search Engine to give the link to your page on every question, whether it is relevant or not. So, Black SEO is not an optimization, really, but rather Exploitation of search engine algorithms, so calling it SEO is technically incorrect. However, another difference is that if SEO is about search engines, the Black SEO is not only about search engines – it’s about anything that could trick users into visiting your web page, even if this visit won’t solve the visitor’s problem or answer to the question asked. In that sense, Black SEO is pretty much like SPAM.

The Page rank was developed by Larry Page (one of the Google founders, hence is the name) and basically it calculates the authority of your page taking into account the amount of links in the Internet that points to your web page and the amount of links in your page leading outside.

According to the original Page rank definition, the links pointing to your website increase the importance of your page (did you notice I am always saying “page”, not “web site”?), while links pointing outside decrease the Page rank. It was so, that high Page rank greatly improved the chance of your page to take a better position on SERP – Search Engine Results Page.

So the very first methods of tricking the search engines into ranking your website higher on Results Pages, was to increase your Page rank. In order to do that, they had to put the link to the target page on lots of other pages. That’s how we’ve got such terms as link farms – tens of thousands(!) of dummy websites, containing garbage texts with links.

The Page rank importance is very low now, thanks to the Black SEO. Even having high Page rank, your page is not guaranteed to be on the first page of SERP. But it is still a legendary term, so most users have no idea that Page rank became just one of hundreds (!) parameters to decide the place of your web page at SERP.

And this means, that those who are still using the old scam techniques to increase the Page rank, are idiots.

What is Referral Spam

Referral spam is a set of methods, to pretend that page A is pointing at page B, while it’s not so. It is assumed, that the owner of page B is listing the resources which point to it, so as the result of the referral spam, the link to page A will appear on the page B.

One of the methods of Referral spam is called Referrer spam – more on it later.

Overall, all the Referral Spam methods have one feature in common – they don’t work as intended and in most cases they work against you. But let’s dissect some of these methods to see exactly why you shouldn’t employ this method… And, wait a second, did you just thought you are not using the referral spam? Read below and think twice, as some of spammers are really innocent victims trying to do better.

How Referrer Spam works

I’ll try to put it as less technical as I can.

The world  wide web is a game of request and response. You are sending the request to the website, and getting the response from it. The response usually contains the contents of the web page. But request contains much more than just a name of the page.

The HTTP request you are sending to the website contains things like the version of your browser, including names and versions of all the add-ons (.NET, Java, Adobe Flash, Windows service pack number…), and among other things – the “Referer” field (yes, like that, with one “R”) which contains the address of the page that leads to this resource.

For example, if the page contains images, each of these images are retrieved by a separate request. And this request contains the full address of the mentioned page. That way, for example, we can detect whether someone is linking to your images from their pages, instead of re-hosting the image files – requests for images will have the address of foreign page as the referrer.

So you only have to put right value to right place and let server to believe you.

So, how exactly Referrer Spam works?

Notice, now we are talking about the referrer spam. As you can guess, the rogue application just crafts the HTTP request with bogus REFERER field pointing at promoted page. However, just about every implementation of that principle have 2 flaws:

  1. They only retrieve the HTML source of the page, meaning that there are no requests for cascaded style sheets or JavaScript files or linked images. Every time you look at your log file you see that lonely record which suggests, that your resource was requested by following the link at spammers website.
  2. They never run embedded Java Scripts, meaning that you won’t see these requests in Google Analytics or other JavaScript-based analytical applications.

So, basically, the program retrieves your web page and discards it. It only hopes that your page contains some sort of backlink tracker together with non-moderated publisher. Nowadays, when every second citizen of the Internet has his own blog (I’m exaggerating here) you never know. So spammers try their luck.

Since traditionally it is a functionality of the blog, spammers are trying to aim at blogs. But how do you think they are finding your blog? They google for common phrases that you can find in all blog engines of that kind. For example, try to google for “POWERED BY THE GENESIS FRAMEWORK”. Every blog engine has some unique text in forms (like “click here to reply”) that is indexed by search engine.

Once your website is indexed by search engine, you get more and more unwanted attention.

Why Referrer spam is offensive

  • It eats your bandwidth. Each request you get is real, and especially if your blog is hosted on less powerful machine, the resource consumption may be significant.
  • The processing power of your server costs you money anyway. Even if you don’t pay for the bandwidth.
  • The log files are filled with irrelevant records about spam requests. If you intend to use your log files, you have to filter out the weed. So at least it will take your time, and in case you don’t know how to do it – money as well.
  • As more posts you have, as more spam requests you get. Overall it works like a DDOS attack, when multiple clients request contents from your server.

How to defeat Referrer Spam

This blog is powered by deeply refactored Blog Engine.NET, brilliant open source platform by Mads Kristensen and other developers. BlogEngine.NET has it’s own protection measure against referrer spam. Right after it gets the new external referrer from an HTTP request, the blog engine retrieves the contents of the mentioned page, and inspects it for existing link to your blog. If no such link is found, the referrer is marked as spam. Referrers are not published automatically, so the only one who could click the link is the blog moderator.

This method of immediate link verification is a bit flawed, in my opinion. It wouldn’t be hard to storm the blog website with requests full of referrers to big, slow pages, which would consume the blog server resources and could be a good basis for DOS attack. But until then – the tactics works fine.

Of course, some more or less sophisticated attacker could craft a temporary page with the actual link to your blog, but the chances are – it won’t happen. The problem of all tools created for spamming (I am not talking about botnets, just about consumer applications) is that they are created by amateurs, who are not going to spend extra time on additional sophisticated features. And such scenario is definitely sophisticated, as it requires some dynamic server side programming.

If you are about to create your own backlink verification mechanism, I would suggest this schema:

  1. Store links in the database, but do not let them through until verified
  2. External application, working outside the web server, is verifying new links on schedule.
  3. If verified ok, the link is let through

Comment Spam

Another well known type of spam is placing links in comments. And placing the link itself is not a problem – if someone leaves the insightful comment (and so it benefits your web site as additional free high quality contents) it’s only beneficial to add the ability to provide the URL to his/her own web resource, taken than it will be accompanied with “rel=nofollow” parameter, which instructs search engine to disregard that link.

The moment this becomes a problem is when you get automated generic reply suggesting to “see the alternative point of view on the article subject” by visiting the provided URL. Which, of course, is irrelevant to the original article, unless you are writing about manhood enlargement too.

You may notice, that this particular blog is using disqus.com as it’s comments provider, and it’s like to outsource the problem to where it could be handled better. Not speaking about other advantages of Disqus, the spam control is an integrated part of it, and so it solves the problem.

If spamming one or more blogs with links to your website could only get your IP address banned from that specific websites, using the system like Disqus may lead to the system-wide ban and adding your website URL to the stop-list, meaning that when others would try to share the link to your website – they wouldn’t be allowed to.

Another way to put comment spam to an end – to enforce the single link handler for all links used. For example – you may have your own link shortener like byte.lv or trust external one like goo.gl as they will take care about the health of the target link.

Google Analytics Spam

A bit more sophisticated type of spam, which doesn’t involve changing the contents of your website. Instead of playing with the REFERER field or posting comments, spamming program is simply running the Google Analytics script which it takes from your page. So in your Google Analytics reports you see, that you were visited by following the link on, say, forex-ninjas website.

Using this method is among the most idiotic. It always raises hype and rage among the Google Analytics users, who can’t find the website they see in their GA reports, in their web server logs. Perhaps they think that once you start using Google Analytics, you don’t look into the web server logs, but that’s wrong. Personally, I have a close inspection of IIS logs on daily basis, as it allows me to see a problem or an opportunity where there are no other indicators available. That was how I found this type of spam.

And just like the referrer spam, the Google Analytics spam records have exactly 1 page per visit, no matter how many visits there are.

The problem of this type of spam is more for the spammer. Google can go as far as remove the advertised website from it’s database, making it a black hole of the internet forever.
Google Analytics users, on the other hand, have to waste their time on creating additional filters for suspicious sources of “traffic”.

Now, who’s the spammer?

People refuse to understand the fact, that submitting the URL once to Google is one and only step required for informing all search engines about your website (I am not talking about StumbleUpon and likes). Sooner than later, your website will be crawled by all of them, including the practically useless academic ones.

How many promising applications and online web services you see in the market, promising you great Google results overnight? “Submit your page to hundreds of search engines” or even worse – “social media marketing”. The latter one is usually what hides the blog-spamming engine.

So most of your referral spam visitors are users of such online tools. You will see that requests are coming from the same IP address, but promoting different resources, such as innocent-looking personal blog. People don’t understand the methods these tools are using for promoting their resource, and trust them the advertisement campaign of their resource, sometimes only to realize that they have to buy another domain name.

The Search Engine Optimization is not a rocket science, and everything that can be done to befriend Google, can be done manually by an average Joe. It is paramount to refuse the slightest temptation of using any sort of referral spam, even if you have technical or financial means to do so. For your website it is totally the same as for you would be to use heavy drugs – get high for a few minutes to get low for the rest of life.

The automated referral spam is the easiest thing to detect even without using any tools. The evidences are stored forever and could be used to create stop-lists at any time. Even a year or two after the fruitless referral spam campaign, the site could suddenly start loosing new visitors, because links to it would disappear, filtered out.

Do you feel sorry for them now?

Resume

Overall, the referral spam is the least effective but the most annoying of all “black hat” techniques.

If you own a blog or message board, don’t ever think that your resource will not be interesting to spammers. There is no such thing as reason for spammer. So make sure your shores are prepared before you see the first wave of spammer fleet.

blog comments powered by Disqus

Month List