2003 October 14

An open letter to Google concerning blog comment spamming

I just sent the following letter to suggestions@google.com. If anybody knows anybody who works there, please feel free to forward it along and/or let me know, so I can.

Over the past weekend there has been an outbreak of blog comment spamming demonstrating that someone has taken the tactic to a new level of automation. Comment spamming is an attempt to game Google's PageRank algorithm by placing links to the spammer's site across many pages on many independent websites, thus raising Google's estimation of the importance of the target page. We believe Google and weblog owners share a common interest in neutralizing this tactic before it takes hold.

My own, quite obscure blog received identical comments autoposted to twenty different articles, each containing the innocuous text "Hello Folks,nice site youre running!", signed with the name "Preteen" and the URL of a porn site. The effect of this was to create twenty links to this porn site scattered through my pages, each with the link text "Preteen".

I was far from the only one affected: many popular sites were hit, including Making Light, Talk Left, Crooked Timber, Samizdata, Winds of Change, and Electrolite. The effect on Google PageRanks is multiplied by the number of Movable Type blogs the spammer is able to find.

As large-scale as this was, it was clearly still a probing attack -- and yet, to see how bad the problem already is, look here, which as of this writing is a Movable Type entry archive page with no less than twenty-three spam comments already attached.

Many people are busy at work on defence strategies, including Jay Allen (MT-Blacklist), Yoz Grahame, and Shelley Powers. But it is also clear from the e-mail spam wars that builders of anti-spam walls and locks are at real disadvantage against the attackers.

This is where Google and other search engines come in. The motivation for comment spammers is to try to game Google's rankings. Weblog owners and Google share a desire to defeat this -- take away the spammers' motivation, and we won't have to fight a war for control of our comment sections.

Our proposal is this: It would be trivial for blog maintainers to explicitly mark our comment sections, for example with HTML comments: <!-- SearchEngine: Begin Anonymous Comment --> . . . <!-- SearchEngine: End Anonymous Comment -->, or in any other fashion convenient to your parsing engine. These markers would delimit untrustworthy material that the page author has had no editorial control over; Google could thus unambiguously assign it no weight in PageRank.

We present this proposal in the belief that Google shares our interest in stopping comment spamming. Obviously, none of us understand the internals of how your engine works, and perhaps HTML comments are not the best way to label content for you. We simply hope this proposal will start a discussion about the best approach to the problem -- we are volunteering to do our part in whatever the right solution turns out to be.

Sincerely,
Colin Roald
James D. Macdonald

Update (10/20): Just received the following response from Google:

Comments

It is very difficult to send just a "thank you" to Google. There is so much other things to contend with, but that is what I want to do. Google is the only place I can find the real news. Thank you!

Posted by: Jean Parker on January 12, 2004 07:35 PM

I think some of your include files are messed up... Maybe you are just doing some work on the site.

Posted by: Vertex on January 23, 2004 02:10 PM

Humm after I submitted my comment it is working fine... Odd.

Posted by: Vertex on January 23, 2004 02:13 PM
Post a comment
Yes   No   (like the Turing Test, but easier)

TrackBack Links
If you run a blog that supports TrackBack, you can link to this article with this TrackBack key.
Today I am pleased (proud, relieved, thankful, etc) to announce the release of MT-Blacklist v1.5. There are some major changes...
Linked from JayAllen - The Daily Journey on October 28, 2003 07:45 PM
An idea of how to decrease the benefits of comment spam by introducing a noindex comment tag markup to HTML.
Linked from Richy's Random Ramblings on January 4, 2004 05:33 PM
Richy has started an interesting discussion about marking out blocks of text so that robots do not index them, say comments sections on weblogs, to reduce the effects of comment spam.
Linked from Neil's World on January 5, 2004 04:24 AM
We've been hit by the Lolita comments spam, and there's also a Swedish neo-nazi spammer at work. We've got a full rundown on the problem, a report detailing how dealing with it... and some suggestions on how we can spam-proof all of our blogs.
Linked from Winds of Change.NET on January 5, 2005 03:02 PM