My first Google query of 2006 yielded an interesting result.
For the first time ever, I was unable to retrieve an archived listing from Craigslist's massive database.
What had happened? Why could I not retrieve the real estate listing I saw a few days ago in 2005 in Google's index?
At first puzzled, I instinctively went to craigslist.org/robots.txt
Noticed this:
User-agent: *
Disallow: /cgi-bin
Disallow: /cgi-secure
Disallow: /forums
Disallow: /search
Disallow: /res/
Disallow: /post
Disallow: /email.friend
Disallow: /?flagCode
Disallow: /ccc
Disallow: /hhh
Disallow: /sss
Disallow: /bbb
Disallow: /ggg
Disallow: /jjj
ccc = community
hhh = housing
sss = for sale
bbb = services
ggg = gigs
jjj = jobs
So the answer is clear. Craig blocked the bulk of his content from being crawled. A query in Google or Yahoo for an item in Craig's "jobs" or "for sale" section will confirm that his content has been removed entirely.
To my knowledge, this is the largest deindexing ever. Tens of million pages vanished.


Reply With Quote

Bookmarks