Craigslist Delists Millions of Pages from Search Engine Indexes [Archive] - Search Engine Roundtable Forums

PDA

View Full Version : Craigslist Delists Millions of Pages from Search Engine Indexes


jewboy
01-02-2006, 10:52 AM
My first Google query of 2006 yielded an interesting result.

For the first time ever, I was unable to retrieve an archived listing from Craigslist's massive database.

What had happened? Why could I not retrieve the real estate listing I saw a few days ago in 2005 in Google's index?

At first puzzled, I instinctively went to craigslist.org/robots.txt (http://craigslist.org/robots.txt)

Noticed this:
User-agent: *
Disallow: /cgi-bin
Disallow: /cgi-secure
Disallow: /forums
Disallow: /search
Disallow: /res/
Disallow: /post
Disallow: /email.friend
Disallow: /?flagCode
Disallow: /ccc
Disallow: /hhh
Disallow: /sss
Disallow: /bbb
Disallow: /ggg
Disallow: /jjj

ccc = community
hhh = housing
sss = for sale
bbb = services
ggg = gigs
jjj = jobs


So the answer is clear. Craig blocked the bulk of his content from being crawled. A query in Google or Yahoo for an item in Craig's "jobs" or "for sale" section will confirm that his content has been removed entirely.

To my knowledge, this is the largest deindexing ever. Tens of million pages vanished.

rustybrick
01-02-2006, 11:00 AM
I wonder if anyone knows when craigs list did this?

Also

How many pages they had indexed prior to doing this?

jadibones
01-02-2006, 01:14 PM
A search in the marketleap.com saturation report for craigslist.org: - Google (http://tools.marketleap.com/mlchart/chart.dll/requestchart?ctrlParamFor=SiteIndex&ctrlSPID=4&ctrlParams=www.craigslist.org:*:6&ctrlVAxisScaleMin=71280&ctrlVAxisScaleInc=-1&ctrlVAxisScaleMax=3982000&ctrlTitle=Saturation:%20Google/%20AOL%0Dwww.craigslist.org&ctrlSeriesSolidColor=27&ctrlLefWallColor=28&ctrlHeight=300&ctrlWidth=450&ctrlQuality=100&ctrlLabelsAngle=270&ctrlFontSize=10&ctrlShowMarks=no&ctrlHorizAxis=1&ctrlVertAxis=0&ctrlBarOutline=no&ctrlVAxisScaleAuto=no&ctrlType=line&ctrllinepenvisible=no&ctrlAreaLinesPenVisible=f&ctrlShowBorder=t&ctrlXLabelsFontSize=6&ctrlXLabelsFontname=verdana&ctrlTitleFontName=verdana&ctrlYLabelsFontSize=8&ctrlYLabelsFontname=ariel&ctrlLeftAxisGridCentered=t&ctrlBottomAxisGridCentered=t)
- Yahoo (http://tools.marketleap.com/mlchart/chart.dll/requestchart?ctrlParamFor=SiteIndex&ctrlSPID=4&ctrlParams=www.craigslist.org:*:28&ctrlVAxisScaleMin=15930&ctrlVAxisScaleInc=-1&ctrlVAxisScaleMax=2860000&ctrlTitle=Saturation:%20Yahoo!/%20FAST/%20AltaVista%0Dwww.craigslist.org&ctrlSeriesSolidColor=27&ctrlLefWallColor=28&ctrlHeight=300&ctrlWidth=450&ctrlQuality=100&ctrlLabelsAngle=270&ctrlFontSize=10&ctrlShowMarks=no&ctrlHorizAxis=1&ctrlVertAxis=0&ctrlBarOutline=no&ctrlVAxisScaleAuto=no&ctrlType=line&ctrllinepenvisible=no&ctrlAreaLinesPenVisible=f&ctrlShowBorder=t&ctrlXLabelsFontSize=6&ctrlXLabelsFontname=verdana&ctrlTitleFontName=verdana&ctrlYLabelsFontSize=8&ctrlYLabelsFontname=ariel&ctrlLeftAxisGridCentered=t&ctrlBottomAxisGridCentered=t)
- MSN (http://tools.marketleap.com/mlchart/chart.dll/requestchart?ctrlParamFor=SiteIndex&ctrlSPID=4&ctrlParams=www.craigslist.org:*:14&ctrlVAxisScaleMin=1640&ctrlVAxisScaleInc=-1&ctrlVAxisScaleMax=11280&ctrlTitle=Saturation:%20MSN%0Dwww.craigslist.org&ctrlSeriesSolidColor=27&ctrlLefWallColor=28&ctrlHeight=300&ctrlWidth=450&ctrlQuality=100&ctrlLabelsAngle=270&ctrlFontSize=10&ctrlShowMarks=no&ctrlHorizAxis=1&ctrlVertAxis=0&ctrlBarOutline=no&ctrlVAxisScaleAuto=no&ctrlType=line&ctrllinepenvisible=no&ctrlAreaLinesPenVisible=f&ctrlShowBorder=t&ctrlXLabelsFontSize=6&ctrlXLabelsFontname=verdana&ctrlTitleFontName=verdana&ctrlYLabelsFontSize=8&ctrlYLabelsFontname=ariel&ctrlLeftAxisGridCentered=t&ctrlBottomAxisGridCentered=t)

Shows that the most pages ever indexed was 3.6 Million, and that was by Google. Some datacenters are showing a drop of about 1 million indexed pages, but not tens of millions.

gemini
01-02-2006, 05:43 PM
I'm wondering what's the reason.

I can guess - people started using Craiglist like PRWeb - getting free links form an authority site. This won't stop the real customers, but surely scare away link hunters. That's my guess.

earlpearl
01-02-2006, 07:01 PM
That is an interesting application by Craigslist.

What a fascinating website. It works dramatically well. It stands on its own. It doesn't need to be in the search engines. Clearly they understand the impact of doing this.

It will be interesting to hear further news from Craigslist as to the reasoning behind this move.

Dave

David
01-03-2006, 12:25 PM
...... anything?

I admire Craig for not selling out to the big guys.

OK he made a boatload out of eBay and is still very profitable - his way

I've used CL for years and it has been very kind to me

David

randfish
01-03-2006, 03:23 PM
Craig also doesn't want ANYONE unauthorized scraping his results. I think that's a big reason for this.

rustybrick
01-03-2006, 03:56 PM
I have a feeling we will be getting more information on this in the upcoming days...

rustybrick
01-04-2006, 09:35 AM
Danny says Craigslist Not Blocking Major Crawlers (http://blog.searchenginewatch.com/blog/060104-085501). :)

Jichino
05-29-2008, 12:23 PM
Scraped results is the least of Craigslist problems right now especially since they are getting sued by Ebay and 2 years ago they were sued for having discriminatory classified ads.

tom123m123
10-23-2009, 10:56 AM
Craigslist Delists tons of pages because of SPAM. Actually now you have many restrictios to post. I got tired to get banned from craigslist, now, I use a service called dailycraigslist.com and they post my ads for me and all of my post are listed on Google.