Page 1 of 2 12 LastLast
Results 1 to 10 of 11

Thread: Craigslist Delists Millions of Pages from Search Engine Indexes

  1. #1

    Default Craigslist Delists Millions of Pages from Search Engine Indexes

    My first Google query of 2006 yielded an interesting result.

    For the first time ever, I was unable to retrieve an archived listing from Craigslist's massive database.

    What had happened? Why could I not retrieve the real estate listing I saw a few days ago in 2005 in Google's index?

    At first puzzled, I instinctively went to craigslist.org/robots.txt

    Noticed this:
    User-agent: *
    Disallow: /cgi-bin
    Disallow: /cgi-secure
    Disallow: /forums
    Disallow: /search
    Disallow: /res/
    Disallow: /post
    Disallow: /email.friend
    Disallow: /?flagCode
    Disallow: /ccc
    Disallow: /hhh
    Disallow: /sss
    Disallow: /bbb
    Disallow: /ggg
    Disallow: /jjj

    ccc = community
    hhh = housing
    sss = for sale
    bbb = services
    ggg = gigs
    jjj = jobs


    So the answer is clear. Craig blocked the bulk of his content from being crawled. A query in Google or Yahoo for an item in Craig's "jobs" or "for sale" section will confirm that his content has been removed entirely.

    To my knowledge, this is the largest deindexing ever. Tens of million pages vanished.

  2. #2

    Default

    I wonder if anyone knows when craigs list did this?

    Also

    How many pages they had indexed prior to doing this?
    Barry Schwartz, CEO of RustyBrick, Inc. & Editor of the Search Engine Roundtable.

  3. #3

    Default

    A search in the marketleap.com saturation report for craigslist.org: - Google
    - Yahoo
    - MSN

    Shows that the most pages ever indexed was 3.6 Million, and that was by Google. Some datacenters are showing a drop of about 1 million indexed pages, but not tens of millions.

  4. #4

    Default

    I'm wondering what's the reason.

    I can guess - people started using Craiglist like PRWeb - getting free links form an authority site. This won't stop the real customers, but surely scare away link hunters. That's my guess.

  5. #5

    Default

    That is an interesting application by Craigslist.

    What a fascinating website. It works dramatically well. It stands on its own. It doesn't need to be in the search engines. Clearly they understand the impact of doing this.

    It will be interesting to hear further news from Craigslist as to the reasoning behind this move.

    Dave

  6. #6
    Join Date
    Dec 2005
    Location
    Huntersville, North Carolina
    Posts
    40

    Default Craig the last bastion of

    ...... anything?

    I admire Craig for not selling out to the big guys.

    OK he made a boatload out of eBay and is still very profitable - his way

    I've used CL for years and it has been very kind to me

    David
    David

    iDo SEO
    www.idoseo.com
    Charlotte NC via SF, CA and England

  7. #7

    Default

    Craig also doesn't want ANYONE unauthorized scraping his results. I think that's a big reason for this.
    Rand Fishkin - CEO & Founder of SEOmoz, a community resource dedicated to providing news, information, tips, tools and more for those in the SEO/M industry.

  8. #8

    Default

    I have a feeling we will be getting more information on this in the upcoming days...
    Barry Schwartz, CEO of RustyBrick, Inc. & Editor of the Search Engine Roundtable.

  9. #9

    Default

    Barry Schwartz, CEO of RustyBrick, Inc. & Editor of the Search Engine Roundtable.

  10. #10

    Default

    Scraped results is the least of Craigslist problems right now especially since they are getting sued by Ebay and 2 years ago they were sued for having discriminatory classified ads.

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •