23 votes

Topic deleted by author

5 comments

  1. [2]
    first-must-burn
    Link
    I guess this doesn't hurt your google page rank because it doesn't see any of the bonus content? While we're talking Markov chains, here's King James Programming - content generated by a Markov...

    I guess this doesn't hurt your google page rank because it doesn't see any of the bonus content?

    While we're talking Markov chains, here's

    King James Programming - content generated by a Markov chain trained on the King James Bible and the Structure and Interpretation of Computer Programs

    2:32 And Joseph said unto his servants, the power of stream processing is that it is good.

    18 votes
    1. TheMediumJon
      Link Parent
      I just adore KJP since I stumbled upon it through Unsong.

      I just adore KJP since I stumbled upon it through Unsong.

      3 votes
  2. json
    Link
    I wouldn't expect a robots.txt ignoring scraper to provide their bot-identity in the request headers. But I guess they do enough to make this plausible.

    I wouldn't expect a robots.txt ignoring scraper to provide their bot-identity in the request headers. But I guess they do enough to make this plausible.

    14 votes
  3. ogre
    (edited )
    Link
    I’ve had an idea like this sitting on my back burner for a while, never researched to see if anyone else had worked on it. The linkmaze server is more my speed; I’d rather scrapers get fed a...

    I’ve had an idea like this sitting on my back burner for a while, never researched to see if anyone else had worked on it. The linkmaze server is more my speed; I’d rather scrapers get fed a heaping mouthful of bullshit than anything even slightly valuable.

    I’d like to see detection more robust than relying on scrapers to identify themselves. You can’t trust AI companies to be honest about their user agent. Does anyone happen to know of more successful efforts to block AI company bots?

    7 votes
  4. first-must-burn
    Link
    Another random thought: rolling this into a browser plugin that actively rewrites content as you browse could be a hilarious prank.

    Another random thought: rolling this into a browser plugin that actively rewrites content as you browse could be a hilarious prank.

    1 vote