Tildes

Activity

Votes

Comments

New

All activity

Showing only topics with the tag "crawlers.web". Back to normal view

Perplexity AI is using stealth, undeclared crawlers to evade website no-crawl directives
~tech
- internet
Article 1214 words
19 comments

cloudflare.com

August 5

35 votes
Content Independence Day: No AI crawl without compensation!

~tech Article 869 words

4 comments

cloudflare.com

July 2

14 votes
Anubis works

~tech Link

25 comments

xeiaso.net

April 13

35 votes
Please stop externalizing your costs directly into my face

~tech Article 752 words, published Mar 17 2025

56 comments

drewdevault.com

March 20

121 votes
Trapping misbehaving bots in an AI Labyrinth

~tech Article 1142 words, published Mar 19 2025

15 comments

cloudflare.com

March 22

40 votes
Block AI scrapers with Anubis
~comp
- open source
Article 1617 words, published Jan 19 2025
29 comments

xeiaso.net

March 17

27 votes
FOSS infrastructure is under attack by AI companies

~tech Article 1864 words

8 comments

thelibre.news

March 20

39 votes
LLM crawlers continue to DDoS SourceHut

~tech Article 427 words

1 comment

sr.ht

March 17

11 votes
Nepenthes: a tarpit intended to catch AI web crawlers
~tech
- internet
Article 1509 words
23 comments

zadzmo.org

January 19

33 votes
Websites are blocking the wrong AI scrapers (because AI companies keep making new ones)
~tech
- internet
Article 1256 words
2 comments

404media.co

July 29, 2024

18 votes
Chrome/Firefox Plugin to locally scrape data from multiple URLs
~tech
- browsers
Ask (recommendations)
As the title suggests, I am looking for a free chrome or firefox plugin that can locally scrape data from multiple URLs. To be a bit more precise, what I mean by it: A free chrome or firefox...

As the title suggests, I am looking for a free chrome or firefox plugin that can locally scrape data from multiple URLs. To be a bit more precise, what I mean by it:
- A free chrome or firefox plugin
- Local scraping: it runs in the browser itself. No cloud computing or "credits" required to run
- Scrape data: Collects predefined data from certain data fields within a website such as https://www.dastelefonbuch.de/Suche/Test
- Infinite scroll: to load data that only loads once the browser scrolls down (kind of like in the page I linked above)
I am not looking into programming my own scraper using python or anything similar. I have found plugins that "kind of" do what I am describing above, and about two weeks ago I found one that pretty much perfectly does what is described ("DataGrab"), but it starts asking to buy credits after running it a few times.

My own list:
- DataGrab: Excellent, apart from asking to buy credits after a while
- SimpleScraper: Excellent, but asks to buy credits pretty much immediately
- Easy Scraper: Works well for single pages, but no possibility to feed in multiple URLs to crawl
- Instant Data Scraper: Works well for single pages and infinite scroll pages, but no possibility to feed in multiple URLs to crawl
- "Data Scraper - Easy Web Scraping" / dataminer.io: Doesn't work well
- Scrapy.org: Too much programming, but looks quite neat and well documented
Any suggestions are highly welcome!

Edit: A locally run executable or cmd-line based program would be fine too, as long as it just needs to be configured (e.g., creating a list of URLs stored in a .txt or .csv file) instead of coded (e.g., coding an infinite scroll function from scratch).
7 comments

douchebag

April 17, 2024

8 votes
Robots.txt governed the behavior of web crawlers for over thirty years; AI vendors are ignoring it or proliferating too fast to block
~tech
- internet
Article 3069 words, published Feb 14 2024
6 comments

The Verge

February 18, 2024

41 votes
The creators of TikTok caused my website to shut down
~tech
- internet
- social media
Video 6:30
2 comments

YouTube: MattKC

August 18, 2023

12 votes
Mastodon's dubious crawler exemption

~comp Article 554 words, published Nov 7 2022

1 comment

jefftk.com

November 27, 2022

4 votes
Spiders

~comp Ask (advice)

Is anyone here familiar with crawling the web? I’m interested in broad crawling, rather than focusing on particular sites. I’d appreciate pretty much any information about how this is usually...

Is anyone here familiar with crawling the web? I’m interested in broad crawling, rather than focusing on particular sites. I’d appreciate pretty much any information about how this is usually done, and things to watch out for if attempting it.

5 comments

Wulfsta

December 2, 2021

10 votes
XSS attacks on Googlebot allow search index manipulation
~comp
- security
Article 1551 words
0 comments

tomanthony.co.uk

May 2, 2019

7 votes
An analysis of Cloudflare's email address obfuscation
~comp
- security.networking
- security.cyber
Link
0 comments

jse.li

May 23, 2018

5 votes