The Common Pile v0.1: An 8TB dataset of public domain and openly licensed text Article 432 words 26 votes
Someone made a dataset of one million Bluesky posts for 'machine learning research' social media Link 20 votes
Largest dataset powering AI images removed after discovery of Child Sexual Abuse Materials google Article 3947 words 27 votes