It looks like a nice, reasonably functional standard for automatically communicating the copyright holder’s intent, which is great. But I don’t think it actually does much towards solving the...
It looks like a nice, reasonably functional standard for automatically communicating the author’s copyright holder’s intent, which is great. But I don’t think it actually does much towards solving the issue they’re claiming to solve.
Training models on publicly available content is a legal question more than a technical one. Either training is not copying from a copyright perspective, in which case the license terms encoded by this metadata are irrelevant, or training is copying and we’ve got a major legal overhaul to do in order to separate that use from other mechanisms that computers use to process data without causing a whole tsunami of unintended consequences.
At most I see it as a nice flag for users with CC or copyleft intentions to say “yes, you can use this content”. Maybe that in turn lets some smaller players make a niche by creating boutique “ethical” AI models? But it’s not going to do anything to change what Anthropic or OpenAI do with your content.
Cloudflare’s pay per crawl blocks bots until they they pay and covers billing, so it seems like it’s a more complete solution that doesn’t depend on copyright law. I wonder if anyone is actually...
Cloudflare’s pay per crawl blocks bots until they they pay and covers billing, so it seems like it’s a more complete solution that doesn’t depend on copyright law. I wonder if anyone is actually paying?
They made it available to all customers on August 28.
I am fairly skeptical given the project is headed by a professional startup founder (See EIR in ventrure capital if you want an idea of a high paying tech job that rewards failure) and Microsofts...
I am fairly skeptical given the project is headed by a professional startup founder (See EIR in ventrure capital if you want an idea of a high paying tech job that rewards failure) and Microsofts head of natural language interfaces. Given everything to come out of big tech in the last few years, I'm very doubtful of attempts at altruism.
This is damage control and really feels like a limp apology from a serial cheater. "We're sorry for making billions from not properly compensating artists for their IP while we wrecked the global economy and social contract in the name of a failed experiment replace all labour. Here's a licence we pinky promise to respect and we also promise not to ruin the world again in the name of more money we could never use."
After GPT5 flopped out of the gate, it feels like theres been a fundamental shift in the whole AI discussion. OpenAI found the way to piss off literally everyone and the company obviously has no more cards to play. Apple took its AI to the farm to play with all the Vison Pros and headphone jacks. Meta AI engineers are jumping ship. NVidia has all the money and I am curious if they can continue to prop up almost 50% of the SMP500. Google has a beta for natural language search, 20 years late but better than never. Every company gets to offshore their tech work to India. And Microsoft is doing... whatever it is the devil likes to do.
We do need a solution for web scraping though. They all made a pretty penny with the AI rush but they still crawl through social media and public profiles for advertisers and data brokers. Its not technically illegal but just a gross abuse of public services and creation of so much digital waste and distrust.
Glad to see anything that tries to wrestle a bit of control back. But then again, big corporations don't even care about laws and pirate stuff to get the data, so how is a possibly unenforceable...
Glad to see anything that tries to wrestle a bit of control back. But then again, big corporations don't even care about laws and pirate stuff to get the data, so how is a possibly unenforceable license going to matter to them anyway? The license operates with the assumption that AI companies will operate in good faith, which they already don't and have already proven they have no interest in, and in some cases (Meta) judges have massively twisted fair use law to accommodate their behavior (in ways they'd never twist it to a citizen's benefit)
One of the options or intents should be refusal to accept AI use of material, period, rather than building an entire licensing structure that only ultimately supports the industry anyway. But I suppose again, that's pointless because no AI company will ever listen to or behave when encountering content with a license that says it can't be used for AI training. They'll just vacuum it up anyway like they already do, and as unethically as necessary to support their desires. Whether it's faking the user agent of their bots or downloading copyrighted material via torrents and even internally admitting they know the iffy nature of what they're doing (Meta)
I understand the anger, but do you see a practical way to say “no AI” without causing a cascade of other negative consequences? Honestly even the fair use argument seems reasonably, well, fair to...
I understand the anger, but do you see a practical way to say “no AI” without causing a cascade of other negative consequences?
Honestly even the fair use argument seems reasonably, well, fair to me. You’re right that they likely wouldn’t have made the same judgement for an actual user, and that’s a major problem with the US legal system, but I don’t think that judgment is actually a bad one if you consider the wider implications.
For me, the big thing to remember is that this isn’t an “AI bad” issue, it’s a “huge amoral corporate behemoth bad” one. When the courts were ruling in favour of stronger copyright protections, it was in the favour of the giant media publishers at the expense of the end users and individual creators. Now they’re weakening that over-strengthened legislation a little, and again they’re doing so at the expense of end users and individual creators.
In the current US courts, and the EU ones as well (but not to quite such an extreme), there isn’t a good option on the table. It seems like technically sane rulings are the best we can hope for if we aren’t going to get the sweeping reforms that would be needed to actually protect individuals.
I am okay with any negative consequences that massively impact the AI industry as a whole. I actively desire for them to be rendered useless. I want the majority of the data out there to be...
I am okay with any negative consequences that massively impact the AI industry as a whole. I actively desire for them to be rendered useless. I want the majority of the data out there to be unobtainable by AI and certainly the ability for any creators to say no that want to, and would love a way to enforce that. I am a radical in this regard so understand that is the angle from which any statement I have about these systems comes.
I actively wish for the entire industry to not exist.
Therein lies the problem, though. I wasn’t talking about the impact on the AI industry, I was talking about the impact on computers as a whole. Which in practice means the impact on society and...
Therein lies the problem, though. I wasn’t talking about the impact on the AI industry, I was talking about the impact on computers as a whole. Which in practice means the impact on society and communication as a whole.
It’s a bit like these insane anti-encryption laws the EU keeps talking about, or the UK’s Online Safety Act: you could say “fuck Zuckerberg and fuck WhatsApp, I hope they go bankrupt”, but the wider implications of the laws required to make that happen are massive, and have blowback far beyond that outcome.
I’d also challenge you to define “AI industry” in a way that the guys you justifiably dislike are caught but the unequivocally valuable scientific uses of machine learning (and the benignly positive user facing ones you probably don’t think about as you use them) are left untouched.
I suppose I'd narrow my focus to LLMs and "generative AI" in general (with particular extreme hatred towards any of them that can generate visuals, audio, video, large amounts of text, or code) to...
I suppose I'd narrow my focus to LLMs and "generative AI" in general (with particular extreme hatred towards any of them that can generate visuals, audio, video, large amounts of text, or code) to not throw every ML use under the same umbrella, my fault for using "AI" as a broad term and not specifying (fell into the problematic colloquial use of the term), and I'll be upfront about this- I do not care if there are good uses for LLMS/generative stuff in particular. This is a "toss the baby out with the bathwater" instance for me (though this assumes a baby in the analogy even exists)
Edit: So fair, ML models for weather, proteins, etc or something are a different angle altogether, and using the fuzzy "AI" term needlessly includes a lot of things in the umbrella that are very different things
I think computers/communication/etc could be fine without all the LLM type shit around, like we were for a long time before it became the hype of the moment. I agree that overly broad regulation or laws or whatever can have knock-on effects on tech, but I'm not asking for those, at least not in this particular instance
And I mean, yes I am for entire different paradigms of society that do not allow billionaires or massive corporations even to exist, or extreme regulation that achieves the same thing, or extreme regulation that specifically targets certain types of AI or businesses, etc and I can support that without supporting regulations or laws that specifically target encryption or ignorant content-gating like the UK law.
Like, I'd be okay with a society that finds a way to not allow Anthropic as a company to even legally exist as an entity and a society that shames people for using generative AI
Either way, I'm breaking my own personal rules by even discussing this, at all, tbh. I even have these types of topics normally filtered out so I don't feel the impulse to say anything, this one just made it through because it didn't have any AI tags. I'm not open to seeing things another way, at this time, and I'll be very explicit about that fact.
I think I'm not quite getting my nuance of meaning across here. I'm not suggesting that LLM / genAI tech is necessary to computers or communication, I'm saying that writing a law to successfully...
I think I'm not quite getting my nuance of meaning across here. I'm not suggesting that LLM / genAI tech is necessary to computers or communication, I'm saying that writing a law to successfully target that tech without causing a huge cascade of anti-consumer issues that ripple right back into even 90s era definitely-not-AI-in-any-way computers is very difficult. Doing it without totally overhauling copyright (something I'd love to see done, but don't realistically expect to happen) is near impossible. Doing it successfully within the current legal and political landscape is literally impossible.
What I'm saying is give me your hypothetical anti-AI law and I'll monkey's paw it into something that makes everything more advanced than a Walkman illegal, because that's what the lawyers salivating about political censorship and corporate control of the broader information flow will try to do if we give them the slightest opportunity.
And yeah, as a secondary point I'm saying that I actually do think things like protein folding and weather prediction and cancer detection are pretty important to throw out with the bathwater - that's something I feel strongly about too, and it's a debate I'll happily have because it's meaningful to me - but that's secondary to my thinking because even if there were no worthwhile use of whatever we're fuzzily calling "AI" I'd still be just as wary about the ripple effects of legislating it using copyright, which ultimately is the law that governs the flow of information as a whole.
I mean, sure, I agree that something that could be proposed specifically would be ruined by lawyers and broadened by governments/power and that truly, there is no good way forward. In my opinion,...
I mean, sure, I agree that something that could be proposed specifically would be ruined by lawyers and broadened by governments/power and that truly, there is no good way forward. In my opinion, it's a rather hopeless situation so I'm just hanging on to imaginings of maybe an ideal world that could never exist. Fair enough. I need that hope to hang onto, because otherwise (don't worry about me, but to make my point:)
TW: existential hyperbole
I'd almost rather not exist in a world where AI "art" does
But, that doesn't mean I'd remotely move from my position of actively opposing all forms of "generative AI", and definitely any of them that allow things like fake art / fake music / big swaths of text and code / misinforming "overviews" and summaries, to be made, any of them that train on the web and artists/creators/etc data, and I am fully okay, since the governments won't do anything useful, as we've established, with people that want to create "traps" for bots and crawlers to get stuck into or ways to poison the datasets, etc- basically the people attempting to claw a bit of power back into their own hands, even if it's ultimately futile. Whatever wrenches can be thrown into the machine. The luddites swinging the hammer to break them. People on the inside of these companies slowly fucking things up without their superiors knowing, etc.
I refuse to accept the LLM present or future and if that ends up leaving me behind while the world moves on, so be it. Our world is truly so fucked, that yes, maybe there is no real way out of the mess we've made it to now. But I'll hang onto that adversarial position, because in no way do I need to lay down and "accept" it or assimilate into it, or comply in advance with its omnipresence or the millions of ways it will be used against us and actively harms us even now
EDIT: Lastly, I will agree to an extent with the argument I see that a lot of the problems are the political/social/financial landscape these systems exist in, the incentives that creates, the power/money behind them, their ownership, etc and the types of systems that has led them to create, and that completely isolated, the "tech" of an LLM on its own isn't completely inherently bad, though not without its inherent problems, and being solely in the hands of the people, outside of those structures and scale and data access, it could be far less harmful, it's that we've built a world where it's unavoidable that the existence of these systems will be a net negative for a million reasons
There’s a lot I’d truly love to explore here, but I don’t think that would be fair or helpful on something that’s clearly so emotive - I won’t claim to understand, but I can respect it, and more...
There’s a lot I’d truly love to explore here, but I don’t think that would be fair or helpful on something that’s clearly so emotive - I won’t claim to understand, but I can respect it, and more than anything I hope that things unfold in a way that brings you a bit more comfort. For all that we’re coming from very different places on the technology, I get the impression we have a lot of the same concerns about the implications and the people controlling it.
Well, thank you, and, understandable, and yes, I agree that it's, at least for me, not worth exploring any of it past this point here I really shouldn't have posted on the topic to begin with tbh,...
Well, thank you, and, understandable, and yes, I agree that it's, at least for me, not worth exploring any of it past this point here
I really shouldn't have posted on the topic to begin with tbh, so I sorta opened this can of worms myself by doing something I set a rule for myself not to do (given that I know my position is what it is, and how I feel, etc, I don't want to dive into something only to appear to be in bad faith later by telling the truth about not wanting to debate the topic, etc... i get it could come off that way). I've reached the "generative AI hater" stage (far past the debate/critique stage), and view it as as unmovable (at least in one direction) as a number of my political/social views, so it really helps no one for me to chime in every time I see a topic about AI.
I can’t tell if this is a good idea or not. I’m curious what others think. I think semantic web ideas have generally not been very effective nor widely adopted, but it seems like this is a way of...
I can’t tell if this is a good idea or not. I’m curious what others think. I think semantic web ideas have generally not been very effective nor widely adopted, but it seems like this is a way of baking licensing into web content in a different way than it has been done historically. Will this actually work to get big tech to pay for content, or will this just be a technical way of opting out of being crawled?
Why would it do either thing? Putting a thing on your website that says "do not crawl" is just an extra document to feed the crawlers. Until some of these companies get dismantled, none of them...
Will this actually work to get big tech to pay for content, or will this just be a technical way of opting out of being crawled?
Why would it do either thing?
Putting a thing on your website that says "do not crawl" is just an extra document to feed the crawlers. Until some of these companies get dismantled, none of them are following any licenses.
But a freely readable website, while "public" in the sense of access, doesn't mean data on that website is freely openly available for the taking. But it does mean crawlers can crawl it even if...
But a freely readable website, while "public" in the sense of access, doesn't mean data on that website is freely openly available for the taking. But it does mean crawlers can crawl it even if they do not have the right to. So for completely nonpublic content, maybe it would make sense, but is there a danger of it locking away public access to content that is freely available to view but not "take"? Almost like a weird form of DRM of web content?
It all seems like you'd have to get the entire industry to sign on and play by the rules, which they've shown they're not interested in doing, and also I worry there are plenty of unexplored side-effects to something like this
Like, even the site's copy says "add this to your robots.txt, website, etc" - that will do nothing to stop them. They're already actively using fake user agents to get around robots.txt. If it requires crawlers present themselves honestly it's DOA
Some forms of this RSL, as I mentioned, would allow for implementing encryption so that only authorized license holders could access it. I agree that putting some markup in a robots.txt does...
Like, even the site's copy says "add this to your robots.txt, website, etc" - that will do nothing to stop them. They're already actively using fake user agents to get around robots.txt. If it requires crawlers present themselves honestly it's DOA
Some forms of this RSL, as I mentioned, would allow for implementing encryption so that only authorized license holders could access it. I agree that putting some markup in a robots.txt does nothing to change the status quo. That’s not the part of this that’s interesting to me (other than possibly standardizing machine-readable metadata for signaling to ethical crawlers, which is something that I’ve felt is lacking with the web since, well, forever, but isn’t necessarily the most dire of problems).
I suppose to me it just feels like a bandaid on an open wound, and locking up content sets a potentially scary precedent (depending on the content affected) It's been a worry for me ever since AI...
I suppose to me it just feels like a bandaid on an open wound, and locking up content sets a potentially scary precedent (depending on the content affected)
It's been a worry for me ever since AI has become a thing, that society's response to it could take bad, unnuanced turns in our zealousness to fight it (like making copyright law even worse in ways that penalize everyone not just AI companies, for instance)
I may have missed it in the docs, but do they have a solution for the micropayments infrastructure required to make that happen? That’s always been a major sticking point in the past.
I may have missed it in the docs, but do they have a solution for the micropayments infrastructure required to make that happen? That’s always been a major sticking point in the past.
RSL specifies a <payment> element, but the content owner must link out to a payment service to handle transactions. If <payment> is not present, the pricing model is assumed to be free.
RSL specifies a <payment> element, but the content owner must link out to a payment service to handle transactions. If <payment> is not present, the pricing model is assumed to be free.
Got you, thanks. That means the micropayments question is still open, which I think is another major piece of the puzzle that’d need to be in place for this to work well. They’d need to partner...
Got you, thanks. That means the micropayments question is still open, which I think is another major piece of the puzzle that’d need to be in place for this to work well.
They’d need to partner with one of the major payment processors for a user friendly and cost effective way to make that tag work - which could well happen, it’s not technically that hard - but I’ve been hearing organisations talk about it way back to into the days that these conversations were happening on Slashdot and still haven’t seen it come together…
It looks like a nice, reasonably functional standard for automatically communicating the
author’scopyright holder’s intent, which is great. But I don’t think it actually does much towards solving the issue they’re claiming to solve.Training models on publicly available content is a legal question more than a technical one. Either training is not copying from a copyright perspective, in which case the license terms encoded by this metadata are irrelevant, or training is copying and we’ve got a major legal overhaul to do in order to separate that use from other mechanisms that computers use to process data without causing a whole tsunami of unintended consequences.
At most I see it as a nice flag for users with CC or copyleft intentions to say “yes, you can use this content”. Maybe that in turn lets some smaller players make a niche by creating boutique “ethical” AI models? But it’s not going to do anything to change what Anthropic or OpenAI do with your content.
Cloudflare’s pay per crawl blocks bots until they they pay and covers billing, so it seems like it’s a more complete solution that doesn’t depend on copyright law. I wonder if anyone is actually paying?
They made it available to all customers on August 28.
I am fairly skeptical given the project is headed by a professional startup founder (See EIR in ventrure capital if you want an idea of a high paying tech job that rewards failure) and Microsofts head of natural language interfaces. Given everything to come out of big tech in the last few years, I'm very doubtful of attempts at altruism.
This is damage control and really feels like a limp apology from a serial cheater. "We're sorry for making billions from not properly compensating artists for their IP while we wrecked the global economy and social contract in the name of a failed experiment replace all labour. Here's a licence we pinky promise to respect and we also promise not to ruin the world again in the name of more money we could never use."
After GPT5 flopped out of the gate, it feels like theres been a fundamental shift in the whole AI discussion. OpenAI found the way to piss off literally everyone and the company obviously has no more cards to play. Apple took its AI to the farm to play with all the Vison Pros and headphone jacks. Meta AI engineers are jumping ship. NVidia has all the money and I am curious if they can continue to prop up almost 50% of the SMP500. Google has a beta for natural language search, 20 years late but better than never. Every company gets to offshore their tech work to India. And Microsoft is doing... whatever it is the devil likes to do.
We do need a solution for web scraping though. They all made a pretty penny with the AI rush but they still crawl through social media and public profiles for advertisers and data brokers. Its not technically illegal but just a gross abuse of public services and creation of so much digital waste and distrust.
Glad to see anything that tries to wrestle a bit of control back. But then again, big corporations don't even care about laws and pirate stuff to get the data, so how is a possibly unenforceable license going to matter to them anyway? The license operates with the assumption that AI companies will operate in good faith, which they already don't and have already proven they have no interest in, and in some cases (Meta) judges have massively twisted fair use law to accommodate their behavior (in ways they'd never twist it to a citizen's benefit)
One of the options or intents should be refusal to accept AI use of material, period, rather than building an entire licensing structure that only ultimately supports the industry anyway. But I suppose again, that's pointless because no AI company will ever listen to or behave when encountering content with a license that says it can't be used for AI training. They'll just vacuum it up anyway like they already do, and as unethically as necessary to support their desires. Whether it's faking the user agent of their bots or downloading copyrighted material via torrents and even internally admitting they know the iffy nature of what they're doing (Meta)
I understand the anger, but do you see a practical way to say “no AI” without causing a cascade of other negative consequences?
Honestly even the fair use argument seems reasonably, well, fair to me. You’re right that they likely wouldn’t have made the same judgement for an actual user, and that’s a major problem with the US legal system, but I don’t think that judgment is actually a bad one if you consider the wider implications.
For me, the big thing to remember is that this isn’t an “AI bad” issue, it’s a “huge amoral corporate behemoth bad” one. When the courts were ruling in favour of stronger copyright protections, it was in the favour of the giant media publishers at the expense of the end users and individual creators. Now they’re weakening that over-strengthened legislation a little, and again they’re doing so at the expense of end users and individual creators.
In the current US courts, and the EU ones as well (but not to quite such an extreme), there isn’t a good option on the table. It seems like technically sane rulings are the best we can hope for if we aren’t going to get the sweeping reforms that would be needed to actually protect individuals.
I am okay with any negative consequences that massively impact the AI industry as a whole. I actively desire for them to be rendered useless. I want the majority of the data out there to be unobtainable by AI and certainly the ability for any creators to say no that want to, and would love a way to enforce that. I am a radical in this regard so understand that is the angle from which any statement I have about these systems comes.
I actively wish for the entire industry to not exist.
Therein lies the problem, though. I wasn’t talking about the impact on the AI industry, I was talking about the impact on computers as a whole. Which in practice means the impact on society and communication as a whole.
It’s a bit like these insane anti-encryption laws the EU keeps talking about, or the UK’s Online Safety Act: you could say “fuck Zuckerberg and fuck WhatsApp, I hope they go bankrupt”, but the wider implications of the laws required to make that happen are massive, and have blowback far beyond that outcome.
I’d also challenge you to define “AI industry” in a way that the guys you justifiably dislike are caught but the unequivocally valuable scientific uses of machine learning (and the benignly positive user facing ones you probably don’t think about as you use them) are left untouched.
I suppose I'd narrow my focus to LLMs and "generative AI" in general (with particular extreme hatred towards any of them that can generate visuals, audio, video, large amounts of text, or code) to not throw every ML use under the same umbrella, my fault for using "AI" as a broad term and not specifying (fell into the problematic colloquial use of the term), and I'll be upfront about this- I do not care if there are good uses for LLMS/generative stuff in particular. This is a "toss the baby out with the bathwater" instance for me (though this assumes a baby in the analogy even exists)
Edit: So fair, ML models for weather, proteins, etc or something are a different angle altogether, and using the fuzzy "AI" term needlessly includes a lot of things in the umbrella that are very different things
I think computers/communication/etc could be fine without all the LLM type shit around, like we were for a long time before it became the hype of the moment. I agree that overly broad regulation or laws or whatever can have knock-on effects on tech, but I'm not asking for those, at least not in this particular instance
And I mean, yes I am for entire different paradigms of society that do not allow billionaires or massive corporations even to exist, or extreme regulation that achieves the same thing, or extreme regulation that specifically targets certain types of AI or businesses, etc and I can support that without supporting regulations or laws that specifically target encryption or ignorant content-gating like the UK law.
Like, I'd be okay with a society that finds a way to not allow Anthropic as a company to even legally exist as an entity and a society that shames people for using generative AI
Either way, I'm breaking my own personal rules by even discussing this, at all, tbh. I even have these types of topics normally filtered out so I don't feel the impulse to say anything, this one just made it through because it didn't have any AI tags. I'm not open to seeing things another way, at this time, and I'll be very explicit about that fact.
I think I'm not quite getting my nuance of meaning across here. I'm not suggesting that LLM / genAI tech is necessary to computers or communication, I'm saying that writing a law to successfully target that tech without causing a huge cascade of anti-consumer issues that ripple right back into even 90s era definitely-not-AI-in-any-way computers is very difficult. Doing it without totally overhauling copyright (something I'd love to see done, but don't realistically expect to happen) is near impossible. Doing it successfully within the current legal and political landscape is literally impossible.
What I'm saying is give me your hypothetical anti-AI law and I'll monkey's paw it into something that makes everything more advanced than a Walkman illegal, because that's what the lawyers salivating about political censorship and corporate control of the broader information flow will try to do if we give them the slightest opportunity.
And yeah, as a secondary point I'm saying that I actually do think things like protein folding and weather prediction and cancer detection are pretty important to throw out with the bathwater - that's something I feel strongly about too, and it's a debate I'll happily have because it's meaningful to me - but that's secondary to my thinking because even if there were no worthwhile use of whatever we're fuzzily calling "AI" I'd still be just as wary about the ripple effects of legislating it using copyright, which ultimately is the law that governs the flow of information as a whole.
I mean, sure, I agree that something that could be proposed specifically would be ruined by lawyers and broadened by governments/power and that truly, there is no good way forward. In my opinion, it's a rather hopeless situation so I'm just hanging on to imaginings of maybe an ideal world that could never exist. Fair enough. I need that hope to hang onto, because otherwise (don't worry about me, but to make my point:)
TW: existential hyperbole
I'd almost rather not exist in a world where AI "art" doesBut, that doesn't mean I'd remotely move from my position of actively opposing all forms of "generative AI", and definitely any of them that allow things like fake art / fake music / big swaths of text and code / misinforming "overviews" and summaries, to be made, any of them that train on the web and artists/creators/etc data, and I am fully okay, since the governments won't do anything useful, as we've established, with people that want to create "traps" for bots and crawlers to get stuck into or ways to poison the datasets, etc- basically the people attempting to claw a bit of power back into their own hands, even if it's ultimately futile. Whatever wrenches can be thrown into the machine. The luddites swinging the hammer to break them. People on the inside of these companies slowly fucking things up without their superiors knowing, etc.
I refuse to accept the LLM present or future and if that ends up leaving me behind while the world moves on, so be it. Our world is truly so fucked, that yes, maybe there is no real way out of the mess we've made it to now. But I'll hang onto that adversarial position, because in no way do I need to lay down and "accept" it or assimilate into it, or comply in advance with its omnipresence or the millions of ways it will be used against us and actively harms us even now
EDIT: Lastly, I will agree to an extent with the argument I see that a lot of the problems are the political/social/financial landscape these systems exist in, the incentives that creates, the power/money behind them, their ownership, etc and the types of systems that has led them to create, and that completely isolated, the "tech" of an LLM on its own isn't completely inherently bad, though not without its inherent problems, and being solely in the hands of the people, outside of those structures and scale and data access, it could be far less harmful, it's that we've built a world where it's unavoidable that the existence of these systems will be a net negative for a million reasons
There’s a lot I’d truly love to explore here, but I don’t think that would be fair or helpful on something that’s clearly so emotive - I won’t claim to understand, but I can respect it, and more than anything I hope that things unfold in a way that brings you a bit more comfort. For all that we’re coming from very different places on the technology, I get the impression we have a lot of the same concerns about the implications and the people controlling it.
Well, thank you, and, understandable, and yes, I agree that it's, at least for me, not worth exploring any of it past this point here
I really shouldn't have posted on the topic to begin with tbh, so I sorta opened this can of worms myself by doing something I set a rule for myself not to do (given that I know my position is what it is, and how I feel, etc, I don't want to dive into something only to appear to be in bad faith later by telling the truth about not wanting to debate the topic, etc... i get it could come off that way). I've reached the "generative AI hater" stage (far past the debate/critique stage), and view it as as unmovable (at least in one direction) as a number of my political/social views, so it really helps no one for me to chime in every time I see a topic about AI.
I can’t tell if this is a good idea or not. I’m curious what others think. I think semantic web ideas have generally not been very effective nor widely adopted, but it seems like this is a way of baking licensing into web content in a different way than it has been done historically. Will this actually work to get big tech to pay for content, or will this just be a technical way of opting out of being crawled?
Why would it do either thing?
Putting a thing on your website that says "do not crawl" is just an extra document to feed the crawlers. Until some of these companies get dismantled, none of them are following any licenses.
If this becomes a widely adopted standard for securely licensing and encrypting nonpublic content, then such content could not be crawled.
But a freely readable website, while "public" in the sense of access, doesn't mean data on that website is freely openly available for the taking. But it does mean crawlers can crawl it even if they do not have the right to. So for completely nonpublic content, maybe it would make sense, but is there a danger of it locking away public access to content that is freely available to view but not "take"? Almost like a weird form of DRM of web content?
It all seems like you'd have to get the entire industry to sign on and play by the rules, which they've shown they're not interested in doing, and also I worry there are plenty of unexplored side-effects to something like this
Like, even the site's copy says "add this to your robots.txt, website, etc" - that will do nothing to stop them. They're already actively using fake user agents to get around robots.txt. If it requires crawlers present themselves honestly it's DOA
Some forms of this RSL, as I mentioned, would allow for implementing encryption so that only authorized license holders could access it. I agree that putting some markup in a robots.txt does nothing to change the status quo. That’s not the part of this that’s interesting to me (other than possibly standardizing machine-readable metadata for signaling to ethical crawlers, which is something that I’ve felt is lacking with the web since, well, forever, but isn’t necessarily the most dire of problems).
I suppose to me it just feels like a bandaid on an open wound, and locking up content sets a potentially scary precedent (depending on the content affected)
It's been a worry for me ever since AI has become a thing, that society's response to it could take bad, unnuanced turns in our zealousness to fight it (like making copyright law even worse in ways that penalize everyone not just AI companies, for instance)
I may have missed it in the docs, but do they have a solution for the micropayments infrastructure required to make that happen? That’s always been a major sticking point in the past.
RSL specifies a
<payment>
element, but the content owner must link out to a payment service to handle transactions. If<payment>
is not present, the pricing model is assumed to be free.Got you, thanks. That means the micropayments question is still open, which I think is another major piece of the puzzle that’d need to be in place for this to work well.
They’d need to partner with one of the major payment processors for a user friendly and cost effective way to make that tag work - which could well happen, it’s not technically that hard - but I’ve been hearing organisations talk about it way back to into the days that these conversations were happening on Slashdot and still haven’t seen it come together…