From the article: ... ... This is very sweeping. Imagine a litigation hold on every Gmail user's email.
From the article:
OpenAI is now fighting a court order to preserve all ChatGPT user logs—including deleted chats and sensitive chats logged through its API business offering—after news organizations suing over copyright claims accused the AI company of destroying evidence.
...
In the filing, OpenAI alleged that the court rushed the order based only on a hunch raised by The New York Times and other news plaintiffs. And now, without "any just cause," OpenAI argued, the order "continues to prevent OpenAI from respecting its users’ privacy decisions." That risk extended to users of ChatGPT Free, Plus, and Pro, as well as users of OpenAI’s application programming interface (API), OpenAI said.
The court order came after news organizations expressed concern that people using ChatGPT to skirt paywalls "might be more likely to 'delete all [their] searches' to cover their tracks," OpenAI explained. Evidence to support that claim, news plaintiffs argued, was missing from the record because so far, OpenAI had only shared samples of chat logs that users had agreed that the company could retain. Sharing the news plaintiffs' concerns, the judge, Ona Wang, ultimately agreed that OpenAI likely would never stop deleting that alleged evidence absent a court order, granting news plaintiffs' request to preserve all chats.
...
Before the order was in place mid-May, OpenAI only retained "chat history" for users of ChatGPT Free, Plus, and Pro who did not opt out of data retention. But now, OpenAI has been forced to preserve chat history even when users "elect to not retain particular conversations by manually deleting specific conversations or by starting a 'Temporary Chat,' which disappears once closed," OpenAI said. Previously, users could also request to "delete their OpenAI accounts entirely, including all prior conversation history," which was then purged within 30 days.
This is very sweeping. Imagine a litigation hold on every Gmail user's email.
It's extremely broad and quite an unpleasant development. As a citizen of a country covered by the GDPR, it's worrying to think that some judge in New York basically has veto power over the rights...
It's extremely broad and quite an unpleasant development. As a citizen of a country covered by the GDPR, it's worrying to think that some judge in New York basically has veto power over the rights which it grants me.
"Oh no, the courts are making us keep and use all this training data!" Snark aside, the court is making them retain the data, but that's a separate issue from when/how it is used. Seems like...
"As a result, OpenAI is forced to jettison its commitment to allow users to control when and how their ChatGPT conversation data is used, and whether it is retained," OpenAI argued.
"Oh no, the courts are making us keep and use all this training data!" Snark aside, the court is making them retain the data, but that's a separate issue from when/how it is used. Seems like OpenAI is setting up a strawman as an excuse to use chat logs as training data going forward.
I don’t really see how you are coming to that conclusion. All they are saying is that they make specific commitments to their business users that their data is not logged, and now they are being...
I don’t really see how you are coming to that conclusion. All they are saying is that they make specific commitments to their business users that their data is not logged, and now they are being forced to log it in violation of that commitment. Nowhere are they saying anything about using that data for training, and if there were even a hint of OpenAI using private customer data for training that would be actually disastrous for their company.
I don't agree it would be disastrous, if only because their entire claim currently is that they must be allowed to use all information online for free for their company to work with no regard for...
I don't agree it would be disastrous, if only because their entire claim currently is that they must be allowed to use all information online for free for their company to work with no regard for any copyright or ownership. Allegedly without this they cannot survive, and they're still not making money. Someone killed themselves after talking to a character.AI bot.... And they're still here. I don't think that anything can be "disastrous" in this environment, where people will just keep giving them money, apparently, and people can die using the product and there's zero interest in regulation.
Trusting them to not use the data input by users? To not "revert" the opt-out settings at each TOS update? Why would I trust people with zero ethics or boundaries on intellectual property that isn't their own.
Yeah my first thought was about whether the company I work at will have to drop OpenAI. Imagine if data shouldn't have been present in a query and then being unable to scrub it at all; that's a...
Yeah my first thought was about whether the company I work at will have to drop OpenAI. Imagine if data shouldn't have been present in a query and then being unable to scrub it at all; that's a lot of risk versus moving to a provider not under these terms.
Someone on HN also pointed out the potential that this puts OpenAI's business in the EU at risk since it wouldn't be GDPR-compliant.
I understand that and feel like I addressed that I don't believe they care about the outcomes of their actions. I just am not convinced that something like that a) turns off the money spigot or b)...
I understand that and feel like I addressed that I don't believe they care about the outcomes of their actions. I just am not convinced that something like that a) turns off the money spigot or b) actually matters to them on an ethical ground.
And if there's not ultimately a large enough financial cost or an ethical standard they follow, and in fact their stated standard is "we must be allowed to use all available data for free" and the likelihood of regulatory control is nil, I'm not genuinely sure how much the billionaires actually care about fucking over their business contracts.
You'd think they should, but since this all seems to be a financial shell game, I don't think financial wellbeing is all that critical either.
"Reports show that ChatGPT is using data from those who chose to opt out" is a class action settlement where everyone gets 42 bucks and credit monitoring IMO. That's what happens during every other privacy/data violation. They lose some contracts, they throw an exec under the bus. But do we really think anything significant happens to them?
In the EU, they will absolutely be destroyed. The laws on data retention, customer data, and segregating business data are treated very seriously, and the fines are huge (percentage of revenue...
In the EU, they will absolutely be destroyed. The laws on data retention, customer data, and segregating business data are treated very seriously, and the fines are huge (percentage of revenue type fines).
This court order is honestly an international legal nightmare.
I can see the concern with the EU, though per this analysis, the court order itself is in a bit of a grey area in one Article but maybe not another if I understand correctly. There's an exception...
I can see the concern with the EU, though per this analysis, the court order itself is in a bit of a grey area in one Article but maybe not another if I understand correctly. There's an exception for court orders but whether that applies in the US is somewhat up in the air.
They're certainly not alone in getting court orders prohibiting destruction of data as potential destruction of evidence however. And I acknowledge it would hurt them more in EU or (maybe?) UK, however it still seems mostly like you get to pay a bunch of money and keep doing business.
Personally I think their long term bigger legal risk to their existence in the EU is flagrant copyright violation and a sense of entitlement to everyone else's work, without which they wouldn't be in this dilemma. They're almost certainly violating the exact same laws by collecting data that individuals could not request to have deleted in their training model.
It only cost them 15 mil in Italy - which isn't nearly enough to make them care. But regardless they were already a GDPR compliance mess.
The fines for things like age verification are nothing compared to the DMA compliance fines. They're massive fines that take a percentage of global revenue which means they're proportionally huge...
The fines for things like age verification are nothing compared to the DMA compliance fines. They're massive fines that take a percentage of global revenue which means they're proportionally huge compared to a company's EU revenue. Google has been fined billions of euros, and Apple was recently hit with a €500 million find under the Digital Markets Act.
There are usually exceptions for legally required retention, but the international regulations often conflict and make compliance a nightmare. Thorny regulations are a huge reason so many companies avoid Brazil and India despite their seemingly huge markets.
All the copyright stuff is tangential to the legal nightmare of maintaining logs for everything users ever input.
Correct me if I'm wrong but the fine in Italy was not just about age verification? Apple and Google remain in operation in the EU though. I get that it's a lot of money, I just don't think it'll...
Correct me if I'm wrong but the fine in Italy was not just about age verification?
Apple and Google remain in operation in the EU though. I get that it's a lot of money, I just don't think it'll be enough money.
They just raised 20 billion (with another 20 pending), and have been in violation of the GDPR since they have been using people's data without permission with no way to request removal (or likely to actually remove it) from the beginning. I'm not an expert on that privacy law, so I'm happy to learn more but that seems to cross a line.
I do get that it's complicated legally. And ultimately I personally am fine if they and everyone else that operates like this get nuked by the EU and every other regulatory body because I think their business is exploitative and unethical. But I just don't think it'll happen.
They don't have to keep the logs forever, just through the court case. If that's a terrible imposition, they can plead their case to the judge or move quicker on the court case. But since this case is also potentially existential for them, who knows.
I'll be thrilled to hear people's private data isn't being used. It shouldn't be. But no one else seems to actually keep data private anymore, so I'm just not particularly optimistic that they do it for their corporate clients, especially the smaller, less powerful ones. This is just me saying "don't expect your data to stay private if you put it out there" and "why would I trust this company which already acts in this way in particular"
I can't prove it, I am not really trying to. I just don't have sympathy for them for this and have very low expectations for any entity excused from any sort of ethics by default.
You're right that the fine was about more than age verification, but the fine actually proves my point about the EU leveraging massive fines over things like not sufficiently communicating privacy...
You're right that the fine was about more than age verification, but the fine actually proves my point about the EU leveraging massive fines over things like not sufficiently communicating privacy rights. According to OpenAI, that Italian fine was nearly 20 times their revenue from Italy! That kind of fine absolutely makes a company think twice and act more cautiously.
In a statement shared with the Associated Press, OpenAI called the decision disproportionate and that it intends to appeal, stating the fine is nearly 20 times the revenue it made in Italy during the time period. It further said it's committed to offering beneficial artificial intelligence that abides by users' privacy rights.
To your other point:
I'll be thrilled to hear people's private data isn't being used. It shouldn't be. But no one else seems to actually keep data private anymore, so I'm just not particularly optimistic that they do it for their corporate clients, especially the smaller, less powerful ones. This is just me saying "don't expect your data to stay private if you put it out there" and "why would I trust this company which already acts in this way in particular"
There are plenty of ways to use LLMs in ways that make it impossible for your inputs to be used as training data, but corporate clients are a whole other thing. As other commenters have said, corporate legal agreements are absolutely taken seriously. They are not broken for silly reasons like a bit more training data. The companies that do are rightfully treated with suspicion and get a lot of bad press (Twitter under Elon).
This is one of those things where IMO this company has already done a bunch of shady and shitty things, so I'm basing my opinion that they will probably keep doing shady and shitty things on what...
There are plenty of ways to use LLMs in ways that make it impossible for your inputs to be used as training data, but corporate clients are a whole other thing. As other commenters have said, corporate legal agreements are absolutely taken seriously. They are not broken for silly reasons like a bit more training data. The companies that do are rightfully treated with suspicion and get a lot of bad press (Twitter under Elon).
This is one of those things where IMO this company has already done a bunch of shady and shitty things, so I'm basing my opinion that they will probably keep doing shady and shitty things on what they've done so far. If one doesn't think they're a shady and shitty thing-doing company, we'll probably come to different conclusions.
Meanwhile I could point out Altman is not new to concerns about privacy violations or allegations of engaging in deceptive behavior.
I know that corporate contracts are "serious business"™ Taking that into consideration, let me make clear that my expectations are in fact that low.
For a company like OpenAI, revenue is basically irrelevant. They have enormous amounts of money to burn and that was true well before they started collecting any revenue at all, and I expect it...
According to OpenAI, that Italian fine was nearly 20 times their revenue from Italy! That kind of fine absolutely makes a company think twice and act more cautiously.
For a company like OpenAI, revenue is basically irrelevant. They have enormous amounts of money to burn and that was true well before they started collecting any revenue at all, and I expect it will continue to be true regardless of how little revenue they have.
From a cursory search, OpenAI has doubled its total revenue year-on-year since 2022 and even if that trend continues for the foreseeable future (I have my doubts, especially given how much of their revenue last year was from big partnerships and how little was from actual users/subscribers) and completely stops raising additional investor funds (I have even stronger doubts) then it’s likely several years before it’s even close to making a profit.
I think in the more likely scenario — that they continue raising larger and larger investor funds, and their annual revenue growth starts to plateau or slow down, then they might be decades away from their revenue being relevant to their expenditure
Percentage of revenue fines still hurt and make a company think twice about doing business in a country. If OpenAI would only ever lose money in Italy due to fines, they wouldn't expand there....
Percentage of revenue fines still hurt and make a company think twice about doing business in a country. If OpenAI would only ever lose money in Italy due to fines, they wouldn't expand there.
Even if a company's global revenue will be higher next year, they'll still carefully weigh the finances of expanding to other markets. The common regulatory framework is a huge part of why Americans tech companies scale faster than EU companies dealing with a fractured legal landscape.
Possibly, though there have been plenty of data breaches and regulation violations before with corporate products, and I'm not sure how it ultimately plays out here. I don't own an accurate...
Possibly, though there have been plenty of data breaches and regulation violations before with corporate products, and I'm not sure how it ultimately plays out here. I don't own an accurate crystal ball.
But IMO there's nothing that protects pretty much any LLM from this same lawsuit from newspapers (and one from authors and publishers, and so on). And I don't know if any of the companies have some sort of stronger ethical standards either. So they could end up in the exact same space, both required to maintain data and with fundamentally the same likelihood of violating users' privacy.
It just doesn't make any sense to trust them IMO given their public stances and their legal stances in court and even how they worded this press release, on top of everything else mentioned.
I'm referring to OpenAI not having ethics, not their customers. And my point with that is not that I think corporations have ethics (though I think they can and should) it's that OpenAI has done...
I'm referring to OpenAI not having ethics, not their customers. And my point with that is not that I think corporations have ethics (though I think they can and should) it's that OpenAI has done very little IMO to prove itself trustworthy at all given their explicit stances, and the fundamental lack of significant consequences if they engage in behavior against their contracts. The only other thing potentially stopping them would be ethics and they lack those as well.
If the retention requirement alone is sufficient we'll see if tons of companies drop them during this time, but I don't think that will make a difference. And I don't think that most of the companies that could be dropping their contracts are a true financial threat anyway. They're not making money on those contracts either as far as we know. Which vendor would you go to that lacks the same issues? (Maybe there's someone, maybe you switch like car insurance just on principle every now and then.) What actually makes the money spigot turn off besides maybe heavy government regulation?
I am being very cynical about this and I'm very happy to find out I'm wrong and they honor their contracts, but I just feel like if they say they're entitled to all your data both to the press and in court, why not believe them?
I don't have any specific information about how OpenAI does deals, but speaking generally, a consequence of a supplier breaking contracts with its big customers is losing their business and...
I don't have any specific information about how OpenAI does deals, but speaking generally, a consequence of a supplier breaking contracts with its big customers is losing their business and getting sued. This is something most companies care about, even if their ethics is questionable in other ways.
Such contracts aren't like typical end user agreements - they're the result of negotiations between lawyers at each firm and the contracts are often much stricter. Sometimes there are specific penalty clauses.
If we were talking about Twitter after Musk bought them, what you say would make sense. Musk blatantly broke contracts and got sued by suppliers, employees, and even their landlord. There were a lot of news stories about it because it's not typical behavior. I haven't heard about that sort of behavior from OpenAI, have you?
I'm aware of what contracts are and the role of the vendor vs the customer by which I mean the company contracting with the vendor, not the end user. I feel like I've explicitly said why I think...
I'm aware of what contracts are and the role of the vendor vs the customer by which I mean the company contracting with the vendor, not the end user.
I feel like I've explicitly said why I think they don't face significant consequences for breaking their contracts nor do I think they're particularly pressured by market factors or EU regulations at the moment and I'll just be repeating myself. People like to give them money even if strings are attached (strings of become for-profit). And the consequences of violating actual laws so far have been minimal.
Your question really isn't relevant to my, as I noted, cynical predictive take. I'm making a future claim based on my interpretation of their actions, not based on their historical actions. And it feels like you're doing the Socratic thing which feels disingenuous as you're able to look that information up yourself.
The comparison I'd make would be that there's no reason to trust Twitter's* promise that Grok would be a truth seeking AI when everything that's happened since shows its pretty susceptible to manipulation by Musk and anyone else that wants to alter it. Idk what they promise anyone they sell their product to, but I wouldn't trust them, personally, based on the behavior of the company and executives.
Anyway I know my take is cynical, and I will be happy if I am wrong. But I have very rarely been surprised pleasantly in the face of low expectations for companies who explicitly are excused from even the concept of ethics.
*I know that the AI company now owns Twitter instead and that is also just a shell game.
Perhaps I'm reading too much into it, but my conclusion comes directly from the leading part of that statement: The wording suggests complete abandon of user privacy and control rather than a...
Perhaps I'm reading too much into it, but my conclusion comes directly from the leading part of that statement:
"As a result, OpenAI is forced to jettison its commitment to allow users to control when and how their ChatGPT conversation data is used"
The wording suggests complete abandon of user privacy and control rather than a compromise, and like I said before retention is a separate issue to use (with the exception of legal discovery//what is cited as evidence in court). I'm just following the implication to its logical conclusion, but again you're right that this isn't stated outright and I'm reading a lot into the word "jettison".
edit:
if there were even a hint of OpenAI using private customer data for training that would be actually disastrous for their company.
Would it? Modern AI is built on rampant copyright infringement and abuses, and people freely give away their privacy. It'll be alarming for some, but so long as they remain a leader in the space, the masses will continue using it. And if using private customer data for training gives them that edge, they will. They already do. This would just expand the scope to training on data from "Services for businesses".
The safest way to protect private data is to never collect it in the first place. The second best way is to never store private data. If they're now legally required to maintain everything that's...
The safest way to protect private data is to never collect it in the first place. The second best way is to never store private data.
If they're now legally required to maintain everything that's ever input, that's an absolute nightmare to protect and maintain. Not even getting into how this affects user data coming from a company. Many lawyers are going to be working late nights to figure out the international ramifications of this policy.
I find it alarming that a single American judge can effectively bankrupt OpenAI without any additional oversight or a full legal proceeding. All of the work that we've put in to supporting OpenAI...
I find it alarming that a single American judge can effectively bankrupt OpenAI without any additional oversight or a full legal proceeding.
All of the work that we've put in to supporting OpenAI APIs in our products will be completely wasted if this isn't reversed very quickly. Our customers and potentially us too are at risk of being shutdown from huge GDPR fines if we don't immediately cease using OpenAI services.
Now our products do support other providers such as Google and Anthropic (with lesser features) but presumably they're at the same risk of being made impossible to use and comply with the GDPR.
I imagine a lot of businesses are doing exactly that right now: seeing if they can divest. Even if it isn't necessary compliance wise, it might still be expensive proprietary data going into the API.
I imagine a lot of businesses are doing exactly that right now: seeing if they can divest. Even if it isn't necessary compliance wise, it might still be expensive proprietary data going into the API.
The other providers might not be as vulnerable to copyright infringement suits, depending on how they trained? Just as end-to-end encryption is preferred by people who want to be very sure that...
The other providers might not be as vulnerable to copyright infringement suits, depending on how they trained?
Just as end-to-end encryption is preferred by people who want to be very sure that their communication is secure, this is going to make running AI inference either locally or in your own cloud account more appealing to businesses. They might still get sued, but at least it will be for something they did, rather than something someone else did.
But Wang disagreed with Hunt that she exceeded her authority in enforcing the order, emphasizing in a footnote that her order cannot be construed as enabling mass surveillance.
"Proposed Intervenor does not explain how a court’s document retention order that directs the preservation, segregation, and retention of certain privately held data by a private company for the limited purposes of litigation is, or could be, a 'nationwide mass surveillance program,'" Wang wrote. "It is not. The judiciary is not a law enforcement agency."
This is so very dumb American litigation-ese. Yes, she's technically right that the purpose of surveilling isn't there, but the actual output of her order is absolutely enforced surveillance. 🙄
This is so very dumb American litigation-ese. Yes, she's technically right that the purpose of surveilling isn't there, but the actual output of her order is absolutely enforced surveillance. 🙄
In Canadian Law, it's explicit case law that you can't go on a fishing expedition with your discovery requests, nevermind this kind of insane "emergency order". I can't say for sure but I would...
In Canadian Law, it's explicit case law that you can't go on a fishing expedition with your discovery requests, nevermind this kind of insane "emergency order". I can't say for sure but I would bet that US law has similar rules. Terrible law.
It’s a very quick way for me to skip an article as it’s so often the minimum amount of research and clearly not going to cover any of the nuance that might exist. So there’s that
It’s a very quick way for me to skip an article as it’s so often the minimum amount of research and clearly not going to cover any of the nuance that might exist.
From the article:
...
...
This is very sweeping. Imagine a litigation hold on every Gmail user's email.
It's extremely broad and quite an unpleasant development. As a citizen of a country covered by the GDPR, it's worrying to think that some judge in New York basically has veto power over the rights which it grants me.
"Oh no, the courts are making us keep and use all this training data!" Snark aside, the court is making them retain the data, but that's a separate issue from when/how it is used. Seems like OpenAI is setting up a strawman as an excuse to use chat logs as training data going forward.
I don’t really see how you are coming to that conclusion. All they are saying is that they make specific commitments to their business users that their data is not logged, and now they are being forced to log it in violation of that commitment. Nowhere are they saying anything about using that data for training, and if there were even a hint of OpenAI using private customer data for training that would be actually disastrous for their company.
I don't agree it would be disastrous, if only because their entire claim currently is that they must be allowed to use all information online for free for their company to work with no regard for any copyright or ownership. Allegedly without this they cannot survive, and they're still not making money. Someone killed themselves after talking to a character.AI bot.... And they're still here. I don't think that anything can be "disastrous" in this environment, where people will just keep giving them money, apparently, and people can die using the product and there's zero interest in regulation.
Trusting them to not use the data input by users? To not "revert" the opt-out settings at each TOS update? Why would I trust people with zero ethics or boundaries on intellectual property that isn't their own.
Yeah my first thought was about whether the company I work at will have to drop OpenAI. Imagine if data shouldn't have been present in a query and then being unable to scrub it at all; that's a lot of risk versus moving to a provider not under these terms.
Someone on HN also pointed out the potential that this puts OpenAI's business in the EU at risk since it wouldn't be GDPR-compliant.
I understand that and feel like I addressed that I don't believe they care about the outcomes of their actions. I just am not convinced that something like that a) turns off the money spigot or b) actually matters to them on an ethical ground.
And if there's not ultimately a large enough financial cost or an ethical standard they follow, and in fact their stated standard is "we must be allowed to use all available data for free" and the likelihood of regulatory control is nil, I'm not genuinely sure how much the billionaires actually care about fucking over their business contracts.
You'd think they should, but since this all seems to be a financial shell game, I don't think financial wellbeing is all that critical either.
"Reports show that ChatGPT is using data from those who chose to opt out" is a class action settlement where everyone gets 42 bucks and credit monitoring IMO. That's what happens during every other privacy/data violation. They lose some contracts, they throw an exec under the bus. But do we really think anything significant happens to them?
In the EU, they will absolutely be destroyed. The laws on data retention, customer data, and segregating business data are treated very seriously, and the fines are huge (percentage of revenue type fines).
This court order is honestly an international legal nightmare.
I can see the concern with the EU, though per this analysis, the court order itself is in a bit of a grey area in one Article but maybe not another if I understand correctly. There's an exception for court orders but whether that applies in the US is somewhat up in the air.
They're certainly not alone in getting court orders prohibiting destruction of data as potential destruction of evidence however. And I acknowledge it would hurt them more in EU or (maybe?) UK, however it still seems mostly like you get to pay a bunch of money and keep doing business.
Personally I think their long term bigger legal risk to their existence in the EU is flagrant copyright violation and a sense of entitlement to everyone else's work, without which they wouldn't be in this dilemma. They're almost certainly violating the exact same laws by collecting data that individuals could not request to have deleted in their training model.
It only cost them 15 mil in Italy - which isn't nearly enough to make them care. But regardless they were already a GDPR compliance mess.
The fines for things like age verification are nothing compared to the DMA compliance fines. They're massive fines that take a percentage of global revenue which means they're proportionally huge compared to a company's EU revenue. Google has been fined billions of euros, and Apple was recently hit with a €500 million find under the Digital Markets Act.
There are usually exceptions for legally required retention, but the international regulations often conflict and make compliance a nightmare. Thorny regulations are a huge reason so many companies avoid Brazil and India despite their seemingly huge markets.
All the copyright stuff is tangential to the legal nightmare of maintaining logs for everything users ever input.
Correct me if I'm wrong but the fine in Italy was not just about age verification?
Apple and Google remain in operation in the EU though. I get that it's a lot of money, I just don't think it'll be enough money.
They just raised 20 billion (with another 20 pending), and have been in violation of the GDPR since they have been using people's data without permission with no way to request removal (or likely to actually remove it) from the beginning. I'm not an expert on that privacy law, so I'm happy to learn more but that seems to cross a line.
I do get that it's complicated legally. And ultimately I personally am fine if they and everyone else that operates like this get nuked by the EU and every other regulatory body because I think their business is exploitative and unethical. But I just don't think it'll happen.
They don't have to keep the logs forever, just through the court case. If that's a terrible imposition, they can plead their case to the judge or move quicker on the court case. But since this case is also potentially existential for them, who knows.
I'll be thrilled to hear people's private data isn't being used. It shouldn't be. But no one else seems to actually keep data private anymore, so I'm just not particularly optimistic that they do it for their corporate clients, especially the smaller, less powerful ones. This is just me saying "don't expect your data to stay private if you put it out there" and "why would I trust this company which already acts in this way in particular"
I can't prove it, I am not really trying to. I just don't have sympathy for them for this and have very low expectations for any entity excused from any sort of ethics by default.
You're right that the fine was about more than age verification, but the fine actually proves my point about the EU leveraging massive fines over things like not sufficiently communicating privacy rights. According to OpenAI, that Italian fine was nearly 20 times their revenue from Italy! That kind of fine absolutely makes a company think twice and act more cautiously.
To your other point:
There are plenty of ways to use LLMs in ways that make it impossible for your inputs to be used as training data, but corporate clients are a whole other thing. As other commenters have said, corporate legal agreements are absolutely taken seriously. They are not broken for silly reasons like a bit more training data. The companies that do are rightfully treated with suspicion and get a lot of bad press (Twitter under Elon).
This is one of those things where IMO this company has already done a bunch of shady and shitty things, so I'm basing my opinion that they will probably keep doing shady and shitty things on what they've done so far. If one doesn't think they're a shady and shitty thing-doing company, we'll probably come to different conclusions.
Meanwhile I could point out Altman is not new to concerns about privacy violations or allegations of engaging in deceptive behavior.
I know that corporate contracts are "serious business"™ Taking that into consideration, let me make clear that my expectations are in fact that low.
For a company like OpenAI, revenue is basically irrelevant. They have enormous amounts of money to burn and that was true well before they started collecting any revenue at all, and I expect it will continue to be true regardless of how little revenue they have.
From a cursory search, OpenAI has doubled its total revenue year-on-year since 2022 and even if that trend continues for the foreseeable future (I have my doubts, especially given how much of their revenue last year was from big partnerships and how little was from actual users/subscribers) and completely stops raising additional investor funds (I have even stronger doubts) then it’s likely several years before it’s even close to making a profit.
I think in the more likely scenario — that they continue raising larger and larger investor funds, and their annual revenue growth starts to plateau or slow down, then they might be decades away from their revenue being relevant to their expenditure
Percentage of revenue fines still hurt and make a company think twice about doing business in a country. If OpenAI would only ever lose money in Italy due to fines, they wouldn't expand there.
Even if a company's global revenue will be higher next year, they'll still carefully weigh the finances of expanding to other markets. The common regulatory framework is a huge part of why Americans tech companies scale faster than EU companies dealing with a fractured legal landscape.
Possibly, though there have been plenty of data breaches and regulation violations before with corporate products, and I'm not sure how it ultimately plays out here. I don't own an accurate crystal ball.
But IMO there's nothing that protects pretty much any LLM from this same lawsuit from newspapers (and one from authors and publishers, and so on). And I don't know if any of the companies have some sort of stronger ethical standards either. So they could end up in the exact same space, both required to maintain data and with fundamentally the same likelihood of violating users' privacy.
It just doesn't make any sense to trust them IMO given their public stances and their legal stances in court and even how they worded this press release, on top of everything else mentioned.
I'm referring to OpenAI not having ethics, not their customers. And my point with that is not that I think corporations have ethics (though I think they can and should) it's that OpenAI has done very little IMO to prove itself trustworthy at all given their explicit stances, and the fundamental lack of significant consequences if they engage in behavior against their contracts. The only other thing potentially stopping them would be ethics and they lack those as well.
If the retention requirement alone is sufficient we'll see if tons of companies drop them during this time, but I don't think that will make a difference. And I don't think that most of the companies that could be dropping their contracts are a true financial threat anyway. They're not making money on those contracts either as far as we know. Which vendor would you go to that lacks the same issues? (Maybe there's someone, maybe you switch like car insurance just on principle every now and then.) What actually makes the money spigot turn off besides maybe heavy government regulation?
I am being very cynical about this and I'm very happy to find out I'm wrong and they honor their contracts, but I just feel like if they say they're entitled to all your data both to the press and in court, why not believe them?
I don't have any specific information about how OpenAI does deals, but speaking generally, a consequence of a supplier breaking contracts with its big customers is losing their business and getting sued. This is something most companies care about, even if their ethics is questionable in other ways.
Such contracts aren't like typical end user agreements - they're the result of negotiations between lawyers at each firm and the contracts are often much stricter. Sometimes there are specific penalty clauses.
If we were talking about Twitter after Musk bought them, what you say would make sense. Musk blatantly broke contracts and got sued by suppliers, employees, and even their landlord. There were a lot of news stories about it because it's not typical behavior. I haven't heard about that sort of behavior from OpenAI, have you?
I'm aware of what contracts are and the role of the vendor vs the customer by which I mean the company contracting with the vendor, not the end user.
I feel like I've explicitly said why I think they don't face significant consequences for breaking their contracts nor do I think they're particularly pressured by market factors or EU regulations at the moment and I'll just be repeating myself. People like to give them money even if strings are attached (strings of become for-profit). And the consequences of violating actual laws so far have been minimal.
Your question really isn't relevant to my, as I noted, cynical predictive take. I'm making a future claim based on my interpretation of their actions, not based on their historical actions. And it feels like you're doing the Socratic thing which feels disingenuous as you're able to look that information up yourself.
The comparison I'd make would be that there's no reason to trust Twitter's* promise that Grok would be a truth seeking AI when everything that's happened since shows its pretty susceptible to manipulation by Musk and anyone else that wants to alter it. Idk what they promise anyone they sell their product to, but I wouldn't trust them, personally, based on the behavior of the company and executives.
Anyway I know my take is cynical, and I will be happy if I am wrong. But I have very rarely been surprised pleasantly in the face of low expectations for companies who explicitly are excused from even the concept of ethics.
*I know that the AI company now owns Twitter instead and that is also just a shell game.
Perhaps I'm reading too much into it, but my conclusion comes directly from the leading part of that statement:
The wording suggests complete abandon of user privacy and control rather than a compromise, and like I said before retention is a separate issue to use (with the exception of legal discovery//what is cited as evidence in court). I'm just following the implication to its logical conclusion, but again you're right that this isn't stated outright and I'm reading a lot into the word "jettison".
edit:
Would it? Modern AI is built on rampant copyright infringement and abuses, and people freely give away their privacy. It'll be alarming for some, but so long as they remain a leader in the space, the masses will continue using it. And if using private customer data for training gives them that edge, they will. They already do. This would just expand the scope to training on data from "Services for businesses".
The safest way to protect private data is to never collect it in the first place. The second best way is to never store private data.
If they're now legally required to maintain everything that's ever input, that's an absolute nightmare to protect and maintain. Not even getting into how this affects user data coming from a company. Many lawyers are going to be working late nights to figure out the international ramifications of this policy.
I find it alarming that a single American judge can effectively bankrupt OpenAI without any additional oversight or a full legal proceeding.
All of the work that we've put in to supporting OpenAI APIs in our products will be completely wasted if this isn't reversed very quickly. Our customers and potentially us too are at risk of being shutdown from huge GDPR fines if we don't immediately cease using OpenAI services.
Now our products do support other providers such as Google and Anthropic (with lesser features) but presumably they're at the same risk of being made impossible to use and comply with the GDPR.
I imagine a lot of businesses are doing exactly that right now: seeing if they can divest. Even if it isn't necessary compliance wise, it might still be expensive proprietary data going into the API.
The other providers might not be as vulnerable to copyright infringement suits, depending on how they trained?
Just as end-to-end encryption is preferred by people who want to be very sure that their communication is secure, this is going to make running AI inference either locally or in your own cloud account more appealing to businesses. They might still get sued, but at least it will be for something they did, rather than something someone else did.
Judge denies creating “mass surveillance program” harming all ChatGPT users
This is so very dumb American litigation-ese. Yes, she's technically right that the purpose of surveilling isn't there, but the actual output of her order is absolutely enforced surveillance. 🙄
In Canadian Law, it's explicit case law that you can't go on a fishing expedition with your discovery requests, nevermind this kind of insane "emergency order". I can't say for sure but I would bet that US law has similar rules. Terrible law.
It’s a very quick way for me to skip an article as it’s so often the minimum amount of research and clearly not going to cover any of the nuance that might exist.
So there’s that