24 votes

Fact sheet: US President Joe Biden issues executive order on safe, secure, and trustworthy artificial intelligence

30 comments

  1. [24]
    mattgif
    Link
    What I would really like to see is for there to be a legal requirement for AI output to cite its sources. If AI is offering allegedly factual information, it should be able to provide annotations...

    What I would really like to see is for there to be a legal requirement for AI output to cite its sources. If AI is offering allegedly factual information, it should be able to provide annotations which direct readers to the factual basis for these assertions.

    This would help mitigate what are, to my mind, the most pressing issues with AI:

    • Hallucinations: If the source doesn't exist, or doesn't actually say what the AI claims it does, it is easy enough to check
    • Theft: If the source is some forbidden material (due to copyright or other protections), we would know
    • Dubious knowledge base: If the only source of a claim, say, about the war in Gaza is a paranoid blog that also features UFO news, we should be aware of that.
    21 votes
    1. [7]
      Goodtoknow
      Link Parent
      A raw model doesn't even know it's own sources, it would have to be connected to the internet and prompted how to research. Raw models are still creatively useful without relying on them for...

      A raw model doesn't even know it's own sources, it would have to be connected to the internet and prompted how to research. Raw models are still creatively useful without relying on them for factual information.

      19 votes
      1. [6]
        mattgif
        Link Parent
        Right. That's the part that needs to be fixed. The AI companies need to prove their products are safe before unleashing them on the market. Source citing is one part of doing that. I'm not moved...

        Right. That's the part that needs to be fixed. The AI companies need to prove their products are safe before unleashing them on the market. Source citing is one part of doing that.

        I'm not moved by the argument that they are useful and therefore we should let other concerns slide with the public as beta testers. I say this as someone who loves Copilot for coding. I would love it more if it told me where it was getting its advice.

        10 votes
        1. [5]
          BitsMcBytes
          Link Parent
          I think this is essentially asking for AGI? AKAIK, models can only return a source if it is provided a source or a set of sources, as part of the prompt (either user given or internally fetched...

          I think this is essentially asking for AGI?

          AKAIK, models can only return a source if it is provided a source or a set of sources, as part of the prompt (either user given or internally fetched and then sent to the model.) There's no way I think, for a model to reverse engineer its weights and figure out how some vector maps to some source.

          But AGI would, in theory, be able to say "this is what I know about the world and here is my source on it."

          I might be wrong on this and definitely welcome any corrections!

          11 votes
          1. [4]
            mattgif
            Link Parent
            Definitely not AGI. But yes, a departure from stochiastic parrots. You can imagine how this would work in the absence of AGI by exploiting the ability of LLMs to both generate and summarize: Make...

            Definitely not AGI. But yes, a departure from stochiastic parrots. You can imagine how this would work in the absence of AGI by exploiting the ability of LLMs to both generate and summarize:

            • Make a first pass generative assertion
            • Find what sources are relevant to that topic (simple tagging model, weighted for authority)
            • Search sources for key words
            • Summarize results
            • Does assertion match any summaries?
            • If yes, return assertion citing matching result
            • If no, go to 1 and try again

            Way more tedious than machine-gunning assertions and hoping something is true. But what are computers for, if not doing repetitive tedious tasks? Way more computationally expensive too. But bad information has a high price too.

            The main problem with that approach would be figuring out how to calculate a "match" between a summary and an assertion. (Also, the summary process is fallible.) But you wouldn't need AGI to solve it, just recursive LLMs.

            6 votes
            1. [2]
              Minori
              Link Parent
              But that's not citing the sources for the model's output. That's just finding sources that are similar to what the model produced. Stochastic parrots don't know their sources, so trying to create...

              But that's not citing the sources for the model's output. That's just finding sources that are similar to what the model produced. Stochastic parrots don't know their sources, so trying to create a fake citation by working backwards is extremely misleading.

              A line of best fit with an R² value simply can't tell you all the data points it's based off of. It can give you possibilities, but there are no 1:1 relationships.

              2 votes
              1. mattgif
                Link Parent
                It'd be functionally the same. It's fact checking itself and providing sources. I'd be happy with that.

                It'd be functionally the same. It's fact checking itself and providing sources. I'd be happy with that.

                1 vote
            2. Promethean
              Link Parent
              There are plugins for ChatGPT that allow it to comb through research databases to find papers to cite in its response. It's quite effective in identifying papers of interest.

              There are plugins for ChatGPT that allow it to comb through research databases to find papers to cite in its response. It's quite effective in identifying papers of interest.

    2. [8]
      teaearlgraycold
      Link Parent
      I'm gonna be honest, I don't think this makes much sense. Hallucinations are exactly what you're looking for with LLMs half of the time. The first thing we should do is determine the copyright...

      I'm gonna be honest, I don't think this makes much sense. Hallucinations are exactly what you're looking for with LLMs half of the time.

      The first thing we should do is determine the copyright status of model weights and their outputs.

      5 votes
      1. [7]
        mattgif
        Link Parent
        If the AI is answering a question or giving an example, it should cite its source. How does that fail to make sense for the 50% of the time you're not looking for fabulated non-information?

        If the AI is answering a question or giving an example, it should cite its source. How does that fail to make sense for the 50% of the time you're not looking for fabulated non-information?

        1. [6]
          teaearlgraycold
          Link Parent
          This is completely down to the users not having their expectations set appropriately. LLMs are great for synthesizing results from cite-able sources, they are not at all designed to recall what...

          This is completely down to the users not having their expectations set appropriately. LLMs are great for synthesizing results from cite-able sources, they are not at all designed to recall what source in their training corpus contributed to a response. So you need to tell the user that if they're interacting with an LLM on its own it's just a word jumbler. When they interact with an LLM that is connected to the web (like GPT-4 when the Bing plugin is enabled) the expectations can be set differently.

          Basically these are two different products and I don't think an LLM without an internet connection should get nerfed because people don't understand what it's doing.

          6 votes
          1. [5]
            mattgif
            Link Parent
            Put another way, you are saying that LLMs are perfectly safe in the hands of knowledgeable users that understand their limits and purposes. I agree. On the flip side, then, they are unsafe in less...

            Put another way, you are saying that LLMs are perfectly safe in the hands of knowledgeable users that understand their limits and purposes.

            I agree. On the flip side, then, they are unsafe in less capable hands.

            Problem: AIs are widely available, and it turns out that most people--even very smart people--can be mislead by them.

            I proposed a solution: Get AIs to cite their sources.

            So now I'm curious where our area of disagreement is: Do you disagree that there is a problem? Do agree there is a problem, but do not care if it is solved? Or do you have an alternative solution?

            2 votes
            1. [3]
              teaearlgraycold
              Link Parent
              The alternative solution is setting expectations. If there should be some mandate in place that’s what should be done. Require LLMs to be wrapped in an educational onboarding process. Change the...

              The alternative solution is setting expectations. If there should be some mandate in place that’s what should be done. Require LLMs to be wrapped in an educational onboarding process. Change the design to make it clear the responses aren’t “answers”. They’re just responses.

              2 votes
              1. [2]
                mattgif
                Link Parent
                I don't think it's realistic to train the human population to be smart about AI use. "Setting expectations" with a TOS is definitely not going to be enough. Unless maybe you were thinking of...

                I don't think it's realistic to train the human population to be smart about AI use. "Setting expectations" with a TOS is definitely not going to be enough.

                Unless maybe you were thinking of something like a license-to-use test? I would accept that this is an alternative approach.

                1 vote
                1. teaearlgraycold
                  (edited )
                  Link Parent
                  Yeah TOS won't do shit. I mean some kind of evaluation to show that the user understands what they're getting into. And for the variants that primarily work from cite-able sources you could skip...

                  Yeah TOS won't do shit. I mean some kind of evaluation to show that the user understands what they're getting into. And for the variants that primarily work from cite-able sources you could skip that step. So perhaps on signup you are immediately able to use an LLM that synthesizes answers for you from cited web results. Then if you want to use the "creative" AI you first prove you know the basics about how it works and that the entire point of this more advanced version is to make stuff up.

                  Edit: The ultimate version of this might be a slider. It would go from "factual" to "creative" to "hallucinatory".

            2. vord
              Link Parent
              Easy solution to the copyright problem: All AI models must have their training data publicly available. I agree that LLMs that are answering questions need to cite where those answers come from....

              Easy solution to the copyright problem: All AI models must have their training data publicly available.

              I agree that LLMs that are answering questions need to cite where those answers come from. It's not acceptable for 5th graders to do it on a resarch essay, it's not excusable for tech giants that somehow have the resources to maintain a search history of every person on the planet for decades.

    3. [3]
      sharpstick
      Link Parent
      I've been using the Perplexity AI app for several months specifically because it lists its sources as part of its feedback and will often times include a intro sentence about how certain it can be...

      I've been using the Perplexity AI app for several months specifically because it lists its sources as part of its feedback and will often times include a intro sentence about how certain it can be about the answer it gives based on the data it can find. I have seen it become a lot more nuanced in its answers the more capable it becomes.

      4 votes
      1. Goodtoknow
        Link Parent
        Me too, but I find a large portion of the time it's sources are nonsense.

        Me too, but I find a large portion of the time it's sources are nonsense.

        1 vote
      2. shinigami
        Link Parent
        Now this comment has me interested. Interested enough to leave a comment to find later and research. This addresses the two things I want from an AI model.

        Now this comment has me interested. Interested enough to leave a comment to find later and research.

        This addresses the two things I want from an AI model.

    4. Gekko
      Link Parent
      I agree, personally I've noticed that AI can speak very authoritatively about stuff it's making up on the fly. Without an ability to discern between fact and fiction, it'll be tricky to rely on it...

      I agree, personally I've noticed that AI can speak very authoritatively about stuff it's making up on the fly. Without an ability to discern between fact and fiction, it'll be tricky to rely on it as a proper source of information.

      2 votes
    5. [3]
      Moonchild
      Link Parent
      What counts as an ai? And what counts as a source? Suppose I take many measurements of some thing, produce a polynomial best-fit approximation using standard techniques, and then use that...

      What counts as an ai? And what counts as a source? Suppose I take many measurements of some thing, produce a polynomial best-fit approximation using standard techniques, and then use that polynomial to make predictions about the thing. Should predictions made using the polynomial be required to cite elements of the original dataset? That seems absurd to me. Which elements would you even cite? The closest ones? Not very helpful if you need to do a lot of interpolation or extrapolation (especially true if you work in a very high-dimensional space, as neural nets do). The ones that the prediction was most sensitive to? But it is hard to produce a useful ranking, there—moving any single point of the input, or even several of them, will not generally affect the curve very much at all.

      More generally, I notice a tendency to assume that technical problems can simply be solved. Sometimes they can be solved. Sometimes they can't. Often, we don't know if it's possible, or how to do it—and sometimes it's not even obvious what it would mean to solve the problem. Previously (N.B. I don't mean to single out this user, and I certainly don't mean to imply anything about their character; but it's a rather pertinent example), somebody suggested that, because apple's choice to disallow third-party browsers on its phones is politically problematic, it should 'just' expend its massive resources to solve the attendant technical problems. Which is nonsense. By the same token, nobody's holding out on us w.r.t. nuclear fusion.

      Provenance is an interesting area of research in ai (at least, I assume it is interesting to the people who find ai interesting). But it is research. It's not clear to what extent it can be done, and it's not entirely clear what it would even mean to do it.

      1 vote
      1. [2]
        mattgif
        (edited )
        Link Parent
        You're asking a lot of rhetorical questions as "gotchas," but they seem to have fairly clear cut answers. For my purposes, the US Department of State's definition is good enough: You then ask: A...

        You're asking a lot of rhetorical questions as "gotchas," but they seem to have fairly clear cut answers.

        What counts as an ai?

        For my purposes, the US Department of State's definition is good enough:

        “The term ‘artificial intelligence’ means a machine-based system that can, for a given set of human-defined objectives, make predictions, recommendations or decisions influencing real or virtual environments.”

        You then ask:

        [W]hat counts as a source?

        A source is where the information came from. I'm not sure how that's a puzzle. In your example if you are taking the measurements, then you would be the source. You would probably cite your data set, and include some way to access and scrutinize it. If it's not your data, you would cite wherever you got the data as a source. If you are applying some mathematical methods to them, then presumably you're showing your work so all of the data is there to be reconstructed. If it's non-obvious, you might cite sources showing the validity of applying those methods to the problem at hand. This is how we expect scientific papers to work (including pure mathematics). It's how we expect journalists to behave. Why should AI get off the hook? I say it shouldn't.

        You then lament:

        More generally, I notice a tendency to assume that technical problems can simply be solved.

        We have a problem with AI: It produces unreliable information with no discernible providence, and companies nevertheless expect us to accept these systems into our lives. To drive our cars. To write the news. To edit our code. To help with term papers.

        I'm saying: It would help mitigate these problems if AI cited its sources. If tech companies can figure that out, great! If not, maybe AI doesn't get to run wild. Maybe its use is heavily restricted.

        5 votes
        1. Moonchild
          (edited )
          Link Parent
          In this case, the data set is the entirety of the training data. Which is unreasonably large. All of the training data informs a given response. Some of it perhaps moreso, but it's not clear how...

          You would probably cite your data set

          In this case, the data set is the entirety of the training data. Which is unreasonably large. All of the training data informs a given response. Some of it perhaps moreso, but it's not clear how to quantify that (what if it takes significant stylistic inspiration from one source?).

          A source is where the information came from

          The output is not denotative; it is narrative. And, many of the more interesting putative uses for the AIs are synthetic, not simply replicative, such that there is not necessarily a source at all.

          We have a problem with AI

          I am not disagreeing with this, but...

          I'm saying: It would help mitigate these problems if AI cited its sources. If tech companies can figure that out, great! If not, maybe AI doesn't get to run wild. Maybe its use is heavily restricted.

          ...another great mitigation would be if the AIs simply stopped being unreliable. My point is that you have proposed an amelioration (provenance), but with no clear definition for it and no indication of its feasibility, it doesn't seem like an interesting point to lead with. Maybe the AIs—as they currently exist—should be regulated and restricted—that seems to me like a much more interesting thing to argue about. And if something else appears that in fact works differently, then we can talk about that at that point.

          3 votes
    6. Grayscail
      Link Parent
      I think researchers themselves would probably like that, but I'm not sure that would be feasible with a language model. Isn't the point to try and aggregate lots of different sources and...

      I think researchers themselves would probably like that, but I'm not sure that would be feasible with a language model. Isn't the point to try and aggregate lots of different sources and synthesize them together to get sort of an "average" answer? Once the training data has gone through the algorithm it loses any sort of locality that would let you track a particular response back to the source. You could trace an answer back to the nodes that fed into it, but each of those nodes is influenced by all the training data that has gone through the network, not just the particular sources that ended up contributing to the answer.

      It's like doing a linear regression and then trying to pinpoint which part of your dataset caused the value of the regression line at a certain point to have a certain value. They kind of all did, so you can't exactly give a single answer to that question.

      Which is part of why it's not a good idea to put full faith in the answers such language models give you. They're not really citing exact sources, which is why they sometimes make mistakes even if the correct information was fed into it at some point.

  2. [6]
    updawg
    Link
    Curious if this can really do anything at all. These all have long descriptions, but these are the bolded "ledes" of the bullet points: New Standards for AI Safety and Security Require that...

    Curious if this can really do anything at all.

    These all have long descriptions, but these are the bolded "ledes" of the bullet points:

    New Standards for AI Safety and Security
    • Require that developers of the most powerful AI systems share their safety test results and other critical information with the U.S. government.

    • Develop standards, tools, and tests to help ensure that AI systems are safe, secure, and trustworthy.

    • Protect against the risks of using AI to engineer dangerous biological materials

    • Protect Americans from AI-enabled fraud and deception by establishing standards and best practices for detecting AI-generated content and authenticating official content.

    • Establish an advanced cybersecurity program to develop AI tools to find and fix vulnerabilities in critical software

    • Order the development of a National Security Memorandum that directs further actions on AI and security

    • Order the development of a National Security Memorandum that directs further actions on AI and security

    Protecting Americans’ Privacy
    • Protect Americans’ privacy by prioritizing federal support for accelerating the development and use of privacy-preserving technique

    • Strengthen privacy-preserving research and technologies

    • Evaluate how agencies collect and use commercially available information and strengthen privacy guidance for federal agencies

    • Develop guidelines for federal agencies to evaluate the effectiveness of privacy-preserving techniques

    Advancing Equity and Civil Rights
    • Provide clear guidance to landlords, Federal benefits programs, and federal contractors

    • Address algorithmic discrimination

    • Ensure fairness throughout the criminal justice system

    Standing Up for Consumers, Patients, and Students
    • Advance the responsible use of AI

    • Shape AI’s potential to transform education

    Supporting Workers
    • Develop principles and best practices to mitigate the harms and maximize the benefits of AI for workers

    • Produce a report on AI’s potential labor-market impacts, and study and identify options for strengthening federal support for workers facing labor disruptions

    Promoting Innovation and Competition
    • Catalyze AI research across the United States through a pilot of the National AI Research Resource

    • Promote a fair, open, and competitive AI ecosystem

    • Use existing authorities to expand the ability of highly skilled immigrants and nonimmigrants with expertise in critical areas to study, stay, and work in the United States

    Advancing American Leadership Abroad
    • Expand bilateral, multilateral, and multistakeholder engagements to collaborate on AI

    • Accelerate development and implementation of vital AI standards

    • Promote the safe, responsible, and rights-affirming development and deployment of AI abroad to solve global challenges

    Ensuring Responsible and Effective Government Use of AI
    • Issue guidance for agencies’ use of AI

    • Help agencies acquire specified AI products and services

    • Accelerate the rapid hiring of AI professionals

    Who knows if any of this will make anything better or worse, but it's something.

    2 votes
    1. [4]
      tealblue
      Link Parent
      I strongly disagree with outright promoting the use of AI in education. It'd be one thing to be saying that we should accept it as a reality and find ways to work around it, but this just sounds...

      I strongly disagree with outright promoting the use of AI in education. It'd be one thing to be saying that we should accept it as a reality and find ways to work around it, but this just sounds like a way to solve the problem of underpaid and underqualified teachers by replacing them with shitty AI tools (that will probably be worse for actual learning than existing online tools like Khan Academy). The purpose of education extends beyond just inserting information into students' brains or creating economically productive citizens. There's immeasurable value to having students be taught by actual (qualified) human teachers.

      13 votes
      1. [2]
        Wolf_359
        Link Parent
        They will do anything except pay teachers. I have watched schools spend literal millions on cutting edge programs that are designed to finally "fix" education. My co-teachers and I have already...

        They will do anything except pay teachers.

        I have watched schools spend literal millions on cutting edge programs that are designed to finally "fix" education.

        My co-teachers and I have already figured out the real solution - smaller class sizes and better home lives. Only one of those can be solved by the school system and its rarely being done.

        As a special Ed. teacher in an integrated classroom, the most effective approach BY FAR has been to split our classes in half (gen. Ed. and special Ed mixed in evenly in each half of the class) and just teach to a smaller class.

        Same lessons, same grading, same everything with only half the number of kids. The behavior, the grades, the engagement are all through the roof better.

        They'll invest in EVERYTHING except attracting more teachers. And this is during a massive nationwide teacher shortage.

        22 votes
        1. Gummy
          Link Parent
          The highschool I graduated from recently built a 6 million dollar football stadium, much to the disgust of underpaid teachers. Gotta keep up that sports image I guess. The language arts building...

          The highschool I graduated from recently built a 6 million dollar football stadium, much to the disgust of underpaid teachers. Gotta keep up that sports image I guess. The language arts building was crumbling when I was there and I can guarantee it hasn't seen a dime of their (football) budget

          6 votes
      2. updawg
        Link Parent
        Agreed. I think with AI we essentially won't need to do research (i.e. literature reviews) and certain kinds of writing...but we will absolutely need to still have those skills and we will need...

        Agreed. I think with AI we essentially won't need to do research (i.e. literature reviews) and certain kinds of writing...but we will absolutely need to still have those skills and we will need the ability to critically analyze what we are reading even more than we have needed it in the past.

        It goes along with the idea that I liked where writing is conducted in class so that students can't use AI but more importantly, they have assignments to analyze things that were written by AI to see why it's unreliable, learn to think critically, etc.

        1 vote
    2. sparksbet
      Link Parent
      I'm very glad addressing discrimination and equity with AI is a priority here -- that's by far one of the most important ways the government needs to push back on the largely-unregulated industry...

      I'm very glad addressing discrimination and equity with AI is a priority here -- that's by far one of the most important ways the government needs to push back on the largely-unregulated industry to prevent AI from being thoughtlessly used to harm those already most vulnerable. How effectively this accomplishes that is going to heavily depend on the details though.

      2 votes