15 votes

How will AI learn next?

11 comments

  1. [5]
    tnifc
    Link
    Recent history has shown people won't stop generating engagement data. The reddit exodus never materialized. Facebook is still used inspite of that never ending circus. What concerns me more is...

    Recent history has shown people won't stop generating engagement data. The reddit exodus never materialized. Facebook is still used inspite of that never ending circus.

    What concerns me more is the era of accessible information is over. Things are going to be collated and priced. Put behind paywalls controlled by the few tech giants. The internet which was once heralded as a great equalizer is now going to be another tiered system of socioeconomic class.

    15 votes
    1. [3]
      skybrian
      Link Parent
      That's one scenario, but it seems too pessimistic. Take music, for example. It's easy to find lots of music for free on YouTube. Spotify is pretty cheap. Yes, there are ads, and these are big...

      That's one scenario, but it seems too pessimistic. Take music, for example. It's easy to find lots of music for free on YouTube. Spotify is pretty cheap. Yes, there are ads, and these are big companies, and they may raise prices, but it doesn't seem all that expensive, and hardly a dystopia. The walled gardens are very leaky.

      Also, people can share music easily, though perhaps not openly.

      We have several jukeboxes in the cloud that seem to have everything, though that's an illusion. We've come a long way since I was kid and we were buying records and cassette tapes and listening to the radio.

      There are also inexpensive ways to make music. Some entry-level gear seems pretty cheap?

      Overall, access to music seems far more widespread and diverse than it's ever been. Do you really see that changing?

      For other kinds of information, I'll just point out that most people don't need to go to the library much anymore, and it doesn't seem like that will change?

      9 votes
      1. [2]
        Quickbeam
        Link Parent
        You've got a solid point in your argument there. But I think it's a big grey area when it comes to how open and free everything is. Music is a very tough business to be in and big labels are...

        You've got a solid point in your argument there. But I think it's a big grey area when it comes to how open and free everything is. Music is a very tough business to be in and big labels are really squeezing maximum profits. Music is indeed more accessible and more people are able to create something that can be heard by millions of people. This also creates a scenario that it can be difficult to be found as a small artist. Especially when services like Spotify favour artists from labels that pay promotional costs to be featured in popular playlists or recommendations. And by this the OP of the original comment has a point but I believe you are right as well. It seems to be a grey area where we're in right now when it comes to how free our information actually is or that it is given to us because some big label/corporation wants to feed it to us.

        3 votes
        1. skybrian
          Link Parent
          When you talk about "how free our information actually is" I would ask "for whom?" Artists and listeners often don't have the same interests. Usually, artists want to make a living. Sometimes they...

          When you talk about "how free our information actually is" I would ask "for whom?"

          Artists and listeners often don't have the same interests. Usually, artists want to make a living. Sometimes they do give away songs for free and rely on on Patreon or other forms of revenue. More often, though, the music isn't free to copy because they want to make money off it.

          It's easy to blame this on big labels but there are many small labels and the artists that use them are basically doing the same things. Or you could blame it on big tech, but instead I would say that Spotify and YouTube perform a service and it's not easy to replicate.

          From a listener's point of view, though, some artists being hard to find isn't really a problem. We have more music of more variety than we can possibly listen to, and I don't see that changing.

          1 vote
    2. flowerdance
      Link Parent
      Not only that, but even if so-called "open" LLM models are released (by OpenAI, Meta, or whoever), these LLM models are not the true models they are using. Furthermore, only corporations with huge...

      Not only that, but even if so-called "open" LLM models are released (by OpenAI, Meta, or whoever), these LLM models are not the true models they are using. Furthermore, only corporations with huge resources can actually run these models in drastically short times and drastically more tokens. That is, even if Meta releases something like a 150B model, only hardcore GPU farms will be able to churn out the responses with the extended amount of tokens you can use with OpenAI's web-based Chat-GPT in acceptable time. OpenAI itself has released in their papers and press releases that it needs millions of dollars to keep running Chat-GPT. Not even a Bitcoin farm can do that.

      As you've said, this will only make things even worse. I imagine a future where a "community"-powered compute resource can exist to fight corporations hogging everything like the very AI that is trained on the whole of human knowledge. We are entering Skynet dystopian levels of stratification. But as it stands, we are fucked.

      1 vote
  2. CosmicDefect
    Link
    I had an interesting interaction with this article. I'm a physicist who was thinking of an interesting way to interact with not only my own research but do something like make my own personal...

    I had an interesting interaction with this article. I'm a physicist who was thinking of an interesting way to interact with not only my own research but do something like make my own personal Richard Feynman training it on all his publications for example. To that end, I was reading the documentation on this repository:

    https://github.com/arc53/DocsGPT

    GPT-powered chat for documentation, chat with your documents - GitHub - arc53/DocsGPT: GPT-powered chat for documentation, chat with your documents

    Five minutes later, I come here and open up this article and come across this paragraph:

    It won’t be long before many of us also start bulk-importing our most private documents into these models. A chatbot hasn’t yet asked me to grant it access to my e-mail archives—or to my texts, calendar, notes, and files. But, in exchange for a capable A.I. personal assistant, I could be tempted to compromise my privacy. A personal-assistant bot might nudge me to install a browser extension that tracks where I go on the Web so that it can learn from my detailed searching and browsing patterns. And ChatGPT and its ilk will soon become “multimodal,” able to fluidly blend and produce text, images, videos, and sound. Most language is actually spoken rather than written, and so bots will offer to help us by transcribing our meetings and phone calls, or even our everyday interactions.

    The bit towards the end where the author discussed making the LLM AI have rudimentary "curiosity" both had me kind of disturbed but also excited. There is something called the "rubber ducky" technique where you explain problems to say a laymen or literal rubber duck and the act of explaining the problem leads to a solution. A "research assistant" trained in exactly the research I want to discuss would be incredibly exciting especially if it could possibly ask me questions back. I normally do this with my collaborators, but our time together is often limited. An AI version would be ever-present and available.

    8 votes
  3. [5]
    Grasso
    Link
    Companies will monopolize their users data and build their chat bots from their own data. Microsoft, Meta, and Google will change their user agreements to allow for it and be in a good position as...

    Companies will monopolize their users data and build their chat bots from their own data. Microsoft, Meta, and Google will change their user agreements to allow for it and be in a good position as they own so many platforms with so many users. I also see Microsoft taking a cue from its gaming division and start buying companies with the data (Reddit for example) if they need more places to feed data from.

    So mega companies will once again be able to utilize their positions to monopolize the AI training field as more platforms lock down their content. I don’t see Reddit or the BBC who also recently took steps to stop companies from training stepping up and hiring a massive number of AI researchers to capitalize on their data and jump into a new market.

    What’s the solution? I don’t know, but it’s sad to see that you will need a legal army before being able to try to make a startup that competes going forward.

    2 votes
    1. [4]
      skybrian
      Link Parent
      I'm wondering what you mean when you say companies will "monopolize" their users' data? These companies aren't the same, but I don't think they're preventing people from using their own data to...

      I'm wondering what you mean when you say companies will "monopolize" their users' data? These companies aren't the same, but I don't think they're preventing people from using their own data to train AI or do whatever they like with it.

      For Microsoft, the files are stored on our own computer (running Windows) and you have many apps available to do what you like with them. Facebook and Google let you download your data, which isn't as convenient but certainly doable.

      (That's different from already having the data of millions of users to train on, though.)

      1 vote
      1. [3]
        Grasso
        Link Parent
        They write their terms of service to prevent scrapping of data. Reddit just went through this and the collateral damage killed third party apps. You can certainly download your own data, but these...

        They write their terms of service to prevent scrapping of data. Reddit just went through this and the collateral damage killed third party apps.

        You can certainly download your own data, but these models are not trained on individual users data. Even the most prolific users don’t generate enough data and even if they did it would still only be one sided. You need vast quantities of user interactions to feed in to the training of these models.

        These giant companies are in a unique position to take the data that users generate to train their next AI platforms. They aren’t going to hand over that data to competitors.

        3 votes
        1. [2]
          skybrian
          Link Parent
          Yes, that sounds about right. But with respect to AI's, I'm wondering where the users are in all this. Do they want to contribute their data to train arbitrary AI's? Personally, I don't mind, at...

          Yes, that sounds about right.

          But with respect to AI's, I'm wondering where the users are in all this. Do they want to contribute their data to train arbitrary AI's? Personally, I don't mind, at least for data I've shared publicly, but most of the online sentiment I've read is against this.

          So, when Reddit or some other social network prevents AI's from scraping their data, it's sort of doing what their users want. (Even though preventing all third-party access is not what users want.)

          What happens next? Well, lots of people are using OpenAI directly, so they will get user data that way. Google will use the data from their own users to improve their own services, as they have from the beginning. (It's optional; everyone sees a checkbox for this.)

          I can imagine independent organizations that collect training data, sort of like how Open Street Maps became an alternative to Google Maps. If history is any guide, they will be a few years behind commercial offerings, but might eventually become reasonable alternatives.

          1. Grasso
            Link Parent
            It's tough. I think that many aren't fans of providing their content for free to a massive company that will turn around and charge a monthly fee to provide a convenient query interface for it to...

            It's tough. I think that many aren't fans of providing their content for free to a massive company that will turn around and charge a monthly fee to provide a convenient query interface for it to regurgitate a of the content without attribution.

            Search engines had the benefit of users making a search query, your content was presented at the end in the whatever format and context you originally wrote it. Over the years, search (Google) has reached in to whatever format you submitted the content, extracted the relevant piece, and pulled it onto the search query results in their enhanced search results, removing the need for the user to actually load your webpage and perhaps see other interesting content and ads.

            Chat bots go a step further and fully remove the possibility of seeing the website with the content and just getting the information to users as if the chat bot answered the question itself. The content that allowed it to get the answer is completely obscured. I guess some of the bots will give you a source, but who will actually click that link when the answer was already given?

            As far as what happens next? Who knows. Will the companies actually figure out how to monetize these services. There have been a couple articles posted on Tildes showing that they are already struggling to make that happen. It's the early days and they are burning money to get people hooked, who knows how long they can turn the screws and keep enough people paying to turn a profit. Will this whole hype wave turn in to another Smart Assistant, where Siri came out and everyone was amazed until it stagnated and ultimately becoming mostly forgotten.

            The problem with independent organizations trying to compete is the huge amount of money needed to run these models. The hardware requirements at the moment are huge. It will come down with time, but by then the big players will be even further ahead. It will be an quite an uphill battle for them to be relevant.

            1 vote