• _haha_oh_wow_
    link
    fedilink
    English
    881 year ago

    Gee, seems like something a CTO would know. I’m sure she’s not just lying, right?

    • @Bogasse@lemmy.ml
      link
      fedilink
      English
      41 year ago

      And on the other hand it is a very obvious question to expect. If you have something hide how on the world are you not prepared for this question !? 🤡

    • @VirtualOdour@sh.itjust.works
      link
      fedilink
      English
      21 year ago

      It’s a question that is based on a purposeful misunderstanding of the technology, it’s like expecting a bee keeper to know each bees name and bedtime. Really it’s like asking a bricklayer where each brick came from in the pile, He can tell you the batch but not going to know this brick came from the forth row of the sixth pallet, two from the left. There is no reason to remember that it’s not important to anyone.

      The don’t log it because it would take huge amounts of resources and gain nothing.

      • @zaphod@lemmy.ca
        link
        fedilink
        English
        3
        edit-2
        1 year ago

        What?

        Compiling quality datasets is enormously challenging and labour intensive. OpenAI absolutely knows the provenance of the data they train on as it’s part of their secret sauce. And there’s no damn way their CTO won’t have a broad strokes understanding of the origins of those datasets.

    • @Hotzilla@sopuli.xyz
      link
      fedilink
      English
      -21 year ago

      To be fair, these datasets are one of their biggest competitive edge. But saying in to interviewer “I cannot tell you”, is not very nice, so you can take the americal politician approach and say “I don’t know/remember” which you cannot ever be hold accountable for.

    • @abhibeckert@lemmy.world
      link
      fedilink
      English
      56
      edit-2
      1 year ago

      Every video ever created is copyrighted.

      The question is — do they need a license? Time will tell. This is obviously going to court.

      • @Kazumara@feddit.de
        link
        fedilink
        English
        341 year ago

        Don’t downvote this guy. He’s mostly right. Creative works have copyright protections from the moment they are created. The relevant question is indeed if they have the relevant permissions for their use, not wether it had protections in the first place.

        Maybe some surveillance camera footage is not sufficiently creative to get protections, but that’s hardly going to be good for machine reinforcement learning.

  • @Buttons@programming.dev
    link
    fedilink
    English
    621 year ago

    If I were the reporter my next question would be:

    “Do you feel that not knowing the most basic things about your product reflects on your competence as CTO?”

    • @ForgotAboutDre@lemmy.world
      link
      fedilink
      English
      281 year ago

      Hilarious, but if the reporter asked this they would find it harder to get invites to events. Which is a problem for journalists. Unless your very well regarded for your journalism, you can’t push powerful people without risking your career.

      • @Abnorc@lemm.ee
        link
        fedilink
        English
        21 year ago

        That, and the reporter is there to get information, not mess with and judge people. Asking that sort of question is really just an attack. We can leave it to commentators and ourselves for judge people.

        • Aniki 🌱🌿
          link
          fedilink
          English
          -4
          edit-2
          1 year ago

          this is limp dick energy. If asking questions is an attack then you’re probably a piece of shit doing bad things.

          • @Abnorc@lemm.ee
            link
            fedilink
            English
            4
            edit-2
            1 year ago

            Think about the answer you would actually get. They would dismiss the question or give some sort of nonsense answer. It’s a rhetorical question, and the only thing that it serves to do is criticize the person being asked. That’s not what reporters are there to do. If the answer would actually give some useful information to the reader, then it’s worth asking.

      • Aniki 🌱🌿
        link
        fedilink
        English
        01 year ago

        boofuckingwoo. Reporters are not supposed to be friends with the people they are writing about.

        • tb_
          link
          fedilink
          English
          171 year ago

          True, but if those same people they’re not supposed to be friends with are the ones inviting them to those events/granting them early access…

          In other words: the system is rigged.

          • nifty
            link
            fedilink
            English
            11 year ago

            The system is rigged.

            You cannot give the same criticism to a rich person vs. a poor person even if their incompetence is the same. I am not sure what’s the fix, other than the common refrain of “there should be no millionaires/billionaires”. How does society heal itself if you cannot hold people accountable?

          • Aniki 🌱🌿
            link
            fedilink
            English
            -21 year ago

            Again - boofuckinghooo. Let the fuckers have no friends in the media. The media owners make journalists spinless advertisement sellers. I have very little respect for the profession at this point.

            • tb_
              link
              fedilink
              English
              51 year ago

              What a delightful and helpful attitude.

            • @MalachaiConstant@lemmy.world
              link
              fedilink
              English
              21 year ago

              You’re missing the point that they need those relationships to gain access to sources. You literally cannot force people to talk to you

    • @RatBin@lemmy.world
      link
      fedilink
      English
      41 year ago

      Also about this line:

      Others, meanwhile, jumped to Murati’s defense, arguing that if you’ve ever published anything to the internet, you should be perfectly fine with AI companies gobbling it up.

      No I am not fine. When I wrote that stuff and those researches in old phpbb forums I did not do it with the knowledge of a future machine learning system eating it up without my consent. I never gave consent for that despite it being publicly available, because this would be a designation of use that wouldn’t exist back than. Many other things are also publicly available, but some a re copyrighted, on the same basis: you can publish and share content upon conditions that are defined by the creator of the content. What’s that, when I use zlibrary I am evil for pirating content but openai can do it just fine due to their huge wallets? Guess what, this will eventually creating a crisis of trust, a tragedy of the commons if you will when enough ai generated content will build the bulk of your future Internet search! Do we even want this?

  • @CosmoNova@lemmy.world
    link
    fedilink
    English
    46
    edit-2
    1 year ago

    I almost want to believe they legitimately do not know nor care they‘re committing a gigantic data and labour heist but the truth is they know exactly what they‘re doing and they rub it under our noses.

    • @laxe@lemmy.world
      link
      fedilink
      English
      141 year ago

      Of course they know what they’re doing. Everybody knows this, how could they be the only ones that don’t?

    • @Bogasse@lemmy.ml
      link
      fedilink
      English
      121 year ago

      Yeah, the fact that AI progress just relies on “we will make so much money that no lawsuit will consequently alter our growth” is really infuriating. The fact that general audience apparently doesn’t care is even more infuriating.

      • @toddestan@lemmy.world
        link
        fedilink
        English
        01 year ago

        I’d say not really, Tolkien was a writer, not an artist.

        What you are doing is violating the trademark Middle-Earth Enterprises has on the Gandalf character.

        • @A_Very_Big_Fan@lemmy.world
          link
          fedilink
          English
          21 year ago

          The point was that I absorbed that information to inform my “art”, since we’re equating training with stealing.

          I guess this would have been a better example lol. It’s clearly not Gandalf, but I wouldn’t have ever come up with it if I hadn’t seen that scene

    • @jaemo@sh.itjust.works
      link
      fedilink
      English
      01 year ago

      It also tells us how hypocritical we all are since absolutely every single one of us would make the same decisions they have if we were in their shoes. This shit was one bajillion percent inevitable; we are in a river and have been since we tilled soil with a plough in the Nile valley millennia ago.

      • @whoisearth@lemmy.ca
        link
        fedilink
        English
        21 year ago

        Speak for yourself. Were I in their shoes no I would not. But then again my company wouldn’t be as big as theirs for that reason.

  • andrew_bidlaw
    link
    fedilink
    English
    351 year ago

    Funny she didn’t talked it out with lawyers before that. That’s a bad way to answer that.

  • @TheObviousSolution@lemm.ee
    link
    fedilink
    English
    211 year ago

    Then wipe it out and start again once you have where your data is coming from sorted out. Are we acting like you having built datacenter pack full of NVIDIA processors just for this sort of retraining? They are choosing to build AI without proper sourcing, that’s not an AI limitation.

    • BoscoBear
      link
      fedilink
      English
      -141 year ago

      I don’t think so. They aren’t reproducing the content.

      I think the equivalent is you reading this article, then answering questions about it.

      • @A_Very_Big_Fan@lemmy.world
        link
        fedilink
        English
        161 year ago

        Idk why this is such an unpopular opinion. I don’t need permission from an author to talk about their book, or permission from a singer to parody their song. I’ve never heard any good arguments for why it’s a crime to automate these things.

        I mean hell, we have an LLM bot in this comment section that took the article and spat 27% of it back out verbatim, yet nobody is pissing and moaning about it “stealing” the article.

        • @MostlyGibberish@lemm.ee
          link
          fedilink
          English
          -111 year ago

          Because people are afraid of things they don’t understand. AI is a very new and very powerful technology, so people are going to see what they want to see from it. Of course, it doesn’t help that a lot of people see “a shit load of cash” from it, so companies want to shove it into anything and everything.

          AI models are rapidly becoming more advanced, and some of the new models are showing sparks of metacognition. Calling that “plagiarism” is being willfully ignorant of its capabilities, and it’s just not productive to the conversation.

          • @A_Very_Big_Fan@lemmy.world
            link
            fedilink
            English
            -41 year ago

            True

            Of course, it doesn’t help that a lot of people see “a shit load of cash” from it, so companies want to shove it into anything and everything.

            And on a similar note to this, I think a lot of what it is is that OpenAI is profiting off of it and went closed-source. Lemmy being a largely anti-capitalist and pro-open-source group of communities, it’s natural to have a negative gut reaction to what’s going on, but not a single person here, nor any of my friends that accuse them of “stealing” can tell me what is being stolen, or how it’s different from me looking at art and then making my own.

            Like, I get that the technology is gonna be annoying and even dangerous sometimes, but maybe let’s criticize it for that instead of shit that it’s not doing.

            • @Mnemnosyne@sh.itjust.works
              link
              fedilink
              English
              21 year ago

              One problem is people see those whose work may no longer be needed or as profitable, and…they rush to defend it, even if those same people claim to be opposed to capitalism.

              They need to go ‘yes, this will replace many artists and writers…and that’s a good thing because it gives everyone access to being able to create bespoke art for themselves.’ but at the same time realize that while this is a good thing, it also means the need for societal shift to support people outside of capitalism is needed.

              • @MostlyGibberish@lemm.ee
                link
                fedilink
                English
                11 year ago

                it also means the need for societal shift to support people outside of capitalism is needed.

                Exactly. This is why I think arguing about whether AI is stealing content from human artists isn’t productive. There’s no logical argument you can really make that a theft is happening. It’s a foregone conclusion.

                Instead, we need to start thinking about what a world looks like where a large portion of commercially viable art doesn’t require a human to make it. Or, for that matter, what does a world look like where most jobs don’t require a human to do them? There are so many more pressing and more interesting conversations we could be having about AI, but instead we keep circling around this fundamental misunderstanding of what the technology is.

            • @MostlyGibberish@lemm.ee
              link
              fedilink
              English
              11 year ago

              I can definitely see why OpenAI is controversial. I don’t think you can argue that they didn’t do an immediate heel turn on their mission statement once they realized how much money they could make. But they’re not the only player in town. There are many open source models out there that can be run by anyone on varying levels of hardware.

              As far as “stealing,” I feel like people imagine GPT sitting on top of this massive collection of data and acting like a glorified search engine, just sifting through that data and handing you stuff it found that sounds like what you want, which isn’t the case. The real process is, intentionally, similar to how humans learn things. So, if you ask it for something that it’s seen before, especially if it’s seen it many times, it’s going to know what you’re talking about, even if it doesn’t have access to the real thing. That, combined with the fact that the models are trained to be as helpful as they possibly can be, means that if you tell it to plagiarize something, intentionally or not, it probably will. But, if we condemned any tool that’s capable of plagiarism without acknowledging that they’re also helpful in the creation process, we’d still be living in caves drawing stick figures on the walls.

      • ...m...
        link
        fedilink
        English
        6
        edit-2
        1 year ago

        …with the prevalence of clickbaity bottom-feeder news sites out there, i’ve learned to avoid TFAs and await user summaries instead…

        (clicks through)

        …yep, seven nine ads plus another pop-over, about 15% of window real estate dedicated to the actual story…

      • @Linkerbaan@lemmy.world
        link
        fedilink
        English
        41 year ago

        Actually neural networks verbatim reproduce this kind of content when you ask the right question such as “finish this book” and the creator doesn’t censor it out well.

        It uses an encoded version of the source material to create “new” material.

        • BoscoBear
          link
          fedilink
          English
          21 year ago

          Sure, if that is what the network has been trained to do, just like a librarian will if that is how they have been trained.

          • @Linkerbaan@lemmy.world
            link
            fedilink
            English
            0
            edit-2
            1 year ago

            Actually it’s the opposite, you need to train a network not to reveal its training data.

            “Using only $200 USD worth of queries to ChatGPT (gpt-3.5- turbo), we are able to extract over 10,000 unique verbatim memorized training examples,” the researchers wrote in their paper, which was published online to the arXiv preprint server on Tuesday. “Our extrapolation to larger budgets (see below) suggests that dedicated adversaries could extract far more data.”

            The memorized data extracted by the researchers included academic papers and boilerplate text from websites, but also personal information from dozens of real individuals. “In total, 16.9% of generations we tested contained memorized PII [Personally Identifying Information], and 85.8% of generations that contained potential PII were actual PII.” The researchers confirmed the information is authentic by compiling their own dataset of text pulled from the internet.

            • BoscoBear
              link
              fedilink
              English
              01 year ago

              Interesting article. It seems to be about a bug, not a designed behavior. It also says it exposes random excerpts from books and other training data.

              • @Linkerbaan@lemmy.world
                link
                fedilink
                English
                01 year ago

                It’s not designed to do that because they don’t want to reveal the training data. But factually all neural networks are a combination of their training data encoded into neurons.

                When given the right prompt (or image generation question) they will exactly replicate it. Because that’s how they have been trained in the first place. Replicating their source images with as little neurons as possible, and tweaking them when it’s not correct.

                • BoscoBear
                  link
                  fedilink
                  English
                  31 year ago

                  That is a little like saying every photograph is a copy of the thing. That is just factually incorrect. I have many three layer networks that are not the thing they were trained on. As a compression method they can be very lossy and in fact that is often the point.

  • @whoisearth@lemmy.ca
    link
    fedilink
    English
    81 year ago

    So my work uses ChatGPT as well as all the other flavours. It’s getting really hard to stay quiet on all the moral quandaries being raised on how these companies are training their AI data.

    I understand we all feel like we are on a speeding train that can’t be stopped or even slowed down but this shit ain’t right. We need to really start forcing businesses to have moral compass.

    • @RatBin@lemmy.world
      link
      fedilink
      English
      31 year ago

      I spot aot of people GPT-eing their way through personale notes and researches. Whereas you used to see Evernote, office, word, note taking app you see a lot of gpt now. I feel weird about it.

  • @Fedizen@lemmy.world
    link
    fedilink
    English
    7
    edit-2
    1 year ago

    this is why code AND cloud services shouldn’t be copyrightable or licensable without some kind of transparency legislation to ensure people are honest. Either forced open source or some kind of code review submission to a government authority that can be unsealed in legal disputes.

  • @RatBin@lemmy.world
    link
    fedilink
    English
    21 year ago

    Obviously nobody fully knows where so much training data come from. They used Web scraping tool like there’s no tomorrow before, with that amount if informations you can’t tell where all the training material come from. Which doesn’t mean that the tool is unreliable, but that we don’t truly why it’s that good, unless you can somehow access all the layers of the digital brains operating these machines; that isn’t doable in closed source model so we can only speculate. This is what is called a black box and we use this because we trust the output enough to do it. Knowing in details the process behind each query would thus be taxing. Anyway…I’m starting to see more and more ai generated content, YouTube is slowly but surely losing significance and importance as I don’t search informations there any longer, ai being one of the reasons for this.