The New York Times blocks OpenAI’s web crawler::The New York Times has officially blocked GPTBot, OpenAI’s web crawler. The outlet’s robot.txt page specifically disallows GPTBot, preventing OpenAI from scraping content from its website to train AI models.

    • poke@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      3
      ·
      1 year ago

      They made a flag specifically for their crawler, so they can say that they do but in the most annoying way possible.

  • kucuva@lemmy.ml
    link
    fedilink
    English
    arrow-up
    5
    ·
    1 year ago

    what is the ai being trained for anyways, how to be a NYT journalist?

  • AutoTL;DR@lemmings.worldB
    link
    fedilink
    English
    arrow-up
    5
    ·
    1 year ago

    This is the best summary I could come up with:


    Based on the Internet Archive’s Wayback Machine, it appears NYT blocked the crawler as early as August 17th.

    The change comes after the NYT updated its terms of service at the beginning of this month to prohibit the use of its content to train AI models.

    OpenAI didn’t immediately reply to a request for comment.

    The NYT is also considering legal action against OpenAI for intellectual property rights violations, NPR reported last week.

    If it did sue, the Times would be joining others like Sarah Silverman and two other authors who sued the company in July over its use of Books3, a dataset used to train ChatGPT that may have thousands of copyrighted works, as well as Matthew Butterick, a programmer and lawyer who alleges the company’s data scraping practices amount to software piracy.

    Update August 21st, 7:55PM ET: The New York Times declined to comment.


    The original article contains 202 words, the summary contains 146 words. Saved 28%. I’m a bot and I’m open source!

  • porkins@lemmy.basedcount.com
    link
    fedilink
    English
    arrow-up
    5
    arrow-down
    30
    ·
    edit-2
    1 year ago

    This goes against everything that the NYT preaches in terms of saying that the press is under attack and needs to be protected. AI consumption of news content makes the news more accessible. Their paid articles don’t overlap with what ChatGPT is doing. This is really a bunch of old people getting butt hurt about tech they don’t fully understand.

    • Kachilde@lemmy.world
      link
      fedilink
      English
      arrow-up
      28
      arrow-down
      1
      ·
      1 year ago

      While I am no fan of the NYT and other news site’s pricing models, I don’t think that this goes against “protecting the press”. Journalists do a job. They research, compile, draft, and write articles in their own voice (or the voice of the news outlet). They are paid for this work. OpenAI wants to scrape the words off news sites so that their language model can regurgitate them for free.

      This is the AI Art thing all over again. Creators should be paid for their work.

      • porkins@lemmy.basedcount.com
        link
        fedilink
        English
        arrow-up
        0
        arrow-down
        3
        ·
        1 year ago

        Maybe you are not thinking about the capabilities of AI fully there are ones that are enriched with recent data, so your can ask it about recent events. Also, I do ask it about historical information, so it is nice to have that available.

    • JdW@lemmy.world
      link
      fedilink
      English
      arrow-up
      13
      ·
      1 year ago

      AI consumption of news content makes the news more accessible.

      If journalists and their platforms do not get paid their articles won’t get written. So no, the free absorbtion of professional articles into a LLM that uses the article to answer a Pokemon question online in 6 months is not making “news” more “accesible”.

      • porkins@lemmy.basedcount.com
        link
        fedilink
        English
        arrow-up
        1
        arrow-down
        2
        ·
        1 year ago

        It’s moreso an archive of historical knowledge. Thinking it just answers Pokémon questions is shortsighted.