Quite frequently I come across scanned books that are viewable for free online. For example, the publisher put them there (such as preview chapters), a library (old books from their collection that are in public domain), etc. Since I like hoarding data, and the online viewers that are used to present the book to me might not be very practical, I frequently try to download the books one way or another. This requires toying with the “inspect element” tool and various other methods of getting the images/PDF. Now, all that I access is what is, well, accessible; I don’t hack into the servers or something. But - the stuff is meant to be hidden from the normal user. Does that act of hiding the material, no matter how primitive and easily circumvented, mean that I’m not allowed to access it at all?

I suppose ripping a public domain book is no big deal, but would books under copyright fare differently?

Mainly I’m asking out of curiosity, I don’t expect the police to come visit me for ripping a 16th century dictionary.

Note: I live in EU, but I’d be curious to hear how this is treated elsewhere too.

Edit: I also remembered a funny trick I noticed on one site - it allows viewing PDFs on their website, but not downloading, unless you pay for the PDF. But when you load the page, even without paying, the PDF is already downloaded onto your computer and can be found in the browser cache. Is it legal to simply save the file that is already on your computer?

  • simple@lemm.ee
    link
    fedilink
    arrow-up
    56
    ·
    2 months ago

    AFAIK web scraping (the act of grabbing and downloading any data you see available on the internet) isn’t illegal, and I would assume downloading PDFs provided to you online would fall under that. Since it is copyrighted it would probably be illegal to share it, though.

    • nvermind@lemm.ee
      link
      fedilink
      arrow-up
      26
      arrow-down
      1
      ·
      2 months ago

      This. In a case around LinkedIn courts ruled that it’s legal to scrape publicly available data. The company doing the scraping was selling that data to corporate customers, but ultimately use might depend on the information you’re accessing and under what permissions. (Not a lawyer)

      • papalonian@lemmy.world
        link
        fedilink
        arrow-up
        1
        ·
        2 months ago

        If you scraped a pirate site and stored a bunch of links to copyrighted content you’d probably be fine, actually using those links to download or share copyrighted content is what’s illegal. It’d be like buying the stuff to make a bomb or drugs, but then not making any bombs or drugs.

        That being said, while not necessarily illegal, I wouldn’t want authorities to find my bomb and drug ingredients, or my scraped piracy links, as I’d probably have some 'splainin to do.

        (Not a lawyer)

          • papalonian@lemmy.world
            link
            fedilink
            arrow-up
            1
            ·
            2 months ago

            Who said that you can’t scrape content from the other place?

            If you scraped a pirate site and stored a bunch of links to copyrighted content you’d probably be fine,

            If you’re referring to the last line, I say I wouldn’t want authorities to find it because I don’t want to have to explain it. I’m 99% sure someone would not just store links to a bunch of pirated content for fun, they probably have accessed said pirated content, now you have to explain to the authorities why you have links to pirated content without implicating yourself in copyright infringement.

            Like I said, probably fine, I just wouldn’t want the hassle if I somehow got caught.

              • papalonian@lemmy.world
                link
                fedilink
                arrow-up
                1
                ·
                2 months ago

                Sorry man, I’m not exactly sure what you’re asking.

                If you are able to load the content on your computer without infringing copyright laws, you’re allowed to circumvent whatever the website has in place to store whatever data you would like from whatever website you would like, regardless of the nature of the site, so long as the content is legal (is not CP) and again not being presented in a way that infringes aforementioned copyright laws.

                If you’re asking why the copyright laws exist, I can’t really help you with that one.