Hey
I have very simple question, do anything i type around lemmy actually get indexed by GoogleBot and other big tech proprietary web scraper?
Thanks!
As a general rule you should assume everything you put on the internet is accessible. Unless it’s encrypted by a key you provide, and I mean a full rsa key, it’s going to be accessible.
The notion of privacy on the internet is not a thing. Everything you write is going to be collected and aggregated, especially in the world of AI.
What you can do is use an alt account with no info, and if it’s really sensitive use a VPN to hide where it came from. For private messaging do not use anything but an encrypted chat service.
Anything on the web that is accessible by you is accessible by web scrapers. So to answer your question, yes.
Unless Lemmy goes down the route of Twitter (I refuse to call it by its new name, it’s dumb) of blocking access unless you login. That’s just how things are.
Can they just block them for robots.txt ?
Every federated instance? unlikely but possible.
If activity pub becomes mainstream, then the indexers will get a direct feed instead of scraping it from instances
Answer
You couldn’t have put that into the title? Jeez.
Ahhh thats soo dump
Yes, but unfortunately since Lemmy is decentralized you can’t do the site:example.org thing.
You can get fairly close:
https://www.google.com/search?q=lemmy+inurl%3A"%2Fc%2F"+
lemmy inurl:“/c/” QUERY
kinda hacky, but it only returns lemmy results for me… heh, lots of duplicates.
edit:Made the example query less specific