DeepSeek collects keystroke data and more, storing it in Chinese servers

restingboredface@sh.itjust.works · 1 month ago

DeepSeek collects keystroke data and more, storing it in Chinese servers

WorldsDumbestMan@lemmy.today · 1 month ago

Time to make up a conspiracy about key leading figures, and laugh as they disappear each other.

Anarki_@lemmy.blahaj.zone · 1 month ago

werefreeatlast@lemmy.world · 1 month ago

Everyone must ask to see Xi jing jing ping pong nudes! But without mentioning Xi or nudes.

That would be a great way of poisoning their plans.

Phoenixz@lemmy.ca · 1 month ago

Isn’t it open source? If so it should be near trivial to get rid of all of that.

If it’s closed source I wouldn’t touch it with a tej foot pole, it’s the same reason I rarely use chat gpt, it’s just freely giving away your personal data to open AI.

Viri4thus@feddit.org · 1 month ago

Let the FUD begin!

Azenis@lemmy.world · edit-2 1 month ago

I trust DeepSeek Open Source if it allows me to copy and review it. I don’t trust ~~Open~~AI like ChatGPT.

randomname@scribe.disroot.org · 1 month ago

Is Deepseek Open Source?

Hugging Face researchers are trying to build a more open version of DeepSeek’s AI ‘reasoning’ model

Hugging Face head of research Leandro von Werra and several company engineers have launched Open-R1, a project that seeks to build a duplicate of R1 and open source all of its components, including the data used to train it.

The engineers said they were compelled to act by DeepSeek’s “black box” release philosophy. Technically, R1 is “open” in that the model is permissively licensed, which means it can be deployed largely without restrictions. However, R1 isn’t “open source” by the widely accepted definition because some of the tools used to build it are shrouded in mystery. Like many high-flying AI companies, DeepSeek is loathe to reveal its secret sauce.

GrosPapatouf@lemmy.world · 1 month ago

The runner is open source, and that’s what matter in this discussion. If you host the model on your own servers, you can ensure that no corporation (american or Chinese) has access to your data. Access to the training code and data is irrelevant here.

randomname@scribe.disroot.org · edit-2 1 month ago

The guys at HF (and many others) appear to have a different understanding of Open Source.

As the Open Source AI definition says, among others:

Data Information: Sufficiently detailed information about the data used to train the system so that a skilled person can build a substantially equivalent system. Data Information shall be made available under OSI-approved terms.

In particular, this must include: (1) the complete description of all data used for training, including (if used) of unshareable data, disclosing the provenance of the data, its scope and characteristics, how the data was obtained and selected, the labeling procedures, and data processing and filtering methodologies; (2) a listing of all publicly available training data and where to obtain it; and (3) a listing of all training data obtainable from third parties and where to obtain it, including for fee.

Code: The complete source code used to train and run the system. The Code shall represent the full specification of how the data was processed and filtered, and how the training was done. Code shall be made available under OSI-approved licenses.

For example, if used, this must include code used for processing and filtering data, code used for training including arguments and settings used, validation and testing, supporting libraries like tokenizers and hyperparameters search code, inference code, and model architecture.

Parameters: The model parameters, such as weights or other configuration settings. Parameters shall be made available under OSI-approved terms.

The licensing or other terms applied to these elements and to any combination thereof may contain conditions that require any modified version to be released under the same terms as the original.

These three components -data, code, parameter- shall be released under the same condition.

ozoned@lemmy.world · 1 month ago

Chinese company does what American companies have done for 25+ years now!

Is it time for REAL data privacy laws or are we just gonna keep playing whack-a-mole with Chinese tech companies that get us nowhere?

Someonelol@lemmy.dbzer0.com · 1 month ago

Our data’s just too valuable for these parasites. Data privacy laws may eventually pass to compel software companies to store everything in US servers only.

ozoned@lemmy.world · 1 month ago

Excellent Point. If that’s the case though, then wouldn’t other countries follow suit which still limits big tech’s reach and makes them less profitable and less powerful? Idk. Guess we’ll see how it plays out. Either way, I’m staying as far from those ecosystems as possible to at least try to mitigate some of what they do. I’ll never be totally successful, genie is put of the bottle, but we can at least attempt.

Subverb@lemmy.world · 1 month ago

If you think the American companies do anything different you’re not paying attention and simply believing the propaganda.

Imprint9816@lemmy.dbzer0.com · 1 month ago

Run it locally then

horse_battery_staple@lemmy.world · 1 month ago

https://www.smartprix.com/bytes/how-to-run-deepseek-ai-locally-on-your-phone-2-methods/

ZeDoTelhado@lemmy.world · edit-2 1 month ago

Thanks, managed to have it installed locally bia pocket pal (termux was giving me errors constantly on compile). Out of curiosity, I made a very “interesting” prompt, and frankly I am not even surprised

EDIT: decided to be a little spicier, didn’t fail to amuse me

Grapho@lemmy.ml · 1 month ago

That so edgy, man. I bet all the girls in your high school think you’re the raddest.

ayaya@lemdro.id · 1 month ago

This is mildly pedantic but you’re not actually running Deepseek R1, you’re running a 7B version of Qwen that’s been fine-tuned on Deepseek R1 outputs. All of the “distilled” models are existing models trained on R1.

ZeDoTelhado@lemmy.world · 1 month ago

Nice catch. I’ll be sure after do run the real thing

Naia@lemmy.blahaj.zone · 1 month ago

I swear people do not understand how the internet works.

Anything you use on a remote server is going to be seen to some degree. They may or may not keep track of you, but you can’t be surprised if they are. If you run the model locally, there is no indication it is sending anything anywhere. It runs using the same open source LLM tools that run all the other models you can run locally.

This is very much like someone doing surprised pikachu when they find out that facebook saves all the photos they upload to facebook or that gmail can read your email.

alcoholicorn@lemmy.ml · 1 month ago

The telephone company knows your phone number!

mel@jlai.lu · 1 month ago

Same as Chrome’s magic bar, or android keyboard no ? So in the end, does USA doing it good because “democracy” (never ever with napalm) when China is bad because human rights violation (USA never did anything like this) ?

Dearth@lemmy.world · 1 month ago

Seriously this. Nothing that China is accused of doing is any worse than what i know America has done. If it’s the Chinese Communist Party stealing your data at least you know it won’t be used to inject ads everywhere you go on the internet

Max-P@lemmy.max-p.me · 1 month ago

At least they’re transparent about it, unlike american companies that hide behind convoluted terms of services and then sell the data behind your back but it’s technically legal.

China’s like “yeah we collect everything”. I can appreciate the honesty.

Zerush@lemmy.ml · 1 month ago

Any ChatAI logs your keystrokes and your inputs to work and update their LLM. The PP and TOS is the same and even better as those from the US competitors. DeepSeek is OpenSource

randomname@scribe.disroot.org · 1 month ago

Is Deepseek Open Source?

Hugging Face researchers are trying to build a more open version of DeepSeek’s AI ‘reasoning’ model

Hugging Face head of research Leandro von Werra and several company engineers have launched Open-R1, a project that seeks to build a duplicate of R1 and open source all of its components, including the data used to train it.

The engineers said they were compelled to act by DeepSeek’s “black box” release philosophy. Technically, R1 is “open” in that the model is permissively licensed, which means it can be deployed largely without restrictions. However, R1 isn’t “open source” by the widely accepted definition because some of the tools used to build it are shrouded in mystery. Like many high-flying AI companies, DeepSeek is loathe to reveal its secret sauce.

deadcatbounce@reddthat.com · edit-2 1 month ago

No.

As opposed to Microsoft, Google, … NSA, or GCHQ servers. Or all of the above.

☆ Yσɠƚԋσʂ ☆@lemmy.ml · 1 month ago

Yeah, but these are wholesome domestic spy agencies that are just looking after you and protect you from yourself.

deadcatbounce@reddthat.com · 1 month ago

Fuck. They told me that they were storing my backups.

tht@social.pwned.page · 1 month ago

I run it locally on a 4080, how do they capture my keystrokes?

dai@lemmy.world · 1 month ago

No idea, have you monitored the package / container for network activity?

Perhaps this refers to other clients not running the model locally.

Petter1@lemm.ee · 1 month ago

Yea, I am looking for a deep analysis of this as well

Naia@lemmy.blahaj.zone · 1 month ago

It doesn’t. They run using stuff like Ollama or other LLM tools, all of the hobbyist ones are open source. All the model is is the inputs, node weights and connections, and outputs.

LLMs, or neural nets at large, are kind of a “black box” but there’s no actual code that gets executed from the model when you run them, it’s just processed by the host software based on the rules for how these work. The “black box” part is mostly because these are so complex we don’t actually know exactly what it is doing or how it output answers, only that it works to a degree. It’s a digital representation of analog brains.

People have also been doing a ton of hacking at it, retraining, and other modifications that would show anything like that if it could exist.

dawnglider@lemmy.ml · 1 month ago

Are you running a quantized model or one of the distilled ones?

I Cast Fist@programming.dev · 1 month ago

Question: if we bridge 2 ais and let them talk to one another, will they eventually poison each other with gibberish bullshit?