Boffins convert typing sounds into text with 95% accuracy

kpw@kbin.social · 7 months ago

Boffins convert typing sounds into text with 95% accuracy

Cronch@lemmy.world · 7 months ago

Quite scary considering the accuracy and how many open mics everyone is surrounded by without even realizing it. Not to mention if any content creator types their password while live streaming or recording they could get their accounts stolen.

vareriu@lemmy.world · 7 months ago

One more reason to switch to a password manager, even though they could still find out the master password…

qwertyqwertyqwerty@lemmy.one · 7 months ago

Probably still have some safety if you’re using two-factor, or have a master key in addition to a password (e.g. 1Password).

mosiacmango@lemm.ee · edit-2 7 months ago

Or use a local password safe like keepass.

kent_eh@lemmy.ca · 7 months ago

I guess my typos are now a security feature!

mski@lemmy.ca · 7 months ago

I’d be curious how well this approach translates to multi-lingual keyboard layouts. For english users, perhaps theres another benefit to non-QWERTY layouts (e.g. Colemak or Dvorak) after all? … and two factor authentication should remain helpful I presume. Especially physical key methods with no audible characters typed (e.g. Yubikey, Titan, etc.)

jrbaconcheese@yall.theatl.social · 7 months ago

I was thinking the same, but it would be trivial for software to realize that “fnj xlg” maps to “the dog” with Colemak or Dvorak.

Waldowal@lemmy.world · 7 months ago

New policy from the corporate office: If you are working in a public place, like a coffee shop, please scream while typing your login password.

sanimalp@lemmy.world · 7 months ago

I screamed my password and now I got hacked. Thanks for nothing!

Free Palestine 🇵🇸@sh.itjust.works · 7 months ago

Some laptops like the Framework laptop have fingerprint sensors

Physical Security keys like NitroKeys or YubiKeys are another option

Bipta@kbin.social · 7 months ago

I don’t see the relevance.

Free Palestine 🇵🇸@sh.itjust.works · 7 months ago

You can use fingerprint or U2F to unlock your password manager and copy the password. That way you don’t have to type it in.

RGB3x3@lemmy.world · 7 months ago

That does nothing for keylogging through this method.

fruitycoder@sh.itjust.works · 7 months ago

It would have to be combined with a secure (no microphones) area during setup, but it seems like swapped biometric plus token would defeat this attack (password gathering). It would however not defeat generic data collection.

Earthwormjim91@lemmy.world · 7 months ago

It would eliminate someone being able to get your username or password via this method though. Because you never have to type them in.

With my MacBook I can use either touchid or my watch to automatically unlock it, so I don’t even have to type my password in to get into my laptop. And then I use touchid and Keychain for all my passwords so I never have to type those in either.

Render@sh.itjust.works · 7 months ago

I wonder if different switches, keycap profiles, keyboard material ect affect the accuracy?

BearOfaTime@lemm.ee · edit-2 7 months ago

I never learned to touch-type, so my typing style is very different from most people though I can type fast enough for work.

My typing style only uses 3 fingers, and both hands type keys in the middle of the keyboard.

I wonder if this has any effect on accuracy?

Edit: Article states touch-typing can reduce accuracy. Wonder if that’s because they type more softly than us tech gorillas who tend to bash on the keys?

overlordror@lemmy.world · 7 months ago

I’m a touch typist who can reach 160wpm when I’m really flowing, I would guess the speed makes accuracy harder to distinguish individual keys than you pressing keys with three fingers.

Dave@lemmy.nz · 7 months ago

I type an awful lot slower than you, and still it’s faster than I can think. How do you think of what to type fast enough to type at 160wpm?

flipht@kbin.social · 7 months ago

Not the original person you responded to, but I type 120ish wpm. The trick is to try to tap into the same part of your brain that verbalizes words when you talk, rather than the part that composes stuff when you write.

overlordror@lemmy.world · 7 months ago

That speed is usually transcription for me, I’m listening to someone and type what I hear. Actual writing and composing a thought typing speed is closer to 120wpm or so. I learned to type on a typewriter which is much slower, current low profile mech keyboard contributes to faster typing speed too.

d3Xt3r@lemmy.nz · 7 months ago

This is old news. This article was published on 7 Aug 2023.

code@lemmy.world · 7 months ago

This method is far older than that, and it keeps popping up every so often as a “new” attack. First time I read about this method was in the early 2000’s, and I’m pretty sure it been done before that as well.

Last@lemmy.world · 7 months ago

I first heard about it in the Snowden leaks, but I have no doubt it was discovered earlier.

paraphrand@lemmy.world · 7 months ago

Neat, so when my friends are taking about satisfyingly clackety keyboards I can inform them it’s a security hazard.

AmberPrince@kbin.social · 7 months ago

I’ll accept the risk. I need the clicky

fruitycoder@sh.itjust.works · 7 months ago

Might have to spend some time getting Easy Effects/Noise Torch set up on my systems again just to reduce the vectors again.

There is a good comment on this post on physical mitigation that seems helpful as well: https://www.reddit.com/r/Fedora/comments/uerp9z/comment/i6p0jqa/?utm_source=share&utm_medium=mweb3x&utm_name=mweb3xcss&utm_term=1&utm_content=share_button

bionicjoey@lemmy.ca · 7 months ago

Isn’t boffin a derogatory term like “nerd”?

What a dogshit headline.

BearOfaTime@lemm.ee · 7 months ago

It can be. Being a boffin, I’m not offended. Up to the individual if they choose to be offended.

bionicjoey@lemmy.ca · 7 months ago

Still shitty journalism to refer to researchers publishing their research in that way.

Silic0n_Alph4@lemmy.world · 7 months ago

It’s The Register - think the Financial Times for IT but in the style of The Sun/any other British tabloid. They do it for the lulz, if you will - don’t get too hung up on the headlines as the content is top quality.

IllNess@infosec.pub · 7 months ago

Article also uses the term “eggheads”.

To go from keystroke sounds to actual letters, the eggheads recorded a person typing on a 16-inch 2021 MacBook Pro using a phone placed 17cm away and processed the sounds to get signatures of the keystrokes.

AutoTL;DR@lemmings.world · 7 months ago

This is the best summary I could come up with:

In other words, this is a side channel attack with considerable accuracy, minimal technical requirements, and a ubiquitous data exfiltration point: Microphones, which are everywhere from our laptops, to our wrists, to the very rooms we work in.

To make matters worse, the trio said in their paper that they’ve achieved what they claim is an accuracy record for acoustic side-channel attacks (ASCA) without relying on a language model.

Luckily in this case it’s not power usage, CPU frequencies, blinking lights or RAM buses leaking data unavoidably, but a good old-fashioned problem occurring between the computer and chair that can actually be mitigated somewhat easily.

The researchers note that skilled users able to rely on touch typing are harder to detect accurately, with single-key recognition dropping from 64 to 40 percent at the higher speeds enabled by the technique.

Working among the clacking of phantom keyboards would surely annoy everyone, which is why the researchers suggest only adding the sounds to Skype and Zoom transmissions after they’ve been recording instead of subjecting employees to real-time noisemakers.

Followup research is now going on into using new sources for recordings, like smart speakers, better keystroke isolation techniques and the addition of a language model to make their acoustic snooping even more effective.

The original article contains 656 words, the summary contains 210 words. Saved 68%. I’m a bot and I’m open source!

Sanctus@lemmy.world · 7 months ago

Idk how it works with non-NVIDIA GPUs but get Nvidia Broadcast or an equivalent. Its a life saver.

ABCDE@lemmy.world · 7 months ago

macOS Sonoma has just updated with camera effects/reactions and “voice isolation” which works just like NVIDIA Broadcast/RTX Voice, luckily.

atocci@kbin.social · 7 months ago

It doesn’t do a very good job of removing my keyboard noise for some reason, and it makes my voice sound noticably worse 😔

Sanctus@lemmy.world · 7 months ago

Mines perfect, my baby can’t even scream in my mic. It gets caught. I don’t recall messing with settings, and my GPU is a 2080 TI. Idk, hardware maybe? Theres not much to mess with.

helenslunch@feddit.nl · 7 months ago

Someone explain how this works? Doesn’t make much sense to me how that’s even possible.

Pons_Aelius@kbin.social · 7 months ago

Because of different placement on the keyboard and different finger pressure, each key press has a slightly different sound.

The telling thing in this story is this

with 95 percent accuracy in some cases.

For some people (those with a very consistent typing style on a known keyboard) they were right 95% of the time.

In the real world this type of thing is basically useless as you would need a decent sample of the person typing on a known keyboard for it to work.

To go from keystroke sounds to actual letters, the eggheads recorded a person typing on a 16-inch 2021 MacBook Pro using a phone placed 17cm away and processed the sounds to get signatures of the keystrokes.

So to do this you need to have physical access to the person (to place a microphone nearby) and know what type of device they are typing on and for it to be a device that you have already analysed the sound profile of.

ILikeBoobies@lemmy.ca · 7 months ago

You don’t need physical access, just some malware that has access to the microphone

We would hope researchers “discovering” this wouldn’t have a production ready product as their proof of concept. So there is room from improvement but military contractors would love to invest in this

Pons_Aelius@kbin.social · 7 months ago

You don’t need physical access, just some malware

Which you still need to have previously installed…

If the person has allowed malware to be installed just install a keylogger (which gives you 100% accuracy every time) rather than jump through more hoops with this.

agent_flounder@lemmy.world · 7 months ago

The article says

The researchers note that skilled users able to rely on touch typing are harder to detect accurately, with single-key recognition dropping from 64 to 40 percent at the higher speeds enabled by the technique.

Hm. Sounds like “some cases” are hunt and peck typists or very slow touch typists.

I don’t know if training for each victim’s typing is really needed. I get the impression they were identifying unique sounds and converting that to the correct letters. I only skimmed and I didn’t quite understand the description of the mechanisms. Something about deep learning and convolution or…? I think they also said they didn’t use a language model so I could be wrong.

Pons_Aelius@kbin.social · edit-2 7 months ago

The problems is that even with up to 95% accuracy that still means the with a password length of 10 there is a 50/50 chance that one character is wrong.

A password with one character wrong is just as useless as randomly typing.

Which character is wrong and what should it be? You only have 2 or 3 more guess till most systems will lock the account.

This is an interesting academic exercise but there are much better and easier ways to gain access to passwords and systems.

The world is not a bond movie.

Deploying social engineering is much easier than this sort of attack.

catch22@startrek.website · 7 months ago

They’ll have modelled the acoustic signals to differentiate between different keys. Individual acoustic waves eminating from pressing a key will have features extracted from them to identify them. Opimal featues are then choose to maximise accuracy, such as features that still work when the signal is captured at different distances or angles. With all these types of singsl processing inference models, you never get 100 percent. The claim of 95 percent is actually very high.

9point6@lemmy.world · 7 months ago

Every key is unique and at a different distance to the microphone and therefore makes tiny differences in noise.

Knowing this, and knowing the frequency distribution of letters in language (e.g. we know “e” is the most common letter) and some clever analysis over a large enough sample of typing, we can figure out what each key sounds like with a statically high level of probability. Once that’s happened it’s just like any other speech recognition software, except it’s the language of your keyboard.

TootSweet@lemmy.world · 7 months ago

This is just me kindof guessing off the top of my head, but:

Depending where the mic is in relation to the keyboard, it can tell to some extent the relative distance from the key to the mic by volume of the keypress.
The casing of the keyboard has a particular shape with particular acoustic properties which would make certain keys sound different than others. (Maybe the ones toward the middle have a more bass sound to them as opposed to more treble in the keys closer to the edges of the keyboard.)
The surface on which the keyboard sits may also resonate differently with different keys.
There may be some extent to which the objects in the room (including the typist and monitor, etc) could have reflected or absorbed soundwaves in ways that would differ depending on the angle at which the soundwaves hit them, which would be affected by the location of the key.
Some keys like the spacebar and left shift almost always have a stabilizer bar which significantly affects the sound of the key for most keyboards.
For human typists, there are patterns in the timing of key presses. It’s quicker to type two keys in succession if those two keys are pressed by different fingers or different hands, for instance. Imaging typing the word “jungle”, for instance. “J”, “u”, and “n” are all pressed with the right index finger (for touch typists). So the first three letters would be slower to type than the rest of the letters.
I’d imagine this method also allowed the program to take into account various aspects of human language. (Probably English in this case, but it could just as well have been another language.) Certain strings of consonants just never appear consecutively. Certain letters are less frequently used. Things like that. Probably the accuracy would have been lower if the subjects were asked to type specific strings of random letters.
It may also be that this particular experiment involved fairly controlled circumstances. They always placed the mic 12cm from the keyboard, for instance. Maybe they also used the exact same keyboard on the exact same desk with the exact same typist for all tests and training. And it sounds like they trained it on known text for a good while before testing the AI by asking the AI to actually discern what was typed. That’s pretty perfect conditions that probably wouldn’t be realistic for an actual attack. Not to minimize the potential privacy imacts of this, though. I’d fully expect methods like this to be more accurate for a more generalized set of cases.

Now, the researchers didn’t sit down and list out all of these (or any other) ways in which software could determine what was typed from audio and compose an algorithm that accounted for all/most/some of these. They just kindof threw a bunch of audio with accompanying “right answers” at a machine learning algorithm and let the algorithm figure out whatever clues it could discern and combine those in whatever way it found most beneficial to come up with an (increasingly-more-accurate-with-every-training-set) answer. It’s likely the algorithm came up with different things than I did that helped it determine which key(s) were being pressed.

mymy@lemmy.blahaj.zone · edit-2 7 months ago

Removed by mod