They probably don’t perform the translation until the user requests it. Automatically translating every comment to every language to check if it changes would be a lot of additional computation
It might not be too bad, once you get into code breaking, some of the simple techniques quickly yield metrics that can guess at the language with not much processing (depending on the total message length, but you could get a similar low effort guess by just analysing a sample)
It’s as simple as measuring the average distance between letters in a sample, and you could probably do more by using something like average ranges in UTC. Each language will vary, so you can build a map with some sample text, them just take n letters to guess the language with reasonable accuracy
On top of that, you could use user feedback or other factors to further narrow it down…Not perfect (and would look strange like this when it does fail), but then you could flag a defected language and give users a one click translate button
They probably don’t do the translations until requested like you said (there’s a lot of languages out there to translate into after all), but a platform as big as YouTube might be using big data to decide what to preemptively translate into what language (and maybe using low demand periods or optimizing for engagement, maybe a combination of both)
I mean they could. But do you think that if something offers a translate button and it translates to the same thing, that that’s costing them enough money that it’s worth it for them to spend all that effort?
They probably don’t perform the translation until the user requests it. Automatically translating every comment to every language to check if it changes would be a lot of additional computation
It might not be too bad, once you get into code breaking, some of the simple techniques quickly yield metrics that can guess at the language with not much processing (depending on the total message length, but you could get a similar low effort guess by just analysing a sample)
It’s as simple as measuring the average distance between letters in a sample, and you could probably do more by using something like average ranges in UTC. Each language will vary, so you can build a map with some sample text, them just take n letters to guess the language with reasonable accuracy
On top of that, you could use user feedback or other factors to further narrow it down…Not perfect (and would look strange like this when it does fail), but then you could flag a defected language and give users a one click translate button
They probably don’t do the translations until requested like you said (there’s a lot of languages out there to translate into after all), but a platform as big as YouTube might be using big data to decide what to preemptively translate into what language (and maybe using low demand periods or optimizing for engagement, maybe a combination of both)
I mean they could. But do you think that if something offers a translate button and it translates to the same thing, that that’s costing them enough money that it’s worth it for them to spend all that effort?