Study Finds That 52 Percent of ChatGPT Answers to Programming Questions Are Wrong

ekZepp@lemmy.world · 9 months ago

Study Finds That 52 Percent of ChatGPT Answers to Programming Questions Are Wrong

katy ✨@lemmy.blahaj.zone · 9 months ago

ill use copilot in place of most of the times ive searched on stackoverflow or to do mundane things like generate repeated things but relying solely on it is the same as relying solely on stackoverflow.

slacktoid@lemmy.ml · 9 months ago

We need a comparison against an average coder. Some fucking baseline ffs.

Epzillon@lemmy.ml · 9 months ago

I worked for a year developing in Magento 2 (an open source e-commerce suite which was later bought up by Adobe, it is not well maintained and it just all around not nice to work with). I tried to ask some Magento 2 questions to ChatGPT to figure out some solutions to my problems but clearly the only data it was trained with was a lot of really bad solutions from forum posts.

The solutions did kinda work some of the times but the way it was suggesting it was absolutely horrifying. We’re talking opening so many vulnerabilites, breaking many parts of the suite as a whole or just editing database tables. If you do not know enough about the tools you are working with implementing solutions from ChatGPT can be disasterous, even if they end up working.

muhyb@programming.dev · 9 months ago

Ask “are you sure?” and it will apologize right away.

Lemongrab@lemmy.one · edit-2 9 months ago

And then agree with whatever you said, even if it was wrong.

Evotech@lemmy.world · 9 months ago

Probably more than 52% of what programmers type is wrong too

habl@lemmy.world · 9 months ago

We mostly suck in emails.

lurch (he/him)@sh.itjust.works · 9 months ago

The one time it was helpful at work was when I used it to thank and wish a person well that left a company we work with. I couldn’t come up with a good response and ChatGPT just spat real good stuff out in seconds. This is what it’s really good for.

grrgyle@slrpnk.net · 9 months ago

Yeah things that follow a kind of lexical “script” that you don’t want to get creative with would be pretty easy to generate. Farewells, greetings, dear Johns, may he rest in peaces, etc etc

iopq@lemmy.world · edit-2 9 months ago

ChatGPT: I’m happy for you though, Or sorry that happened

Crisps@lemmy.world · 9 months ago

In the short term it really helps productivity, but in the end the reward for working faster is more work. Just doing the hard parts all day is going to burn developers out.

birbs@lemmy.world · 9 months ago

I program for a living and I think of it more as doing the interesting tasks all day, rather than the mundane and repetitive. Chat GPT and GitHub Copilot are great for getting something roughly right that you can tweak to work the way you want.

Veraxus@lemmy.world · edit-2 9 months ago

I’m surprised it scores that well.

Well, ok… that seems about right for languages like JavaScript or Python, but try it on languages with a reputation for being widely used to write terrible code, like Java or PHP (hence having been trained on terrible code), and it’s actively detrimental to even experienced developers.

jsomae@lemmy.ml · 9 months ago

Sure, but by randomly guessing code you’d get 0%. Getting 48% right is actually very impressive for an LLM compared to just a few years ago.

xthexder@l.sw0.com · 9 months ago

Just useful enough to become incredibly dangerous to anyone who doesn’t know what they’re doing. Isn’t it great?

jsomae@lemmy.ml · 9 months ago

Now non-coders can finally wield the foot-gun once reserved only for coders! /s

Truth be told, computer engineering should really be something that one needs a licence to do commercially, just like regular engineering. In this modern era where software can be ruinous to someone’s life just like shoddy engineering, why is it not like this already.

iopq@lemmy.world · 9 months ago

Look, nothing will blow up if I mess up my proxy setup on my machine. I just won’t have internet until I revert my change. Why would that be different if I were getting paid for it?

cows_are_underrated@discuss.tchncs.de · 9 months ago

Nothing happens if you fuck up your proxy, but if you develop an app that gets very popular and don’t care about safety, so hackers are able to take control over your whole Server they can do a lot of damage. If you develop software for critical infrastructure it can actually cost human lives if you fuck up your security systems.

iopq@lemmy.world · 9 months ago

Yes, but people with master’s degrees also fuck this up, so it’s not like some accreditation system will solve the issue of people making mistakes

cows_are_underrated@discuss.tchncs.de · 9 months ago

Yeah, but its probably more likely that the untaught might fuck up some stuff.

sajran@lemmy.ml · 9 months ago

Setting up proxy is not engineering.

iopq@lemmy.world · 9 months ago

I have to actually modify the code to properly package it for my distro, so it’s engineering because I have to make decisions for how things work

sajran@lemmy.ml · 9 months ago

I don’t see how this supports your point then. If “setting up proxy” means “packaging it to run on thousands user machines” then isn’t there obvious and huge potential for a disastrous fuckup?

☆ Yσɠƚԋσʂ ☆@lemmy.ml · 9 months ago

Exactly, I also find that it tends to do a pretty good job pointing you in the right direction. It’s way faster than googling or going through sites like stackoverflow because the answers are contextual. You can ask about a specific thing you want to do, and and an answer that gives you a general idea of what to do. For example, I’ve found it to be great for crafting complex sql queries. I don’t really care if the answer is perfect, as long as it gives me an idea of what I need to do.

InvaderDJ@lemmy.world · 9 months ago

You can also play with it to try and get closer to correct. I had problems with getting an Excel macro working and getting unattended-updates working on my pihole. GPT was wrong at first, but got me partly there and I could massage the question and Google and get closer to the right answer. Without it, I wouldn’t have been able to get any of it, especially with the macro.

eerongal@ttrpg.network · 9 months ago

Worth noting this study was done on gpt 3.5, 4 is leagues better than 3.5. I’d be interested to see how this number has changed

Hotzilla@sopuli.xyz · 9 months ago

There is huge gap between 3.5 and 4 especially in coding related questions. GPT3.5 does not have large enough token size to handle harder code related questions.

zero_spelled_with_an_ecks@programming.dev · 9 months ago

4 made up functions that didn’t exist last time I asked in a programming question.

Melkath@kbin.social · 9 months ago

Developing with ChatGPT feels bizzarely like when Tony Stark invented a new element with Jarvis’ assistance.

It’s a prolonged back and forth, and you need to point out the AIs mistakes and work through a ton of iterations to get something that is close enough that you can tweak it and use, but it’s SO much faster than trawling through Stack Overflow or hoping someone who knows more than you can answer a post for you.

elgordio@kbin.social · 9 months ago

Yeah if you treat it is a junior engineer, with the ability to instantly research a topic, and are prepared to engage in a conversation to work toward a working answer, then it can work extremely well.

Some of the best outcomes I’ve had have needed 20+ prompts, but I still arrived at a solution faster than any other method.

Melkath@kbin.social · 9 months ago

In the end, there is this great fear of “the AI is going to fully replace us developers” and the reality is that while that may be a possibility one day, it wont be any day soon.

You still need people with deep technical knowledge to pilot the AI and drive it to an implemented solution.

AI isnt the end of the industry, it has just greatly sped up the industry.

haui@lemmy.giftedmc.com · 9 months ago

The interesting bit for me is that if you ask a rando some programming questions they will be 99% wrong on average I think.

Stack overflow still makes more sense though.

Samueru@lemmy.ml · 9 months ago

I find it funny that thumbnail with a “fail” I’m actually surprised that it got 48% right.

ulterno@lemmy.kde.social · 9 months ago

You forgot the “at least” before the 52%.

Destide@feddit.uk · 9 months ago

It’s programming spell check

Ech@lemm.ee · 9 months ago

For the upteenth time - an llm just puts words together, it isn’t a magic answer machine.

walter_wiggles@lemmy.nz · 9 months ago

Yeah but it’s just going to get better at magicking. Soon all us wizards will be out of a job…

zero_spelled_with_an_ecks@programming.dev · 9 months ago

Just as soon as we no longer need to drive.

chiisana@lemmy.chiisana.net · 9 months ago

Self driving cars need to convince regulators that they’re safe enough, even if assuming they master the tech.

LLMs has already convinced our bosses that we are expendable, and can drastically reduce cost centres for their next earnings call.

Naminreb@kbin.social · 9 months ago

A parrot blabbing the theory of relativity doesn’t make it Einstein.