• 0 Posts
  • 40 Comments
Joined 1 year ago
cake
Cake day: July 5th, 2023

help-circle


  • Insane compute wasn’t everything. Hinton helped develop the technique which allowed more data to be processed in more layers of a network without totally losing coherence. It was more of a toy before then because it capped out at how much data could be used, how many layers of a network could be trained, and I believe even that GPUs could be used efficiently for ANNs, but I could be wrong on that one.

    Either way, after Hinton’s research in ~2010-2012, problems that seemed extremely difficult to solve (e.g., classifying images and identifying objects in images) became borderline trivial and in under a decade ANNs went from being almost fringe technology that many researches saw as being a toy and useful for a few problems to basically dominating all AI research and CS funding. In almost no time, every university suddenly needed machine learning specialists on payroll, and now at about 10 years later, every year we are pumping out papers and tech that seemed many decades away… Every year… In a very broad range of problems.

    The 580 and CUDA made a big impact, but Hinton’s work was absolutely pivotal in being able to utilize that and to even make ANNs seem feasible at all, and it was an overnight thing. Research very rarely explodes this fast.

    Edit: I guess also worth clarifying, Hinton was also one of the few researching these techniques in the 80s and has continued being a force in the field, so these big leaps are the culmination of a lot of old, but also very recent work.


  • Lots of good comments here. I think there’s many reasons, but AI in general is being quite hated on. It’s sad to me - pre-GPT I literally researched how AI can be used to help people be more creative and support human workflows, but our pipelines around the AI are lacking right now. As for the hate, here’s a few perspectives:

    • Training data is questionable/debatable ethics,
    • Amateur programmers don’t build up the same “code muscle memory”,
    • It’s being treated as a sole author (generate all of this code for me) instead of like a ping-pong pair programmer,
    • The time saved writing code isn’t being used to review and test the code more carefully than it was before,
    • The AI is being used for problem solving, where it’s not ideal, as opposed to code-from-spec where it’s much better,
    • Non-Local AI is scraping your (often confidential) data,
    • Environmental impact of the use of massive remote LLMs,
    • Can be used (according to execs, anyways) to replace entry level developers,
    • Devs can have too much faith in the output because they have weak code review skills compared to their code writing skills,
    • New programmers can bypass their learning and get an unrealistic perspective of their understanding; this one is most egregious to me as a CS professor, where students and new programmers often think the final answer is what’s important and don’t see the skills they strengthen along the way to the answer.

    I like coding with local LLMs and asking occasional questions to larger ones, but the code on larger code bases (with these small, local models) is often pretty non-sensical, but improves with the right approach. Provide it documented functions, examples of a strong and consistent code style, write your test cases in advance so you can verify the outputs, use it as an extension of IDE capabilities (like generating repetitive lines) rather than replacing your problem solving.

    I think there is a lot of reasons to hate on it, but I think it’s because the reasons to use it effectively are still being figured out.

    Some of my academic colleagues still hate IDEs because tab completion, fast compilers, in-line documentation, and automated code linting (to them) means you don’t really need to know anything or follow any good practices, your editor will do it all for you, so you should just use vim or notepad. It’ll take time to adopt and adapt.


  • As someone who researched AI pre-GPT to enhance human creativity and aid in creative workflows, it’s sad for me to see the direction it’s been marketed, but not surprised. I’m personally excited by the tech because I personally see a really positive place for it where the data usage is arguably justified, but we either need to break through the current applications of it which seems more aimed at stock prices and wow-factoring the public instead of using them for what they’re best at.

    The whole exciting part of these was that it could convert unstructured inputs into natural language and structured outputs. Translation tasks (broad definition of translation), extracting key data points in unstructured data, language tasks. It’s outstanding for the NLP tasks we struggled with previously, and these tasks are highly transformative or any inputs, it purely relies on structural patterns. I think few people would argue NLP tasks are infringing on the copyright owner.

    But I can at least see how moving the direction toward (particularly with MoE approaches) using Q&A data to support generating Q&A outputs, media data to support generating media outputs, using code data to support generating code, this moves toward the territory of affecting sales and using someone’s IP to compete against them. From a technical perspective, I understand how LLMs are not really copying, but the way they are marketed and tuned seems to be more and more intended to use people’s data to compete against them, which is dubious at best.


  • Not to fully argue against your point, but I do want to push back on the citations bit. Given the way an LLM is trained, it’s not really close to equivalent to me citing papers researched for a paper. That would be more akin to asking me to cite every piece of written or verbal media I’ve ever encountered as they all contributed in some small way to way that the words were formulated here.

    Now, if specific data were injected into the prompt, or maybe if it was fine-tuned on a small subset of highly specific data, I would agree those should be cited as they are being accessed more verbatim. The whole “magic” of LLMs was that it needed to cross a threshold of data, combined with the attentional mechanism, and then the network was pretty suddenly able to maintain coherent sentences structure. It was only with loads of varied data from many different sources that this really emerged.


  • Yeah, I may be wrong but I think it usually comes down to a very specific kind of precision needed. It’s not meant to be hostile, I think, but meant to provide a domain-specific explanation clearly to those who need to interpret it in a specific way. In law, specific jargon infers very specific behaviour, so it’s meant to be precise in its own way (not a law major, can’t say for sure), but it can seem completely meaningless if you aren’t prepped for it.

    Same thing in other fields. I had a professor who was very pedantic about {braces} vs [brackets] vs (parentheses), and it seemed totally unnecessary to be so corrective in discussions, but when explaining where things went wrong with a student’s work it was vital to be able to quickly differentiate them in their work so they could review the right areas or understand things faster during a lecture later down the line.

    But that noise takes longer to teach through, so if it is important, it needs it’s own time to learn, and it will make it inaccessible to anyone who didn’t get that time to learn and digest it.


  • There was a lovely computer science book for kids I can’t remember the name of, and it was all about the evil jargon trying to prevent people from mastering the magical skills of programming and algorithms. I love these approaches. I grew up in an extremely non/anti-academic environment, and I learned to explain things in non-academic ways, and it’s really helped me as an intro lecturer.

    Jargon is the mind killer. Shorthands are for the people who have enough expertise to really feel the depths of that shorthand and use it to tickle the old familiar neurons they represent without needing to do the whole dance. It’s easy to forget that to a newcomer, the symbol is just a symbol.





  • My two cents, after years of Markdown (and md to PDF solutions) and LaTeX and a full two years of trying to commit to bashing my head against Word for work purposes, I’m really enjoying Typst. It didn’t take long to convert my themes, having docs I can import which are basically just variables to share across documents in a folder has been really helpful. Haven’t gone too deep into it but I’m excited to give it a deeper test run over the next little bit.


  • Lots of immediate hate for AI, but I’m all for local AI if they keep that direction. Small models are getting really impressive, and if they have smaller, fine-tuned, specific-purpose AI over the “general purpose” LLMs, they’d be much more efficient at their jobs. I’ve been rocking local LLMs for a while and they’ve been great as a small compliment to language processing tasks in my coding.

    Good text-to-speech, page summarization, contextual content blocking, translation, bias/sentiment detection, click bait detection, article re-titling, I’m sure there’s many great use cases. And purely speculation,but many traditional non-llm techniques might be able to included here that were overlooked because nobody cared about AI features, that could be super lightweight and still helpful.

    If it goes fully remote AI, it loses a lot of privacy cred, and positions itself really similarly to where everyone else is. From a financial perspective, bandwagoning on AI in the browser but “we won’t send your data anywhere” seems like a trendy, but potentially helpful and effective way to bring in a demographic interested in it without sacrificing principles.

    But there’s a lot of speculation in this comment. Mozilla’s done a lot for FOSS, and I get they need monetization outside of Google, but hopefully it doesn’t lead things astray too hard.


  • Yeah, this is the approach people are trying to take more now, the problem is generally amount of that data needed and verifying it’s high quality in the first place, but these systems are positive feedback loops both in training and in use. If you train on higher quality code, it will write higher quality code, but be less able to handle edge cases or potentially complete code in a salient way that wasn’t at the same quality bar or style as the training code.

    On the use side, if you provide higher quality code as input when prompting, it is more likely to predict higher quality code because it’s continuing what was written. Using standard approaches, documenting, just generally following good practice with code before sending it to the LLM will majorly improve results.


  • PixelProf@lemmy.catoADHD memes@lemmy.dbzer0.comAdrenaline Wave
    link
    fedilink
    English
    arrow-up
    4
    ·
    edit-2
    1 year ago

    Every time. Try to get ahead of your work? Well, good for you, that first 20% went really well, now let’s spend the next two weeks on “work” that interferes with your other needs and needs to get thrown out because there’s no way it’s integrating with the other 80% that needs to happen within the next hour and also everything that you did for the other 20% is useless and needs to be redone now that you broke it with that tangent.

    It’s been a painful summer “preparing” to teach my fall courses.


  • I sit somewhere tangential on this - I think Bret Victor’s thoughts are valid here, or my interpretation of them - that we need to start revisiting our tooling. Our IDEs should be doing a lot more heavy lifting to suit our needs and reduce the amount of cognitive load that’s better suited for the computer anyways. I get it’s not as valid here as other use cases, but there’s some room for improvements.

    Having it in separate functions is more testable and maintainable and more readable when we’re thinking about control flow. Sometimes we want to look at a function and understand the nuts and bolts and sometimes we just want to know the overall flow. Why can’t we swap between views and inline the functions in our IDE when we want to see the full flow? In fact, why can’t we see the function inline but with the parameter variables replaced by passed values to get a feel for how the function will flow and compute what can be easily computed (assuming no global state)?

    I could be completely off base, but more and more recently - especially after years of teaching introductory programming - I’m leaning more toward the idea that our IDEs should be doubling down on taking advantage of language features, live computation, and co-operating with our coding style… and not just OOP. I’d love to hear some places that I might be overlooking. Maybe this is all a moot point, but I think code design and tooling should go hand in hand.



  • I appreciate the comment, and it’s a point I’ll be making this year in my courses. More than ever, students have been struggling to motivate themselves to do the work. The world’s on fire and it’s hard to intrinsically motivate to do hard things for the sake of learning, I get it. Get a degree to get a job to survive, learning is secondary. But this survival mindset means that the easiest way is the best way, and it’s going to crumble long-term.

    It’s like jumping into an MMORPG and using a bot to play the whole game. Sure you have a cap level character, but you have no idea how to play, how to build a character, and you don’t get any of the references anyone else is making.


  • This is a very output-driven perspective. Another comment put it well, but essentially when we set up our curriculum we aren’t just trying to get you to produce the one or two assignments that the AI could generate - we want you to go through the motions and internalize secondary skills. We’ve set up a four year curriculum for you, and the kinds of skills you need to practice evolve over that curriculum.

    This is exactly the perspective I’m trying to get at work my comment - if you go to school to get a certification to get a job and don’t care at all about the learning, of course it’s nonsense to “waste your time” on an assignment that ChatGPT can generate for you. But if you’re there to learn and develop a mastery, the additional skills you would have picked up by doing the hard thing - and maybe having a Chat AI support you in a productive way - is really where the learning is.

    If 5 year olds can generate a university level essay on the implications of thermodynamics on quantum processing using AI, that’s fun, but does the 5 year old even know if that’s a coherent thesis? Does it imply anything about their understanding of these fields? Are they able to connect this information to other places?

    Learning is an intrinsic task that’s been turned into a commodity. Get a degree to show you can generate that thing your future boss wants you to generate. Knowing and understanding is secondary. This is the fear of generative AI - further losing sight that we learn though friction and the final output isn’t everything. Note that this is coming from a professor that wants to mostly do away with grades, but recognizes larger systemic changes need to happen.


  • 100%, and this is really my main point. Because it should be hard and tedious, a student who doesn’t really want to learn - or doesn’t have trust in their education - will bypass those tedious bits with the AI rather than going through those tedious, auxiliary skills that you’re expected to pick up, and use the AI was a personal tutor - not a replacement for those skills.

    So often students are concerned about getting a final grade, a final result, and think that was the point, thus, “If ChatGPT can just give me the answer what was the point”, but no, there were a bunch of skills along the way that are part of the scaffolding and you’ve bypassed them through improper use of available tools. For example, in some of our programming classes we intentionally make you use worse tools early to provide a fundamental understanding of the evolution of the language ergonomics or to understand the underlying processes that power the more advanced, but easier to use, concepts. It helps you generalize later, so that you don’t just learn how to solve this problem in this programming language, but you learn how to solve the problem in a messy way that translates to many languages before you learn the powerful tools of this language. As a student, you may get upset you’re using something tedious or out of date, but as a mentor I know it’s a beneficial step in your learning career.

    Maybe it would help to teach students about learning early, and how learning works.