LLM Code Generation: How does it work?
Do LLM code generation tools like CodeParrot or GitHub Copilot really “understand” your code? Short Answer, No. How do they work then
Do LLM Code Generation tools like CodeParrot or GitHub Copilot really “understand” your code?
Not really. To understand how llm code generation works - Imagine asking your Friend “Hey! Get me noodle”
You could be talking about a packet of noodle to prepare ramen, or a bowl of freshly prepared Hakka noodles since you are hungry or even about your pet poodle Noodle. Your friend understands your “intent” as they know you. Technically the statement “get me noodle” is called form and what you really mean is called intent.
Humans derive intent from form because they have outside context. But LLMs are only trained on form and don’t have surrounding context. If LLMs were to act on the statement they’ll bring you a packet of noodle as that might be the most frequent data in its training set. The interesting question is then how they seem to know what you mean?
How LLM code generation works
Imagine talking to your partner on the phone for hours everyday. Your partner needs to get some work done, so they hire an assistant Bob. Bob listens to your calls and over time learns what your partner replies. Bob gets so good at pattern matching, that they start replying on your partner’s behalf without you even noticing.
This is what LLMs do. They have seen millions of code examples of developers building applications, websites etc. So they can pattern match your incomplete code and find appropriate matches. Example - when writing the FAQSection code, Copilot has seen many examples of such sections, so it can complete the code for you
But one bad day - you have a fight with your partner. This is new to Bob. He doesn’t know how to behave in a fight and says something that your partner wouldn’t. That is what we call hallucinations in LLM. Like when writing some very custom component - code output maybe wrong
Conclusion: LLM code generation
LLMs are great at pattern matching and very accurately predict most things since they have read the entire internet, the greatest data source ever created. But when facing unseen context they can be wrong.
So, LLMs make for great productivity tools for UI building, autocompletion, searching for errors and documentation but need humans for oversight after all.