A Study of LLMs’ Preferences for Libraries and Programming Languages
Here’s our latest paper, lead author Lukas Twist from King’s College London where I am a Visiting Professor. It’s been wonderful to talke with Lukas about his work and help contribute to it.
https://arxiv.org/abs/2503.17181
Large Language Models (LLMs) are increasingly used to generate code, influencing users’ choices of libraries and programming languages in critical real-world projects. However, little is known about their systematic biases or preferences toward certain libraries and programming languages, which can significantly impact software development practices. To fill this gap, we perform the first empirical study of LLMs’ preferences for libraries and programming languages when generating code, covering eight diverse LLMs. Our results reveal that LLMs exhibit a strong tendency to overuse widely adopted libraries such as NumPy; in up to 48% of cases, this usage is unnecessary and deviates from the ground-truth solutions. LLMs also exhibit a significant preference toward Python as their default language. For high-performance project initialisation tasks where Python is not the optimal language, it remains the dominant choice in 58% of cases, and Rust is not used a single time. These results indicate that LLMs may prioritise familiarity and popularity over suitability and task-specific optimality. This will introduce security vulnerabilities and technical debt, and limit exposure to newly developed, better-suited tools and languages. Understanding and addressing these biases is essential for the responsible integration of LLMs into software development workflows.
It’s probably a very controversial thesis, but the reason LLM prefer the Python is because they’re still quite primitive!
The great strength of Python is that you can code-a-bit, run-a-bit in a tight REPL loop, very friendly to beginners learning as they go, or task oriented scripting that may never be reused. It plays to suppress the emotion that triggers so many beginners to abandon tasks they do not feel progressing. If AI does not (or should not) experience emotion there is something wrong.
Despite the similarity of Python to Functional Languages (type inference, REPL loops), Functional Programming has the opposite emotional experience for beginners – more effort is needed to start, but more satisfying and reliable once it becomes natural. If AI does not experience emotion, it should favor functional languages, but there are fewer examples available for training.
Back in the 1970’s and 1980’s (before spreadsheets) it was thought that half the population would need to become programmers to solve the “application backlog” problem of adding automation to every office systems, but what do you do with all the people once done. Leaders of the time (both government and commercial) had experienced transformational change through world war, and had vision to look beyond the immediate productivity challenge with three approaches:
Spreadsheets and PC changed what people could do for themselves, and graphical interfaces changed what people expected from systems efforts. None of the big visions addressed the change and faltered.
If we consider AI code generation through the wider prism, it suggests two thoughts:
Tomorrows AI will probably not even consider Python
LikeLike