Turns Out, Not Every AI Bot Is a Coding Whiz

Key Takeaways

  • Extensive testing reveals significant differences in AI chatbots’ coding abilities.
  • Four chatbots are recommended for coding tasks: ChatGPT Plus, Perplexity Pro, Google Gemini Pro 2.5, and Microsoft Copilot (free version).
  • Many other AI models, even some specialized for coding, performed poorly in real-world tests.
  • Free versions of top performers like ChatGPT and Perplexity can be useful but may have limitations.
  • AI is best for assisting with code snippets or debugging, not writing entire complex applications yet.

Technology rarely surprises veteran tech commentators anymore, but the ability of OpenAI’s ChatGPT to write a functional WordPress plugin shortly after its release was genuinely astonishing.

That initial success sparked a deep dive into the coding capabilities of various AI chatbots. According to testing by ZDNET, 14 different large language models (LLMs) were put through four real-world programming challenges.

The results showed a wide gap in performance. Even now, some AI models struggle to create working code, with four failing to generate functional plugins in the tests.

However, the testing identified four highly recommended options for those looking to use AI for programming assistance.

ChatGPT Plus (using GPT-4o) and Perplexity Pro stood out, both passing all tests. These services typically cost around $20 per month. Their free counterparts also performed reasonably well, though often with usage limits or access to less powerful underlying models.

Google’s Gemini Pro 2.5 also successfully completed all coding challenges. While technically free, its usage limits were found to be quite restrictive, potentially requiring payment for practical use.

Surprisingly, Microsoft’s Copilot, which had performed poorly in past tests, showed dramatic improvement. Its free version passed all four tests, making it a strong contender.

Other chatbots, including Grok (associated with X) and DeepSeek V3, showed promise, passing most but not all tests. They might be worth watching as they develop.

Unfortunately, several other tested AIs, including Meta AI, Meta Code Llama, GitHub Copilot, and Anthropic’s Claude 3.5 Sonnet, did not fare well, failing most of the coding tasks despite some being marketed specifically for developers.

It’s important to remember that while AI can be a powerful coding assistant for specific tasks like writing code snippets or debugging, it generally can’t build entire, complex applications from scratch just yet.

The field is evolving rapidly, with significant performance shifts noted even between testing rounds. As AI technology advances, these capabilities are likely to continue improving.

Independent, No Ads, Supported by Readers

Enjoying ad-free AI news, tools, and use cases?

Buy Me A Coffee

Support me with a coffee for just $5!

 

More from this stream

Recomended