With the recent launch of Claude 3.7 Sonnet LLM by Anthropic, the platform’s most intelligent model to date, the AI field has received another powerful player, and analysts have tested its competence in coding side by side against xAI’s latest AI model – Grok 3.
As it happens, the two new participants in the AI playground have come out in close proximity, and Analytics Vidhya’s Anu Madan has tested their coding capabilities against each other to check which is a better coder’s sidekick, sharing her results in a report published on February 25.
The tests took place in five key areas: code debugging, game creation, data analysis, code refactoring, and image augmentation.
Claude 3.7 Sonnet vs. Grok 3: Code debugging
Prompted to find errors in the provided code, both correctly identified all of them, explained them in simple language, and provided the corrected code alongside an explanation and tips for running it. However, while Claude 3.7 Sonnet’s new code ran seamlessly without any errors, Grok 3’s did not run as it still had errors.
Furthermore, the output generated by Claude 3.7 Sonnet was a strong indicator of the “model’s improvement on the “if eval” (a very important coding) benchmark – a parameter on which h=it scores higher than any other LLM.”
Game creation
When asked to create a “ragdoll physics simulation using Matter.js and HTML5 Canvas in JavaScript,” both gave a detailed code for the visualization, listed all the features and provided explanations. That said, Claude 3.7 Sonnet was easier to use, making the output available right within the interface, while its opponent required copying the entire output and testing it in a terminal to see the visualization.
The doll created by Claude 3.7 Sonnet had an entire range of expected motion, plus some extra features of playing with the speed. Meanwhile, Grok 3’s doll would sometimes vibrate even when no force was acting on it but otherwise, it was as impressive.
Data analysis
In terms of data analysis, Claude 3.7 Sonnet got the point as well. While both made good sense of the input data (a study on diabetes) and key insights from it, Claude 3.7 Sonnet created an analysis dashboard and scatter plots in the chat, making it very simple to visualize the trends.
On the other hand, Grok 3, while it did offer the Python code for all the plots it considered relevant for the given dataset, as well as key insights, the code for various plots had errors, making it impossible to see the visualizations it had created.
Code refactoring
As for code refactoring (the process of restructuring existing source code without changing its external behavior), Grok 3 did better. Claude 3.7 Sonnet did a good job in optimization and iteration efficiency, but its competitor aligned better with the overall refactoring goal.
In other words, it made the code cleaner, clearer, more maintainable, and production-ready, which is the purpose of refactoring to begin with. Claude 3.7 Sonnet’s approach was slightly informal, lacked typed hints, and relied on debugging prints.
Image augmentation
Despite beating Claude 3.7 Sonnet in code refactoring, Grok 3 didn’t do as good a job as Anthropic’s model in terms of image augmentation.
Specifically, when prompted to provide the Python code for image masking (hiding or revealing specific parts of an image by applying a mask, defining the visible and hidden areas), Claude 3.7. Sonnet did what it was supposed to do.
In the meantime, Grok 3’s method involved thresholding, which segmented the image based on brightness, rather than performing true masking, resulting in a high-contrast, binary mask.
Claude 3.7 Sonnet vs. Grok 3: Who wins?
All things considered, in the Claude 3.7 Sonnet vs. Grok 3 showdown, the former emerges as the ultimate winner over Grok 3 in tasks that require coding, although xAI’s model does better in refactoring. And here’s the best part – Claude 3.7 Sonnet will soon feature a Claude Coder – an agent doing the coding for the user, so the best might yet come.