Measuring what works in AI: Kearney’s new LLM Leaderboard

LLM Leaderboard

Quick insights

The enterprise AI market reveals distinct strategic positioning across leading models. OpenAI’s GPT-4.5 and the emerging o1 model dominate as comprehensive solutions, delivering superior performance with minimal implementation risk—the clear choice for enterprises prioritizing both capability and operational readiness.

Performance specialists Sonar-reasoning-pro and DeepSeek R1-671b lead in core business applications such as customer service and data processing, though they require greater organizational investment in compliance and integration infrastructure. Amazon Nova Pro occupies the strategic middle ground, offering strong performance with moderate implementation complexity—ideal when looking for near-best-in-class task execution without the premium readiness profile of the top-tier models.

Cost-conscious organizations will find value in Mistral’s Large v2, which maintains competitive performance while significantly reducing total ownership costs. Meanwhile, Google’s Gemini Pro addresses rapid deployment needs through exceptional API reliability and global language support, making it optimal for organizations prioritizing speed-to-market and international scalability over absolute performance.

Methodology

To explain how the Leaderboard will evaluate LLMs, we provide below a summary of our methodology: our framework for comparing the relevant market options; the criteria the Leaderboard will use to make those comparisons; and our strategy for establishing the most useful and responsive evaluations over time.

Enterprise readiness

Performance

Next steps

Given the inherently dynamic and fluid nature of AI, we are designing the LLM Leaderboard to be as nimble and responsive as possible. Our plan to publish a new Leaderboard each quarter will allow us to balance the need for timeliness with the value of collecting actionable data over a meaningful period.

We fully expect that the criteria and approaches we use will evolve over time—just like the technologies we are measuring. We will continually ensure that the LLM Leaderboard reflects an accurate “state of play” in the market, including all of the models likeliest to be considered by executives for their organizational AI needs.

Toward this end, Kearney will continue to survey businesses to refine the Leaderboard, so that it continues to address the priorities, needs, and expectations of a rapidly evolving market.

The leaderboard will be regularly augmented with newer models, to capture the most current snapshot of the technological and competitive landscape.

Our foremost objective for the LLM Leaderboard is to ensure that it remains a highly useful resource for executives and organizations seeking to understand this fascinating—and rapidly evolving—technology.

The authors would like to thank Sonal Bhavsar, Akansha Baruah, Aarushi Kapoor, and Ankita Gandhi for their valuable contributions to this article.

Your industry

Your needs

About us

Careers

Insights

Your industry

About us

Careers

Insights