Quick insights
OpenAI’s GPT-4 Turbo and GPT-4o models stand out as top performers for enterprise use, excelling in both enterprise readiness and business performance metrics with high scores in functionality, usability, and integration. Google’s Gemini Pro also shines with exceptional accessibility and strong overall performance, making it a solid choice for organizations prioritizing ease of deployment. This quarter’s new entrant, Alibaba's Qwen Max, boasts a high business performance score despite moderate enterprise readiness. For cost-effective solutions, Mistral’s Large v2 model offers high cost-efficiency while maintaining reasonable performance, appealing to budget-conscious enterprises.
It’s clear from this initial Leaderboard that different models excel in different areas, which suggests that the optimal choice of LLM can vary significantly depending on specific business applications.
One of our top priorities for the LLM Leaderboard is to ensure that it continually reflects the rapidly evolving AI needs of Kearney’s enterprise clients. We will achieve this by frequently adjusting the weights of the Leaderboard’s parameters, and tailoring the Leaderboard to each client’s specific needs—drawing upon our experts’ business expertise and our understanding of each organization’s unique requirements.
For example, a customer-focused company might place an especially high priority on usability, while an academic team might emphasize training dataset quality. Some enterprises might want to use the Leaderboard to provide ongoing support in assessing current capabilities and gaps; others might value metrics assessing compatibility with specific cloud providers or other tech partners.
To explain how the Leaderboard will evaluate LLMs, we provide below a summary of our methodology: our framework for comparing the relevant market options; the criteria the Leaderboard will use to make those comparisons; and our strategy for establishing the most useful and responsive evaluations over time.
Given the inherently dynamic and fluid nature of AI, we are designing the LLM Leaderboard to be as nimble and responsive as possible. Our plan to publish a new Leaderboard each quarter will allow us to balance the need for timeliness with the value of collecting actionable data over a meaningful period.
We fully expect that the criteria and approaches we use will evolve over time—just like the technologies we are measuring. We will continually ensure that the LLM Leaderboard reflects an accurate “state of play” in the market, including all of the models likeliest to be considered by executives for their organizational AI needs.
Toward this end, Kearney will continue to survey businesses to refine the Leaderboard, so that it continues to address the priorities, needs, and expectations of a rapidly evolving market.
The leaderboard will be regularly augmented with newer models, to capture the most current snapshot of the technological and competitive landscape.
Our foremost objective for the LLM Leaderboard is to ensure that it remains a highly useful resource for executives and organizations seeking to understand this fascinating—and rapidly evolving—technology.