GPT-4 outdated, no longer the best AI of all

According to the LMSYS benchmark results, GPT-4 of OpenAI was surpassed by Claude-3 of Anthropic losing its primacy as the “best” language model on the market. The changing of the guard marks a significant moment in the industry, as GPT-4 has long been considered the benchmark for LLM performance.

The LMSYS benchmark, a collaboration between researchers at UC Berkley, UC San Diego and Carnegie Mellon University, aims to evaluate large language models and the chatbots that use them through the Chatbot Arena, a system of ranking based on human ratings and on the use of Elo rating system.

In this competitive context, Claude 3 Opus achieved a score of 1253narrowly surpassing GPT-4 which stopped at 1251. This result, while narrow, moved GPT-4 from its long hold of first place.

No less notable was Claude 3’s performance Haikua “local” size model which, despite its exponentially smaller size than Opus, he managed to place seventh, officially entering the “GPT-4” performance class. This rise demonstrates the efficiency of smaller models in specific tasks and challenges the trend of creating ever-larger models.

However, Anthropic’s leadership may not last long. Sources inside OpenAI recently revealed that GPT-5 is almost ready for the public debut, scheduled for mid-year. This new model promises to significantly surpass GPT-4 in terms of capabilities, thanks to the use of “external AI agents” to carry out specific tasks, thus improving speed and reliability in solving complex problems.

In summary, although Anthropic’s Claude-3 now leads the LMSYS ranking, the upcoming launch of GPT-5 could turn the tables once again, highlighting the constant evolution and innovation in the field of large language models.

For Latest Updates Follow us on Google News

Related posts