AI unicorn Anthropic releases Claude 3, a model it claims can beat OpenAI’s best

Innovation

In an interview, founders Dario and Daniela Amodei told Forbes that Anthropic’s new enterprise-focused model, released Monday, outperforms rivals GPT-4 and Google’s Gemini 1.0 Ultra.

Cofounders and siblings Dario Amodei and Daniela Amodei say Claude 3’s release is another sign that “Anthropic is more of an enterprise company than a consumer company.” ANTHROPIC

Anthropic today announced a new series of large language models that the artificial intelligence company claims are the world’s most intelligent to date, outperforming rival offerings from OpenAI and Google.

Called Claude 3, Anthropic’s new model “family” comes in three versions — Opus, Sonnet and Haiku — that vary by performance and price.

Opus, the most powerful and most expensive version to run, outperformed OpenAI’s GPT-4 and Google’s Gemini 1.0 Ultra across a series of benchmarks that measure intelligence, the company said.

It and Sonnet, the mid-tier offering, were made available Monday, while Haiku will be released at a later announced date.

In an interview, cofounder and CEO Dario Amodei said the model family was designed with different business use cases in mind. “Claude 3 Opus is, at least according to the evaluations, in many respects the best-performing model in the world across a range of tasks,” he added.

Inside AI unicorn Anthropic’s unusual US$750 million fundraise

On a number of popular test subjects including undergraduate level general knowledge (MMLU), grade school math (GSM8K), computer code (HumanEval) and question-and-answers knowledge (ARC-Challenge), Claude 3 Opus outperformed OpenAI’s GPT-4 and Google’s Gemini 1.0 Ultra, per benchmarks the company shared.

On the general-knowledge benchmark, Claude 3 Opus also outperformed Mistral Large, the top-line released model from open-source AI unicorn Mistral, released last week.

The version of Claude 3 that most users will see, however, Claude 3 Sonnet, performed more on par with GPT-4: ahead on some benchmarks, behind on others.

And Amodei conceded that Anthropic’s benchmarks did not factor in recent updates from OpenAI and Google (GPT-4 Turbo and Gemini 1.5 Pro) as their peers have not yet published corresponding test evaluations.

“I would be surprised if we did not perform competitively,” he said.

At $15 per million tokens input — equivalent to the text of 2,500 book pages — and $75 per million tokens output, Claude 3 Opus is more expensive than the preview version of OpenAI’s GPT-4 Turbo, which costs $10 and $30 per million tokens, respectively.

Amodei and cofounder and sister Daniela Amodei told Forbes they expect Opus to be used by businesses that need the most cutting-edge performance for functions like complex data analysis and biomedical research.

Claude 3 Sonnet, by comparison — which is five times cheaper — would make sense for most tasks, they added, with uses ranging from search and retrieval across large data stores, sales forecasting and targeted marketing and code generation.

The lowest-cost model, Claude 3 Haiku, will cost just a fraction of that, handy for live interactions with customers, content moderation and in logistics inventory management.

The Haiku version still performed on par with Anthropic’s last flagship version of Claude 2, the predecessor model it released just eight months ago, Dario Amodei said: “It’s very competitive with other models in the same class. This is a big gain.”

Anthropic’s reported benchmark performance placed Claude 3 Opus ahead of rivals like OpenAI’s GPT-4. ANTHROPIC

All three models will allow for prompts of up to 200,000 tokens (approximately the size of a book), more than the 128,000 supported by GPT-4 Turbo.

Opus users will be able to request 1 million token limits for some uses, Anthropic said, matching the ceiling Google has offered to some users of Gemini 1.5 Pro.

Formed by seven researchers who quit OpenAI, Anthropic has historically aimed to separate itself from its progenitor and other companies in the field through a deeper focus on AI safety.

Some industry insiders have wondered if this has slowed the company down and questioned its model performance in recent months, including on social media.

On a popular crowdsourced leaderboard of human evaluators, Claude 1 currently carries a higher rating than its successors Claude 2.0 and the updated Claude 2.1.

Dario Amodei shrugged off those ratings as just one human-based evaluation of a finite number of consumer tasks.

He conceded that while Claude 2 was safer than its predecessor in a way that satisfied Anthropic’s researchers, that came at the cost of higher “incorrect refusals,” or rejections of prompts that the model believed came too close to its safety guardrails.

The Claude 3 family performs much better than predecessors in not serving those rejections, Anthropic claimed. Harmless prompts close in content to its safety limits are refused about 10% of the time, compared to 25% for Claude 2.1.

“Now we’re making progress towards more balance between the two, something that gets the best of both worlds,” Amodei said. “It’s really hard to draw a complex boundary in the right way. We’re always trying to do that better.”

While companies like Inflection, Character.AI and even OpenAI have ventured further into consumer use cases, Anthropic is focusing on business customers.

Users of its free consumer chatbot, also called Claude, will now get access to Sonnet, while individuals looking to try Opus will need to subscribe to its $20-per-month paid version. But Claude 3’s releases were made more for business use cases in mind, said Daniela Amodei.

Claude customers include tech companies Gitlab, Notion, Quora and Salesforce (an Anthropic investor); financial giant Bridgewater and conglomerate SAP, as well as businesses research portal LexisNexis, telco SK Telecom and the Dana-Farber Cancer Institute.

Among early Claude 3 test users, productivity software maker Asana found a 42% improvement in initial response time, AI-focused executive Eric Pelz said in a statement.

Fellow software company Airtable said that it had integrated Claude 3 Sonnet into its own AI tool to help with faster content creation and data summarization.

As for how much Claude 3 cost to train — how much computing, and for how long — Anthropic’s cofounders declined to say.

While Claude 2 was released last July, Amodei said that was no giveaway, as the company sometimes trains multiple models at once, depending on the availability of clusters of graphics processing units, or GPUs.

Anthropic — which was recently raising $750 million at a valuation of $18.4 billion, as Forbes reported — plans to add features including code interpretation, search functions and source citations in the coming months.

“We’re going to continue to scale up our models and make them more intelligent, but also continue to try to make the smaller, cheaper models smarter and more efficient,” Amodei said. “There will be updates large and small throughout the year.”

This article was first published on forbes.com and all figures are in USD.