We have no clue what goes on inside AI’s brain. This $1.25 billion startup is trying to find out.

Innovation

Founded in 2024, Goodfire is trying to fix AI’s black box problem — increasingly important as AI systems take on high-stakes tasks in cybersecurity and finance.
Eric Ho Headshot

“What we really care most about is building this future of intentional design,” Goodfire CEO Eric Ho says. “How do we actually shape and debug and design these models that we really, really want.”

Goodfire

Type a question into an AI chatbot. A few seconds later, voila! The response unspools in front of your eyes. But exactly why the bot came up with that specific answer is still something of a mystery— even to those who built it. Not understanding how AI really works has inadvertently led to models behaving in unexpected ways: obsessing over goblins and other magical creatures, showering people with fake praise and most concerningly, deceiving and blackmailing the very people who built them.

AI’s core enigma has given rise to a smattering of so-called “interpretability” startups that aim to investigate what’s happening under AI’s hood. That could help improve models’ capabilities, making them safer and training them to avoid spewing out wrong answers or acting in nefarious ways. That’s the premise behind Goodfire, a $1.25 billion-valued AI research lab that studies the inner workings of models. Its tools promise to help developers and researchers inspect models and control how they behave.

“It’s like these alien intelligences have crash landed on earth and they’re incredibly smart, but nobody knows how they work,” says Eric Ho, CEO and cofounder.

Founded in June 2024 in San Francisco, Goodfire has recruited some 50 top researchers who focus on interpretability from labs like OpenAI and Google DeepMind. Goodfire’s picked up over $200 million from name VC firms like B Capital, Menlo Ventures and Lightspeed Venture Partners and was AI behemoth Anthropic’s first startup investment. It’s part of a wave of investment in research-driven startups as AI models take on critical tasks.

Now, Goodfire has released new research that provides a striking insight into how AI models think: they use shapes to represent concepts. The technical term for this is “neural geometry.” When AI models learn about things like the months of the year, it registers this concept in the form of a circular loop (since December loops back to January). Colors, for example, are represented as a wheel — just as they are in design software. Tom McGrath, cofounder and chief scientist at Goodfire, says these shapes matter because if researchers want to tweak how the models are behaving, they must nudge the model along based on the shapes it prefers.

“If you don’t respect this kind of geometry, then you’ll just break the model,” McGrath says. “It generally just makes it dumber.”

The startup’s insight into how AI uses geometry to think helped shape its flagship tool, called Silico. Launched in April, Silico can open up an AI model and map out different parts of its “brain,” so to speak. That allows researchers and developers to locate the internal knobs and dials responsible for specific errors, and retrain the model accordingly. A common mistake, for instance, is that several models believe that 9.11 is a bigger number than 9.9 because they confuse decimal points with Bible verse numbers, thanks to their training data. Companies like Mayo Clinic, Rakuten and Microsoft are already using the product. Mayo, for instance, is using it to check the accuracy of a new DNA model that studies the effects of rare genetic mutations, predicting which ones cause diseases.

Prima Mente, a London-based AI company building biological foundational models, also used Goodfire’s tools while training an AI model to use blood samples to predict if a patient would develop Alzheimer’s. While the model’s predictions were fairly accurate, the company didn’t know why it was so good, so Goodfire helped them understand which factors contributed to the model’s decisions. That resulted in the discovery of a new type of biomarker for Alzheimer’s.

“Whenever you have a superhuman model, you can reverse engineer it to do really interesting things,” Ho says.

While interpretability is an emerging area of AI research, model makers like Anthropic and OpenAI have invested in it, and a handful of nascent companies building explainable foundation models are emerging, including London-based startup Conjecture ($25 million in total funding) and Guide Labs ($18.8 million in funding). But Ho says he’s surprised more people haven’t piled into solving the problem.

“We’re flying blind, there’s no steering wheel attached and we’re all kind of hurling on a bus,” Ho says. “I think it’s one of the most interesting intellectual problems that we have today.”

Want to see more Forbes articles on your feed? Tap here to make Forbes Australia a preferred source on Google.

Look back on the week that was with hand-picked articles from Australia and around the world. Sign up to the Forbes Australia newsletter here or become a member here.

More from Forbes Australia

Avatar of Rashi Shrivastava