The raging demand for computers to run AI models has only accelerated, but there are two major obstacles that anyone in the business needs to overcome: getting the right chips, and getting them into data centers where they can start generating revenue.

General Compute, a new inference neocloud — a company that rents out AI processing power, specializing in the phase when models are running and responding to users rather than being trained — has answers to those questions that illuminate where the AI ecosystem is headed. Those answers helped it raise a $15 million seed round at a $60 million post-money valuation, led by FUSE VC with participation from Carya Venture Partners and Village Global Ventures.

First, what is the right chip? The demand for GPUs has gone through the roof, but it’s becoming conventional wisdom that they aren’t the best-suited chips for running AI models once they have been trained. The phase of AI where a model is actively generating responses has different computational requirements than training, and a new class of chips is being designed specifically for it. Nvidia’s $20 billion Groq transaction in December and Cerebras’ $57 billion IPO last week point the way.

With capacity strained at both those companies, the co-founders of General Compute, CEO Finn Puklowski and CTO Jason Goodison, found another option. They’re turning to specialized chips built by SambaNova, an Intel-backed chipmaker focused on inference that has fallen a bit out of the Silicon Valley conversation.

That may change when SambaNova releases its new chips this year. The architecture is more flexible and uses more memory to store context during inference calculations, and SambaNova claims that it outperforms not just GPUs but also other specialized chips built by the likes of Groq or Cerebras. Puklowski says the new chips will generate 600 to 700 tokens per second, versus about 250 tokens per second for GPUs.

General Compute has $300 million of the company’s SN50 chips on order and says it will be the first neocloud deploying them.

These chips also help solve the second big problem—where to put them—for General Compute: They are air-cooled, not water-cooled, and consume less power, so they can be installed in existing data center facilities without new infrastructure investments.

Puklowski is pursuing colocation deals — arrangements where General Compute installs its hardware in someone else’s facility — not just with data center providers, but also with crypto miners looking to repurpose their infrastructure as the cost of producing a bitcoin has often exceeded its price.

General Compute launched its cloud offering last week, claiming it is already the fastest at running MiniMax 2.7, a powerful open-source LLM.

Joe Hassleman is a venture investor who got in on the ground floor of the inference boom when he invested in Groq in 2021. This year, he launched a new fund, Evercrest Partners, focused on the AI space, and made General Compute his first investment. Hassleman sees in SambaNova’s partnership with General Compute parallels to Coreweave’s relationship with Nvidia — and to the pairing of Groq’s chip-making with its former cloud offering.

“They do need a healthy mix of customers that are going to put their chips in environments that are going to have high growth to them,” Hassleman said. “As much as General Compute is making a bet on SambaNova, SambaNova is making a bet on General Compute.”

The question is what kind of computer architecture will capture the most value in the AI future. Inference clouds are implicit bets on a world of multiple models and agents, one where no single provider dominates and speed and cost of inference become the key competitive variables. Consider the $113 million Series B raised for OpenRouter this week, reflecting the company’s ability to offer customers access to multiple models in order to optimize their token spend.

Speed matters in that calculation, for price, and for capability. Puklowski wants to turn hour-long workloads for coding agents into five- or ten-minute tasks, and make audio agents for customer service, which require faster inference to converse effectively, more economical.

“If you use ChatGPT and it gives you 50 tokens per second, that’s still a heck of a lot faster than we can read,” Puklowski told TechCrunch, “Now that things have moved to agent-to-agent, where agents are out there reading on our behalf or pinging databases, they need to go faster.”

When you purchase through links in our articles, we may earn a small commission. This doesn’t affect our editorial independence.



Source link

Share.
Leave A Reply

Exit mobile version