As demand for generative AI grows, cloud service providers such as Microsoft, Google and AWS, along with large language model (LLM) providers such as OpenAI, have all reportedly considered developing their own custom chips for AI workloads.

Speculation that some of these companies — notably OpenAI and Microsoft — have been making efforts to develop their own custom chips for handling generative AI workloads due to chip shortages have dominated headlines for the last few weeks.   

While OpenAI is rumored to be looking to acquire a firm to further its chip-design plans, Microsoft is reportedly working with AMD to produce a custom chip, code-named Athena.

Google and AWS both have already developed their own chips for AI workloads in the form of Tensor Processing Units (TPUs), on the part of Google, and AWS' Trainium and Inferentia chips.

But what factors are driving these companies to make their own chips? The answer, according to analysts and experts, lies around the cost of processing generative AI queries and the efficiency of currently available chips, mainly griphics processing unites (GPUs). Nvidia's A100 and H100 GPUs currently dominate the AI chip market.

“GPUs are probably not the most efficient processor for generative AI workloads and custom silicon might help their cause,” said Nina Turner, research manager at IDC.

GPUs are general-purpose devices that happen to be hyper-efficient at matrix inversion, the essential math of AI, noted Dan Hutcheson, vice chairman of TechInsights.

“They are very expensive to run. I would think these companies are going after a silicon processor architecture that’s optimized for their workloads, which would attack the cost issues,” Hutcheson said.

Using custom silicon, according to Turner, may allow companies such as Microsoft and OpenAI to cut back on power consumption and improve compute interconnect or memory access, thereby lowering the cost of queries.

OpenAI spends approximately $694,444 per day or 36 cents per query to operate ChatGPT, according to a report from research firm SemiAnalysis.

“AI workloads don't exclusively require GPUs,” Turner said, adding that though GPUs are great for parallel processing, there are other architectures and accelerators better suited for such AI-based operations.

Other advantages of custom silicon include control over access to chips and designing elements specifically for LLMs to improve query speed, Turner said.

Some analysts also likened the move to design custom silicon to Apple’s strategy of producing chips for its devices. Just like Apple made the switch from general purpose processors to custom silicon in order to improve performance of its devices, the generative AI service providers are also looking to specialize their chip architecture, said Glenn O'Donnell, research director at Forrester.

“Despite Nvidia's GPUs being so wildly popular right now, they too are general-purpose devices. If you really want to make things scream, you need a chip optimized for that particular function such as image processing or specialized generative AI,” O’Donnell explained, adding that custom chips could be the answer for such situations.

However, experts said that developing custom chips might not be an easy affair for any company.

“Several challenges, such as high investment, long design and development lifecycle, complex supply chain issues, talent scarcity, enough volume to justify the expenditure and lack of understanding of the whole process, are impediments to developing custom chips,” said Gaurav Gupta, vice president and analyst at Gartner.  

For any company that is just kickstarting the process from scratch, it might take at least two to two and a half years, O’Donnell said, adding that scarcity of chip designing talent is a major factor behind delays.

O’Donnell’s perspective is backed by examples of large technology companies acquiring startups to develop their own custom chips or partnering with companies that have expertise in the space. AWS acquired Israeli startup Annapurna Labs in 2015 to develop custom chips for its offerings. Google, on the other hand, partners with Broadcom to make its AI chips.

While OpenAI is reportedly looking to acquire a startup to make a custom chip that supports its AI workloads, experts believe that the plan might not be linked to chip shortages, but  more about supporting inference workloads for LLMs, as Microsoft keeps adding AI features into apps and signing up customers for its generative AI services

“The obvious point is that they have some requirement nobody is serving, and I reckon it might be an inference part that’s cheaper to buy and cheaper to run than a big GPU, or even the top Sapphire Rapids CPUs, without making them beholden to either AWS or Google,” according to Omdia principal analyst Alexander Harrowell. He added that he was basing his opinion on CEO Sam Altman’s comments that GPT-4 is unlikely to scale further, and would rather need enhancing. Scaling an LLM requires more compute power when compared to inferencing a model. Inferencing is the process of using a trained LLM to generate more accurate predictions or results.

Further, analysts said that acquiring a large chip designer might not be a sound decision for OpenAI as it would approximately cost around $100 million to design and get the chips ready for production.

“While OpenAI can try and raise money from the market for the effort, the deal with Microsoft earlier this year essentially led to selling an option over half the company for $10 billion, of which some unspecified proportion is in non-cash Azure credits — not the move of a company that’s rolling in cash,” Harrowell said.

Instead, the ChatGPT-maker can look at acquiring startups that have AI accelerators, Turner said, adding that such a move would be more economically advisable.

In order to support inferencing workloads, potential targets for acquisition could be Silicon Valley firms such as Groq, Esperanto Technologies, Tenstorrent and Neureality, Harrowell said, adding that SambaNova could also be a possible acquisition target if OpenAI is willing to discard Nvidia GPUs and move on-premises from a cloud-only approach.

Next read this:

IT World