But what if AI could run on CPUs—with no loss in speed or quality?
Ziroh Labs, a California-based deeptech startup, has done exactly this with its Kompact AI platform, built in partnership with the Indian Institute of Technology, Madras (IIT Madras) and IITM Pravartak Technologies Foundation. In less than a month of its release, Ziroh Labs has received over 200 requests to use the product. “We’ve only managed around 50 meetings so far—7 to 8 per day—and the variety of use cases is remarkable,” Hrishikesh Dewan, co-founder of Ziroh Labs, told Mint in an interview.
CPUs typically have fewer cores (usually 4 to 16) designed for handling complex tasks quickly, while GPUs have hundreds or thousands of simpler cores optimized for running many tasks simultaneously, making them ideal for AI and machine learning. However, GPUs are scarce and much more expensive than CPUs. Nvidia owns a lion’s share of the market (65%), followed by AMD (18%) and Intel (16%), according to tech industry observer Jon Peddie research.
In contrast, there are many more CPUs in the world—five or six per household, minimum, Dewan points out. If these CPUs are enabled to run AI, access can be immediately unlocked, he argues. “Startups can call any data centre and get a CPU-based machine. So accessibility improves, costs go down, and power requirements are lower—making AI more sustainable,” Dewan explained.
How does Kompact AI work?
AI models, Dewan explains, are essentially just sets of equations that can be computed. This means they can run on GPUs, but also on CPUs—or even on devices like washing machines or refrigerators that have small processors in them. The real question, according to him, isn’t: Can you run AI models on CPUs? It’s rather: How long will it take to get an output, and what about the quality of that output?
For instance, companies use techniques like quantisation and distillation to run advanced AI on regular CPUs, but these techniques can compromise the output quality, Dewan explained. But how? Quantization reduces the “precision” of the numbers inside an AI model (instead of using 32-bit numbers, it might use 8-bit). Distillation takes a large model (a “teacher”) and trains a smaller, simpler model (a “student”) to mimic the teacher’s behaviour. The student model runs faster but often isn’t as smart as the teacher—it simplifies things, and some knowledge is lost in translation.
Kompact AI, according to Dewan, avoids both quantization and distillation in a bid to keep the model’s full quality intact and yet get the same output the model designer intended. “First, we retain the original model without any quantization or distillation, so it performs as designed. Second, we optimize it for specific processors—down to their instruction sets. This pairing of model and processor is what we call Kompact AI,” he explained.
Kompact menu
Kompact AI offers a library of pre-optimized AI models, covering text, speech, images, and multimodal applications, all designed to run smoothly on CPUs. Developers can access these models globally and use Kompact’s common AI runtime (called Common AI-Language Runtime, or ICAN) that supports over 10 programming languages to easily integrate them into their applications. Runtime typically takes the code you’ve written and ensures it actually works while the programme is running. Companies “can run these models on their own CPUs—either on-premise or in the cloud,” Dewan said.
Zeroh Labs has optimized 17 models across eight processor types. It includes models from the DeepSeek series, Microsoft’s Phi models, Meta’s LLaMA, and Alibaba’s Qwen series. The models have also been optimized for processors like Intel Xeon Silver, Gold, Platinum, Emerald, and others. Take, for example, the LLaMA 27 billion parameter model. “We first optimize it theoretically, and then we fine-tune (teaching an already-trained model to do a specific task better by giving it new, focused examples) its performance for each processor. That’s the product—the optimized model and its matched runtime. The critical part is this: we retain the full quality of the model. We don’t compromise it. In fact, when IIT Madras rigorously validated our results, they found our quality scores were better than existing implementations,” Dewan explained.
To be sure, other companies, too, are working on developing AI models optimized to run on CPUs in a bid to reduce the reliance on GPUs. For instance, in April, Microsoft Research introduced BitNet b1.58, a “1-bit” AI model capable of running on standard CPUs, including Apple’s M2. But to get the desired level of performance, Microsoft uses its custom framework, bitnet.cpp, which only works with certain types of hardware for now.
Intel’s models
Intel, too, has optimized over 500 AI models to run on its Core Ultra processors, but they include built-in components like a neural processing unit (NPU) to help with on-device AI processing. Intel’s Meteor Lake CPUs, too, feature integrated NPUs. ARM’s CPU architectures are tailored for AI inference tasks (where a trained model makes predictions or gives answers based on new input), supporting a range of workloads from large language models (LLMs) to domain-specific applications, making them suitable for edge devices and energy-efficient deployments. Neural Magic (acquired by Red Hat, an IBM unit) focuses on optimizing deep learning algorithms to run efficiently on CPUs by leveraging the chips’ large memory and complex cores.
While others might pursue the challenge of building larger models, we’re focused on solving real-world problems using domain-specific models, says Dewan. Kompact AI plans to begin with small models that have 50 billion parameters or less. Dewan believes Kompact AI will find applications in many areas. He explained that “specific use cases such as aiding farmers, supporting students or assisting frontline workers in rural areas are best served by smaller, task-specific AI models that can run efficiently on smartphones and low-power devices, ensuring accessibility where it matters most”.
It can even be used for “green computing”—running in data centres powered by renewable energy. Since most centres already have CPUs and power set up, unlike power-hungry GPUs, no new infrastructure is needed, making Kompact AI a greener option, Dewan explained. He added that other use cases include AI for income tax, point-of-sale (POS) systems in kirana stores, and climate and water solutions. “Everyone wants to add intelligence to their workflows, and we make that possible,” he said.
Privacy uncompromised
Kompact AI also helps maintain privacy, according to Dewan. “Remember when everyone uploaded photos to OpenAI’s tools like Sora or DALL-E? Did anyone ask where those photos went? AI today is so powerful that one photo can be used to generate hundreds more. And yes, OpenAI’s terms allow them to use that data. So, the question becomes: where is your data going, and how is it being used?
“Models can now run on your own device or your own server (known as ‘edge’ computing), so your data stays with you, solving your privacy problem since you don’t need to send anything out,” he explained.
As for funding, Ziroh Labs has not announced any specifics yet, but “we have raised funds, mostly from angel investors in the Bay Area, and now from institutional investors as well”, Dewan said. “We’ll disclose more when the time is right. But this has to succeed—there’s no room for failure.”
Key takeaways
- Ziroh Labs’ Kompact AI allows AI models to run efficiently on CPUs instead of GPUs, addressing GPU shortages and high costs.
- Kompact AI retains full model quality while optimizing performance for specific processors.
- Running AI on CPUs makes AI deployment cheaper, more sustainable, and more accessible.
- Kompact AI runs models on personal devices or private servers, enhancing data privacy by eliminating the need to send data to third-party cloud providers.
- The platform is designed for practical use cases, such as supporting farmers, students, and frontline workers and enabling “green computing” in energy-efficient data centres.