“I have subscriptions to Netflix, Prime and all the major OTT platforms, but I find myself opening Kuku FM and Pocket FM whenever I come across a particular story that piques my interest,” he said. “I recently finished ‘Secret Billionaire’ (a drama series on Kuku FM), and the content quality is impressive.”
Priyanshu, an engineering student from Delhi, recently switched from watching YouTube videos to listening to Kuku FM. Normally an avid consumer of fiction, he has recently started exploring non-fiction content on the platform. “I am addicted to the app as the audio quality and the vast variety of content is great…But I wish they could reduce the pricing of the subscription plans,” he said.
Like Ankur and Priyanshu, hundreds of millions of users log on to audio OTT platforms periodically to entertain themselves. Indeed, audio content, which includes audiobooks, audio series and podcasts, is experiencing a global renaissance, with India emerging as a key market.
According to a 2024 Redseer report, even though it’s still a nascent industry, the audio series segment has already captured an audience base at par with video streaming and OTT platforms, with 10-11% of that base comprising paid users.
“When we look at the big picture, we’re talking about a whopping 1.3 billion potential users worldwide who are tuning into audio series,” the report stated, adding that almost 27% came from India alone in 2023.
Pocket FM and Kuku FM, founded in 2018, publishing platform Pratilipi, which launched Pratilipi FM around 2020, and Hubhopper, a podcast platform started in 2015, came up to create the category.
View Full Image
“If you look at the video spectrum, you have a bunch of platforms for each and every use case. But if you analyse the audio ecosystem, you had just Spotify for music and some products doing podcasts—there was no clear software product for audio entertainment in general and that is where we saw the opportunity,” said Prateek Dixit, co-founder and chief technology officer, Pocket FM.
Bengaluru-headquartered Pocket FM alone claims to have a listener community of over 200 million and over 15 million monthly active listeners, while Kuku FM, also based in the Garden City, has over 4.2 million paying subscribers on its platform. It offers both fiction and non-fiction series, whereas Pocket FM is focused solely on fiction.
Pocket FM’s revenue from operations spiked to ₹1,052 crore from ₹176 crore in FY23, owing to its US expansion and growth in both subscription and ad revenue. Its losses narrowed to ₹165 crore in fiscal year 2024 (FY24) from ₹ 208 crore in FY23, according to a company statement (Pocket FM has not disclosed its consolidated financials in its annual returns).
Kuku FM’s revenue from operations surged 2.1x year-on-year to ₹88 crore in FY24, up from ₹41 crore in FY23, while its losses came down from ₹117 crore to ₹96 crore over the same period.
The first phase has been good, but they’re still a long way from Spotify, which has 263 million paying subscribers. The audio companies are now harnessing artificial intelligence (AI) to turbocharge growth in the next phase. AI is helping them produce far more content in far less time for far less money. That has, however, come at the cost of humans, as many tasks have now been automated. In the last few months, both Kuku FM and Pocket FM have laid off hundreds of employees, with their content teams being the most impacted.
The AI factor
Imagine a world where your favourite bedtime stories, thrillers and romances are written, narrated and personalized by artificial intelligence (AI). That’s now a reality and it’s reshaping the world of audio entertainment.
AI is playing an oversized role in how audio stories are produced, marketed and personalized today. Audio startups are on a mission to cut content production time to the bone, offer massive libraries and expand their reach to the remotest corners with personalized material and ads. AI is doing the heavy lifting, and has consequently expanded revenue potential and opened the doors to limitless scalability. Thanks to the technology slashing production time costs, companies are able to produce a vast quantity and variety of content.
Earlier, writing was done manually, and then sent to production houses to be converted to audio through professional voice artists. Post this, sound engineers would manually edit recordings and add background scores and effects. Now, while scripting is done by writers with the use of AI, voice production is completely done through AI with human involvement limited to adjustments for clarity and emotions, and quality checks. Post-production processes, including background sound, mixing, pacing and modulating, are all done with AI.
On Kuku FM, stories such as ‘Secret Billionaire,’ ‘Women of Prison’ and ‘Bloodstone Fortune’ have all been scripted with the help of AI.
View Full Image
Pocket FM has launched a co-pilot on its app to assist writers when they are writing stories as well as create a user-generated content platform, where writers can go to the app and publish stories on their own.
“If you look at audio series, it usually takes one day to produce 30 minutes of content. After the co-pilot, the amount of content you can produce and give to the user has doubled. At the same time, this entire democratization of launching content through this self-serve platform has decreased the amount of time it takes to publish the content,” said Dixit.
View Full Image
“We now get about 10,000 concepts every month. It’s really hard to get all of these produced through either studio or radio jockeys (RJs), and that’s where I think the AI production shifted, with 80% now being produced by AI. Obviously, there’s always a proofreader,” Suyog Gothi, head of India business, Pocket FM, added.
This has kicked content production into high gear, slashing turnaround times. Prior to AI integration, Kuku FM used to release 50,000-60,000 minutes of stories a month, and that has now risen to more than 150,000 minutes. Pocket FM, which only produced 15-20 audio series each month, now does 150-200 series.
Of over 75,000 audio series on the Pocket FM platform, over 40,000 are AI audio series—scripted stories converted to audio through AI. The company says it is able to produce content at 10% of its earlier cost and in 5% of the time taken earlier. Meanwhile, average user time has increased by 15-20% and AI-driven recommendations have led to 40% more engagement with new content.
Kuku FM is on the same path. “If we needed 10 titles on the platform, it used to take roughly 60 days to get the content ready. Today we are able to do it in two weeks,” said Lal Chand Bisu, founder and chief executive officer (CEO) of Kuku FM. As of today, 100% of Kuku FM’s catalogue is ‘AI assisted but human crafted.’ Humans are involved in giving prompts and scripting stories with the use of AI, in quality checks, and in proofreading.
According to Bisu, while Kuku FM’s overall expenditure on content has increased, the cost per content piece has significantly come down. “Two years back, we used to spend between ₹50 lakh and ₹80 lakh, but these days we are spending more than ₹2 crore on the content side,” he explained. “But the cost has reduced per content piece. Earlier, it used to roughly cost ₹50,000 per hour to create content, which is now ₹20,000,” he added. At the same time, the average listening minutes have increased by 30-40%.
Kuku FM offers a monthly subscription of ₹399 for three months while Pocket FM follows a freemium model, allowing users to listen to the first 15 minutes of each audio series for free. The user can then purchase coins, priced between ₹50 and ₹100, to unlock and watch additional episodes.
Stumbling blocks
One of the biggest hurdles with AI-generated audio is getting the sound just right. Mispronunciation, awkward tones and out-of-context errors are all too common. For platforms betting big on AI, fixing these issues isn’t just important—it’s essential to keeping users engaged and coming back for more.
“Since we have been working with these models for so long, we understand some nuances of how we should prompt. That way, we are able to control the hallucination to some extent. But it’s still there,” said Vendra.
AI models can generate false information and present it confidently as accurate, a phenomenon known as hallucination.
Another challenge these companies face is that the use of AI is largely limited to English content and Hindi to some extent, as there isn’t enough data for regional languages yet. Kuku FM has about 45% of its content in Hindi, 50% in other languages and 5% in English. For Pocket FM, the majority of the content library in India is in Hindi, while globally the majority of its content is in English.
“Data scarcity with fewer high-quality data sets for Indian vernacular languages is the main issue, as most of the Web is in English. Moreover, language complexity with multiple accents and dialects make training harder. But it will improve as Indian data sets and locally trained models are built in the next 3-5 years,” said Jaspreet Bindra, CEO of AI&Beyond, an AI education company.
With hundreds of writers using AI to generate stories, content repetition is another major issue for audio OTT platforms. If narratives start sounding too similar, it could undermine the volume-driven strategy these companies are relying on to scale up.
The road ahead
Going forward, these companies are looking to capitalize on global demand and are looking to replicate the model to different international geographies. Mit Desai, practice member, media and entertainment, Praxis Global Alliance, noted that navigating data privacy laws, content licensing and cultural adaptation will be critical to ensuring a sustainable global expansion. Praxis Global Alliance is a research company.
A bigger question hovering over the sector is what advances in AI mean for human beings. While unskilled labourers get low wages for backbreaking work, creative types and engineers have been able to make a comfortable living thanks to their skill. But with AI steadily taking over even skilled tasks, dark clouds have started to appear on their horizon.
Analysts say the layoffs at audio companies suggest there is likely to be a shift in hiring across the industry. “Demand for traditional voice artists, narrators and translators is likely to decline as AI-generated voice synthesis and cloning become more sophisticated,” said Shivaraj Jayakumar, practice member, consumer and internet, Praxis Global Alliance. “Instead of hiring large creative teams, companies might retain a few premium artists for flagship projects while using AI for bulk content production.”
The founders of the audio companies vehemently claim otherwise. “The output depends on human beings. It is not something that is automatically generated. The focus, however, has shifted to more senior, more creative guys,” said Bisu. The illustrators, writers and artists are all still there, but the throughput or the amount of content they produce has increased, said Dixit.
Hayao Miyazaki, the legendary co-founder of Studio Ghibli, a Japanese animation company that has been in the news of late after people around the world started using the latest version of ChatGPT to render images in the Ghibli style, once dismissed AI-generated art as “an insult to life itself,” calling it “disgusting” in a now-famous critique.
Miyazaki may not approve, but AI made Studio Ghibli go viral—people who had never heard of it suddenly became aware of its existence. While the quality of the art is for the experts to judge, AI proved that with the right prompts, it can do anything.
Having harnessed its power and seeing how it can transform their fortunes, audio companies have embraced AI in a bear hug. And they will not let go, even if the path they are on leads to an ethical and creative crossroads.