ElevenLabs CEO Says AI Audio Models Will Be ‘Commoditized’ Over Time

At TechCrunch Disrupt 2025, ElevenLabs CEO Mati Staniszewski predicts that AI voice and audio models will soon become widely accessible commodities.

The voice of artificial intelligence is changing. At the recent TechCrunch Disrupt 2025 conference, ElevenLabs co-founder and CEO Mati Staniszewski made a bold statement about AI in the audio space: he believes that AI audio models will soon become commodities.

In other words, what is cutting-edge today will one day be common and widespread.

What does “commoditized” mean in this case

When something is commoditized, it loses its uniqueness and becomes widely available at lower cost. Staniszewski explained that while ElevenLabs is still pushing hard to build top-tier audio models now, he expects that within a few years the gap between different companies’ models will shrink. “Over the long term, it will commoditize, over the next couple of years,” he said.

Why this matters for AI-audio

At the start, only a few companies had the know-how to build high-quality voice models. These models can generate lifelike voices, audio book narration, dubbing, or interactive agents. But if everyone eventually has access to good voices, then the real advantage shifts from “who built the model” to “how you use it”.

Staniszewski noted the gap between models is already closing. He said differences in voices or languages will still exist, but “on its own, the differences will be smaller.” For creators, developers, and businesses this means the focus may move toward applications, user experience, and integration rather than raw model performance.

What ElevenLabs is doing now

Despite predicting commoditization later, ElevenLabs is still working hard at the model front. Staniszewski pointed out that if the audio sounds bad or robotic, users will notice and go elsewhere. “The only way to solve it is… building the models yourself,” he said, “and then, over the long term, there will be other players that will solve that, too.”

ElevenLabs plans to tackle both model development and real-world applications. The company said it will launch partnerships and work with open-source tech to combine its audio strength with other AI systems. Staniszewski also expects a shift toward multi-modal models ones that handle audio and video or audio and text at the same time.

What this means for you

If you are a creator, you might wonder what this shift means for your work. Here are some key take-away:

  • Better access: When AI audio models become more common, you will have more options for quality voice generation at lower cost.

  • More focus on use case: Instead of just picking a voice model, you will pay attention to how you use the voice in your app, game, or media.

  • Need for integration: With models becoming standard, being able to integrate voice tech smoothly, add customization, and make it sound human will matter more.

  • Competition changes: Companies that once had an edge purely due to model quality may lose that edge. The battleground will shift to interface, experience, and usefulness.

Why this prediction runs deep

Staniszewski’s forecast might sound surprising coming from a company that builds these models. But it makes sense when you look at the landscape. Just like cameras, microphones, and screens evolved from niche gear to widespread tools, audio models may follow the same path. Once the architecture is solved, many companies can copy, adapt and launch similar models.

And when that happens, what separates products will not be the model itself, but the product, the experience you deliver. Staniszewski compared this future to Apple’s walk: “The same way software and hardware was the magic for Apple, we think the product and AI will be the magic for the generation of the best use cases,” he said.

What the next steps look like

In the short term (next year or two), Staniszewski expects more innovation. He sees models becoming more “fused” or multi-modal, meaning the combination of audio and video or audio plus large language models. He pointed to examples where an AI can generate a video and voice together.

Meanwhile, ElevenLabs will keep building both models and tools, and form partnerships to scale its reach. At the same time, other players entering the space means more models will become available, which feeds the idea of commoditization.

The Bottom Line

The statement that AI audio models will be commoditized may sound like it undermines what ElevenLabs does, but actually it speaks to a larger vision for the audio-AI industry. What matters is less which model you use and more how you use it. The advantage will move toward user experience, integration, narrative, and applications that people care about.

If you are working with voice AI or planning on building with it, this is a helpful signal: start thinking not only about model quality, but about how you bring voice into your product in a way that feels human, useful and distinctive. The era of premium audio models as a rare edge might be ending. The era of smart voice experiences could just be beginning.

Also Read:Surfshark’s New AI Feature Takes a Bite Out of Scam Emails

 

Author

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top