Redpine Raises $8M to Become the Spotify of AI Training Data

About 1% of the world's data is openly available for training AI models. The other 99%? Locked behind paywalls, buried in enterprise databases, or sitting in content archives that nobody has figured out how to monetize for the machine learning era. Stockholm's Redpine thinks that's both a massive problem and an even bigger opportunity.

The startup has closed an $8 million (EUR 6.8 million) seed round led by NordicNinja, with Luminar Ventures and node.vc joining the cap table. But the angel investor list tells you more about what Redpine is building than the institutional money does. Peter Sarlin, co-founder of SiloAI (acquired by AMD). Patrik Tran from Validio. Anna Nordell Westling from Sana. And leaders from OpenAI, Perplexity, and Spotify. When AI lab insiders put personal money into a data infrastructure startup, they're telling you something about where the industry's bottleneck actually is.

The Piracy-to-Streaming Playbook, Applied to Data

Co-founder David Osterdahl was part of Spotify's early team. He watched the music industry go from suing teenagers for downloading MP3s to embracing a streaming model that eventually generated more revenue than CDs ever did. The pattern recognition here is deliberate.

Right now, AI companies mostly train on scraped internet data. Same sources, same quality issues, same legal exposure. There's no moat in that approach, and the lawsuits are already piling up. Content creators, publishers, and data owners are suing AI companies for using their work without permission or payment. Sound familiar? It should. It's the exact same arc the music industry went through, just compressed into fewer years and involving neural networks instead of Napster.

Redpine's pitch: partner with content owners to license their premium, non-public data and make it available through an API that AI companies can plug into directly. The content owner gets paid. The AI company gets legal certainty and access to higher-quality data than anything available through web scraping. Everyone wins, theoretically.

The VC-Turned-Founder Who Saw Every AI Startup Using the Same Data

The other co-founder, Anders Hammarback, spent years as a VC evaluating AI startups. What kept bothering him was how many of them were building differentiated models on top of completely undifferentiated data. "Every startup we looked at was training on the same scraped internet datasets," he's said. "No moat. Insufficient accuracy. And no compensation to the people who created the underlying content."

That observation landed differently once Hammarback started thinking about it from the supply side rather than the investment side. If every AI company needs better data and content owners need a new revenue stream, there's an infrastructure business sitting in the middle waiting to be built.

Redpine was founded in 2024. It's already working with what the company describes as "leading international AI labs" and a US-based biotechnology research firm called AsedaSciences. The product: a platform with proprietary retrieval and reranking technology that lets AI agents access licensed, multimodal, non-public data at scale.

Dr. Vesterbacka's Retrieval Engine Is the Technical Bet

The technical core of Redpine is being built by founding data scientist Dr. Leonora Vesterbacka, who's leading the development of proprietary retrieval and reranking technology. If you're not deep in the AI infrastructure world, here's what that means: when an AI agent needs to find relevant information from a massive dataset, the quality of the retrieval system determines whether it gets useful answers or garbage. Most existing retrieval systems were built for search engines, not for AI agents that need to reason over complex, licensed datasets. Vesterbacka's work at Redpine aims to solve that specific problem.

The team's pedigree helps explain why they've attracted angels from four Nordic tech unicorns. Spotify, Sana, Zettle, and Lunar are all represented in the founding team's backgrounds. Add McKinsey, CERN, and H&M to the mix, and you've got a team that spans consumer tech, enterprise software, fundamental research, and retail. It's an unusual combination, but data licensing is an unusual problem that sits at the intersection of technology, legal frameworks, and business model innovation.

The Agent Economy Needs Plumbing, and Redpine Wants to Be It

Hammarback describes Redpine as "building the infrastructure to enable the token-based agent economy to grow sustainably for all stakeholders." That's a mouthful, so let's unpack it.

The AI industry is moving from large language models that generate text to AI agents that take actions. Those agents need access to real-time, accurate, domain-specific data. Not the general-purpose internet data that LLMs trained on, but the specific, often proprietary datasets that exist behind corporate firewalls and publisher paywalls. As agentic AI becomes the dominant paradigm, the demand for this kind of licensed data infrastructure will grow exponentially.

Redpine is positioning itself as the default data layer for that world. The seed funding will go toward scaling the network of proprietary data partnerships across key industries, hiring across engineering, data science, and go-to-market, and deepening relationships with AI labs. NordicNinja General Partner Marek Kiisa joins the board, bringing the fund's pan-Nordic network and deep tech expertise.

The Awkward Copyright Question That Won't Go Away

There's a tension at the heart of Redpine's model that's worth naming directly. The company is building a licensed data business in an industry that has, to date, treated data licensing as mostly optional. Major AI labs have scraped the open web with abandon, settled lawsuits when they had to, and generally treated copyright as a speed bump rather than a stop sign.

For Redpine's model to work at scale, AI companies need to actually want licensed data badly enough to pay for it. That could happen because of legal pressure (more lawsuits, stronger enforcement). It could happen because of quality pressure (scraped data hits a ceiling and licensed data produces meaningfully better results). Or it could happen because of regulatory pressure (the EU AI Act's transparency requirements make provenance tracking essential).

Probably it'll be all three at once. The early traction with AI labs suggests at least some of these forces are already at work. But the honest truth is that Redpine is building for a market that's still forming. The bet is that the market will form in their direction. Given the trajectory of AI regulation in Europe and the growing awareness of data quality as a competitive advantage, it's a bet with solid reasoning behind it. Not guaranteed. But well-reasoned.

The comparison to Spotify isn't perfect. Music licensing had decades of legal precedent and existing collecting societies. Data licensing for AI is being invented in real time, with new court rulings and regulations appearing monthly. But the core insight is the same: when you can turn piracy into a premium service that everyone prefers, the economics work for all sides.

Whether Redpine can build fast enough to capture that position before the market consolidates is the $8 million question. The angel list suggests the AI industry's insiders think they can.

Visit Redpine