Dirty Secrets of the AI Gold Rush

Shailesh Goel
May 29
6 min read

And the organizations that own that capability will be the quiet winners of the next decade.

Everyone is talking about AI models. The billion-dollar valuations, the ChatGPT moments, the foundation model wars. But beneath every intelligent AI system — every self-driving car, every medical diagnostic tool, every large language model that answers your questions — there is an unglamorous, labour-intensive, and massively underestimated industry keeping it all alive.

Data annotation. The process of human beings labelling, tagging, categorizing, and validating raw data so that AI systems can learn from it.

It is not glamorous. It does not make headlines. But it is the foundational infrastructure of the AI economy — and the market is about to explode in ways that most business leaders are not yet positioned to capture.

The Numbers Are Staggering

The global data annotation and labelling market was valued at approximately $3.6 billion in 2024. By 2035, it is projected to reach $17.9 billion — growing at a CAGR of nearly 16% annually. The broader data annotation tools ecosystem tells an even bigger story: from $14.5 billion in 2024 to over $96 billion by 2035.

These are not speculative numbers. They reflect real capital already flowing into human-generated training data, driven by a structural reality that no amount of automation can fully escape.

More than 80% of the engineering labour in machine learning projects worldwide is devoted to data preparation and labelling. AI annotation demand grew 154% year-over-year on Upwork in 2026 alone, making it the fastest-growing skill in data science and analytics. A 2022 Google Research paper estimated the global annotation workforce to already be in the millions — and projected it could grow to billions.

This is not a niche. This is the next industrial workforce.

Why Human Intelligence Is Irreplaceable Here

There is a common assumption that AI will eventually annotate its own training data — that automation will eliminate the need for human labellers. This is partially true and largely misleading.

Yes, AI-assisted annotation tools are improving. Pre-labelling, active learning, and model-in-the-loop approaches are increasing annotator productivity. But the more capable AI systems become, the more sophisticated the data quality requirements become. You cannot train a model on data generated by that same model without compounding its errors. The humans in the loop are not going away — they are moving up the value chain.

A self-driving vehicle needs millions of annotated video frames, with precise bounding boxes around every pedestrian, every traffic sign, every lane boundary — at 30 frames per second, across weather conditions, lighting variations, and geographies. A medical imaging AI needs expert-verified labels that require clinical knowledge, not just visual recognition. A large language model needs human preference data — people making nuanced judgements about which AI responses are more helpful, more accurate, and more aligned with human values.

None of this can be done at scale without domain knowledge, quality governance, and operational rigour. This is not simple clicking. This is knowledge work at industrial scale.

I Have Seen This From the Inside

I did not arrive at this view through research alone. I built one of these operations.

In 2010, at Bosch Global Software Technologies, I was part of a small team tasked with exploring whether data annotation for autonomous driving programs could be an internal capability rather than fully outsourced. We started with three people and a set of questions we did not yet know how to answer: How do you train annotators to label LIDAR point clouds accurately? How do you build quality frameworks when there is no established playbook? How do you scale an operation that requires both precision and throughput, simultaneously?

By the time I moved on from that role, we had scaled to over 800 people — a combination of internal teams and managed external partners — built into a USD 7 million business vertical serving Bosch's global autonomous driving programs. Along the way we built SOPs, quality governance systems, escalation frameworks, productivity measurement models, and training pipelines from scratch. We made expensive mistakes and learned from every one of them.

The organizations that win in the AI data economy are not necessarily those with the most annotators. They are those with the right operational infrastructure, domain knowledge, and quality governance systems to produce data that actually makes models better. That combination is extraordinarily hard to build. And it is where the competitive moat lies.

The Real Winners Will Not Be Who You Expect

The market conversation around AI tends to focus on model developers — OpenAI, Google DeepMind, Anthropic, Mistral. But the companies that will quietly accumulate the most durable advantage in the AI economy are those that control high-quality, domain-specific training data at scale.

Domain knowledge creates defensibility. General annotation — labelling cats in photos — is a commodity. But annotating surgical procedures for medical AI, or classifying rare fault patterns in industrial sensor data, or generating preference data for legal reasoning models, requires deep domain expertise that cannot be replicated quickly.

Platforms create compounding advantage. The organizations investing in proprietary annotation platforms — not just using off-the-shelf tools, but building integrated data management, quality tracking, and workforce coordination systems — will have structural cost and quality advantages that widen over time. Data management and workflow platforms are the fastest-growing segment of the annotation market, growing at a CAGR of 28.4%.

Geography creates opportunity. India, Southeast Asia, and Africa are emerging as primary annotation workforce hubs. India's national AI strategy explicitly identifies data labelling as a major employment and economic growth driver. Africa alone is projected to see 1.8 million data annotation jobs created by 2025.

Volume creates data network effects. The more high-quality annotated data a company produces in a specific domain, the better its quality benchmarks become, the better it can train new annotators, and the better the AI models it supports perform. This creates a reinforcing cycle that is very difficult for new entrants to break.

What Organizations Need to Do Now

For business leaders making decisions about AI strategy, here is what the data annotation opportunity means in practice:

If you are building AI products: Your model quality ceiling is determined by your data quality floor. Treating annotation as a procurement exercise — find the cheapest vendor, set a price per label, move on — is the fastest route to a model that does not perform. The investment is in governance, domain expertise, and feedback loops, not just headcount.

If you are a GCC or shared services leader: Data annotation is one of the most significant new service lines you are not yet offering. The infrastructure requirements — workforce management, quality systems, platform tools, domain training — align precisely with what GCCs already do well. The organizations that establish this capability in the next two to three years will be well-positioned as AI data demand accelerates.

If you are an operations leader: This is your domain. Data annotation at scale is an operations problem — workforce planning, quality governance, throughput optimization, process standardization, partner management. The technology is a tool. The operational rigour is the differentiator.

If you are thinking about where the jobs are going: AI is creating a new category of knowledge-intensive, scalable, globally distributed work — and the demand for that work is growing faster than the supply of organized, quality-governed capacity to do it.

The Quiet Infrastructure of Intelligence

The industrial revolution created a new class of worker — the factory operative — who powered the machinery of the age. The digital revolution created the knowledge worker. The AI revolution is creating something new again: the intelligence worker, the human who teaches machines to understand the world.

This is not a transitional workforce to be replaced as soon as automation catches up. It is a permanent and growing layer of the AI stack — because the models keep getting more ambitious, the domains keep getting more specialized, and the quality requirements keep getting higher.

The organizations that recognize this early — that invest in the operational infrastructure, the domain expertise, and the platform capabilities to do this work well — will not just be participants in the AI economy. They will be foundational to it.

The gold rush is in the models. The gold is in the data.

Shailesh Goel is a Business Operations and Program Management leader with three decades of experience across manufacturing, digital services, and AI data operations. He built and scaled Bosch's data annotation operations for autonomous driving from a 3-person pilot to an 800+ member ecosystem. He writes at shaileshgoel.in

Shailesh Goel

The Complexity 2 Clarity Playbook

Real frameworks. Hard-won lessons. Clarity distilled from three decades across industries, functions, and continents.