Skip to content
← LandscapeSector

Data & Training.

Data is the missing crypto-AI moat — but most data layers are still solving the wrong problem.

Tracked 14Conviction 04Watching 08Skeptical 02

Thesis

Why this sector, why now.

Everyone agrees AI is data-constrained. Decentralized data layers (Grass, Vana, Masa, others) bet that crypto rails unlock data supply that proprietary platforms cannot — residential bandwidth, user-permissioned datasets, verifiable provenance. The thesis is right. The execution gap is selling data into AI labs at scale and proving quality competitive with scraped Common Crawl + licensed sources. The leaders are running ahead on user count (airdrop-driven). The signal I want is enterprise data-buy contracts, not airdrop snapshots.

Signals I track

What would move my read.

  1. 01

    Disclosed data-licensing deals with AI labs

  2. 02

    Post-airdrop user retention

  3. 03

    Data verifiability primitives shipping to production

Kill shot

What would kill the thesis

Synthetic data quality crosses the bar for foundation model training, and the marginal value of human-supplied web data collapses to near-zero.

Coverage

Projects on the radar.

Grass

Conviction

Bandwidth/data layer cho AI training — large airdrop hype — Solana

Deep read →

Kaito

Conviction

An AI-powered crypto 'InfoFi' platform and attention market that turned its KAITO token and Yap leaderboard into a category, off a modest ~$11M raise led by Dragonfly.

Deep read →

Story Protocol

Conviction

An L1 blockchain for programmable IP — letting creators register, license and monetize content in the AI era — backed by $80M from a16z at a $2.25B valuation.

Deep read →

Vana

Conviction

User-owned data L1 turning personal data into a tokenized asset class via DataDAOs; ~$25M from Paradigm, Coinbase Ventures and Polychain.

Deep read →

Cookie DAO

Watching

On-chain data and attribution layer that became the de facto mindshare index for crypto AI agents via cookie.fun.

Deep read →

Fraction AI

Watching

Crypto-AI startup decentralizing data labeling / agent-output evaluation; $6M pre-seed co-led by The Spartan Group and Symbolic Capital.

Deep read →

Masa Network

Watching

AI training data marketplace — privacy-preserving — earlier stage

Deep read →

Nimble Network

Watching

Decentralized AI network giving agents real-time web-data access; affiliated entity raised $47M Series B led by Norwest (Web2/Web3 entity overlap unclear).

Deep read →

Pond

Watching

Graph-neural-network 'foundational model layer' for on-chain data and prediction; $7.5M seed led by Archetype with Coinbase Ventures and Delphi.

Deep read →

Rivalz

Watching

Decentralized AI data-coordination and oracle layer (rOracle/ADCS) feeding verifiable real-world data to on-chain AI agents.

Deep read →

Sahara AI

Watching

A decentralized AI blockchain for data ownership, attribution and model monetization, with $43M from Pantera, Polychain, Binance Labs and Sequoia.

Deep read →

Supra

Watching

Vertically-integrated L1 bundling native oracles, VRF and cross-chain data into one 'IntraLayer' stack.

Deep read →

Entangle

Skeptical

Modular interoperability and data-oracle infrastructure repositioning toward verifiable data feeds for AI.

Deep read →

Folks Finance

Skeptical

The dominant lending and liquidity protocol on Algorand, now pushing cross-chain with an oracle/intent layer.

Deep read →

Going deeper

Bespoke Data & Training dive for your fund.

Brief me

Draft thesis · editorial voice in progress, edits land continuously