Data & Training.
Data is the missing crypto-AI moat — but most data layers are still solving the wrong problem.
Thesis
Why this sector, why now.
Everyone agrees AI is data-constrained. Decentralized data layers (Grass, Vana, Masa, others) bet that crypto rails unlock data supply that proprietary platforms cannot — residential bandwidth, user-permissioned datasets, verifiable provenance. The thesis is right. The execution gap is selling data into AI labs at scale and proving quality competitive with scraped Common Crawl + licensed sources. The leaders are running ahead on user count (airdrop-driven). The signal I want is enterprise data-buy contracts, not airdrop snapshots.
Signals I track
What would move my read.
- 01
Disclosed data-licensing deals with AI labs
- 02
Post-airdrop user retention
- 03
Data verifiability primitives shipping to production
Kill shot
What would kill the thesis
Synthetic data quality crosses the bar for foundation model training, and the marginal value of human-supplied web data collapses to near-zero.
Coverage
Projects on the radar.
Grass
ConvictionBandwidth/data layer cho AI training — large airdrop hype — Solana
Deep read →
Kaito
ConvictionAn AI-powered crypto 'InfoFi' platform and attention market that turned its KAITO token and Yap leaderboard into a category, off a modest ~$11M raise led by Dragonfly.
Deep read →
Story Protocol
ConvictionAn L1 blockchain for programmable IP — letting creators register, license and monetize content in the AI era — backed by $80M from a16z at a $2.25B valuation.
Deep read →
Vana
ConvictionUser-owned data L1 turning personal data into a tokenized asset class via DataDAOs; ~$25M from Paradigm, Coinbase Ventures and Polychain.
Deep read →
Cookie DAO
WatchingOn-chain data and attribution layer that became the de facto mindshare index for crypto AI agents via cookie.fun.
Deep read →
Fraction AI
WatchingCrypto-AI startup decentralizing data labeling / agent-output evaluation; $6M pre-seed co-led by The Spartan Group and Symbolic Capital.
Deep read →
Masa Network
WatchingAI training data marketplace — privacy-preserving — earlier stage
Deep read →
Nimble Network
WatchingDecentralized AI network giving agents real-time web-data access; affiliated entity raised $47M Series B led by Norwest (Web2/Web3 entity overlap unclear).
Deep read →
Pond
WatchingGraph-neural-network 'foundational model layer' for on-chain data and prediction; $7.5M seed led by Archetype with Coinbase Ventures and Delphi.
Deep read →
Rivalz
WatchingDecentralized AI data-coordination and oracle layer (rOracle/ADCS) feeding verifiable real-world data to on-chain AI agents.
Deep read →
Sahara AI
WatchingA decentralized AI blockchain for data ownership, attribution and model monetization, with $43M from Pantera, Polychain, Binance Labs and Sequoia.
Deep read →
Supra
WatchingVertically-integrated L1 bundling native oracles, VRF and cross-chain data into one 'IntraLayer' stack.
Deep read →
Entangle
SkepticalModular interoperability and data-oracle infrastructure repositioning toward verifiable data feeds for AI.
Deep read →
Folks Finance
SkepticalThe dominant lending and liquidity protocol on Algorand, now pushing cross-chain with an oracle/intent layer.
Deep read →
Going deeper
Bespoke Data & Training dive for your fund.
Draft thesis · editorial voice in progress, edits land continuously