TLDC 📏

Infrastructure for evaluating and governing AI models.

July 09, 2025

Sponsored by

Spotlight

What if evaluating AI models became as standard and scalable as writing code tests?

Quick Pitch: The LLM Data Company (TLDC) is building the evaluation layer for AI. Its flagship product, doteval, turns LLM evaluations into programmable assets, making it easier to test, govern, and ship AI models with confidence.

The Problem

Widespread Adoption: 200,000+ enterprises are deploying GenAI, but few can measure performance reliably.
Manual, Brittle Workflows: Evaluations are done through spreadsheets, JSON, or internal tools that don’t scale.
Deployment Risk: Without proper evals, teams ship untested prompts, agents, or fine-tuned models into production.

Snapshot

Industry: AI Infrastructure and Evaluation Tools
Headquarters: San Francisco, California
Year Founded: 2025 (YC S25)
Traction: Used by Perplexity, Diode, and Cubic in legal, technical, and safety-critical workflows

Founder Profiles

Daanish Khazi, Co-Founder: Ex-Traba engineer with experience at Tesla and Meta
Gavin Bains, Co-Founder: Former Traba engineer with backgrounds at leading tech companies
Joseph Besgen, Co-Founder: Ex-Traba with experience at Honey and Roland Berger consulting

Funding

Current Round: Raising (Seed)
Lead Investor: Y Combinator
Other Backers: Multiple Tier-1 VCs & Angels

Revenue Engine

Target: Mid-market enterprises with eval needs
Product: Developer-first, subscription-based platform
Go-to-Market: Direct sales to applied AI teams

What Users Love

Easy-to-write eval instructions using YAML, with AI helping speed up the process
Versioning and reuse across models and prompts
Tight collaboration between legal, product, and engineering
Measurable improvement in speed and accuracy of AI rollouts

Playing Field

Pi Labs: Provides copilot for generic rubrics with proprietary graders
Haize Labs: Focuses on judge alignment but limited authoring infrastructure
Internal Lab Solutions: Custom-built but not scalable or reusable

TLDC's Edge: First to make evaluation design structured, collaborative, and repeatable across organizations and domains

Why It Matters

Evaluations are the new QA for AI. As GenAI adoption grows (71% using it in at least one function), teams need graded examples and guardrails to assess quality before deployment.

What Sets Them Apart

Evaluation-as-Infrastructure: Not just result viewers—full-stack creation and management
Cross-Domain Fit: Legal, tech, healthcare, reasoning tasks
Collaborative Authoring: Empowers non-engineers to build and own evals
Speed: Turns weeks of manual work into hours via reusable, collaborative workflows

Analysis

Bulls Case 📈

Used in high-stakes production environments
Evaluation is becoming standard in GenAI deployment
Backed by top-tier investors
Large and growing enterprise market

Bears Case 📉

Varying customer needs across domains
Competition from internal tooling at large labs
Need to balance flexibility with simplicity
Early-stage go-to-market execution

Verdict

As GenAI moves from pilot to production, TLDC addresses a core gap: evaluation infrastructure. Its structured, developer-first approach and early traction signal strong product-market fit. Like GitHub did for code QA, TLDC has the potential to define how teams test and trust AI. The challenge will be scaling while staying usable—and proving value faster than internal tools can.

In Partnership With

Stop Asking AI Questions, and Start Building Personal AI Software.

Transform your AI skills in just 5 days through this free email course. Whatever your starting point, by Day 5 you'll be building working software without writing code.

Each day delivers actionable techniques and real-world examples straight to your inbox. No technical skills required, just knowledge you can apply immediately.

The Startup Pulse

Grammarly — Acquired Superhuman at undisclosed amount to expand its AI productivity suite; last valued at $825M with ~$35M ARR.
Figma — Filed to go public under “FIG,” aiming to raise $1.5B after $749M in 2024 revenue and a return to profitability in Q1.
DataBahn AI — The Dallas-based startup Raised $17M Series A to develop AI-native data pipelines for enterprise security, led by Forgepoint Capital.

Written by Ashher

Update your email preferences or unsubscribe here

228 Park Ave S, #29976, New York, New York 10003, United States