TLDC 📏

Infrastructure for evaluating and governing AI models.

Sponsored by

Spotlight

What if evaluating AI models became as standard and scalable as writing code tests?

Quick Pitch: The LLM Data Company (TLDC) is building the evaluation layer for AI. Its flagship product, doteval, turns LLM evaluations into programmable assets, making it easier to test, govern, and ship AI models with confidence.

The Problem

  • Widespread Adoption: 200,000+ enterprises are deploying GenAI, but few can measure performance reliably.

  • Manual, Brittle Workflows: Evaluations are done through spreadsheets, JSON, or internal tools that don’t scale.

  • Deployment Risk: Without proper evals, teams ship untested prompts, agents, or fine-tuned models into production.

Snapshot

  • Industry: AI Infrastructure and Evaluation Tools

  • Headquarters: San Francisco, California

  • Year Founded: 2025 (YC S25)

  • Traction: Used by Perplexity, Diode, and Cubic in legal, technical, and safety-critical workflows

Founder Profiles

  • Daanish Khazi, Co-Founder: Ex-Traba engineer with experience at Tesla and Meta

  • Gavin Bains, Co-Founder: Former Traba engineer with backgrounds at leading tech companies

  • Joseph Besgen, Co-Founder: Ex-Traba with experience at Honey and Roland Berger consulting

Funding

  • Current Round: Raising (Seed)

  • Lead Investor: Y Combinator

  • Other Backers: Multiple Tier-1 VCs & Angels

Revenue Engine

  • Target: Mid-market enterprises with eval needs

  • Product: Developer-first, subscription-based platform

  • Go-to-Market: Direct sales to applied AI teams

What Users Love

  • Easy-to-write eval instructions using YAML, with AI helping speed up the process

  • Versioning and reuse across models and prompts

  • Tight collaboration between legal, product, and engineering

  • Measurable improvement in speed and accuracy of AI rollouts

Playing Field

  • Pi Labs: Provides copilot for generic rubrics with proprietary graders

  • Haize Labs: Focuses on judge alignment but limited authoring infrastructure

  • Internal Lab Solutions: Custom-built but not scalable or reusable

TLDC's Edge: First to make evaluation design structured, collaborative, and repeatable across organizations and domains

Why It Matters

Evaluations are the new QA for AI. As GenAI adoption grows (71% using it in at least one function), teams need graded examples and guardrails to assess quality before deployment.

What Sets Them Apart

  • Evaluation-as-Infrastructure: Not just result viewers—full-stack creation and management

  • Cross-Domain Fit: Legal, tech, healthcare, reasoning tasks

  • Collaborative Authoring: Empowers non-engineers to build and own evals

  • Speed: Turns weeks of manual work into hours via reusable, collaborative workflows

Analysis

Bulls Case 📈 

  • Used in high-stakes production environments

  • Evaluation is becoming standard in GenAI deployment

  • Backed by top-tier investors

  • Large and growing enterprise market

Bears Case 📉 

  • Varying customer needs across domains

  • Competition from internal tooling at large labs

  • Need to balance flexibility with simplicity

  • Early-stage go-to-market execution

Verdict

As GenAI moves from pilot to production, TLDC addresses a core gap: evaluation infrastructure. Its structured, developer-first approach and early traction signal strong product-market fit. Like GitHub did for code QA, TLDC has the potential to define how teams test and trust AI. The challenge will be scaling while staying usable—and proving value faster than internal tools can.

In Partnership With

Stop Asking AI Questions, and Start Building Personal AI Software.

Transform your AI skills in just 5 days through this free email course. Whatever your starting point, by Day 5 you'll be building working software without writing code.

Each day delivers actionable techniques and real-world examples straight to your inbox. No technical skills required, just knowledge you can apply immediately.

The Startup Pulse

  • Grammarly — Acquired Superhuman at undisclosed amount to expand its AI productivity suite; last valued at $825M with ~$35M ARR.

  • Figma — Filed to go public under “FIG,” aiming to raise $1.5B after $749M in 2024 revenue and a return to profitability in Q1.

  • DataBahn AI — The Dallas-based startup Raised $17M Series A to develop AI-native data pipelines for enterprise security, led by Forgepoint Capital.

Written by Ashher

Update your email preferences or unsubscribe here

© 2025 AngelsRound

228 Park Ave S, #29976, New York, New York 10003, United States