About TunedAI Labs

TunedAI Labs builds fine-tuned AI models for regulated, high-stakes work — healthcare, finance, legal, security — where the output needs to hold up to review, not just sound good.

I'm Mark Gentry, the founder. The company is the convergence point of four threads I've been pulling on for a long time.

Philosophy

Years before I went to grad school, I was sitting in on Hubert Dreyfus's AI courses at UC Berkeley — just for the fun of it. Dreyfus spent four decades making the most serious academic argument against purely computational AI: that reasoning, for minds, is not symbol manipulation. It's embedded in skillful practice, embodied context, and a background of shared meaning that can't be fully formalized. Around that time I also sat in on a Heidegger seminar — the philosophical foundation underneath everything Dreyfus was saying — and a Nietzsche seminar, because I was curious. Years later I went back for a master's in philosophy of consciousness.

Industry

I then spent twenty years recruiting in tech — YouTube, Google, BitTorrent, Tango, and others. Recruiting is a privileged seat. You see what companies actually build, not what they claim to build. You meet the people doing the work before the press releases. I had a front-row view of the systems Dreyfus had been critiquing as they grew into the dominant paradigm.

Hands-on

Among the companies I recruited for was Udacity, including their Machine Learning and Self-Driving Car Nanodegrees. I ended up taking the coursework myself, writing the projects by hand — partly out of curiosity, partly to see if I could.

The Work

Dreyfus was right about something real — fluent output is not reasoning.

TunedAI Labs is where those four threads meet. Modern LLMs are still mostly next-token predictors wrapped in instruction-following scaffolding. The gap shows up most painfully in regulated, high-stakes domains where the wrong answer has consequences. What I've built is an approach to engineering structure into fine-tuned LLMs so that their output is traceable, auditable, and closer to actual reasoning, not just plausibly worded.

A current result: a custom-tuned Qwen 2.5-7B scored 96.96% on CLadder, a public academic benchmark for causal reasoning. The base model scores 63%. Matthew Wong — applied AI engineer, 25 years at the intersection of security, automation, and engineering: White House Situation Room, JP Morgan cyber intelligence, Phantom (acquired by Splunk), Splunk enterprise SOAR — ran the tests independently and confirmed: not benchmaxxed, not overfitted. His words: "Kudos for surviving scrutiny."

None of this claims to have solved Dreyfus's critique. Nobody has. But taking the critique seriously and engineering against it is the work worth doing — and it's what TunedAI Labs is set up to do.

Inheritance

My father spent his career as an engineer at IBM in the mainframe era. His monitor program for the 1401 — the Gentry Monitor — was picked up by IBM and distributed as the FASTER Type II Program. He was awarded IBM's Outstanding Contributor recognition; the cufflinks are still in our family. His oral history is preserved at the Computer History Museum. A historian once told our family he was the unsung hero of CICS, the transaction system the world's banks still run on. I can't prove that part.

But I know what he spent his career doing: making machines reason reliably enough to be trusted with work that mattered. That's the same problem I'm working on, one layer up the stack.

If that sounds like something your team needs, I'd love to talk.