AI Lab

Experimenting with AI before recommending it

I don't write about AI from a distance. I run experiments, document findings, and publish what I learn, including the parts that didn't work. The goal is to test AI tools rigorously on real design tasks before recommending them to anyone.

AI Product ExplorationLocal-First Evaluation · Human-in-the-Loop

LLM Tester

A feasibility lab for deciding whether a small local model can actually do the job.

The question

How do you give a defensible answer to “can we use a small local model for this?” — when existing evaluation tools are built for engineers shipping production systems, not strategists making feasibility calls?

Key insight

The hard part of AI feasibility work is not running the model. It is structuring the noticing — and that is a design problem, not an engineering one.

What was built

A local-first evaluation environment running on PHP, SQLite, and vanilla JS — talking to local models through Ollama. Projects, test cases, test inputs, and test rounds, with human-in-the-loop scoring on quality and accuracy.

Impact

In active use across four real test projects and five small models. Produces defensible feasibility recommendations at the pre-commitment stage — before any infrastructure is written.

What this project demonstrates
Local-first AI evaluation as strategyHuman-in-the-loop scoring for nuanced tasksStructured experimentation for defensible recommendationsPre-commitment stage as its own design problemCounterweight to vibe-based AI evaluation
Explore the full case study →
AI Product ExplorationCompliance Systems · Human-in-the-Loop

AI GovLab

An experiment in using AI to assist with synthesizing complex policy research.

The question

How can AI integrate directly into compliance workflows without replacing human decision authority?

Key insight

Early-stage AI systems require clear decision boundaries - assistance accelerates work, but authority must remain human.

What was built

Working product with structured entity logic, AI document processing, and compliance workflow modeling. AI integrated through automated summarization, context-aware overviews, and structured tagging.

Impact

Shifted compliance from document storage to structured, AI-supported intelligence. Marks my transition from interface-focused UX toward AI workflow and product architecture thinking.

What this project demonstrates
AI-native compliance systemsHuman-in-the-loop validation designStructured maturity modelingCompliance-as-Code thinkingSystem-level product logic
Explore the full case study →
What I'm exploring next

Open questions I'm starting to investigate. These move from this list to the featured slot above as findings firm up.

Planned

AI-Augmented Design Critique

Can AI provide useful design feedback? Testing where AI critique adds value versus where it produces confident-sounding nonsense.

Planned

Ethical Guardrails for AI Content

A practical framework for evaluating when AI-generated content is appropriate for production use and when it introduces unacceptable risk.