LLM Tester
A feasibility lab for deciding whether a small local model can actually do the job.
How do you give a defensible answer to “can we use a small local model for this?” — when existing evaluation tools are built for engineers shipping production systems, not strategists making feasibility calls?
The hard part of AI feasibility work is not running the model. It is structuring the noticing — and that is a design problem, not an engineering one.
A local-first evaluation environment running on PHP, SQLite, and vanilla JS — talking to local models through Ollama. Projects, test cases, test inputs, and test rounds, with human-in-the-loop scoring on quality and accuracy.
In active use across four real test projects and five small models. Produces defensible feasibility recommendations at the pre-commitment stage — before any infrastructure is written.