I write about LLM systems after they ship: where they waste money, where they break, where they surprise.
Twelve years building software. Four leading engineering teams. Lately, my investigations target production RAG pipelines, ingestion cost efficiency, and the quiet reliability gaps that show up once a system meets real traffic. I ship code. I read source. I publish the findings.
Reducing embedding waste and improving RAG reliability
Current focus: tracing cost and correctness failures in production ingestion pipelines. Most recent finding involves two interacting bugs in LlamaIndex that together explain a non-trivial fraction of embedding spend overhead for teams running scheduled re-indexing. Writeup and reproducer live below. Available for selective engagements on cost audits, observability setup, and evaluation harness design.
Things built, shipped, or debugged in the open
ChatGPT LightSession
A Chrome extension that keeps long ChatGPT threads responsive by trimming the DOM to only the last N messages along the active path. Client-side, zero server, configurable via popup.
Chrome Web Store →llamaindex-embedding-churn
Five progressive reproducers for a two-bug interaction in LlamaIndex ingestion pipelines that silently wastes embedding spend. Hash comparison, counting embedder, real OpenAI API, reader-format survey, live S3 end-to-end.
GitHub →