Writing on software design, company building, and the AI / smart city industry.

All of my long-form thoughts on programming, leadership, product design, and more, collected in chronological order.

Aerbits: A New Era of Intelligence

How modern transformer models like RT-DETR and YOLOv11, combined with SAM3 segmentation and vision-language models, are revolutionizing waste management from the sky. A complete backend rebuild delivers new capabilities for field crews.

The Pattern You Can't Unsee

On bridging the gap between experience levels, pattern recognition in engineering leadership, and how teams can learn to see what senior engineers see before it's too late.

Sparrow-1: Human-Level Conversational Timing in Real-Time Voice

Sparrow-1 is an audio-native model that enables AI to respond with natural conversational timing. Instead of using silence-based endpoint detection, it continuously models conversational floor ownership at the frame level, achieving 100% precision with median latency of 55ms—responding at the moment a human listener would.

The Mirror We Built

AI is not an alien visitation. It's a mirror. Drawing on Nietzsche's amor fati, this essay explores what it means to actively affirm the transformation AI represents—not as passive acceptance, but as creative participation in shaping what comes next.

Sparrow-0: Advancing Conversational Responsiveness in Video Agents

Sparrow-0 is a transformer-based turn-detection model that improves how video agents recognize when speakers finish talking. Using semantic analysis instead of silence-based timing, it achieves response times as fast as 600ms while avoiding interruptions, resulting in 50% higher engagement and 80% better retention.

How Creating Sparrow Made Me a Better Conversationalist

Developing Sparrow changed my understanding of human communication. By shifting focus from what is said to when someone will speak next, I learned that silence carries meaning, micro-moments reveal intent, and timing is crucial for trust. The dual-lock principle—detecting speech ending then confirming an opening—reduced interruptions by 50% and doubled engagement.

What Makes Conversational AI Human-Like?

Humans wait about 200ms between conversational turns. Achieving this in video AI requires four pillars: streaming small chunks instead of batches, parallelizing 10 processes, careful resource management, and smart memory handling—moving pointers instead of data saved 1.5 seconds, and GPU memory storage delivered 4x speedup.

Understanding Ourselves in the Context of AI

How will we leverage great wisdom to create great AI? As a dedicated AI enthusiast, I often find myself reflecting on the profound nature of this technology. I asked ChatGPT4 to help me think through this with help from some of the great thinkers in history, finally landing on quotes from Camus, Acton, and Einstein.

Aerbits Architecture

A deep dive into the Aerbits platform architecture.

Thread of Software in my Life

How I got into software and how I've used it to build a career and a life.

Generate SF @ APEC

We were invited to exhibit Aerbits.ai at GenerateSF, a conference hosted for the Asia Pacific Economic Cooperation (APEC) in 2023, in San Francisco...

Accelerate SF Conference

I spent two days of hacking to accelerate the recovery of San Francisco. Government officials including the Mayor, the City Attorney, and three Supervisors all gave their wish list. Among the top city issues was the issue of street cleanliness.

Aerbits OKRs for Q4 2023

At the encouragement of my wife and friends, I spent two days on a "company retreat" working out Aerbit's Q4 Objectives and Key Results.