Thesis

I started training models when I was 17 to do my homework. I’ve been fortunate to work with some brilliant people, yet I kept coming back to the same thesis: we have digital intelligence, but it’s unevenly distributed, so much of the world still operates like it’s 1990.

I can do almost anything from my phone, but people still drive to offices that haven’t changed in half a century. We have models that code better than most engineers, yet drug discovery still takes decades, total factor productivity has stalled since the 1970s, and our power grid runs on software older than me. Somewhere along the way, we ended up serving the software that was supposed to serve us. Everyone talks about AGI while sitting at the same desks, staring at the same screens, solving rehashed versions of the same problems. Where’s my flying car?

In 2019 I built GPT-2 sites that reached millions of people. I spent time finding statistical anomalies and ranked in the top 100 on two exchanges. I accidentally built a ChatGPT plugin with a million users, and worked on some more things. Each time I found the same pattern: the economic output per token generated increases an OOM with every new modality—first seq2seq, then GPT-2, ChatGPT, deep-research agents, and so on. Yet we keep using AGI to make better software instead of replacing software entirely.

AGI diffusion is the bottleneck to what we actually want: intelligence that works beyond the screen, diffused across the real economy where the problems that matter live. That may require significant capital and training, but it will certainly require far fewer people to deliver insanely outsized impact. This is my attempt at solving it. If we get it right, we can finally build the world that should have changed decades ago.