Thirty Hours on a Glass Egg, and I’d Do It Again

A friend who works at an early-stage startup told me recently that the number of people at his company truly using AI to transform how they build software is relatively small. I found this surprising. I had assumed the lean, ambitious places were pulling away, that the picture I was getting from larger companies or browsing my X feed was a story about organizational inertia: real engineers using AI in real ways (autocomplete, generating tests for code that already exists, summarizing a diff or a stack trace) but not yet letting it shape the substance of what they build. But this was a young startup with strong people, real product-market fit, and no dead weight.

I have been wrestling with my hobby renderer for the last two weeks, trying to get a satisfying implementation of Specular Manifold Sampling working. SMS, briefly, is a technique for rendering light paths that bounce through a chain of specular surfaces before reaching a diffuse one (think caustics through a glass egg, the kind of bright pattern your eye picks up on a dining table at lunch). These paths sit on a thin manifold in path space; a vanilla path tracer will essentially never stumble onto them (in my scene, the only light source is embedded inside the displaced surface of the egg, which makes it inaccessible to a path tracer altogether). SMS finds them by setting up a Newton-style root finder over surface positions, which in practice means you are debugging a tower of half-broken pieces: the Jacobian construction, the surface derivatives, the solver itself, the manifold walk’s step-size logic, and the unbiasedness corrections that sit on top.

It has been slower than it should be. I am somewhere past thirty hours of evenings and weekends and the renderer still occasionally produces images that look less like caustics and more like the static at the end of a VHS tape. None of this matters as there are no stakes. The renderer is a personal project, the kind of thing that exists because somewhere around 2010 I had a graduate-level grasp of light transport and I like to pretend I can get it back.

What is interesting (and what I want to talk about) is what the project has done for my fluency with AI coding tools. I have had to invent and discard several workflows for getting useful work out of the agent on this problem. Early on I would describe the whole feature and ask for an implementation; the result was confidently wrong code that compiled but produced garbage. I tried more aggressive verification next, asking the agent to derive the math before writing, which caught some errors and missed the structural ones. Eventually I had to do the decomposition myself: isolate the Jacobian, write finite-difference tests for the surface derivatives, hand the agent the pieces with explicit acceptance criteria. There wasn’t a single bug but a stack of them: typos and arithmetic errors in the derivatives, chain-rule connection issues, geometric precision problems, and some fundamental limitations of the algorithm itself. By the time I was done, I knew which questions to ask and how to ask them, and I now apply that template to other unfamiliar work.

If I were still a working veteran of physically based rendering, none of this would have been necessary. The thirty hours I have spent would have been a few focused days of typing code I already understood. The agent would have been an actively negative contribution; reading and correcting its output would have cost more than just writing it. By any reasonable metric of “did this project ship faster,” the AI tools made it worse.

That is the wrong metric. What I got from those thirty hours wasn’t an SMS implementation (I could have downloaded one). What I got was a workflow for using these tools on hard, unfamiliar code, a sense for which tasks to delegate and which to break down myself, and a calibrated intuition for when the agent is just being lazy. These are foundational skills, and like most foundational skills, the only way to build them is to spend time on a problem where the foundation is the point.

This, I suspect, is what is happening at my friend’s startup and at most large companies. The senior engineers (the ones whose buy-in matters most for a real shift) are exactly the people for whom the local economics of AI tools are worst. They have decades of muscle memory for their workflows. They are under deadline pressure. When the agent produces something unsatisfying, it is rational for them to label it AI slop and return to the way they have always worked. Each individual instance of this decision is correct. The cumulative effect is that the people with the deepest context, the ones who would benefit most from an extension of their abilities, are the last to develop the new skill.

The optimistic reading is that the junior engineers, who have no tried-and-true workflow to fall back on, are pulling ahead, and the team will rebalance over time. I am not sure this reading survives contact with what the juniors are actually building. They are becoming fluent with the tools faster, yes, but the expertise the senior engineers have (the kind that lets you smell a code path is wrong before you can articulate why, the kind I was relying on every time I had to decide whether to trust the agent on SMS) is not something the tools deliver as a side effect of being used. It comes from years of being wrong in instructive ways, and the tools are quite good at preventing exactly that. A junior engineer who never sees the failure modes the senior engineer learned from, because the agent papered over them, is an engineer with a faster cycle time and a thinner foundation. A team needs both populations excelling, and the current dynamic delivers neither.

Without a forcing function the gap widens, and the forcing function cannot be “use more AI.” Every senior engineer I know who has actually become fluent with these tools has a project they tinker with at home: a smart home control system they’re writing, a synth they’re building, a renderer they swore they’d retire in 2010 and apparently haven’t. The pattern is suspicious enough that I have started to think the side project isn’t a perk of being AI-fluent; it is a prerequisite. You need a place where the cost of inefficiency is zero, where you can afford to learn slowly because nobody is waiting on the output.

Most engineers don’t have this. They have families, exhaustion, hobbies that aren’t code: in short, lives. Telling them to develop a side project isn’t a strategy; it is just a way of ensuring that the people who already had personal projects keep their lead. If a company actually wants its senior people to internalize these tools, it has to provide what the side project provides, and that is harder than it sounds.

The first instinct will be to schedule the time. A recurring afternoon, a quarterly week, a percentage carved off the calendar. We have run this experiment before and we know what happens. The time gets absorbed back into the day job when a release slips. The projects that survive review-cycle scrutiny are the ones quietly producing impact, which means the deadline pressure has been displaced rather than removed. The whole point of the side project, the reason mine has worked as a learning vehicle, is that nobody is keeping score. The output is allowed to be embarrassing. If a company schedules an “AI fluency block” that culminates in a demo to leadership, it has built a small hackathon, not the conditions for learning.

What is actually needed, and what nobody wants to defend in a quarterly business review, is something closer to unproductive play. The output shouldn’t matter, including to the engineer. The work should be allowed to fail in ways that don’t interest anyone else. The form it takes has to be the engineer’s to choose, because the choosing is part of the mechanism. The moment any downstream metric attaches itself (a promo case, a visibility win, a tool that gets adopted, a brag in a staff meeting) the local economics that drove the engineer away from these tools in the first place reassert themselves. Play that ladders to outcomes is not play; it is unconfessed work, which is what an engineering organization already excels at producing.

This runs against everything we know about how to run an engineering organization. We have spent a decade getting good at concentrating effort on outcomes, and “give your most expensive engineers time to do things that won’t matter” is a sentence I would have laughed at in 2018. But the alternative is watching your most experienced engineers, the ones whose judgment is doing the most load-bearing work in your company, fall further behind a tool that is now writing a non-trivial fraction of all new code. I would rather lose four hours a week to genuine waste than reframe them as productive in disguise; the disguise is exactly what stops the mechanism from working.

The Veach egg, incidentally, is now kind of rendering correctly (within the limits of what is possible for a heavily displaced surface with my current Newton solver). I am told by my agent that the implementation is “production-ready,” which I have learned means roughly the opposite.

Fifteen Years of Rendering, Catching up in Weeks

I stopped doing serious rendering work around 2010. Path tracing, BSDFs, Monte Carlo integration were all second nature. From the start of my undergrad in 1997 right up to building the Adobe Ray Tracer in Photoshop in 2010 I had been steeped in this world. Then life moved on: building consumer products at scale, building teams, building platforms. My renderer sat dormant.

Recently I picked it back up both to have something concrete to work on with agents but also to scratch that graphics itch that never went away. Normally to catch up I’d plow through the fifteen years of SIGGRAPH papers that been stacked up, but I did something different.

I started implementing instead.

Read less, build more

There’s a difference between understanding a technique and understanding why it works the way it does. Papers give you the former. Code gives you the latter. I am also a ‘doing’ learner, so for me working on something is how I learn. I’d read many papers on MIS but it wasn’t until actually going and implementing it, working through all the bugs and watching the variance drop that it really locks on.

The problem used to be velocity. Getting from “I understand this algorithm” to “a working implementation” took days to weeks. Boilerplate, scaffolding, debugging the trivial stuff. The interesting parts were buried under setup cost.

Coding agents collapsed that ratio.

The agent workflow, honestly

My workflow isn’t careful line-by-line review. That’s not the point; the speed is the feature. When you can go from a paper to a running implementation of GGX microfacet with VNDF sampling in a fraction of the time it used to take, you get to spend your cognitive budget on the parts that actually require thinking.

What I’ve found is that agents aren’t uniformly fast. Some things, like well-specified algorithms with solid reference implementations(Dupuy-Benyoub spherical cap VNDF sampler for example) they handle cleanly. Others require real steering. Getting light subpath guiding right in BDPT came down to a subtle decision about separate vs. shared guiding fields that no prompt was going to resolve on its own. When separate eye and light fields produce destructive interference at the same surface position, you need to understand why, not just what to type.

That pattern (full speed on clear specs, hard stops where physical insight is required) has been one of the more interesting meta-lessons. More on where the boundary actually falls in a later post as I let that stew more.

The biggest surprise: the field went physical

When I left, biased techniques were the pragmatic answer to hard light transport problems. Dipole approximation for subsurface scattering. Photon mapping as a caustics crutch. Spectral rendering was a research luxury, RGB was good enough.

Coming back, I expected things to have fully moved to the GPU but that trade-off to still be alive.

It isn’t. The field has largely moved to unbiased physical simulation across the board. Random walk SSS has replaced diffusion approximations as the standard. Hero wavelength spectral sampling means that spectral rendering is the default. Null-scattering volume formulations handles participating media properly while being physically based. The question isn’t “can we afford to be physically correct?” anymore.

This landed differently for me than it might for others. My original skin rendering work was dual-purpose: graphics and biomedical light transport simulation. The biomedical side required physical random walks and spectral interaction simulation; you can’t use a dipole approximation when you need to know where photons actually go in tissue. At the time, that work lived in a completely separate world from production rendering. The techniques were too expensive, too specialized.

Now seeing random-walk SSS become the graphics standard felt like watching a conversation finally arrive somewhere you’d been standing for a while.

What’s been implemented so far

In a few weeks, working alongside agents, RISE (the renderer I am modernizing) has gone from a reasonable 2010-era foundation to something a lot closer to where the field is now with things like:

  • GGX microfacet with anisotropic VNDF sampling (Dupuy-Benyoub 2023) and Kulla-Conty multiscattering energy compensation
  • Random-walk subsurface scattering replacing dipole/diffusion approximations
  • Hero wavelength spectral sampling to get spectral rendering with lower color noise
  • Null-scattering volume framework for unbiased heterogeneous participating media
  • Light BVH for many-light sampling (4.78x variance reduction on a 100-light scene)
  • Light subpath guiding in BDPT using separate OpenPGL fields for eye and light paths
  • Blue-noise error distribution via ZSobol sampling

Next up: VCM, Hyperspectral skin rendering