The benefits of writing code two days every week without AI, and why the percentage of code written by AI is a vanity metric.
Second post on my interview with Borislav Nikolov, CTO of Rekki.
[I'm embarrassed how long this post took me to sit down to write. Got distracted by a couple other drafts, vibing, and a lot of reading this summer. Hopefully better late than never!]
As I mentioned in my last post, An AI Metamorphosis, I love Borislav’s perspective on AI because I feel like he has fully immersed himself in it, he thinks deeply about the craft of coding, and he has a similar philosophical perspective on things. This is the second half of our discussion, which contains three sections:
Why Rekki’s CTO, Borislav, still codes two days a week without any help from an LLM.
The theory of mind, and reviewing code generated by an LLM.
What the future holds as more and more code is written by LLMs.
Coding two days a week without the help from an LLM.
One of the things that I hadn't heard before is how Borislav insists on coding two days per week in a blank terminal, with no AI help.
"I have days where I don't code with any AI.... I open a blank terminal and then I just code. I don't do any syntax highlighting or autocomplete or anything, just code... Because you start to forget... If you just use language models all the time, so much code gets written. You start losing touch with what is a lot of code, because [ideally] you want zero code. The more you use an AI, the more you forget. It gives you a bunch of code — you would never have written code like that, but.. the code is in front of you, in just a few seconds, and it works. You accept it and feel kind of ok because you don’t consider it ‘my’ code, ‘I’ didn't write this code. You have no empathy towards it. You just read it and accept it like some alien code."
There are two things embedded here: first, is this "slow boiling frog" element that Borislav elaborates on below, where you lose touch with what great code looks like. Second is how an engineer can get abstracted from the prior pride of authorship of the code they create. They didn’t create the code and don’t feel the same level of ownership of it, so now, it's less about the code itself, and more "does it get the job done"? But as we all know, the test "does it get the job done" can be short term gain, long term pain.
Back to Borislav:
"On one of the AI-free days, as you are writing a bunch of boilerplate yourself, you start hating it. Why, why am I writing this? This code is just nonsense. So your nonsense filter stays sharp. The next day, when the LLM spits out 500 lines of code, I tell it 'that's too complex, I don't want it’, I can still feel the spidey sense. Otherwise, it's slow boiling the frog. When you're pushed little by little, your tolerance changes so much. And you forget what is good or not."
This comment made me think of how so many engineering teams now report on what percentage of their code is written by AI, but this can be a bit of a vanity metric (and long time readers know how much I hate vanity metrics!). I get the desire to show how much they are embracing AI by reporting on this, but if you optimize for this metric, you won’t be motivated to edit the extreme amounts of code the AI creates, creating challenges in the future. Something to watch out for. Back to Borislav:
"The key is you have to grow your capacity to make the system do what you want it to do, for it to be a way to express yourself. How easy is it for you to tell the computers what to do? For example in 1980 you just poke an address and a pixel appears, the distance between the code, the wires and the display is in the order of a few hundred lines of machine and micro code, now, there are easily 10 million lines of code. It is easier to add new abstractions and new indirections than to fix anything, and in the last 15-20 years we have been doing that, and this is what language models learned from. Why do we even add layers of indirection and abstractions? Because we are humans, and this is our way to manage complexity. The language model is a stochastic ghost of a human, so it also abstracts. When there are 10 billion lines, I won’t even try to comprehend what is going on, I will have to settle for a summary of the code. It's just going to be too much if you don't try to take control of it... You want to conserve complexity, not let accidental complexity bubble up. But it's going to be really hard to distinguish accidental complexity from actual complexity when you are 100k lines of alien code deep."
The theory of mind, and reviewing code generated by an LLM.
One of the challenges with editing LLM-generated code Borislav talks about is the "theory of mind" -- basically, how humans have an intuition for how other humans think through a problem, but that that intuition doesn't map to how a LLM thinks.
Borislav gives the classic example of being in the room with Sally. Sally leaves her keys on the table and then leaves the room. You take those keys and put them in a box on the table. When Sally comes back to the room, she knows to look in the box because she can pretend to be you in her mind, and know that that is probably where you would hide the keys.
Back to code, "when I read somebody else's code, there are all kinds of reasons for this code's existence, for example historical. At Booking.com, there was a column in a table that was called country code, and inside of it, it had currency. You read code that seems like nonsense, but you know there is a human reason for it. You ask yourself ‘why did they do it like that?’ But when you have this stochastic machine that produces tokens, this is just not true. You don't have a theory of mind for the symbols. Imagine it produces a comment ‘this is a clean version of the function’, you read it and read the function and ask ‘clean for who?’. You can't reason about it in the same way. You can't think about it in the same way. You can’t understand the reason for their existence. It did it because vectors aligned like that, that’s it."
And so you end up "disconnecting" from the code. Instead of refactoring it to a way that makes sense in your brain, many engineers instead act more like a product owner of the code -- "you don't want to read the code, you just want it to work."
"That's a completely valid option, I think, and many people do it and that's fine. And they just have to recognize what exactly we are doing. But the other option is to treat it as your code. And you have to ask yourself, okay, is this what I would have written? And that's what your critique should be, what your bar should be. You must still have to have a strong sense of responsibility and ownership."
I put that part in bold because it really resonated. I wonder if it will stand the test of time?
FWIW, I asked Borislav what percentage of his code is written by an LLM, he wasn't sure but guessed 70%.
What the future holds as more and more code is written by LLMs.
"Imagine your life depending on something and it's like 20 million tokens of code. This thing drives your car or whatever it does. The fact that we tested a black box doesn't really help us, we have to really think carefully about what we are doing. What is this doing? How are we building on top of these layers, because the layers are also going to increase.
“If a language model can just write the program for all kinds of architectures, maybe we don't even have a compatibility layer anymore. And now humans cannot even read or write the programs that the language models can. And that's like a very disturbing thought because imagine, well, maybe the next instruction set is actually designed by just AI. A compiler takes our code and compiles it for this thing, which we don't even know how it works. It's just ten times faster than our thing. Now we can't even step through it, it's just a complete black box. So we move the black box deeper and deeper into the silicon, and then we completely lose touch with how this is working."
One of the risks Borislav highlights here, is that basically our ability to build programs with true complexity is going to get harder and harder, because just accepting what code an LLM produces will introduce "fake" complexity and force us to completely "dissociate" from the code itself, making it harder to build real complexity over time.
Part of the journey Borislav has gone on is learning when to use an LLM.
"You must think of the pressure of the tokens. Imagine you ask it to do sentiment analysis, and you expect a single Positive or Negative from it. Now, that's too much pressure, you must allow it to get to the answer. For example, make a plan of how it would do the analysis and maybe instead of asking it to do positive or negative, ask it to give you a score for each, so if it samples ‘negative’ it can still recover. I don’t think people realize how much is lost when a token collapses. Think of how much is lost when a word is written. Imagine you write the word ‘positive’ on a piece of paper, and then 1 year later, you read the piece of paper and you have to add one more word, you think for a second and write ‘attitude’, then 2 years later you see ‘positive attitude’ what do you write next? All the high dimensional nuance is lost after the collapse.
Another good way to think of it is imagine a dark room with a document and a note, the note says ‘do sentiment analysis on the document, answer only positive/negative’, so you pick a random person from the street, drop them in the room, you see the result, you don’t like it, you change the instructions on the note, and then you pick another random person. Each document will get a random person, your instructions better be good.
I really advise anybody just to watch Karpathy's lectures. It's like 10 hours. I watched them like four times. Watch them, pause them, and it will help you to kind of have a mental model. Because it's a machine that we have never experienced. And you try to think of it either as a human or a computer, and it's neither."
Thank you Borislav for your insights!
In engineering, the percentage of code written by AI might be a vanity metric. But from a business perspective, if you tell me that 50% of your team's code is generated by AI, I might reasonably assume that half the team could be safely downsized.