Code duplication is far cheaper than the wrong abstraction (2016)
79 points by rafaepta 2 hours ago | 53 comments

lg5689 2 minutes ago
I believe that "single source of truth" is a principle that should always be followed. If there's duplicated code where it'd be a bug if they diverge, then you should refactor. It creates a long-distance coupling in your code that may be invisible to future developers until a bug emerges.

But with that in mind, I agree with the article: if it's not a violation of "single source of truth", then abstractions are just a convenience. If it starts being inconvenient, then it's not doing its job and there's no reason to use it. It's a serious code smell if a function needs several flags for custom behavior; that means it's probably the wrong abstraction or violating the single responsibility principle. If there is a legit need for lots of customization, an often-good way to handle is to take a function/functor as an argument for the customization. E.g., rather than `solve(f:double -> double, max_iters = 99, x_abs_tol = 1e-15, x_rel_tol = 1e-15, ...)` you can do `solve(f:double -> double, stopping_criteria: StoppingCriteriaClass)`

reply
dofm 19 minutes ago
No it's not. This has always been a needlessly iconoclastic rather than sensible suggestion.

At the very least it is not once you're working at the wrong kind of scale.

Once you have an awkward number of customers (more than five and less than a hundred), maintaining duplicated code that should have been abstracted and modularised will only seem cheap if you don't mind that you burn through even junior employees at a pace.

And in the LLM era the wrong kind of scale appears in different ways; code generated and duplicated without proper abstraction and then maintained by an LLM that cannot be trusted to do the same modification each time it encounters a pattern or to have enough of an overview to slowly rescue duplicated code through good abstractions.

I would go as far as to say that any abstraction you can maintain (that is in active maintenance, I mean) is better than code duplication once you are passed a de minimis threshold.

reply
coldtea 6 minutes ago
Hardly iconoclastic, it's a very sensible suggestion.

It would be iconoclastic if the common sense basic approach would be to start with abstraction. It's not, the common sense default is to write possibly duplicate behavior until you actually discover several cases to abstract away, until you bevalop a sensible idea of which functionality unites them and which doesn't carry over all of them.

>Once you have an awkward number of customers (more than five and less than a hundred), maintaining duplicated code that should have been abstracted and modularised will only seem cheap if you don't mind that you burn through even junior employees at a pace

Maintaining the wrong abstraction, or, god help, abstractions, would be even worse.

reply
bluefirebrand 2 minutes ago
Yeah, "Write Everything Twice" is a pretty common and sensible direction for any codebase
reply
fny 12 minutes ago
Code duplication is cheaper than the wrong abstraction. If you have a good abstraction, you should run with it.

If you haven't figured out a good abstraction at 5-100 customers, God help you.

reply
mawadev 14 minutes ago
I think you applied this idea into the era of LLMs but consider an abstraction that takes in multiple god structs for branches it may or may not call in the case you are looking at and has a lot of if conditions that explode in combinatory complexity across a deep call chain. Now the bottle neck is that you need to call this function 144 times a second. That is where you start to have clusters of hot code paths where the latency stacks depending on the angle the god structs come in. Not sure what LLMs do here, I don't vibe code
reply
dofm 3 minutes ago
I am applying it to LLMs on the basis of twenty years of seeing smaller programming shops tie themselves in knots by using duplication to avoid developing an abstraction that would help them because they were unsure of it.

Everyone always thinks duplication is fine when you can bill the modifications by the hour. But they never think to understand that the reason they've had so many employees is that they've turned their change process into firefighting all the different versions of the same code and all these young developers burn out from the sheer anxiety of not knowing where all the little fires are.

I once had to rescue a site that had become a victim of its own popularity, that was written by subcontractors who clearly believed that duplication is better than the wrong abstraction.

Until one day, along came a change — MySQL 4 to MySQL 5 — and a significant duplicated query no longer worked due to its new, proper strictness.

The problem was compounded; not only was the broken pattern in hundreds of places where it had sat, stable and predictable, but the pattern was broken because it, itself, was avoidance of another abstraction that would solve it.

They quit: they said they couldn't and wouldn't fix it. It had always worked how they had done it, and it would have to stay on MySQL 4 (which the hosting provider refused to accommodate).

I don't think it helped that they were severely misguided in their understanding of SQL, but the code had become beholden to duplication and then crippled by a new problem in the duplicated pattern.

I had to first find all the contexts in which that pattern appeared (which required me to spend half a day on a bespoke script) and then work out a new pattern and as few variations of it as possible to fix the duplicated code in each place, because there was no proper budget to rewrite the whole thing. And then I sat at my desk, for days, working through each one.

Even a bullshit abstraction would have saved that client both time and money.

reply
Capricorn2481 6 minutes ago
> I would go as far as to say that any abstraction you can maintain (that is in active maintenance, I mean) is better than code duplication once you are passed a de minimis threshold.

Pretty much everyone arguing for duplication has argued what you are saying, which is wait to see a few instances of it before committing to an abstraction. No one is saying duplicate everything 100 times. So I don't think this discussion was ever iconoclastic.

reply
bhouston 26 minutes ago
I used to struggle with abstractions back in my OOP days but since moving pretty much to a purely functional approach I find that code duplication is rare. Just have a function and call it in two parts. The main abstraction issue is then data structures but with TypeScript interfaces being duck typing essentially I run into few problems there as well.

So code duplication because of abstraction issues is rare. Code duplication because of siloed developers is so much more common.

reply
ikety 45 seconds ago
For hobby, I use functional languages, and I find the techniques are the important bits to remember. Most modern languages let you easily stand on functional programming theory. You don't need to know Haskell. Everyone's brain works differently, but the idea of small, simple and occasionally flexible parts building a whole works for me. As opposed to the large complex do it all shape shifting machine.
reply
platz 11 minutes ago
what exactly is 'calling a function in two parts'
reply
lysium 6 minutes ago
I read it as „calling it from two places“
reply
irishloop 4 minutes ago
Too many abstractions are bad. Too many code duplication is bad.

Part of being a good engineer is finding the right balance.

I know engineers who would gladly duplicate code all over the code base to avoid creating a new abstraction.

I know engineers who create polymorphic abstractions for a single caller with a very obvious set of parameters.

So much of wisdom is in finding balance and not being dogmatic about rules.

reply
fjfaase 5 minutes ago
I once used code duplication to implement a fourth type of dialog that looked somewhat similar to the others, that were sharing a lot of code, because I felt that although it looked much the same as the others, there was some fundamental difference. Took me about a day to implement. When some other engineer saw this, he spend the next three weeks trying to integrate all of them with some shared class. His work was not completely worthless, because he did find some small bug during all his efforts to avoid any possible code duplication. I already had predicted that it would take a lot effort, but I did not object, because I hoped that he would learn something from it and the next time think twice before always trying to avoid code duplication.
reply
agentifysh 28 minutes ago
i recall very early in my career i did exactly this. i took what worked duplicated it—my reasoning being that it was far safer to reuse what has been battle tested and leave refactoring at a later stage

it wasn't received well and senior developer told me that 'good developers know exactly what patterns to use all the time before writing any piece of code and that he will clean up my mess'

long story short his refactoring caused what was otherwise a stable system into a complete mess and it reminded me of Nassim Taleb's book

reply
nicoburns 25 minutes ago
It's definitely an "it depends" thing. It's easy to overabstract. On the other hand, I've also met junior developers who just didn't know how to use function parameters.
reply
znkr 22 minutes ago
+1 The worst code I had to maintain was code that tried to follow DRY (without the trying to understand what the original intention of that principle was). The only way out of that mess was widespread code duplication.
reply
strongpigeon 21 minutes ago
Echoing the article, anyone who has experienced both will agree: it’s far easier to work with an under engineered code base than an over engineered one.
reply
ultim8k 8 minutes ago
Nobody wants to listen. Nobody. In 90% of the companies there are some so called senior devs that get ecstatic when they create a new abstraction.

Overengineering, abstractions and premature optimisation are the 3 worst plagues of engineering.

At the same time I’m happy they exist because it means we’ll always have a job.

reply
cryo32 40 minutes ago
You can do both with microservices!
reply
zephen 28 minutes ago
But wait! There's more!

For $19.95, you can replace your single single point of failure with multiple single points of failure!

reply
flawn 21 minutes ago
Or for 100$, get a 5x increase on all failure points - maximum vibes, maximum excitement.
reply
DJBunnies 26 minutes ago
Except 9/10 times microservices end up wildly dependent on each other, yielding a distributed monolith. Better to use service oriented architecture and just ship the monolith, you can test easier and skip the extra layers of serialization / deserialization.
reply
loevborg 23 minutes ago
I think you missed GP's point
reply
Rendello 11 minutes ago
Two talks come to mind here: Mike Acton's Data-Oriented Design and C++ [1] and Brian Cantrill's The Complexity of Simplicity [2].

Mike's talk argues that code solutions need not be modelled on the real world, and that different data creates different problems, which need different solutions. I can't do the talk justice, but it's had a big impact on me.

Brian's talk is about abstraction generally, and how it's difficult to find the "right" abstraction.

1. https://www.youtube.com/watch?v=rX0ItVEVjHc

2. https://www.youtube.com/watch?v=Cum5uN2634o

reply
MrGando 7 minutes ago
I once had to work with a system that was refactored and abstracted away heavily to use Redux. It didn't work then, the implementation had way too many abstractions, doing any change meant you had to touch dozens of files. It was insanity. Left me with a bitter taste regarding the redux pattern for ever (probably not the pattern's fault).
reply
platz 30 minutes ago
2016 (up to 2018 or so) may have been the peak of such varied activity in the developer ecosystem, including articles like this, whether it was discussion, ideation, OSS variety, language development.

There has been growth since but it's been concentrated into fewer channels and somewhat industrialized.

reply
jbvlkt 13 minutes ago
It depends if duplication is accidental or real. I.e. if two taxes are using the same formula, it is accidental. If you use the same physic formula on multipla places, it is real duplication.
reply
aappleby 5 minutes ago
The smallest amount of simple code that solves the problem wins. Everything else is irrelevant.
reply
northisup 15 minutes ago
Duplication is fine, triplication and above is the issue.
reply
mjevans 11 minutes ago
Triplication tends to be where it becomes more clear what the correct thing to abstract or de-duplicate is.

It's of course possible to functional-ize segments of logic, but then the question of state mutation must be brought up. How isolated are these changes from other parts of the code / system state. Can this be run in parallel or is it something that must be serial? What potential race conditions exist?

reply
bob1029 18 minutes ago
If you work backward from the schema these sorts of things tend to evaporate before they can become a problem.

Some of the biggest rabbit holes come from naming conventions not aligning across the business and technology silos. If everyone agrees that Customer has exactly 34 attributes, then it is possible to move to the next step of sharing libraries of types across the team. Getting your POCOs/DTOs 1:1 across the board is when the duplication really starts to melt away.

reply
hedora 5 minutes ago
I’ve seen code bases that evolved like that. The problem is almost always outside the abstraction that has a pile of conditionals.

Usually, some moron decided to copy paste things a few levels up and then the top half of the system metastasized into two parallel universes of broken garbage.

For instance, one might decide to perform auth later in the flow so unauthorized handlers can run and set a “this requires auth” bit that defaults to false, and the other flow could add a forged auth header before the auth step.

Now, the auth handler needs a “allow forged header” flag and a “already authenticated” flag.

I’ve seen that grow to a half dozen cases until massive production dataloss occurred due to a buggy client that tried to delete something local to their account without specifying a userid as a parameter (this codebase was garbage!) and deleted the something for all users instead.

I can’t remember how the dataloss was “fixed”, but it definitely wasn’t “all requests go through a simple auth check, and all handlers declare/implement their auth requirements in the same way”. Getting a design approved to require a user id be specified exactly once for account-level operations was fantasy land for that team. (Most people with any sort of engineering talent bounced in under a year.)

Anyway the “abstractions are hard so copy paste” approach did provide job security for the lifers on that product. I can’t imagine them holding a job elsewhere, but they were completely immune to layoffs (hostage style).

reply
originalcopy 13 minutes ago
While I see the point, I think I more often encounter the opposite. Duplication, but not exactly duplication. Then the "sunk cost fallacy" is not an issue but there is huge maintenance cost and no-one feels like refactoring it. I'd rather refactor bad abstraction than 10x duplication.
reply
TexanFeller 2 minutes ago
> Code duplication is far cheaper than the wrong abstraction

Very true in some sense, but I continue to encourage DRY-bias because I've literally never seen teams duplicate code responsibly and later dedupe it when it's the right time. 95% of the time this sentiment is quoted to justify shipping quick slop and stable reusable bits are never extracted into a shared lib later.

reply
williadc 14 minutes ago
The "99 Bottles of OOP" book mentioned at the bottom was an excellent introduction to refactoring. I highly recommend it if you struggle with finding the right data models for the problems you work on.
reply
antonymoose 37 minutes ago
Twice a coincidence, thrice a pattern.
reply
christophilus 33 minutes ago
Yes. I’m dealing with a graphql, urql, Next, Prisma stack at the moment. Something that would be a handful of lines of code in a different stack ends up being hundreds in this one.

The Node ecosystem is full of wrong abstractions.

reply
Rohansi 22 minutes ago
The problem is self-inflicted. You do not need to keep jumping to the next trendy framework.
reply
RussianCow 17 minutes ago
I don't know about you, but I generally don't write code in a vacuum. Other people may have touched it before me. Those other people may have made poor decisions.

Not that I'm immune from choosing the wrong abstraction sometimes. More than once the "other people" was me. We all make mistakes.

reply
Rohansi 6 minutes ago
Of course, but we should all be doing our best to push back against unnecessary framework churn.
reply
jstimpfle 20 minutes ago
Code duplication is the wrong abstraction too -- unless it's not really code duplication but code that only happens to be similar for some really "unstable" reason.
reply
dofm 13 minutes ago
I would agree that there are good "de minimis" reasons not to abstract code that isn't ready to be abstracted at all. If the pattern has not settled it shouldn't be forced into an abstraction (beyond those that make sure it is e.g. not vulnerable)

But beyond that, any stable abstraction is better than duplicated code.

reply
bazoom42 22 minutes ago
Depends. If the abstraction is just a level of indirection, then it is usually pretty simple to eliminate - just hit “inline function” in the refactoring tool a few times.

On the other hand it is pretty difficult and error prone to consolidate duplicated code which have drifted apart over time.

If in doubt, chose the approach which is simplest and least risk to revert if you discover in the future you made the wrong choice.

I do agree a bad abstraction can cause huge problems. But it’s usually not the kind of abstractions introduced to eliminate code duplication, but the kind of top-down “architecture astronaut” abstractions, where a model is chosen which does not fit the complexity of the problem.

reply
KHRZ 32 minutes ago
This is the biggest lesson I got from LMMs. I have a 1 million LOC vibe coded project that I can only imagine would fit in a few hundred thousand lines. But it's still holding up, I expected some kind of development collapse long before this point.
reply
cassianoleal 28 minutes ago
I don't think that's a good lesson.

OP is right that code duplication is far cheaper than the wrong abstraction, but the opposite is also true - the right abstraction is far cheaper than code duplication.

reply
gavmor 24 minutes ago
Well sooner or later I would expect a developer who intimately understands their code base to feel compelled to start refactoring and extracting fitting, meaningful well-leveraged abstractions.
reply
anon-3988 34 minutes ago
The problem with coming up with a rule that works for everyone is that everyone have a different idea of what makes a good abstraction.

Do you want to iterate using for loop or using .iter().step(2).map()?

I would rather have consistency than a mixed bag of levels of abstractions.

reply
doix 25 minutes ago
> Do you want to iterate using for loop or using .iter().step(2).map()?

This isn't really a good example, assuming both can be used to represent the same thing.

The problem with the wrong abstraction is when your abstraction doesn't let you represent something. Then, because of you've already invested so heavily into it, you start contorting the problem to fit your abstraction and it becomes a shit show.

reply
metaltyphoon 28 minutes ago
> Do you want to iterate using for loop or using .iter().step(2).map()?

I don’t think it matters, specially for sort sized loop scopes

reply
sebastianconcpt 15 minutes ago
Oh the self-contradiction here...

Generalizing this in the abstract is a wrong abstraction.

reply
atmanactive 23 minutes ago
No it's not.
reply
Ozzie-D 9 minutes ago
[flagged]
reply