As I understand it, the US Supreme Court has just this week ruled exactly this. LLM output cannot be copyrighted, so the only part of any piece of software that can be copyrighted is that part that was created by a human.
If you vibe-code the entire thing, it's not copyrightable. And if it can't be copyrighted that means it is in the public domain from the instant it was created and can't be licensed.
Your understanding is incorrect. The case was about whether an LLM can be an author, and did not whether the person using it can be (which will be the case). https://news.ycombinator.com/item?id=47260110
Similarly, the operator of the LLM is the holder of the copyright of the LLM’s output.
This is incorrect. The monkey is unable to have a copyright on the photograph, but there was no court case suggesting the owner of the camera (Slater) has a copyright on the photo, and the Copyright Office's rules actually say the opposite, that it isn't copyrightable at all (the Wikipedia summary of the situation is good, pointing out the Copyright Office specifically added an example of "a photograph taken by a monkey" to their guidance to make their point clear).
The professional photographer claimed he engineered the situation that led to the photo and thus he owns the copyright on the images. That specific claim appears to not have been addressed by the court nor by the copyright office. Instead Slater settled by committing to donations from future revenue of the photos.
The user is responsible for the output of the software. An image created in photoshop isn't the IP of Adobe, nor is text in Word somehow belonging to Microsoft. The idea that because the software tool is AI its output is magically immune from copyright is silly, and any regulation or legislation or agency that comes to that conclusion is silly and shouldn't be taken seriously.
Until they get over the silliness, just lie. You carefully manually crafted each and every character, each pixel, each raw byte by hand, slaving away with a tiny electrode, flipping each bit in memory, to elicit the result you see. Any resemblance to AI creations is purely coincidental, or deliberate as an ironic statement about current affairs.
Using AI as a tool to produce output, no matter how complex the underlying tool, should result in the authorship of the output being assigned to the user of the tool.
If autocorrect in Word doesn't nullify copyright, neither should the use of LLMs; manifesting an idea into code and text and images using prompts might have little human input, but the input is still there. And if it's a serious project, into which many hours of revision, back and forth, testing, changing, etc, there should be absolutely no bar to copyright.
I can entertain a dismissal based on specific low effort uses of a tool - something like "generate a 13 chapter novel 240 pages long" and seeing what you get, then attempting to publish the book. But almost anything that involves any additional effort, even specifying the type of novel, or doing multiple drafts, or generating one chapter at a time, would be sufficient human involvement to justify copyright, in my eyes.
There's no good reason to gatekeep copyright like that. It doesn't benefit society, or individuals, it can only benefit those with vast IP hoards and giant corporations, and it's probably fair to say we've all had about enough of that.
I don't vibe code; I am firmly in charge of the architecture and code style of my projects, and i frequently give detailed instructions to AI tools I use. But, to me, this is leading to a weird place. Why would the result of using a tool to create something new not be copyrightable simply due to the specific tool used?
I think this whole hullabaloo is self inflicted. Code or an other creative work should stand on its merits. There is no issue with copyright and no issue with the ship of Theseus. The current copyright approach is still applicable: code (or any other creative work) that appears to be lifted verbatim from another work could be a copyright violation. Work that is sufficiently original (irrespective of how it was created) is likely not a copyright violation.
I don't think this follows? If I vibe code something and never post it anywhere public, I can still license that code to a company and ask them to pay me for using the code?
So as a corollary, the business model of providing software where you can choose either free (as in beer) and restrictive license (e.g. GPL), or pay money and get a permissive business-compatible license, will cease to exist.
I think that's a shame actually, because it has been a good way of providing software that does something useful but where large companies that earn money from the use will have to pay the software creator.
I believe you can do that with public domain/copyright free material in general. There is no requirement to tell someone that the material you license them is also available under a different one or that your license is not enforceable.
This is a head-spinning argument. The whole point of GPL is to force more things out into the open. You'd think someone who espouses open source would cheer the GPL. The only practical difference between MIT and GPL is that the former allows more closed-source code.
This feels analogous to the paradox of freedom. Truly unlimited freedom would include the freedom to oppress others, so "freedom maximalism" is an unsound philosophy (unless applied solipsistically).
When I publish, I tend to do so under MIT. I also write plenty of closed-source code. And I do generally believe in open source. But I don't use that as a justification for preferring MIT. If anything, I like MIT despite believing in open source, not because. Mainly because I want people to actually use what I wrote.
These are fascinating, if somewhat scary, times.
https://reorchestrate.com/posts/your-binary-is-no-longer-saf... https://reorchestrate.com/posts/your-binary-is-no-longer-saf...
Even SaaSS isn't safe from that type of process:
I guess it depends on your intention, but eventually I'm not sure it'll even be possible to keep it "fully proprietary and closed" in the hopes of no one being able to replicate it, which seems to be the main motivation for many to go that road.
If you're shipping something, making something available, others will be able to use it (duh) and therefore replicate it. The barrier for being able to replicate things like this either together with LLMs or letting the LLM straight it up do it themselves with the right harness, seems to get lowered real quick, massive difference in just a few years already.
Right now you can point claude at any program and ask it to analyse it, write an architecture document describing all the functionality. Then clear memory and get it to code against that architecture document.
You can't do that as easily with closed source software. Except, if you can read assembly, every program is open source. I suspect we're not far away from LLMs being able to just disassemble any program and do the same thing.
Is there a driver in windows that isn't in linux? No problem. Just ask claude to reverse engineer it, write out a document describing exactly how the driver issues commands to the device and what constraints and invariants it needs to hold. Then make a linux driver that works the same way.
Have an old video game you wanna play on your modern computer? No problem. Just get claude to disassemble the whole thing. Then function by function, rewrite it in C. Then port that C code to modern APIs.
It'll be chaos. But I'm quite excited about the possibilities.
I have successfully created a partial implementation of p4 by pointing it at the captured network stream and some strace output. It's amazing how good those things are.
I think it’s entirely reasonable to release a test suite under a license that bars using it for AI reimplementation purposes. If someone wants to reimplement your work with a more permissive license, they can certainly do so, but maybe they should put the legwork in to write their own test suite.
I don't think real AI is around the corner but plenty of people believe it is & they also think they only need a few more data centers to make the fiction into a reality.
Evolution built man that has intelligence based on components that do not have intelligence themselves, it is an emergent property of the system. It is therefore scientific to think we could build machines on similar principles that exhibit intelligence as an emergent property of the system. No woo woo needed.
Sure, but this ain't it.
Actually, I think LLMs are a step in the wrong direction if we really want to reach true AI. So it actually delays it, instead of bringing us close to true AI.
But LLMs are a very good scam that is not entirely snake oil. That is the best kind of scam.
Any particular reason beyond feelings why this is the case.
We already know expert systems failed us when reaching towards generalized systems. LLMs have allowed us to further explore the AI space and give us insights on intelligence. Even more so we've had an explosion in hardware capabilities because of LLMs that will allow us to test other mechanisms faster than ever before.
There are lots of other analogies but the moon ladder is simple enough to be understood even by children when explaining how nothing can emerge from inert building blocks like transistors that is not reducible to their constituent parts.
As I said previously, your time will be much better spent convincing people who are looking for another religion b/c they will be much more susceptible to your beliefs in emergent properties of transistors & data centers of sufficient scale & magnitude.
Congratulations, you're working on a space elevator. A few trillion dollars would certainly get us out of the atmosphere, and the amount of advances in carbon nanotube and foam metal would rocket us ahead decades in material sciences. Couple this with massive banks of capacitors and you could probably generate enough electricity for a country by the charge differential from the top to the bottom.
Oh, I get it, you were trying to be clever by saying something ignorant because it makes you feel special as a human rather than make realistic statements for the progress currently being made in the sciences.
So with "Real AI" you actually mean artificial superintelligence.
This is not always true, for an extreme example see Indistinguishability obfuscation.
And if anything can be reimplemented and there’s no value in the source any more, just the spec or tests, there’s no public-interest reason for any restriction other than completely free, in the GPL sense.
It doesn't if Dan Blanchard spends some tokens on it and then licenses the output as MIT.
LLM companies and increasingly courts view LLM training as fair use, so copyright licensing does not enter the picture.
Even prior to this, relatively simple projects licensed under share alike licenses were in danger of being cloned under either proprietary or more permissive licenses. This project in particular was spared, basically because the LGPL is permissive enough that it was always easier to just comply with the license terms. A full on GPLed project like GCC isn't in danger of an AI being able to clone it anytime soon. Nevermind that it was already cloned under a more permissive license by human coders.
Bikeshedding to eventually come full circle to understand why those decisions were made.
In a world where the large OEMs and bigcorps are increasinly locking down firmware , bootloaders , kernels and the internet. I would think a reappraisal of more enforcement that benefits the USER is paramount.
Instead we have devs looking to tear down the few user protections FLOSS provides and usher in a locked down hacker unfiendly future.
The short version is that chardet is a dependency of requests which is very popular, and you cannot distribute PyInstaller/PyOxidizer builds with chardet due to how these systems bundle up dependencies.
[1]: https://velovix.github.io/post/lgpl-gpl-license-compliance-w...
As i recall there were some similar situations in regards to licences for distro builders regarding graphicsdrivers and even mp3 decoders wherer there was a song and dance the end user had to go through to legally install them during/after setup.
Or better yet to make a truly api compatible re-implementation to use with the license that they want to use, since what they have done i surmise would fall under a derivative work.So they havent really accomplised what they wanted - and instead introduced an unacceptable amount of risk to whoever uses the library going forward.
Kinda reminds me of what the Inderner Archive did during the pandemic with the digital lending library.Pushing the boundaries to test them and establish precedence. in any case let see how it plays out.
IP sounds good in theory but enables things like "patent trolling" by large corps and creating all kinds of goofy barriers and arbitrary questions like we're asking about if re-implementations of ideas are "really ours"
(maybe they were never anyone's in the first place, outside of legally created mentalities)
ideas seem to fundamentally not operate like physical things so asserting they can be considered "property" opens the door for all kinds of absurdities like as pondered in the OP
The problem with IP laws and the US is that the big companies already do what IP is suppose to protect and the US refuses to legislate effectively against them.
Let's see it!
I don't think Stallman has a real proposal to how innovation can be incentivized and compensated.
Take the example of medical innovations, sure big pharma is bad, but if they don't get to monetize their inventions, how will R&D get funded?
If you destroy IP and allow everyone to clone whatever, you will have a great result in the short term, then no one will continue R&D
By taking the public money that goes to medical R&D already, increased if need be, and hire scientists to research medical tech in the interest of public wellbeing and not profit.
IP has always had awkward things like, what if you discover the sole treatment for a disease and can restrict people from making use of it... kind of weird, especially when people can "independently" draw the same conclusions so they truly obtain an idea that is "their own" but which then they are legally restricted from making use of in such an example
also how would you prove it was in the training set? re: your last sentence, the licensed work wasn't in the input in the chardet example ("no access to the old source tree")
Also, for comparison, both GPL and LGPL, when applied to software libraries (in the C sense of the word), assert that creating an application by linking with the library creates a derived work (derived from the library), and then they both give the terms that govern that "derived work" (which are reciprocal for GPL but not for LGPL). IANAL but I believe those terms are enforceable, even if the thing made by linking with the library does not meet a legal threshold for being a derived work.
it's barely tangential to the topic but worth pointing out, I don't think there's firm legal consensus on your library point, that is just the position of the FSF that that's true. IANAL tho. https://en.wikipedia.org/wiki/GNU_General_Public_License#Lib...
For the parent comment on discoverability, I honestly don't know. Some models list their data sources, others do not. But if it came down to a dispute it may be that a court order could result in a search of the actual training data and the system that generated it.
For the second case of derived work through context inclusion, it may end up in a similar situation with forensic analysis of the data that generated some output.
You cannot (*) use LLMs to generate code that you then license, whether that license is GPL, MIT or some proprietary mumbo-jumbo.
(*) unless you just lie about this part.
You can't copyright a work that is only generated by a machine: "In February 2022, the Copyright Office’s Review Board issued a final decision affirming the refusal to register a work claimed to be generated with no human involvement"
But human direction of machine processes can be copyright:
"A year later, the Office issued a registration for a comic book incorporating AI-generated material."
and
"In most cases, however, humans will be involved in the creation process, and the work will be copyrightable to the extent that their contributions qualify as authorship. It is axiomatic that ideas or facts themselves are not protectible by copyright law and the Supreme Court has made clear that originality is required, not just time and effort. In Feist Publications, Inc. v. Rural Telephone Service Co., the Court rejected the theory that “sweat of the brow” alone could be sufficient for copyright protection. “To be sure,” the Court further explained, “the requisite level of creativity is extremely low; even a slight amount will suffice."
See https://www.copyright.gov/ai/Copyright-and-Artificial-Intell...
But it will be a shitshow either way.
It's not clear to me how much code you would need to modify by hand to qualify for copyright this way, but that's not an impossible avenue.
Kinda surprised nobody commented on this
e.g. Somebody wrote a library, and then you had an LLM implement it in a new language.
You didn't come up with the idea for whatever the library does, and you didn't "perform" the new implementation. You're neither writer nor performer, just the person who requested a new performance. You're basically a club owner who hired a band to cover some tunes. There's a lot involved in running a club, just like there's a fair bit involved in operating a LLM, but none of that gives you rights over the "composition". If you want to make money off of that performance, you need to pay the writer and/or satisfy whatever terms and conditions they've made the library available under.
IANAL, so I don't even know what species of worms are inside this can I've opened up. It seems sensible, to me, that running somebody else's work through a LLM shouldn't give you something that you can then claim complete control over.
---------
Edit: For the sake of this argument, let's pretend we're somewhere with sensible music copyright laws, and not the weird piano-roll derived lunacy that currently exists in the U.S..
- one for the composition, the musical idea, music, lyrics.
-one for the recording, the music taking shape in a format that someone can listen to
I don't think this is how software licenses work, as they cover the code itself, rather than the ideas (the specific recording rather than the composition, in the music example), but it's an interesting way to frame why using LLM this way is, if not illegal, at least unethical.
In the example given and discussed here the last couple of days there seems to be a process more akin to having an AI create a cast of the pre-existing work and fill it for the new one.
Imagine doing the same with vehicle engines. Less fuel consumption, less pollution, less weight and who knows how many more benefits.
Just letting the A.I. do it by itself is sloppy though. The real benefit is derived only when the resulting port is of equal or better quality than the original. It needs a more systematic approach, with a human in the loop and good tools to index and select data from both codebases, the original and the ported one. The tools are not invented yet but we will get there.
What if you ask the tool “come up with an idea and build it” and it makes you an (obviously) derivative app? Or what if (closer to this post) you say “copy this thing, but differently so we don’t get into legal trouble”? Is any of those an “original thought” worthy of ownership of the output?
Further, what if this tool can reproduce these forbidden things almost or completely verbatim and the user of the tool has no way to verify it?
Think of software development as finding a structural path from point A to point D.
1.The Foundational Gateway (A → B): You are correct that AI tools are an amalgam of existing data. This foundational layer (A-B) represents the "Prior Art" or the existing IP that serves as a necessary gateway for any further development. If the path starts here, the rights of the original creators must be respected through the established legal framework of Intellectual Property Offices.
2.The Innovative Branch (F → D): However, if an orchestrator uses a tool to forge a new path via a distinct architecture (F) to reach the destination (D), that specific "delta" is a unique intellectual asset. Even if the tool "borrows" the bricks, the topological map of the new architecture belongs to the thinker who directed it.
3.The Necessity of Cross-Licensing: This is where the true core of IP exists. If the owner of the foundation (A-B) wishes to utilize the superior, optimized results of the new path (ABFD), they must respect the IP of the FD architecture. Conversely, the FD creator must acknowledge the base.
We aren't just talking about 'verbatim reproduction' of code; we are talking about the Systemic Design that justifies the existence of IP offices worldwide. The future isn't about "cleaning" licenses through AI, but about a more sophisticated world of Cross-Licensing where the foundational layer and the innovative layer recognize each other's functional logic.
Assuming that you are a programmer, when you think back to your contract, you will have noticed something like "The employee agrees to that any works created during employment will be solely owned by $company_name"
Copyright _should_ be about allowing workers to make money from the non physical stuff that they produce.
Google spent many many millions undermining that so they could run youtube, the news service and google books (amongst other things.
Disney bought most of congress to do the opposite.
At it's heart copyright is a tool that allows you and me to make a living. However its evolved into a system that allows large corporations to make and hold monopolies.
Now that large corporations can see an opportunity to cut employees out of the system entirely, they are quite happy with AI companies undermining copyright, just so long as they can keep charging for auto generated content.
TLDR: copyright is automatically assigned to the creator of the specific work, not the thinker.
ie thinker: "build me a box with two yellow rabbit ears"
The text is copyright of the "thinker"
maker: builds a box with yellow rabbit ears Unless the yellow rabbit ears are a specific and recognisable of the thinker's work, its not infringement.
> © Copyright 2026 by Armin Ronacher.
Oooohkaaaay?
Good term.
For myself, I tend to have a similar view as the author (I publish MIT on most of my work), but it’s not really something I’m zealous about, and I’m not really into “slopforking” the work of others. I tend to prefer reinventing the wheel.
Good heavens, that's incredibly unethical. I suppose I should expect nothing more from a profession that has shied away from ethics essentially since its conception.
> I think society is better off when we share
Me too.
> and I consider the GPL to run against that spirit by restricting what can be done with it.
The GPL explicitly allows anyone to do anything with it, apart from not sharing it.
You want me to share with you, but you don't want to share with me.
That's not how copyright works. It doesn't require exact copies. You also can't just rephrase an existing book from scratch when the ideas expressed are essentially the same. Same with music.
Not ship of Theseus, but a "new implementation from ground up.
Evidently, the author prefers MIT (https://github.com/chardet/chardet/issues/327#issuecomment-4...), and seems OK with slop-coding.
Interestingly that‘s also the exact same spot I stopped reading.
The dilution of morals weakens societies. We ignore them at our own peril, the planet and most certainly any god figure doesn’t care.
Not saying there's a legal precedent for that right now, but it's the only thing that makes any sense to me. Either that or retain the models on only MIT/similarly licenced code or code you have explicit permission to train on.
`if __name__ == "__main__":`
I have no idea where that line first appeared, so figuring out what license it was originally written under would be difficult to track down, and most software only has license info at the file rather than line level.
Let's be honest about what's happening here.
I said this else where, but I work with people who won't even look at GPL code because of the potential legal entanglements.
Yes let's. Corporations with billions of dollars behind them whole sale stole copy right work and licenced code to train models, and then turned around and sold the result with no attribution or monetary benefit given to the people they stole from. They knew what they were doing and relied on the legal system being slow enough that they could plant a flag in the market before legal challenges killed them.
It's an industry built on theft. By all rights they should have been sued/fined out of existence before it ever got this far. But if you have enough money you can make almost anything legal.
In practice, well ... you saw what's been going on with the Epstein files, etc... we are far from being ourselves in a world that's fair and honorable.
(I'm not condoning it, I think it's massively trashy to steal code like this then pretend you're the good guy because of some super weird mental gymnastics you're doing)
And thus we arrive at the absolute shit state the world is in. We keep putting morality aside for something “more interesting” then forget to consider it back in when making the final point.
“Have you tried: “kill all the poor?””
How would I defend myself against hostile entities and societal norms that make it OK to steal from me and my effort without compensation? I will close my doors, put up walls, and distrust more often.
That's clearly the trend the world is going towards and I don't see that changing until we find some a way to make it cheaper to detect deception and parasitic behavior along with holding said entities accountable. Since our world leaders have had a history of unaccountable leadership and they are whom model this behavior, I have difficulty seeing the norms change without drastic worldwide leadership change.
Just because things are not as one wants, does not stop that desire to be there.
> When the author of a project choose a specific license s/he is making a deliberate decision.
Potentially, potentially not. I used to release software under GPL and LGPL but changed my mind a few years after that. I did so in part because of conversations I had with others that convinced me that my values are closer aligned with permissive licenses.
So engaging in a friendly discourse with a maintainer to ask them to relicense is a perfectly fine thing to do and an issue has been with chardet for many, many years on the license.
I have looked at the project earlier today there is effectively no resemblance other than the public API.
Level 0: the code is just copied
Level 1: the code only has white space altered so the AST is the same
Level 2: the code has minor refactoring such as changing variables names and function names (in a compiled language the object code would be highly similar; and this can easily be detected by tools like https://github.com/jplag/JPlag)
Level 3: the code has had significant refactoring such as moving functionality around, manually extracting code to new functions and manually inlining functions
Level 4: the code does the same conceptual steps as the old code but with different internal architecture
At least in the United States you have to reach Level 4 because only concepts are not copyrightable. And I believe chardet has indeed reached level 4 in this rewrite.
Yes, and the choice of license for a project is made for a reason that not necessarily everybody agree with.
And the people who don't agree, have every right to implement a similar, even file-format or API compatible, project and give it another license. Gnumeric vs Excel, for example, or forks like MariaDB and Valkey.
But whether they do that alternative licensed project or not, it's perfectly rational, to not like the choice of license the original is in. They legally have to respect it, but that doesn't mean there's anything irational to disliking it or wishing it was changed.
And it's not merely idle wishing: sometimes it can make the original author/vendor to reconsider and switch license. QT is a big example. Blender. Or even proprietary to open (Mozilla to MPL).
"It's so disgusting to see people who are either malicious or non mentally capable enough to understand this"
It's not some sort of democracy, lol, it's a set of exclusive rights that are created the moment the work being copyrighted is produced.
(For a quick intro I recommend: https://www.youtube.com/watch?v=bxVs7FCgOig)
In the case of the license in question (L/GPL), it's one of the most strict ones out there, it explicitly forbids relicensing code under a different non-compatible license, like MIT; let me says that again, L/GPL EXPLICITLY FORBIDS the thing that happened here from happening.
I sympathize with the guy that spent 12 years of his life maintaining the code, thank you for your service or something, but that does not make a difference. The wording of the (L/GPL) license is clear and the original author and most of the other 50 or so contributors did not approve of this.
Take a look at the guidelines that keep this place together: https://news.ycombinator.com/newsguidelines.html
Not to mention the fact that, as the other commenters mention, that appears to just... not have happened at all in this case, so it's a moot point.
Unlike with music, in software traditionally a (human) programmer could be chosen who haven't "heard" (i.e. read the original code). That has traditionally called a "clean room" implementation (not to be confused with the software development process called "clean room").
I like sharing too but could permissive only licenses not backfire? GPL emerged in an era where proprietary software ruled and companies weren't incentivized to open source. GPL helped ensure software stayed open which helped it become competitive against the monopoly proprietary giants resting on their laurels. The restriction helped innovation, not the supposedly free market.
He is totally in on AI and that quote of his is self-serving. Can't we go back to flaming Unicode in Python?
No doubt, GPL had some influence. But I would hardly single it out as the force that ensured software stayed open. Software stayed open because "information wants to be free" [2], not because some authors wield copyright law like a weapon to be used against corporations.
[1]: https://opensource.com/article/19/4/history-mit-license
[2]: A popular phase based on a fundamental idea that predates software.
The GPL’s significance was that it changed the default outcome. At a time when software was overwhelmingly proprietary, it created a mechanism that required improvements to remain available to users and developers downstream.
Gcc was a massive deal for the reasons why compilers are free now today for example
I also wouldn't agree that proprietary software is in decline. There are niches where the OS, mobile apps, and games are almost entirely proprietary (and that is not changing any time soon). But the most damning problem is that all computer hardware now has multiple layers of subsystems with proprietary software components, even if the boot loader and beyond are ostensibly FOSS.
My take on the cause of proprietary software is "the bottom line". Companies want to sell products and they believe that it's easier to sell things that are not open source. Meanwhile, there are several counterexamples of commercial products that are also open source (not necessarily copyleft), including computer games. The cause of whatever decline you're seeing in proprietary software dominance is unlikely to be the GPL.
The two places it has won out thus far is in retail and SaaS. The environment of 1980 when most important software was locked behind proprietary licenses is quite far behind us.
Bringing a fork in-house and falling behind on maintenance is a very bad idea. The closest I've ever come to that in industry was deploying a patch before the PR was merged.