Hacker News

173 points by pixelmonkey 2 days ago | 179 comments

> I personally think all of this is exciting. I’m a strong supporter of putting things in the open with as little license enforcement as possible. I think society is better off when we share, and I consider the GPL to run against that spirit by restricting what can be done with it.

I like sharing too but could permissive only licenses not backfire? GPL emerged in an era where proprietary software ruled and companies weren't incentivized to open source. GPL helped ensure software stayed open which helped it become competitive against the monopoly proprietary giants resting on their laurels. The restriction helped innovation, not the supposedly free market.

bored9000 15 hours ago

Also, towards the bottom of the page: > Content licensed under the Creative Commons Attribution-NonCommercial 4.0 International License.

rkJahsdg 16 hours ago

Ronacher has a startup Earendil that markets itself as a non-profit like OpenAI. He appears with Austrian OpenClaw people.

He is totally in on AI and that quote of his is self-serving. Can't we go back to flaming Unicode in Python?

duskdozer 24 hours ago

Absolutist permissive licenses are how you get the xkcd jenga tower

Aeolun 15 hours ago

I dunno, I’m inclined to think the WTFPL and MIT did more to help open source. And for a while during my youth there was indeed no distinction between publically accessible code and free and unencumbered code.

cheesecompiler 14 hours ago

Inclined to think that why?

jason_oster 2 days ago

You're putting a lot of responsibility on a license that has several permissive contemporaries. The original BSD license "Net/1" and GPL 1.0 were both published in 1989, while the MIT license has its roots set in "probably 1987" [1] with the release of X11.

No doubt, GPL had some influence. But I would hardly single it out as the force that ensured software stayed open. Software stayed open because "information wants to be free" [2], not because some authors wield copyright law like a weapon to be used against corporations.

[1]: https://opensource.com/article/19/4/history-mit-license

[2]: A popular phase based on a fundamental idea that predates software.

Decabytes 24 hours ago

The existence of permissive licenses like BSD or MIT does not show that copyleft was unimportant.Those licenses allowed code to remain open, but they also allowed it to be absorbed into proprietary products.

The GPL’s significance was that it changed the default outcome. At a time when software was overwhelmingly proprietary, it created a mechanism that required improvements to remain available to users and developers downstream.

Gcc was a massive deal for the reasons why compilers are free now today for example

jason_oster 11 hours ago

I did not say it was unimportant. I said it was not the only important factor.

mzi 24 hours ago

GPL was a response to Symbolics incorporating public domain into their software without giving back to the community (and Lisp Machines).

cheesecompiler 2 days ago

I’m not saying it’s the only force. But if it wasn’t instrumental what’s your take on the cause of proprietary software dominating until relatively recently?

jason_oster 10 hours ago

You certainly made the case that the GPL was the only force, or at least ignored the contribution of alternative licenses.

I also wouldn't agree that proprietary software is in decline. There are niches where the OS, mobile apps, and games are almost entirely proprietary (and that is not changing any time soon). But the most damning problem is that all computer hardware now has multiple layers of subsystems with proprietary software components, even if the boot loader and beyond are ostensibly FOSS.

My take on the cause of proprietary software is "the bottom line". Companies want to sell products and they believe that it's easier to sell things that are not open source. Meanwhile, there are several counterexamples of commercial products that are also open source (not necessarily copyleft), including computer games. The cause of whatever decline you're seeing in proprietary software dominance is unlikely to be the GPL.

randallsquared 24 hours ago

The vast majority of running instances of operating systems are Linux or BSD. I don't think proprietary software has dominated for 15-20 years.

The two places it has won out thus far is in retail and SaaS. The environment of 1980 when most important software was locked behind proprietary licenses is quite far behind us.

cheesecompiler 14 hours ago

Since Linux is GPL this seems to support my point.

waffletower 11 hours ago

The downvotes on the above post are telling -- the GPL Bolsheviks are girding their loins. Myself, I am nostalgic for "information wants to be free" and find the Bolsheviks to embody a horseshoe alternative form of fascism who, somehow without cognizance of the irony, attempt to redefine the meaning of freedom.

littlestymaar 16 hours ago

Linux won against the multiple proprietary Unixes because it forced corporations to contribute back instead of keeping their secret sauce for themselves.

PunchyHamster 12 hours ago

And same corporations are now pushing BSD license at every avenue just to avoid having to do that.

jason_oster 10 hours ago

This confuses the economics of open source. It's easier to contribute changes upstream than maintaining a fork. A smart business decision is using permissively licensed software that is maintained by other teams (low maintenance cost) while contributing patches upstream when the need arises (low feature cost).

Bringing a fork in-house and falling behind on maintenance is a very bad idea. The closest I've ever come to that in industry was deploying a patch before the PR was merged.

littlestymaar 8 hours ago

Proprietary Unixes were literally that at the scale of an entire OS.

marcus_holmes 20 hours ago

> For me personally, what is more interesting is that we might not even be able to copyright these creations at all. A court still might rule that all AI-generated code is in the public domain, because there was not enough human input in it. That’s quite possible, though probably not very likely.

As I understand it, the US Supreme Court has just this week ruled exactly this. LLM output cannot be copyrighted, so the only part of any piece of software that can be copyrighted is that part that was created by a human.

If you vibe-code the entire thing, it's not copyrightable. And if it can't be copyrighted that means it is in the public domain from the instant it was created and can't be licensed.

graemep 18 hours ago

> As I understand it, the US Supreme Court has just this week ruled exactly this. LLM output cannot be copyrighted, so the only part of any piece of software that can be copyrighted is that part that was created by a human.

Your understanding is incorrect. The case was about whether an LLM can be an author, and did not whether the person using it can be (which will be the case). https://news.ycombinator.com/item?id=47260110

jagged-chisel 17 hours ago

This is the correct understanding. Go back to the selfie of the monkey. Is the monkey the creator of the photo? Does he own the copyright? No. The photographer who created the opportunity for the monkey to take the selfie is the holder of the copyright on that image.

Similarly, the operator of the LLM is the holder of the copyright of the LLM’s output.

magicalist 16 hours ago

> This is the correct understanding. Go back to the selfie of the monkey. Is the monkey the creator of the photo? Does he own the copyright? No. The photographer who created the opportunity for the monkey to take the selfie is the holder of the copyright on that image.

This is incorrect. The monkey is unable to have a copyright on the photograph, but there was no court case suggesting the owner of the camera (Slater) has a copyright on the photo, and the Copyright Office's rules actually say the opposite, that it isn't copyrightable at all (the Wikipedia summary of the situation is good, pointing out the Copyright Office specifically added an example of "a photograph taken by a monkey" to their guidance to make their point clear).

jagged-chisel 15 hours ago

I was indeed misremembering part of this.

The professional photographer claimed he engineered the situation that led to the photo and thus he owns the copyright on the images. That specific claim appears to not have been addressed by the court nor by the copyright office. Instead Slater settled by committing to donations from future revenue of the photos.

observationist 10 hours ago

If it were a trained monkey, and the photographer held a button in his hand that triggered the photo taking mechanism, there'd be no question of copyrightability. Similarly, vibe-coding and eliciting output from a software tool which results in software or images or text created under the specification and direction and intent and deliberate action of the user of the tool is clearly able to be copyrighted.

The user is responsible for the output of the software. An image created in photoshop isn't the IP of Adobe, nor is text in Word somehow belonging to Microsoft. The idea that because the software tool is AI its output is magically immune from copyright is silly, and any regulation or legislation or agency that comes to that conclusion is silly and shouldn't be taken seriously.

Until they get over the silliness, just lie. You carefully manually crafted each and every character, each pixel, each raw byte by hand, slaving away with a tiny electrode, flipping each bit in memory, to elicit the result you see. Any resemblance to AI creations is purely coincidental, or deliberate as an ironic statement about current affairs.

greyface- 9 hours ago

Copyright is positive law created by humans, not natural law that we happen to recognize. The idea that adopted legislation or established caselaw can be wrong about what copyright fundamentally is makes no sense.

observationist 6 hours ago

Not what I'm saying - if you meet the technical, intentional definition of a process, substantiated by precedent, then the law should support any variation of the process which has those same technical features meeting the definition.

Using AI as a tool to produce output, no matter how complex the underlying tool, should result in the authorship of the output being assigned to the user of the tool.

If autocorrect in Word doesn't nullify copyright, neither should the use of LLMs; manifesting an idea into code and text and images using prompts might have little human input, but the input is still there. And if it's a serious project, into which many hours of revision, back and forth, testing, changing, etc, there should be absolutely no bar to copyright.

I can entertain a dismissal based on specific low effort uses of a tool - something like "generate a 13 chapter novel 240 pages long" and seeing what you get, then attempting to publish the book. But almost anything that involves any additional effort, even specifying the type of novel, or doing multiple drafts, or generating one chapter at a time, would be sufficient human involvement to justify copyright, in my eyes.

There's no good reason to gatekeep copyright like that. It doesn't benefit society, or individuals, it can only benefit those with vast IP hoards and giant corporations, and it's probably fair to say we've all had about enough of that.

prmph 12 hours ago

Technically how will vibe code be identified? And how does one determine the level of human involvement that would make code copyrightable? What of the prompts? Are those copyrightable? What about the architectural and tactical design of the code if I do those myself?

I don't vibe code; I am firmly in charge of the architecture and code style of my projects, and i frequently give detailed instructions to AI tools I use. But, to me, this is leading to a weird place. Why would the result of using a tool to create something new not be copyrightable simply due to the specific tool used?

I think this whole hullabaloo is self inflicted. Code or an other creative work should stand on its merits. There is no issue with copyright and no issue with the ship of Theseus. The current copyright approach is still applicable: code (or any other creative work) that appears to be lifted verbatim from another work could be a copyright violation. Work that is sufficiently original (irrespective of how it was created) is likely not a copyright violation.

semi-extrinsic 19 hours ago

> And if it can't be copyrighted that means it is in the public domain from the instant it was created and can't be licensed.

I don't think this follows? If I vibe code something and never post it anywhere public, I can still license that code to a company and ask them to pay me for using the code?

So as a corollary, the business model of providing software where you can choose either free (as in beer) and restrictive license (e.g. GPL), or pay money and get a permissive business-compatible license, will cease to exist.

I think that's a shame actually, because it has been a good way of providing software that does something useful but where large companies that earn money from the use will have to pay the software creator.

ketzu 18 hours ago

> I can still license that code to a company and ask them to pay me for using the code

I believe you can do that with public domain/copyright free material in general. There is no requirement to tell someone that the material you license them is also available under a different one or that your license is not enforceable.

magicalist 16 hours ago

Depending on how you do it and they find out, you could certainly be sued for fraud and misrepresentation, though. And, if you put a "copyright by me" at the top of a public domain work, it's technically a crime under 17 U.S.C. § 506(c) - Fraudulent Copyright Notice

https://www.law.cornell.edu/uscode/text/17/506#c

BerislavLopac 19 hours ago

Code is one thing, but what about writing? There is no 100% foolproof way to identify content written by LLMs, and human writing routinely gets incorrectly flagged as such. If I write a book, and a checker says that it's written by LLM, is it automatically in the public domain?

vbarrielle 19 hours ago

The test suite was also licensed under the LGPL. The reimplementation can be seen as a derivative work of the test suite, and thus should fall under the LGPL. This does not even mention the fact that the coding agent, AND the user steering it, both had ample exposure to chardet's source code, making it hard to argue that the reimplementation is a new ship.

bloppe 6 hours ago

> I’m a strong supporter of putting things in the open with as little license enforcement as possible. I think society is better off when we share, and I consider the GPL to run against that spirit by restricting what can be done with it.

This is a head-spinning argument. The whole point of GPL is to force more things out into the open. You'd think someone who espouses open source would cheer the GPL. The only practical difference between MIT and GPL is that the former allows more closed-source code.

This feels analogous to the paradox of freedom. Truly unlimited freedom would include the freedom to oppress others, so "freedom maximalism" is an unsound philosophy (unless applied solipsistically).

When I publish, I tend to do so under MIT. I also write plenty of closed-source code. And I do generally believe in open source. But I don't use that as a justification for preferring MIT. If anything, I like MIT despite believing in open source, not because. Mainly because I want people to actually use what I wrote.

nomdep 2 days ago

In this emerging reality, the whole spectrum of open-source licenses effectively collapses toward just two practical choices: release under something permissive like MIT (no real restrictions), or keep your software fully proprietary and closed.

These are fascinating, if somewhat scary, times.

pabs3 24 hours ago

The latter will become MIT sooner or later with Ghidra plus LLM-assisted reverse engineering.

https://reorchestrate.com/posts/your-binary-is-no-longer-saf... https://reorchestrate.com/posts/your-binary-is-no-longer-saf...

Even SaaSS isn't safe from that type of process:

https://news.ycombinator.com/item?id=47259485

roenxi 15 hours ago

I see you submitted that as a link, it deserves a lot more than the current 4 upvotes I see. What a fascinating article. It gives me much hope that dead old games are not in fact dead. If there is still a binary somewhere and current trends continue then they can probably be resurrected cheaply and with relatively unskilled people.

visarga 23 hours ago

If you got access to a working prototype of a software, you can use it for differential testing. So you got unlimited tests for free.

galaxyLogic 22 hours ago

We will need ... software patents!

ndsipa_pomu 19 hours ago

No, lawyers will want software patents as that's the only group that would benefit from them, apart from large litigation-happy companies that want to squash any competition.

embedding-shape 2 days ago

> or keep your software fully proprietary and closed.

I guess it depends on your intention, but eventually I'm not sure it'll even be possible to keep it "fully proprietary and closed" in the hopes of no one being able to replicate it, which seems to be the main motivation for many to go that road.

If you're shipping something, making something available, others will be able to use it (duh) and therefore replicate it. The barrier for being able to replicate things like this either together with LLMs or letting the LLM straight it up do it themselves with the right harness, seems to get lowered real quick, massive difference in just a few years already.

josephg 24 hours ago

I completely agree.

Right now you can point claude at any program and ask it to analyse it, write an architecture document describing all the functionality. Then clear memory and get it to code against that architecture document.

You can't do that as easily with closed source software. Except, if you can read assembly, every program is open source. I suspect we're not far away from LLMs being able to just disassemble any program and do the same thing.

Is there a driver in windows that isn't in linux? No problem. Just ask claude to reverse engineer it, write out a document describing exactly how the driver issues commands to the device and what constraints and invariants it needs to hold. Then make a linux driver that works the same way.

Have an old video game you wanna play on your modern computer? No problem. Just get claude to disassemble the whole thing. Then function by function, rewrite it in C. Then port that C code to modern APIs.

It'll be chaos. But I'm quite excited about the possibilities.

the_mitsuhiko 22 hours ago

> You can't do that as easily with closed source software. Except, if you can read assembly, every program is open source. I suspect we're not far away from LLMs being able to just disassemble any program and do the same thing.

I have successfully created a partial implementation of p4 by pointing it at the captured network stream and some strace output. It's amazing how good those things are.

toyg 14 hours ago

You don't even need to go down to assembly - most commercial software is trivial to disassemble calling a few EXEs. In theory this is largely forbidden by licenses, but good luck enforcing them now.

moregrist 24 hours ago

I suspect there’s a middle ground that involves either keeping tests more proprietary or a copyright license that bars using the work for AI reimplementation, or both.

I think it’s entirely reasonable to release a test suite under a license that bars using it for AI reimplementation purposes. If someone wants to reimplement your work with a more permissive license, they can certainly do so, but maybe they should put the legwork in to write their own test suite.

measurablefunc 2 days ago

If you listen to the people who believe real AI is right around the corner then any software can be recreated from a detailed enough specification b/c whatever special sauce is hidden in the black box can be inferred from its outward behavior. Real AI is more brilliant than whatever algorithm you could ever think of so if the real AI can interact w/ your software then it can recreate a much better version of it w/o looking at the source code b/c it has access to whatever knowledge you had while writing the code & then some.

I don't think real AI is around the corner but plenty of people believe it is & they also think they only need a few more data centers to make the fiction into a reality.

luma 2 days ago

What you describe is essentially what happened, the AI result working from specs and tests was more performant than the original. The real AI you describe just rewrote chardet without looking at the source, only better.

JambalayaJimbo 24 hours ago

How do you know it didn’t look at the source?

duskdozer 24 hours ago

It was instructed to look at the source...

simonw 23 hours ago

It was instructed NOT to look at the source, with the one exception that it was told to look at this single file full of charset definitions: https://github.com/chardet/chardet/blob/f0676c0d6a4263827924...

glkindlmann 14 hours ago

Is there any visibility or accountability to record exactly what it did and not look at? I doubt it. So we're left with a kind of Rorschach test: some people think LLMs follow rules like law-abiding citizens, and some people distrust commercial LLMs because they understand that commercial LLMs were never designed for visibility and accountability.

simonw 12 hours ago

There should exist a .jsonl file somewhere with exactly that information in it - might be worth Dan preserving that, it should be in a ~/.claude/projects folder.

pixl97 2 days ago

Real AI will never be invented, because as AI systems become more capable we'll figure out humans weren't intelligent in the first place, therefore intelligence never existed.

measurablefunc 2 days ago

Don't worry, just 10 more data centers & a few more gigawatts will get you there even if the people building the data centers & powerplants are unintelligent & mindless drones. But in any event, I have no interest in religious arguments & beliefs so your time will be better spent convincing people who are looking for another religion to fill whatever void was left by secular education since such people are much more amenable to religious indoctrination & will very likely find many of your arguments much more persuasive & convincing.

pixl97 2 days ago

I mean, it sounds kinda like you're the one making religious arguments. My response is one mocking how poorly egotistical people deal with the AI effect.

Evolution built man that has intelligence based on components that do not have intelligence themselves, it is an emergent property of the system. It is therefore scientific to think we could build machines on similar principles that exhibit intelligence as an emergent property of the system. No woo woo needed.

qsera 2 days ago

>It is therefore scientific to think we could build machines on similar principles that exhibit intelligence as an emergent property of the system.

Sure, but this ain't it.

Actually, I think LLMs are a step in the wrong direction if we really want to reach true AI. So it actually delays it, instead of bringing us close to true AI.

But LLMs are a very good scam that is not entirely snake oil. That is the best kind of scam.

pixl97 14 hours ago

>Actually, I think LLMs are a step in the wrong direction if we really want to reach true AI.

Any particular reason beyond feelings why this is the case.

We already know expert systems failed us when reaching towards generalized systems. LLMs have allowed us to further explore the AI space and give us insights on intelligence. Even more so we've had an explosion in hardware capabilities because of LLMs that will allow us to test other mechanisms faster than ever before.

qsera 11 hours ago

Because if it was in the right direction, then it would have been possible to amend its knowledge without going through the whole re-training procedure.

measurablefunc 2 days ago

Me & a few friends are constructing a long ladder to get to the moon. Our mission is based on sound scientific & engineering principles we have observed on the surface of the planet which allows regular people to scale heights they could not by jumping or climbing. We only need a few trillions of dollars & a sufficiently large wall to support it while we climb up to the moon.

There are lots of other analogies but the moon ladder is simple enough to be understood even by children when explaining how nothing can emerge from inert building blocks like transistors that is not reducible to their constituent parts.

As I said previously, your time will be much better spent convincing people who are looking for another religion b/c they will be much more susceptible to your beliefs in emergent properties of transistors & data centers of sufficient scale & magnitude.

pixl97 2 days ago

>friends are constructing a long ladder to get to the moon

Congratulations, you're working on a space elevator. A few trillion dollars would certainly get us out of the atmosphere, and the amount of advances in carbon nanotube and foam metal would rocket us ahead decades in material sciences. Couple this with massive banks of capacitors and you could probably generate enough electricity for a country by the charge differential from the top to the bottom.

Oh, I get it, you were trying to be clever by saying something ignorant because it makes you feel special as a human rather than make realistic statements for the progress currently being made in the sciences.

measurablefunc 2 days ago

I don't think you get it but good luck. I've already spent enough time in this thread & further engagement is not going to be productive for anyone involved.

GaggiX 2 days ago

>Real AI is more brilliant than whatever algorithm you could ever think of

So with "Real AI" you actually mean artificial superintelligence.

measurablefunc 2 days ago

I wrote what I meant & meant what I wrote. You can take up your argument w/ the people who think they're working on AI by adding more data centers & more matrix multiplications to function graphs if you want to argue about marketing terms.

GaggiX 2 days ago

I was just thinking that calling artificial superintelligence "Real AI" was funny.

catoc 23 hours ago

They’re looking for AI that’s só good it’s unreal

measurablefunc 2 days ago

Corporate marketing is very effective. I don't have as many dollars to spend on convincing people that AI is when they give me as much data as possible & the more data they give me the more "super" it gets.

HappyPanacea 2 days ago

> b/c whatever special sauce is hidden in the black box can be inferred from its outward behavior.

This is not always true, for an extreme example see Indistinguishability obfuscation.

vintagedave 2 days ago

Or GPL. Which I’m increasingly thinking is the only license. It requires sharing.

And if anything can be reimplemented and there’s no value in the source any more, just the spec or tests, there’s no public-interest reason for any restriction other than completely free, in the GPL sense.

Hamuko 2 days ago

>Or GPL. Which I’m increasingly thinking is the only license. It requires sharing.

It doesn't if Dan Blanchard spends some tokens on it and then licenses the output as MIT.

jmalicki 2 days ago

Who are you talking about? I can't find reference to this person.

kccqzy 2 days ago

He is the maintainer of chardet. The main topic of the article is the whole LGPL to MIT rewrite and relicense done by this person.

https://github.com/chardet/chardet/releases/tag/7.0.0

Aeolun 14 hours ago

I think the “I maintained this thing for 12 years” weighs a lot heavier than the “and then I even went through the trouble of reimplementing it” before changing it to a license that is more open. Seriously…

badc0ffee 2 days ago

There were two other posts about this today on the HN front page:

https://news.ycombinator.com/item?id=47257803

https://news.ycombinator.com/item?id=47259177

raincole 23 hours ago

I highly recommend read the post in question first before commenting.

vintagedave 13 hours ago

I'm sorry, I don't understand this. I read it in full. If you're referring to the author dismissing GPL, my comment is, I think in converse they have missed something and the GPL is the best license, for the reasons I noted.

formerly_proven 2 days ago

> Or GPL. Which I’m increasingly thinking is the only license. It requires sharing.

LLM companies and increasingly courts view LLM training as fair use, so copyright licensing does not enter the picture.

f33d5173 2 days ago

I don't think it changes much about licensing in particular. People are going on about how since the AI was trained on this code, that makes it a derivative work. But it must be borne in mind that AI training doesn't usually lead to memorizing the training data, but rather learning the general patterns of it. In the case of source code, it learns how to write systems and algorithms in general, not a particular function. If you then describe an interface to it, it is applying general principles to implement that interface. Its ability to succeed in this depends primarily on the complexity of the task. If you give it the interfaces of a closed source and open sourced project of similar complexity, it will have a relatively equal time of implementing them.

Even prior to this, relatively simple projects licensed under share alike licenses were in danger of being cloned under either proprietary or more permissive licenses. This project in particular was spared, basically because the LGPL is permissive enough that it was always easier to just comply with the license terms. A full on GPLed project like GCC isn't in danger of an AI being able to clone it anytime soon. Nevermind that it was already cloned under a more permissive license by human coders.

rzerowan 2 days ago

Strange this with this whole incident apart from the rewrite/LLM part is the general misundrstanding of the licences. LGPL being a pretty permissive one going as far as allowing one to incorporate it in propriety code without the linking reciprocity clause [1] and MIT is even more permissive. Importantly these were meant to protect the USER of the code.Not the Dev , or the Company or the CLA holder - the USER is primary in the FreeSoftware world.Or at least was supposed to be , OSS muddied the waters and forgetting the old lessons learned when thing were basically bigcorp vs indie hacker trying to getthir electronic device to connect to what they want to connect to and do what they need is why were here.

Bikeshedding to eventually come full circle to understand why those decisions were made.

In a world where the large OEMs and bigcorps are increasinly locking down firmware , bootloaders , kernels and the internet. I would think a reappraisal of more enforcement that benefits the USER is paramount.

Instead we have devs looking to tear down the few user protections FLOSS provides and usher in a locked down hacker unfiendly future.

[1] https://licensecheck.io/blog/lgpl-dynamic-linking

the_mitsuhiko 2 days ago

> Strange this with this whole incident apart from the rewrite/LLM part is the general misundrstanding of the licences. LGPL being a pretty permissive one going as far as allowing one to incorporate it in propriety code without the linking reciprocity clause

The short version is that chardet is a dependency of requests which is very popular, and you cannot distribute PyInstaller/PyOxidizer builds with chardet due to how these systems bundle up dependencies.

[1]: https://velovix.github.io/post/lgpl-gpl-license-compliance-w...

[2]: https://github.com/indygreg/PyOxidizer/issues/142

rzerowan 2 days ago

Ok thanks for the background on that - again though this would be a painpoint on the packagers - but fully in line with the intentions of the GPL and with the LGPL to enpower the end user to be able to swap/update/tinker as they see fit.

As i recall there were some similar situations in regards to licences for distro builders regarding graphicsdrivers and even mp3 decoders wherer there was a song and dance the end user had to go through to legally install them during/after setup.

Or better yet to make a truly api compatible re-implementation to use with the license that they want to use, since what they have done i surmise would fall under a derivative work.So they havent really accomplised what they wanted - and instead introduced an unacceptable amount of risk to whoever uses the library going forward.

Kinda reminds me of what the Inderner Archive did during the pandemic with the digital lending library.Pushing the boundaries to test them and establish precedence. in any case let see how it plays out.

erelong 2 days ago

hopefully this continues to show how awkward the idea of "intellectual property" (IP) is until people abandon it

IP sounds good in theory but enables things like "patent trolling" by large corps and creating all kinds of goofy barriers and arbitrary questions like we're asking about if re-implementations of ideas are "really ours"

(maybe they were never anyone's in the first place, outside of legally created mentalities)

ideas seem to fundamentally not operate like physical things so asserting they can be considered "property" opens the door for all kinds of absurdities like as pondered in the OP

AuthAuth 2 days ago

I have no data to back this up but patent trolling seems to happen far less than companies that already own significant infra/talent ripping products from smaller companies and out competing them with their scale. I'd rather have patent trolling than have Amazon manufacturer everything i launch.

The problem with IP laws and the US is that the big companies already do what IP is suppose to protect and the US refuses to legislate effectively against them.

galaxyLogic 22 hours ago

And the reason for this is that there is no limit as to how much money corporations can pay for the election campaigns of politicians who make the laws. Right?

NewsaHackO 2 days ago

Unfortunately, there are going to be people who push back on the virtue of this being a startup founder website.

moralestapia 2 days ago

Is there anything you have created, spending considerable resources and time, that you ended up giving up for free? For the betterment of humanity?

Let's see it!

TZubiri 23 hours ago

the issue with this Stallmanian view on IP is that IP predates software and solves an actual issue.

I don't think Stallman has a real proposal to how innovation can be incentivized and compensated.

Take the example of medical innovations, sure big pharma is bad, but if they don't get to monetize their inventions, how will R&D get funded?

If you destroy IP and allow everyone to clone whatever, you will have a great result in the short term, then no one will continue R&D

duskdozer 12 hours ago

>Take the example of medical innovations, sure big pharma is bad, but if they don't get to monetize their inventions, how will R&D get funded?

By taking the public money that goes to medical R&D already, increased if need be, and hire scientists to research medical tech in the interest of public wellbeing and not profit.

erelong 15 hours ago

I think getting rid of IP shifts economic focus on to tangible physical goods which you can exclusively own: you can sell the physical medical devices, just not claim a specific design is "yours exclusively"

IP has always had awkward things like, what if you discover the sole treatment for a disease and can restrict people from making use of it... kind of weird, especially when people can "independently" draw the same conclusions so they truly obtain an idea that is "their own" but which then they are legally restricted from making use of in such an example

chii 22 hours ago

> then no one will continue R&D

i would like to see a system of publicly funded R&D.

Splinelinus 15 hours ago

I'm waiting for AGPL to become AIGPL: If you train a model with some or all of the licensed work, you agree that the weights of that model constitute a derivative work, and further for the weights, as well as any inference output produced as a result of those weights to be bound by the terms of the license. If you run a model with the licensed work in part or in full as input, you agree that any output from the model is bound by the terms of the license.

sigmar 15 hours ago

You can't change the law with a license agreement and redefine what constitutes a derivative work. If that was possible, people could have done it pre-LLMs.

also how would you prove it was in the training set? re: your last sentence, the licensed work wasn't in the input in the chardet example ("no access to the old source tree")

glkindlmann 14 hours ago

Sure, a license can't create new legal understanding of "derived work", but I think the intent of what Splinelinus said still works: a license outlines the terms under which a licensee can use the licensed Work. The license can say "if you train a model on the Work, then here are the terms that apply to model or what the model generates". If you accept the license, those terms apply, even if the phrase "derived work" never came up. I hope there are more licenses that include terms explicitly dealing with models trained on the Work.

Also, for comparison, both GPL and LGPL, when applied to software libraries (in the C sense of the word), assert that creating an application by linking with the library creates a derived work (derived from the library), and then they both give the terms that govern that "derived work" (which are reciprocal for GPL but not for LGPL). IANAL but I believe those terms are enforceable, even if the thing made by linking with the library does not meet a legal threshold for being a derived work.

sigmar 13 hours ago

Yeah, that's possible, but seems to me more about contract law and creating an EULA for the code, than it is about copyright-derived enforcement. maybe 'copyleft' stuff will move in that direction.

it's barely tangential to the topic but worth pointing out, I don't think there's firm legal consensus on your library point, that is just the position of the FSF that that's true. IANAL tho. https://en.wikipedia.org/wiki/GNU_General_Public_License#Lib...

Splinelinus 13 hours ago

This is also my thinking. A(ffero)GPL does something similar by saying a user of an API to AGPL code is bound by the AGPL license. You can always choose not to use the code, and not to use the license.

For the parent comment on discoverability, I honestly don't know. Some models list their data sources, others do not. But if it came down to a dispute it may be that a court order could result in a search of the actual training data and the system that generated it.

For the second case of derived work through context inclusion, it may end up in a similar situation with forensic analysis of the data that generated some output.

ncruces 15 hours ago

Agree. But then, the test suite was the input (chardet). So, is the test suite creative or functional in nature? And does the concept of fair use apply globally?

rzmmm 15 hours ago

Bingo. I can see this is a possible future, and probably desirable scenario for anyone with preference for free software.

Smith42 13 hours ago

So write it! Shouldn't be much extra to add to the AGPL licence?

PaulDavisThe1st 23 hours ago

US courts have ruled that machine generated code cannot be copyright. Ergo, it cannot be licensed (under any license; nobody owns the copyright, thus nobody can "license" it to anyone else).

You cannot (*) use LLMs to generate code that you then license, whether that license is GPL, MIT or some proprietary mumbo-jumbo.

(*) unless you just lie about this part.

nl 22 hours ago

This oversimplifies it.

You can't copyright a work that is only generated by a machine: "In February 2022, the Copyright Office’s Review Board issued a final decision affirming the refusal to register a work claimed to be generated with no human involvement"

But human direction of machine processes can be copyright:

"A year later, the Office issued a registration for a comic book incorporating AI-generated material."

and

"In most cases, however, humans will be involved in the creation process, and the work will be copyrightable to the extent that their contributions qualify as authorship. It is axiomatic that ideas or facts themselves are not protectible by copyright law and the Supreme Court has made clear that originality is required, not just time and effort. In Feist Publications, Inc. v. Rural Telephone Service Co., the Court rejected the theory that “sweat of the brow” alone could be sufficient for copyright protection. “To be sure,” the Court further explained, “the requisite level of creativity is extremely low; even a slight amount will suffice."

See https://www.copyright.gov/ai/Copyright-and-Artificial-Intell...

PaulDavisThe1st 13 hours ago

I have no doubt that I was oversimplifying it. The court case that determines whether code written by an LLM in response to various types of prompts has not yet been launched (AFAIK; if it has, it has not yet been decided).

But it will be a shitshow either way.

galaxyLogic 22 hours ago

Would writing a prompt, or few, for an LLM qualify as "the requisite level of creativity is extremely low; even a slight amount will suffice"

nl 20 hours ago

Read the linked report - it discusses this.

The short answer is that it's possible if the prompt has sufficient control but only the parts controlled by the human are eligible for copyright.

Using AI doesn't automatically disqualify from copyright protection though.

IsTom 20 hours ago

If you take AI image (that cannot be copyrighted) and adjust it in photo edition software of your choice then the changes are potentially copyrightable and the resulting image can be copyrighted (you need to ensure that your changes pass the low bar of creativity).

It's not clear to me how much code you would need to modify by hand to qualify for copyright this way, but that's not an impossible avenue.

fouc 21 hours ago

> But this all causes some interesting new developments we are not necessarily ready for. Vercel, for instance, happily re-implemented bash with Clankers but got visibly upset when someone re-implemented Next.js in the same way.

Kinda surprised nobody commented on this

beloch 23 hours ago

Perhaps code licensing is going to become more similar to music.

e.g. Somebody wrote a library, and then you had an LLM implement it in a new language.

You didn't come up with the idea for whatever the library does, and you didn't "perform" the new implementation. You're neither writer nor performer, just the person who requested a new performance. You're basically a club owner who hired a band to cover some tunes. There's a lot involved in running a club, just like there's a fair bit involved in operating a LLM, but none of that gives you rights over the "composition". If you want to make money off of that performance, you need to pay the writer and/or satisfy whatever terms and conditions they've made the library available under.

IANAL, so I don't even know what species of worms are inside this can I've opened up. It seems sensible, to me, that running somebody else's work through a LLM shouldn't give you something that you can then claim complete control over.

---------

Edit: For the sake of this argument, let's pretend we're somewhere with sensible music copyright laws, and not the weird piano-roll derived lunacy that currently exists in the U.S..

fpaf 16 hours ago

I find the music example very illuminating, thanks! Looking into US Copyright for songs there are two different kinds:

- one for the composition, the musical idea, music, lyrics.

-one for the recording, the music taking shape in a format that someone can listen to

I don't think this is how software licenses work, as they cover the code itself, rather than the ideas (the specific recording rather than the composition, in the music example), but it's an interesting way to frame why using LLM this way is, if not illegal, at least unethical.

source: https://www.copyright.gov/engage/musicians/

galaxyLogic 22 hours ago

If a recordoing is made in a club, doesn't the party doing the recording have the copyright to that (live) recording, or is it the performers?

jdndbdjsj 21 hours ago

Sans contract? Probably like if I take a photo of you holding a copy of a recent book. I own copy right of the photo. The author still has copyright of the book.

greyface- 12 hours ago

Compulsory licensing for software is going to be fun.

jFriedensreich 11 hours ago

Non-permissive licenses, open core and proprietary software will just not survive. There is no reality in which I or anyone in my community would use something like eg. raycast or the saas email clients that someone locks down and does rent extraction and top down decisions on. The experience of being able to change anything about the software i use with a prompt while using it is impossible to come back from to all the glitches, limitations and stupidities. we have to come to terms with infinite software.

mellosouls 21 hours ago

Note the Ship of Theseus, while a nice comparison for the title, is not - as the author eventually points out - an appropriate analogy here. A fundamental contribution to the idea of whether the identity of the entity persists or not is the continuity between intermediate states.

In the example given and discussed here the last couple of days there seems to be a process more akin to having an AI create a cast of the pre-existing work and fill it for the new one.

emporas 16 hours ago

Porting code from one programming language to another will be one of the most important tasks of code gen A.I.

Imagine doing the same with vehicle engines. Less fuel consumption, less pollution, less weight and who knows how many more benefits.

Just letting the A.I. do it by itself is sloppy though. The real benefit is derived only when the resulting port is of equal or better quality than the original. It needs a more systematic approach, with a human in the loop and good tools to index and select data from both codebases, the original and the ported one. The tools are not invented yet but we will get there.

StephenHerlihyy 20 hours ago

At what point does the cost of reimplementation shrink below the benefits of obfuscation? Consider a new CVE in Linux. Well maybe my Linux is not the same as the public one. Maybe I just set a swarm of AI agents on making me a drop in replacement that is different but with an identical interface. Same-same but different. Right now writing your own OS to replace the entirety of Linux would be costly and error prone. Foolish. But will it always? What happens when Claude Code Infinute Opus can 1-shot a perfect reimagining in 24 hours? Or 30 minutes? Do all my servers have the same copy or are they all slightly different implementations of the same thing? I dunno.

LucasAegis 16 hours ago

AI is merely a sophisticated tool. If your original thoughts achieve a tangible result through this tool, the ownership should reside with the thinker. Reverse-engineering, in this context, shouldn't be seen merely as an infringement on AI-generated code, but as a violation of the human intellect and systemic design that orchestrated that code. We need to move past protecting 'lines of code' and start protecting the 'intent and architecture' behind them.

latexr 16 hours ago

> If your original thoughts achieve a tangible result through this tool, the ownership should reside with the thinker.

What if you ask the tool “come up with an idea and build it” and it makes you an (obviously) derivative app? Or what if (closer to this post) you say “copy this thing, but differently so we don’t get into legal trouble”? Is any of those an “original thought” worthy of ownership of the output?

bayindirh 16 hours ago

What if the tool needs an amalgam of everything on the internet to barely function and some of this everything has a big red label saying that adding said thing to this amalgam is forbidden for a reason or another?

Further, what if this tool can reproduce these forbidden things almost or completely verbatim and the user of the tool has no way to verify it?

LucasAegis 16 hours ago

You are focusing on the 'bricks' (the literal lines of code), but your argument overlooks the fundamental reality of Architectural Interdependency. In the era of AI-driven synthesis, we must shift our perspective from linguistic expression to systemic logic.

Think of software development as finding a structural path from point A to point D.

1.The Foundational Gateway (A → B): You are correct that AI tools are an amalgam of existing data. This foundational layer (A-B) represents the "Prior Art" or the existing IP that serves as a necessary gateway for any further development. If the path starts here, the rights of the original creators must be respected through the established legal framework of Intellectual Property Offices.

2.The Innovative Branch (F → D): However, if an orchestrator uses a tool to forge a new path via a distinct architecture (F) to reach the destination (D), that specific "delta" is a unique intellectual asset. Even if the tool "borrows" the bricks, the topological map of the new architecture belongs to the thinker who directed it.

3.The Necessity of Cross-Licensing: This is where the true core of IP exists. If the owner of the foundation (A-B) wishes to utilize the superior, optimized results of the new path (ABFD), they must respect the IP of the FD architecture. Conversely, the FD creator must acknowledge the base.

We aren't just talking about 'verbatim reproduction' of code; we are talking about the Systemic Design that justifies the existence of IP offices worldwide. The future isn't about "cleaning" licenses through AI, but about a more sophisticated world of Cross-Licensing where the foundational layer and the innovative layer recognize each other's functional logic.

KaiserPro 15 hours ago

> the ownership should reside with the thinker.

Assuming that you are a programmer, when you think back to your contract, you will have noticed something like "The employee agrees to that any works created during employment will be solely owned by $company_name"

Google spent many many millions undermining that so they could run youtube, the news service and google books (amongst other things.

Disney bought most of congress to do the opposite.

At it's heart copyright is a tool that allows you and me to make a living. However its evolved into a system that allows large corporations to make and hold monopolies.

Now that large corporations can see an opportunity to cut employees out of the system entirely, they are quite happy with AI companies undermining copyright, just so long as they can keep charging for auto generated content.

TLDR: copyright is automatically assigned to the creator of the specific work, not the thinker.

ie thinker: "build me a box with two yellow rabbit ears"

The text is copyright of the "thinker"

maker: builds a box with yellow rabbit ears Unless the yellow rabbit ears are a specific and recognisable of the thinker's work, its not infringement.

benob 21 hours ago

It's funny that real value is now in test suites. Or maybe it's always been...

andsoitis 15 hours ago

> I’m a strong supporter of putting things in the open with as little license enforcement as possible.

Oooohkaaaay?

falcor84 15 hours ago

Licensing, and particularly copyleft is based on copyright - you cannot offer a license, if you don't have a copyright on the thing. You can put it in the public domain, but that is very different.

andsoitis 13 hours ago

I understand that. It was just curious to me why, if one holds the position that information ought to be as open as possible, the author still chooses to copyright their won writing. It seems to me, that the ideal is putting it in the public domain, i.e. no copyright. But maybe I'm missing something.

senko 20 hours ago

Maybe, just maybe, this whole AI thing could result in us collectively waking up and realizing copyright is entirely unsuitable for software.

philipwhiuk 17 hours ago

Or maybe that AI is committing copyright theft?

ChrisMarshallNY 19 hours ago

> slopforks

Good term.

For myself, I tend to have a similar view as the author (I publish MIT on most of my work), but it’s not really something I’m zealous about, and I’m not really into “slopforking” the work of others. I tend to prefer reinventing the wheel.

cheesecompiler 2 days ago

After cloning a test suite you're still left with ongoing maintenance and development, maintaining feature parity etc. There's a lot more than passing a test suite. If the rewrite is truly superior it deserves to become the new Ship of Theseus. But e.g. I doubt anyone's AI rewrites of SQLite will ever put a dent in its marketshare.

globular-toast 16 hours ago

> The motivation: enabling relicensing from LGPL to MIT.

Good heavens, that's incredibly unethical. I suppose I should expect nothing more from a profession that has shied away from ethics essentially since its conception.

> I think society is better off when we share

Me too.

> and I consider the GPL to run against that spirit by restricting what can be done with it.

The GPL explicitly allows anyone to do anything with it, apart from not sharing it.

You want me to share with you, but you don't want to share with me.

cubefox 20 hours ago

> Unlike the Ship of Theseus, though, this seems more clear-cut: if you throw away all code and start from scratch, even if the end result behaves the same, it’s a new ship.

That's not how copyright works. It doesn't require exact copies. You also can't just rephrase an existing book from scratch when the ideas expressed are essentially the same. Same with music.

infinitewars 22 hours ago

The ship never existed, only the idea of a ship.

7777777phil 2 days ago

The legal question is a distraction. GPL was always enforced by economics: reimplementation had to cost more than compliance. At $1,100 for 94% API coverage, it doesn't. Copyleft was built for a world where clean-room rewrites were painful but they aren't anymore.

badc0ffee 2 days ago

I don't think it's been established that clean-room rewrites are no longer painful. We don't know if chardet could have been rewritten so easily if the original code wasn't in the training set.

davidcollantes 16 hours ago

> Right now I would argue that unless some evidence of the contrary could be provided, this can be seen as a new implementation from ground up.

Not ship of Theseus, but a "new implementation from ground up.

Evidently, the author prefers MIT (https://github.com/chardet/chardet/issues/327#issuecomment-4...), and seems OK with slop-coding.

thangalin 2 days ago

Translate an alternative?

https://github.com/albfernandez/juniversalchardet

jneen 20 hours ago

I mean, it has to be asked... was the source of chardet not in the training set...?

__mharrison__ 22 hours ago

Licensing is done. Reimplementation will be to easy...

Towaway69 19 hours ago

> There is an obvious moral question here, but that isn’t necessarily what I’m interested in.

Interestingly that‘s also the exact same spot I stopped reading.

The dilution of morals weakens societies. We ignore them at our own peril, the planet and most certainly any god figure doesn’t care.

scuff3d 2 days ago

The solution to this whole situation seems pretty simple to me. LLMs were trained on a giant mix of code, and it's impossible to disentangle it, but a not insignificant portion of their capabilities comes from GPL licenced code. Therefore, any codebase that uses LLM code is now GPL. You have a proprietary product? Not anymore.

Not saying there's a legal precedent for that right now, but it's the only thing that makes any sense to me. Either that or retain the models on only MIT/similarly licenced code or code you have explicit permission to train on.

nkmnz 2 days ago

What about the code that wasn't even GPL, but "all rights reserved", i.e., without any license? That's even stronger than GPL and based on your reasoning, this would mean that any code created by an LLM is not licensed to be used for anything.

PaulDavisThe1st 22 hours ago

Code created by an LLM cannot, in the USA, be copyrighted. No copyright, no license.

nkmnz 14 hours ago

You get it wrong. Copyright is excluding you from using something, a license is allowing you to use something. So „no license“ does NOT mean „free to use“, but „not allowed to use“.

PaulDavisThe1st 13 hours ago

If you do not hold copyright, you cannot prevent someone from copying a thing. If you cannot prevent someone from copying the thing, then "licensing" it is somewhere between pretty weird and pretty stupid, no?

nkmnz 13 hours ago

No, because OP implied that the AI generated content inherits the LICENSE: in their view, if the input has been GPL, The output must be GPL. So if the input hasn’t been licensed at all, the output cannot be licensed. The inheritance of „no license“ is not „no copyright“, but „no license“. The question of copyright applies hasn’t been definitely answered yet, but just because it is likely that the person PROMPTING the AI doesn’t gain copyright, doesn’t mean that an output that is 1:1 derived from copyrighted material loses its copyrighted status. That would be truly ridiculous.

PaulDavisThe1st 12 hours ago

As you note, this is a legal question that has not yet been answered. I think that speculating on the outcome in the current legal climate is fruitless.

duskdozer 24 hours ago

That would make sense, yes.

moralestapia 2 days ago

Yes.

scuff3d 2 days ago

Okay. That's fine with me. I was trying to be generous and assume the GPL would be the strongest.

PaulDavisThe1st 22 hours ago

US courts have already ruled that in the USA, no machine-generated code can be copyrighted. No copyright, no license, of any type.

keithnz 2 days ago

if you train yourself by looking at GPL code then go implement your own things, is that code GPL?

dec0dedab0de 2 days ago

it can be, depending on if it is different enough to convince a jury that it is not a copyright violation. See the lawsuits from Marvin Gaye's family to see how that can be unpredictable.

rmast 22 hours ago

I would imagine there must also be some aspect of uniqueness to it as well for even recognizing where a line of code came from… otherwise almost every Python script might have copied this line from a GPL licensed program:

`if __name__ == "__main__":`

I have no idea where that line first appeared, so figuring out what license it was originally written under would be difficult to track down, and most software only has license info at the file rather than line level.

scuff3d 2 days ago

I work with people who literally won't even look at GPL code, because of the risk. So yes, potentially.

estimator7292 2 days ago

If you copy and paste one line from a thousand different GPL projects, is the resulting program GPL?

Let's be honest about what's happening here.

scuff3d 2 days ago

It could be. The amount of code you copy doesn't matter, just depends on context and if your work could now be considered derivative.

I said this else where, but I work with people who won't even look at GPL code because of the potential legal entanglements.

Yes let's. Corporations with billions of dollars behind them whole sale stole copy right work and licenced code to train models, and then turned around and sold the result with no attribution or monetary benefit given to the people they stole from. They knew what they were doing and relied on the legal system being slow enough that they could plant a flag in the market before legal challenges killed them.

It's an industry built on theft. By all rights they should have been sued/fined out of existence before it ever got this far. But if you have enough money you can make almost anything legal.

AberrantJ 2 days ago

Of course not, because everyone making these arguments wants people to have some magic sauce so they get to ignore all the rules placed on the "artificial" thing.

bakugo 2 days ago

If you genuinely believe that you are not above a literal text completion algorithm and do not deserve any more rights than it, that says more about you than anything else.

AberrantJ 2 days ago

[flagged]

moralestapia 2 days ago

100% agree, if we are fair and honorable.

In practice, well ... you saw what's been going on with the Epstein files, etc... we are far from being ourselves in a world that's fair and honorable.

(I'm not condoning it, I think it's massively trashy to steal code like this then pretend you're the good guy because of some super weird mental gymnastics you're doing)

scuff3d 2 days ago

Completely agree. This isn't practical. It's never going to happen just because of the sheer amount of capital behind LLM companies.

You can do anything rotten, as long as you throw enough money at it.

latexr 16 hours ago

> There is an obvious moral question here, but that isn’t necessarily what I’m interested in.

And thus we arrive at the absolute shit state the world is in. We keep putting morality aside for something “more interesting” then forget to consider it back in when making the final point.

“Have you tried: “kill all the poor?””

https://youtube.com/watch?v=s_4J4uor3JE

mannanj 12 hours ago

I think at the core this is a problem of abuse of the commons and parasitic and extractive behavior being tolerated as a norm.

How would I defend myself against hostile entities and societal norms that make it OK to steal from me and my effort without compensation? I will close my doors, put up walls, and distrust more often.

That's clearly the trend the world is going towards and I don't see that changing until we find some a way to make it cheaper to detect deception and parasitic behavior along with holding said entities accountable. Since our world leaders have had a history of unaccountable leadership and they are whom model this behavior, I have difficulty seeing the norms change without drastic worldwide leadership change.

Devasta 2 days ago

This is awful news, but I don't know what can be done, is it possible to have a new GPL4 that deals with this? I doubt it.

philipwhiuk 17 hours ago

Meanwhile elsewhere: https://www.theguardian.com/technology/2026/mar/06/uk-arts-m...

radarsat1 19 hours ago

This is interesting because I've been considering a similar project. I maintain a package for a scientific simulation codebase, it's all in Fortran and C++ with too much template code, which takes ages to build and is very error prone, and frankly a pain to maintain with its monstrous CMake spaghetti build system. Furthermore the whole thing would benefit with a rewrite around GPU-based execution, and generally a better separation between the API for specifying the simulation and the execution engine. So I've been thinking of rewriting it in Jax and did an initial experiment to port a few of the main classes to Python using Gemini. It did a fairly good job. I want to continue with it, but I'm also a bit hesitant because this is software that the upstream developers have been working on for 20+ years. The idea of just saying to them "hey look I rewrote this with AI and it's way better now" is not something I would do without giving myself pause for thought. In this case it's not about the license, they already use a permissive one, but just the general principle of suggesting a "replacement" for their work.. if I was doing it by hand it might be different, I don't know, they might appreciate that more, but I have no interest in spending that much time on it. Probably what I will do is just present the PoC and ask if they think it's worth attempting to auto-convert everything, they might be open to it. But yeah, the possibilities of auto-transpiling huge amounts of software for modernization purposes is a really interesting application of AI, amazing to think of all the possibilities. But I'm happy to have read the article because I certainly didn't think about the copyright implications.

duskdozer 12 hours ago

If you really want to do that, the sensible thing is to keep it separate from the original and respect the original license. There would have been no outcry if that happened with chardet. If the different package is genuinely better, it will be used.

StacyRawls 9 hours ago

[dead]

STARGA 23 hours ago

[dead]

rmoriz 18 hours ago

I know it's a bit off-topic, but https://www.youtube.com/watch?v=DTYnzLbHUHA

moralestapia 2 days ago

[flagged]

the_mitsuhiko 2 days ago

> "But I wish that car was free", sure pal, but it's not. Are you like, 8 years old?

Just because things are not as one wants, does not stop that desire to be there.

> When the author of a project choose a specific license s/he is making a deliberate decision.

Potentially, potentially not. I used to release software under GPL and LGPL but changed my mind a few years after that. I did so in part because of conversations I had with others that convinced me that my values are closer aligned with permissive licenses.

So engaging in a friendly discourse with a maintainer to ask them to relicense is a perfectly fine thing to do and an issue has been with chardet for many, many years on the license.

jimmaswell 2 days ago

This entirely misses the point. Re-implementing code based on API surface and compatibility is established fair use if done properly (Compaq v. IBM, Google v. Oracle). There's nothing wrong with doing that if you don't like a license. What's in question is doing this with AI that may or may not have been trained on the source. In the instance in the article where the result is very different, it's probably in the clear regardless. I'm sympathetic to the author as I generally don't like GPL either outside specific cases where it works well like the Linux kernel.

trueismywork 2 days ago

The real test would be to see how much of generated code is similar to the old code. Because then it is still a copyright. Just becsuse you drew mickey mouse from memory doesnt above you if it looks close enough to original hickey mouse.

the_mitsuhiko 2 days ago

> The real test would be to see how much of generated code is similar to the old code.

I have looked at the project earlier today there is effectively no resemblance other than the public API.

kccqzy 2 days ago

That’s I believe woefully inadequate. There are some levels of code similarity:

Level 0: the code is just copied

Level 1: the code only has white space altered so the AST is the same

Level 2: the code has minor refactoring such as changing variables names and function names (in a compiled language the object code would be highly similar; and this can easily be detected by tools like https://github.com/jplag/JPlag)

Level 3: the code has had significant refactoring such as moving functionality around, manually extracting code to new functions and manually inlining functions

Level 4: the code does the same conceptual steps as the old code but with different internal architecture

At least in the United States you have to reach Level 4 because only concepts are not copyrightable. And I believe chardet has indeed reached level 4 in this rewrite.

blell 2 days ago

This reminds me of people crying over toybox https://en.wikipedia.org/wiki/Toybox#Controversy

coldtea 2 days ago

>Licenses exists for a reason

Yes, and the choice of license for a project is made for a reason that not necessarily everybody agree with.

And the people who don't agree, have every right to implement a similar, even file-format or API compatible, project and give it another license. Gnumeric vs Excel, for example, or forks like MariaDB and Valkey.

But whether they do that alternative licensed project or not, it's perfectly rational, to not like the choice of license the original is in. They legally have to respect it, but that doesn't mean there's anything irational to disliking it or wishing it was changed.

And it's not merely idle wishing: sometimes it can make the original author/vendor to reconsider and switch license. QT is a big example. Blender. Or even proprietary to open (Mozilla to MPL).

"It's so disgusting to see people who are either malicious or non mentally capable enough to understand this"

moralestapia 2 days ago

Hmm ... you don't have to ask for consent. You just slap the license you want to your code and that's it.

It's not some sort of democracy, lol, it's a set of exclusive rights that are created the moment the work being copyrighted is produced.

(For a quick intro I recommend: https://www.youtube.com/watch?v=bxVs7FCgOig)

In the case of the license in question (L/GPL), it's one of the most strict ones out there, it explicitly forbids relicensing code under a different non-compatible license, like MIT; let me says that again, L/GPL EXPLICITLY FORBIDS the thing that happened here from happening.

I sympathize with the guy that spent 12 years of his life maintaining the code, thank you for your service or something, but that does not make a difference. The wording of the (L/GPL) license is clear and the original author and most of the other 50 or so contributors did not approve of this.

coldtea 2 days ago

[flagged]

moralestapia 2 days ago

Hey, you can definitely rewrite your argument without resorting to bad language.

Take a look at the guidelines that keep this place together: https://news.ycombinator.com/newsguidelines.html

coldtea 2 days ago

[flagged]

logicprog 2 days ago

Also from that exact same study (why not cite the actual study? It's quite readable) the LLMs couldn't recite more than a small fraction of many other books, often ones just as well known[0] — in fact, from the bar charts shown in the exact news article you cited, it's pretty clear that Sonnet 3.7 was a massive outlier, and so was Harry Potter and the Sorcerer's Stone, so it really seems to me like that's an extremely unrepresentative example, and if all the other LLMs couldn't recite even a small fraction of all the other books except that one outlier pairing, despite them being widely reproduced classics, why would we expect LLMs to actually regurgitate regularly, especially a relatively unknown open source project that probably hasn't been separately reproduced that many times?

Not to mention the fact that, as the other commenters mention, that appears to just... not have happened at all in this case, so it's a moot point.

[0]: https://arxiv.org/pdf/2601.02671

the_mitsuhiko 2 days ago

Maybe, but the LLM did not recite the chardet source code so that argument does not appear to apply here.

4star3star 2 days ago

I agree. If we look to music, how can a musician unhear what they've heard? We celebrate musicians when they cite their influences. In the case of a software library, it is a tool, not a work of art. Its beauty is in accomplishing a specific, useful task. If we can accept musicians drawing inspiration from all the music they've ever listened to, we should be able to do the same for software, especially when its internal code is unrecognizable from a similar tool.

coldtea 2 days ago

>I agree. If we look to music, how can a musician unhear what they've heard?

Unlike with music, in software traditionally a (human) programmer could be chosen who haven't "heard" (i.e. read the original code). That has traditionally called a "clean room" implementation (not to be confused with the software development process called "clean room").

irishcoffee 2 days ago

This whole "today" fascination with chardet is a classic example of manipulation. I suggest you disregard this term instead of defending it.

coldtea 2 days ago

[flagged]

fergie 20 hours ago

> A court still might rule that all AI-generated code is in the public domain, because there was not enough human input in it. That’s quite possible, though probably not very likely.

Its not only likely, it is in fact the current position, at least in the US.