Hacker News

154

Is legal the same as legitimate: AI reimplementation and the erosion of copyleft

by dahlia1773069173144 comments
I believe it is a narrow view of the situation. If we take a look into the history, into the reasons for inventing GPL, we'll see that it was an attempt to fight copyrights with copyrights. The very name 'copyleft' is trying to convey the idea.

What AI are eroding is copyright. You can re-implement not just a GPL program, but to reverse engineer and re-implement a closed source program too, people have demonstrated it already, there were stories here on HN about it.

AI is eroding copyright, so there may no longer be a need for the GPL. GNU should stop and rethink its stance, chuck away the GPL as the main tool to fight evil software corporations and embrace LLM as the main weapon.

by ordu1773075254
> Blanchard's account is that he never looked at the existing source code directly. He fed only the API and the test suite to Claude and asked it to reimplement the library from scratch

This feels sort of like saying "I just blindly threw paint at that canvas on the wall and it came out in the shape of Mickey Mouse, and so it can't be copyright infringement because it was created without the use of my knowledge of Micky Mouse"

Blanchard is, of course, familiar with the source code, he's been its maintainer for years. The premise is that he prompted Claude to reimplement it, without using his own knowledge of it to direct or steer.

by sharkjacobs1773075275
In the corporate world, we've started using reimplementation as a way to access tooling that security won't authorize.

Sec has a deny by default policy. Eng has a use-more-AI policy. Any code written in-house is accepted by default. You can see where this is going.

We've been using AI to reimplement tooling that security won't approve. The incentives conspired in the worst outcome, yet here we are. If you want a different outcome, you need to create different incentives.

by kelseyfrog1773078681
If Blanchard is claiming not to have been substantively involved in the creation of the new implementation of chardet (i.e. "Claude did it"), then the new implementation is machine generated, and in the USA cannot be copyright and thus cannot be licensed.

If he is claiming to have been somehow substantively "enough" involved to make the code copyrightable, then his own familiarity with the previous LGPL implementation makes the new one almost certainly a derivative of the original.

by PaulDavisThe1st1773079363
This is only worth arguing about because software has value. Putting this in context of a world where the cost of writing code is trending to 0, there are two obvious futures:

1. The cost continues to trend to 0, and _all_ software loses value and becomes immediately replaceable. In this world, proprietary, copyleft and permissive licenses do not matter, as I can simply have my AI reimplement whatever I want and not distribute it at all.

2. The coding cost reduction is all some temporary mirage, to be ended soon by drying VC money/rising inference costs, regulatory barriers, etc. In that world we should be reimplementing everything we can as copyleft while the inferencing is good.

by largbae1773076858
It should be noted that the Rust community is also guilty of something similar. That is, porting old GPL programs, typically written in C, to Rust and relicensing them as MIT.
by drnick11773076283
i've been following this for a while.. and the trend for copyright (of any form - books code pictures music whatever) being laundered by reinventing the "same" thing in-some-way.. is kind-of clear.

But what happens with the new things? Has the era of software-making (or creating things at large) finished, and from now on everything will be re-(gurgitated|implemented|polished) old stuff?

Or all goes back to proprietary everything.. Babylon-tower style, noone talks to noone?

by svilen_dobrev1773085680
Why isn't it public domain?

How is the MIT license valid for the reimplementation? Since it was created with Claude, shouldn't it be public domain?

Public domain is similar in effect to the MIT license, though without the attribution requirement.

Copyright wasn't really intended to deal with code, but isn't it late to deal with that since almost all code going forward is going to lean heavily on LLM output which isn't copyrightable anyway?

by harshreality1773084186
This article is setting up a bit of a moving target. Legal vs legitimate is at least only a single vague question to be defined but then the target changes to “socially legitimate” defined only indirectly by way of example, like aggressive tax avoidance as “antisocial”— and while I tend to agree with that characterization my agreement is predicated on a layering of other principals.

The fundamental problem is that once you take something outside the realm of law and rule of law in its many facets as the legitimizing principal, you have to go a whole lot further to be coherent and consistent.

You can’t just leave things floating in a few ambiguous things you don’t like and feel “off” to you in some way- not if you’re trying to bring some clarity to your own thoughts, much less others. You don’t have to land on a conclusion either. By all means chew over things, but once you try to settle, things fall apart if you haven’t done the harder work of replacing the framework of law with that of another conceptual structure.

You need to at least be asking “to what ends? What purpose is served by the rule?” Otherwise you’re stuck in things where half the time you end up arguing backwards in ways that put purpose serving rules, the maintenance of the rule with justifications ever further afield pulled in when the rule is questioned and edge cases reached. If you’re asking, essentially, “is the spirit of the rule still there?” You’ve got to stop and fill in what that spirit is or you or people that want to control you or have an agenda will sweep in with their own language and fill the void to their own ends.

by ineedasername1773078538
Surprised they don't mention Google LLC v. Oracle America, Inc. Seems a bit myopic to condone the general legality while arguing "you can only use it how I like it".

It also doesn't talk about the far more interesting philosophical queston. Does what Blanchard did cover ALL implementations from Claude? What if anyone did exactly what he did, feed it the test cases and say "re-implement from scratch", ostensibly one would expect the results to be largely similar (technically under the right conditions deterministically similar)

could you then fork the project under your own name and a commercial license? when you use an LLM like this, to basically do what anyone else could ask it to do how do you attach any license to it? Is it first come first serve?

If an agent is acting mostly on its own it feels like if you found a copy of Harry Potter in the fictional library of Babel, you didn't write it, just found it amongst the infinite library, but if you found it first could you block everyone else that stumbles on a near-identical copy elsewhere in the library? or does each found copy represent a "Re-implementation" that could be individually copyrighted?

by ticulatedspline1773078813
There's a Japanese version of that page, written in classical text writing direction, in columns. Which is cool. Makes me wonder, though - how readable is it with so many English loanwords which should be rotated sideways to fit into columns?
by AndriyKunitsyn1773081285
Broadly speaking, the “freedom of users” is often protected by competition from competing alternatives. The GNU command line tools were replacements for system utilities. Linux was was a replacement for other Unix kernels. People chose to install them instead of proprietary alternatives. Was it due to ideology or lower cost or more features? All of the above. Different users have different motivations.

Copyleft could be seen as an attempt to give Free Software an edge in this competition for users, to counter the increased resources that proprietary systems can often draw on. I think success has been mixed. Sure, Linux won on the server. Open source won for libraries downloaded by language-specific package managers. But there’s a long tail of GPL apps that are not really all that appealing, compared to all the proprietary apps available from app stores.

But if reimplementing software is easy, there’s just going to be a lot more competition from both proprietary and open source software. Software that you can download for free that has better features and is more user-friendly is going to have an advantage.

With coding agents, it’s likely that you’ll be able to modify apps to your own needs more easily, too. Perhaps plugin systems and an AI that can write plugins for you will become the norm?

by skybrian1773077230
"Antirez closes his careful legal analysis as though it settles the matter. Ronacher acknowledges that “there is an obvious moral question here, but that isn't necessarily what I'm interested in.” Both pieces treat legal permissibility as a proxy for social legitimacy. "

This whole article is just complaining that other people didn't have the discussion he wanted.

Ronacher even acknowledged that it's a different discussion, and not one they were trying to have at the moment.

If you want to have it, have it. Don't blast others for not having it for you.

by wccrawford1773075168
You can't put a copyright and MIT license on something you generated with AI. It is derived from the work of many unknown, uncredited authors.

Think about it; the license says that copies of the work must be reproduced with the copyright notice and licensing clauses intact. Why would anyone obey that, knowing it came from AI?

Countless instances of such licenses were ignored in the training data.

by kazinator1773077217
IMHO, the API and Test Suite, particularly the latter, define the contract of the functional definition of the software. It almost doesn't matter what that definition looks like so long as it conforms to the contract.

There was an issue where Google did something similar with the JVM, and ultimately it came down to whether or not Oracle owned the copyright to the header files containing the API. It went all the way to the US supreme court, and they ruled in Google's favour; finding that the API wasn't the implementation, and that the amount of shared code was so minimal as to be irrelevant.

They didn't anticipate that in less than half a decade we'd have technology that could _rapidly_ reimplement software given a strong functional definition and contract enforcing test suite.

by dleslie1773077931
Buried in here: Mark Pilgrim suddenly reappearing after his sudden disappearance years ago! Has he been up to anything since then?
by mh22661773084342
Why are people even having problems with sharing their changes to begin with? Just publishing it somewhere does not seem too expensive. The risk of accidentally including stuff that is not supposed to become public? Or are people regularly completely changing codebases and do not want to make the effort freely available, maybe especially to competitors? I would have assumed that the common case is adding a missing feature here, tweaking something there, if you turn the entire thing on its head, why not have your own alternative solution from scratch?
by danbruc1773079276
Not a lawyer, but my understanding is: In theory, copyright only protects the creative expression of source code; this is the point of the "clean room" dance, that you're keeping only the functional behavior (not protected by copyright). Patents are, of course, an entirely different can of worms. So using an LLM to strip all of the "creative expression" out of source code but create the same functionality feels like it could be equivalent enough.

I like the article's point of legal vs. legitimate here, though; copyright is actually something of a strange animal to use to protect source code, it was just the most convenient pre-existing framework to shove it in.

by nicole_express1773076349
> If source code can now be generated from a specification, the specification is where the essential intellectual content of a GPL project resides. Blanchard's own claim—that he worked only from the test suite and API without reading the source—is, paradoxically, an argument for protecting that test suite and API specification under copyleft terms.

This is an interesting reversal in itself. If you make the specification protected under copyright, then the whole practice of clean room implementations is invalid.

by bjt1773077798
by 1773078392
> When GNU reimplemented the UNIX userspace, the vector ran from proprietary to free. Stallman was using the limits of copyright law to turn proprietary software into free software. […] The vector in the chardet case runs the other way.

That’s just your subjective opinion which many other people would disagree. I bet Armin Ronacher would agree that an MIT licensed library is even freer than an LGPL licensed library. To them, the vector is running from free to freer.

by kccqzy1773077780
Why does anyone need his new library? They can do what he did and make their own.

I'm glad we can fork things at a point and thumb our noses at those who wish to cash in on other's work.

by t435621773076887
I feel like the licenses that suffer the most isn't the GPL, but the ones like SSPL. If your code can be re-implemented easily and legally by AWS using an LLM, why risk publishing it?

It does feel like open source is about to change. My hunch is that commercial open source (beyond the consultation model) risks disappearing. Though I'd be happy to be proven wrong.

by strongpigeon1773077700
It's clear that we're entering a new era of copyright _expectations_ (whether we get new _legislation_ is different), but for now realise this: the people like me who like copyleft can do this too. We can take software we like, point an agent at it, and tell it to make a new version with the AGPL3.0-or-later badge on the front.
by grahamlee1773076812
I don't think this part is correct: "If you distribute modified code, or offer it as a networked service, you must make the source available under the same terms."

That's what something like AGPL does.

by sayrer1773077201
One of the things that irks me about this whole thing is, if it’s so clean room and distinct, why make the changes to the existing project? Why not make an entirely new library?

The answer to that, I think, is that the authors wanted to squat an existing successful project and gain a platform from it. Hence we have news cycle discussing it.

Nobody cares about a new library using AI, but squash an existing one with this stuff, and you get attention. It’s the reputation, the GitHub stars, whatever

by dwroberts1773075703
Someone be brave, and do this to ZFS. Poke the Oracle bear!
by Khaine1773082058
I'm less concerned about AI eroding copyleft and more exited about AI eroding copy right.
by hexyl_C_gut1773080124
That's a non-sequitur. chardet v7 is GPL-derived work (currently in clear violation of the GPL). If xe wanted it to be a different thing xe should've published as such. Simple as.
by moralestapia1773085426
Perhaps software patents may play an even bigger role in the future.
by throwaway20271773076128
Imagine if the author has his way, and when we have AI write software, it becomes legally under the license of some other sufficiently similar piece of software. Which may or may not be proprietary. "I see you have generated a todo app very similar to Todoist. So they now own it." That does not seem like a good path either for open source software or for opening up the benefits of AI generated software.
by delichon1773075804
A lot of untagged IANAL takes here today.
by mwkaufma1773078477
What if someone doesn't declare that it has been reimplemented using an LLM? Isn't it enough to simply declare that you have reimplemented the software without using an LLM? Good luck proving that in court...

One thing is certain, however: copyleft licenses will disappear: If I can't control the redistribution of my code (through a GPL or similar license), I choose to develop it in closed source.

by mfabbri771773075644
If the model wasn't trained on copyleft, if he didn't use a copyleft test suite and if he wasn't the maintainer for years. Clearly the intent here is copyright infringement.

If you have software your testsuite should be your testsuite, you do dev with a testsuite and then mit without releasing one. Depending on the test-suite it may break clean room rules, especially for ttd codebases.

by casey21773077430
I think what is happening is the collapse of the “greater good”. Open source is dependent upon providing information for the greater good and general benefit of its readers. However now that no one is reading anything, its purpose is for the great good of the most clever or most convincing or richest harvester.
by righthand1773077019
shall we now have to think about the tradeoffs in adopting

- proprietary

- free

- slop-licensed

software?

by throawayonthe1773075634
> Ronacher notes this as an irony and moves on. But the irony cuts deeper than he lets on. Next.js is MIT licensed. Cloudflare's vinext did not violate any license—it did exactly what Ronacher calls a contribution to the culture of openness, applied to a permissively licensed codebase. Vercel's reaction had nothing to do with license infringement; it was purely competitive and territorial. The implicit position is: reimplementing GPL software as MIT is a victory for sharing, but having our own MIT software reimplemented by a competitor is cause for outrage. This is what the claim that permissive licensing is “more share-friendly” than copyleft looks like in practice. The spirit of sharing, it turns out, runs in one direction only: outward from oneself.

This argument makes no sense. Are they arguing that because Vercel, specifically, had this attitude, this is an attitude necessitated by AI, reimplementation, and those who are in favor of it towards more permissive licenses? That certainly doesn't seem to be an accurate way to summarize what antirez or Ronacher believe. In fact, under the legal and ethical frameworks (respectively) that those two put forward, Vercel has no right to claim that position and no way to enforce it, so it seems very strange to me to even assert that this sort of thing would be the practical result of AI reimplementations. This seems to just be pointing towards the hypocrisy of one particular company, and assuming that this would be the inevitable universal, attitude, and result when there's no evidence to think so.

It's ironic, because antirez actually literally addresses this specific argument. They completely miss the fact that a lot of his blog post is not actually just about legal but also about ethical matters. Specifically, the idea he puts forward is that yes, corporations can do these kinds of rewrites now, but they always had the resources and manpower to do so anyway. What's different now is that individuals can do this kind of rewrites when they never have the ability to do so before, and the vector of such a rewrite can be from a permissive to copyleft or even from decompile the proprietary to permissive or copyleft. The fact that it hasn't been so far is a more a factor of the fact that most people really hate copyleft and find an annoying and it's been losing traction and developer mind share for decades, not that this tactic can't be used that way. I think that's actually one of the big points he's trying to make with his GNU comparison — not just that if it was legal for GNU to do it, then it's legal for you to do with AI, and not even just the fundamental libertarian ethical axiom (that I agree with for the most part) that it should remain legal to do such a rewrite in either direction because in terms of the fundamental axioms that we enforce with violence in our society, there should be a level playing field where we look at the action itself and not just whether we like or dislike the consequences, but specifically the fact that if GNU did it once with the ability to rewrite things, it can be done again, even in the same direction, it now even more easily using AI.

by logicprog1773076043
[dead]
by szundi1773077048
Perhaps we should finally admit that copyright has always been nonsense, and abolish this ridiculous measure once and for all
by moi23881773075940
I think we're going one step too far even, AI itself is a gray area and how can they guarantee it was trained legally or if it's even legal what they're doing and how can they assert that the input training data didn't contain any copyrighted data.
by throwaway20271773075500