I have no idea how an LLM company can make any argument that their use of content to train the models is allowed that doesn't equally apply to the distillers using an LLM output.
"The distilled LLM isn't stealing the content from the 'parent' LLM, it is learning from the content just as a human would, surely that can't be illegal!"...
The argument is that converting static text into an LLM is sufficiently transformative to qualify for fair use, while distilling one LLM's output to create another LLM is not. Whether you buy that or not is up to you, but I think that's the fundamental difference.
The whole notion of 'distillation' at a distance is extremely iffy anyway. You're just training on LLM chat logs, but that's nowhere near enough to even loosely copy or replicate the actual model. You need the weights for that.
> The U.S. Court of Appeals for the D.C. Circuit has affirmed a district court ruling that human authorship is a bedrock requirement to register a copyright, and that an artificial intelligence system cannot be deemed the author of a work for copyright purposes
> The court’s decision in Thaler v. Perlmutter,1 on March 18, 2025, supports the position adopted by the United States Copyright Office and is the latest chapter in the long-running saga of an attempt by a computer scientist to challenge that fundamental principle.
I, like many others, believe the only way AI won't immediately get enshittified is by fighting tooth and nail for LLM output to never be copyrightable
Thaler v. Perlmutter is an a weird case because Thaler explicitly disclaimed human authorship and tried to register a machine as the author.
Whereas someone trying to copyright LLM output would likely insist that there is human authorship is via the choice of prompts and careful selection of the best LLM output. I am not sure if claims like that have been tested.
The US copyright office has published a statement that they see AI output analogous to a human contracting the work out to a machine. The machine would hold the copyright, but can't, consequently there is none. Which is imho slightly surprising since your argument about choice of prompt and output seems analogous to the argument that lead to photographs being subject to copyright despite being made by a machine.
On the other hand in a way the opinion of the US copyright office doesn't matter, what matters is what the courts decide
It's a fine line that's been drawn, but this ruling says that AI can't own a copyright itself, not that AI output is inherently ineligible for copyright protection or automatically public domain. A human can still own the output from an LLM.
>I, like many others, believe the only way AI won't immediately get enshittified is by fighting tooth and nail for LLM output to never be copyrightable
If the person who prompted the AI tool to generate something isn't considered the author (and therefore doesn't deserve copyright), then does that mean they aren't liable for the output of the AI either?
Ie if the AI does something illegal, does the prompter get off scot-free?
When you buy, or pirate, a book, you didn't enter into a business relationship with the author specifically forbidding you from using the text to train models. When you get tokens from one of these providers, you sort of did.
I think it's a pretty weak distinction and by separating the concerns, having a company that collects a corpus and then "illegally" sells it for training, you can pretty much exactly reproduce the acquire-books-and-train-on-them scenario, but in the simplest case, the EULA does actually make it slightly different.
Like, if a publisher pays an author to write a book, with the contract specifically saying they're not allowed to train on that text, and then they train on it anyway, that's clearly worse than someone just buying a book and training on it, right?
> When you buy, or pirate, a book, you didn't enter into a business relationship with the author specifically forbidding you from using the text to train models.
Nice phrasing, using "pirate".
Violating the TOS of an LLM is the equivalent of pirating a book.
Contracts can't exclude things that weren't invented when the contracts were written.
Ultimately it's up to legislation to formalize rules, ideally based on principles of fairness. Is it fair in non-legalistic sense for all old books to be trainable-on, but not LLM outputs?
Not really your point but I think the skills to create these things are much slower to train than producing chips and data centres.
So they couldn't really build any of these projects weekly since the cost of construction materials / design engineers / construction workers would inflate rapidly.
Worth keeping in mind when people say "we could have built 52 hospitals instead!" or similar. Yes, but not really... since the other constraints would quickly reveal themselves
But by some definition my "Ctrl", "C", and "V" keys can build a C compiler...
Obviously being facetious but my point being: I find it impossible to judge how impressed I should be by these model achievements since they don't show how they perform on a range of out-of-distribution tasks.
Even that is underselling it; jobs are a necessary evil that should be minimised. If we can have more stuff with fewer people needing to spend their lives providing it, why would we NOT want that?
This is already hyperbolic; in most countries where software engineers or similar knowledge workers are widely employed there are welfare programmes.
To add to that, if there is such mass unemployment in this scenario it will be because fewer people are needed to produce and therefore everything will become cheaper... This is the best kind of unemployment.
So at best: none of us have to work again and will get everything we need for free. At worst, certain professions will need a career switch which I appreciate is not ideal for those people but is a significantly weaker argument for why we should hold back new technology.
If you were to rank all of the C compilers in the world and then rank all of the welfare systems in the world, this vibe-coded mess would be at approximately the same rank as the American welfare system. Especially if you extrapolate this narcissistic, hateful kleptocracy out a few more years.
Yeah but who can be hurt by this, these are both private companies? So whose interest is his "conflicting" with? I'm sure the shareholders will raise it with him and/or bring a lawsuit if they aren't happy (they probably are happy).
What $$$?! The top tier of apple one is £36.95pm. If I spend 15 mins of time every month extra self hosting then it’s immediately not worth it. (Not to mention self hosting won’t be free).
Also, for that price I get: 2TB cloud storage,Apple TV,Apple Music,news,workouts,arcade most of which cannot be self hosted.
Economies of scale are real, it’s possible Apple makes a ton of money and the user is getting a good deal!
> The rendering engine is from-scratch in Rust with HTML parsing, CSS cascade, layout, text shaping, paint, and a custom JS VM.
If I cloned Pixar’s rendering library and called that then added to my CV ‘built a renderer from scratch’ this would be entirely dishonest…
I use LLMs often and don’t hate Cursor or think they’re a bad company. But it’s obvious they are being squeezed and have little USP (even less so than other AI players). They are frankly extremely pressured to make up lies.
I don’t think I’d resist the pressure either, so not on a high horse here, but it doesn’t make it any less dishonest.
Interestingly, the UK PM (and allies) just blocked a would-be political rival Andy Burnham standing as an MP.
One of the given reasons is because Burnham is currently mayor of Greater Manchester, and running a new election there would cost approx £4m(!!) which is a huge waste of taxpayer money.
I was surprised that they even gave this as a faux reason since it seems like the sort of money they would spend on replenishing the water coolers, or buying bic pens, or... building a static website!
Tangentially, Burnham has a long history with these sorts of public-sector private vampires, having been up to his neck in PFI (of "£200 to change a lightbulb" fame) in his stint leading the NHS.
The fact that a huge amount of money is extracted from the UK government for no (or very little value) is a crying shame.
I know multiple people who work as consultants (hired via private agencies, paid for by Government) who have literally done nothing for six months plus.
They have no incentive to whistleblow, the agency employing them has no incentive to get rid of them as they take a cut, and then government department hiring them is non-the-wiser because they have no technical knowledge or understanding of what's being carried out.
Being cynical i would say it's because Burnham could potentially challenge Starmer. Less cynically Labour has a big enough majority they can afford to lose this by election. The headache of replacing the mayor of Manchester is not worth it.
Why can't he just do both jobs? Boris did it iirc.
If memory serves, Dan Jarvis also did it, being both MP and mayor of the South Yorkshire city region or whatever it was called at the time.
It is fairly innately political. No Prime Minister has ever polled as low as Starmer and come back from it, or so is being said in the press. Burnham might be a smart electoral move, but he's not a plaything of the Labour right, so they kept him out.
That's not inconsistency in the rules, that's inconsistency in what being the mayor means. In Sheffield it means you show up wearing funny clothes every so often, in Greater Manchester it means you have a full-time job, a large budget, and actual responsibilities.
For our American brethren, it's like the difference between being the Mayor of NYC vs the Macy’s Thanksgiving Day Parade King.
It's actually the role of Police and Crime Commissioner that prevents them from being an MP simultaneously. In Greater Manchester (and London) the PCC role is combined with that of Mayor, but it isn't in most other city regions.
There's not much actual difference in the mayoral aspect of the roles - Jarvis was the Mayor of the South Yorkshire Combined Authority, not simply the mayor of Sheffield City Council.
Funny, I’m the same. I also like taking walks to think but I’ve found that I must have my head pointing almost directly down (I.e. looking at my feet). It’s also how I stand thinking in the shower, with the warm water hitting my angled neck. Maybe something beneficial about that position of the neck, or maybe just habit!
I will also have conversations in my head during my walk, I’ve done this my whole life and I’m not sure to this day whether my lips move during these or not. In any case, I must get some funny looks with head bolted to the ground mumbling to myself…
As for the software. I would not want a camera on 24/7 (on any device, a compromise being my doorbell, which isn't cloud connected). It'd defeat the small LED which informs you it is on (since it is always-on), and if the machine is compromised this is a method to receive personal data.
Actually, I'd prefer a hardware killswitch on things like camera and microphone.
Alas, I'm not alone in meditating and thinking while taking a shower.
It's one of the moments of my day when I recollect what happened, what I need to do, and what not to do.
The problem is that I can get quite lost during this phase, and hot water isn't cheap, so my SO is always threatening to put a big timer in the bathroom.
My pet hypothesis about why shower is often praised to be such a mindful place is that it has not so much to do with water and more to do with the fact that for many people life alternates between 1) constant social interaction and interruptions from other people and 2) bathroom time.
How many people these days have a dedicated home office, off limits to anyone else? How many partners sleep in different rooms?
Sure, perhaps the sensory experience plays some role, but if your bathroom is reliably the most interruption-free place for you, naturally you’d form a habit of catching up on all the “slow thinking”, most negatively impacted by interruptions, during shower.
I’ve seen people with interruption-free solo hobbies (be that hiking in the woods, motorcycling, rock climbing, etc.) describing similarly mindful experiences, but unlike those shower is the lowest common denominator and perhaps one that happens most routinely.
In my case, though walks help declutter my mind somewhat, for deeper thoughts, I have to write it down sitting or laying in the bed in the worst of positions. Thinking too deeply while walking only leaves me anxious in the end as I tend to get sidetracked a lot in conversation and always have to restart the conversation over and over again.
I tried doing the same. Sometimes it made my understanding of things much clearer. However most fimes, I found it worked best when I had a clear idea on paper, either to validate the idea or when I needed to an opinion. Otherwise, ChatGPT in my case, built upon my idea that I hadn't thought through well and confuse the shit out of me.
But that’s not the Turing Test. The human who can be fooled in the Turing test was explicitly called the “interrogator”.
To pass the Turing test the AI would have to be indistinguishable from a human to the person interrogating it in a back and forth conversation. Simply being fooled by some generated content does not count (if it did, this was passed decades ago).
I think state of the art LLMs would pass the Turing test for 95% people if those people could (text) chat to them in a time before LLM chatbots became widespread.
That is, the main thing that makes it possible to tell LLM bots apart from humans is that lots of us have over the past 3 years become highly attuned to specific foibles and text patterns which signal LLM generated text - much like how I can tell my close friends' writing apart by their use of vocabulary, punctuation, typical conversation topics, and evidence (or lack) of knowledge in certain domains.
"The distilled LLM isn't stealing the content from the 'parent' LLM, it is learning from the content just as a human would, surely that can't be illegal!"...
reply