And they are not selling this or distributing this. The model is very different.

cmiles74 · 2025-07-07T19:52:42 1751917962

I have to disagree, without all the copyrighted input data there would be no output data for these companies to sell. This output data is the product and they are distributing it for dollars.

KoolKat23 · 2025-07-07T19:59:08 1751918348

Copyright is concerned with the the actual physical copy. The model isn't this. The end user would have to carefully prompt the models algorithm to output a copyright infringing piece.

This argument is more along the lines of: blaming Microsoft Word for someone typing characters into the word processors algorithm, and outputting a copy of an existing book. (Yes, it is a lot easier, but the rationale is the same). In my mind the end user prompting the model would be the one potentially infringing.

cmiles74 · 2025-07-07T20:15:04 1751919304

FWIW, I don’t think there is a prompt that would reliably produce, verbatim, a copyrighted work.

I do think that a big part of the reason Anthropic downloaded millions of books from pirate torrents was because they needed that input data in order to generate the output, their product.

I don’t know what that is, but, IMHO, not sharing those dollars with the creators of the content is clearly wrong.