CCBot was already in so many robots.txt prior to this
How is CC supposed to know or control how people use the archive contents
What if CC is relying on fair use
# To request permission to license our intellectual
# property andd/or other materials, please contact this
# site's operator directly
If the operator has no intellectual property rights in the material, then do they need permission from the rights holders to license such materials for use in creating LLMs and collect licensing fees
Is it common for website terms and conditions to permit site operators to sublicense other peoples' ("users") work for use in creating LLMs for a fee
Read a tos and notice that you give the site operators unlimited license to reproduce or spread your works, almost on any site. it's required to host and show the content essentially
Is Common Crawl exclusively for "AI"
CCBot was already in so many robots.txt prior to this
How is CC supposed to know or control how people use the archive contents
What if CC is relying on fair use
If the operator has no intellectual property rights in the material, then do they need permission from the rights holders to license such materials for use in creating LLMs and collect licensing feesIs it common for website terms and conditions to permit site operators to sublicense other peoples' ("users") work for use in creating LLMs for a fee
Is this fee shared with the rights holders