Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Did they have a license to use public source code as a data source for data set though?


God I wish contracts were encoded semantically rather than as plain text. I just tried to look through Github's terms of service[1]. I'd search for "Github can <verb> with <adjective> code" if I could. Instead I'm giving up.

[1] https://docs.github.com/en/github/site-policy/github-terms-o...


A world in which all laws and contracts were required to be written in Lojban would be interesting.


That looks hard. More politically feasible might be a language I've unfortunately forgotten the name of, ordinary English but with some extra rules designed to eliminate ambiguity -- every noun has to carry a specifier like "some" or "all" or "the" or "a", etc.


Legalese might be similar to code, and there is lots of interest in making law machine readable. So don't give up; check back later.


Yes, it's public source code.


Public doesn’t mean it’s not encumbered by copyrights


Pretty much everything is trained on copyrighted content: machine translation software, TWDNE, DALL-E, and all the GPTs. Software people are bringing this up now because it's their ox being gored. It's the same as when furries got upset about This Fursona Does Not Exist.[1][2]

1. https://news.ycombinator.com/item?id=23093911

2. https://www.reddit.com/r/HobbyDrama/comments/gfam2y/furries_...


To expand on your argument, pretty much every person is trained on copyrighted content too. Doesn't make their generated content automatically subject to copyright either.


yeah, except that Oracle and google have way more lawyer power than furries artists.


You have no idea how much money they make. Some of them have payment plans for commissions.


This is an argument for why this is a bigger problem, not a smaller one.


If it's BSD-licensed, the encumbrance doesn't matter much.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: