Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

What if it copies only a few lines, but not an entire function? Or the function name is different, but the code inside is the same?



If we could answer those questions definitively, we could also put lawyers out of a job. There’s always going to be a legal gray area around situations like this.


Matching on the abstract syntax tree might be sufficient, but might be complex to implement.


You can probably tokenize the names so they become irrelevant. You can ignore non-functional whitespace, so that code C remains. Maybe one can hash all the training data D such that hash(C) is in hash(D). Some sort of Bloom filter...




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: