If we could answer those questions definitively, we could also put lawyers out of a job. There’s always going to be a legal gray area around situations like this.
You can probably tokenize the names so they become irrelevant. You can ignore non-functional whitespace, so that code C remains. Maybe one can hash all the training data D such that hash(C) is in hash(D). Some sort of Bloom filter...