You don't actually want to filter out "bad" training data. That this stuff exist...

You don't actually want to filter out "bad" training data. That this stuff exists is an important fact about the world. It's mostly just fine tuning to make sure it produces output that align with whatever values you want it to have. The models do assign a moral dimension to all of it's concepts, so if you fine tune it so that it's completions match your desired value system, it'll generally do what you expect, even if somewhere deep in the data set there is training data diametrically opposed to it.