> It is not that much about censorship, even that would be somewhat fine if OpenAI would do it dataset level so chatgpt would not have any knowledge about bomb-making
While I would agree that "don't tell it how to make bombs" seems like a nice idea at first glance, and indeed I think I've had that attitude myself in previous HN comments, I currently suspect that it may be insufficient and that a censorship layer may be necessary (partly in a addition, partly as an alternative).
I was taught, in secondary school, two ways to make a toxic chemical using only things found in a normal kitchen. In both cases, I learned this in the form of being warned of what not to do because of the danger it poses.
There's a lot of ways to be dangerous, and I'm not sure how to get an AI to avoid dangers without them knowing them. That said, we've got a sense of disgust that tells us to keep away from rotting flesh without explicit knowledge of germ theory, so it may be possible although research would be necessary, and as a proxy rather than the real thing it will suffer from increased rates of both false positives and false negatives. Nevertheless, I certainly hope it is possible, because anyone with the model weights can extract directly modelled dangers, which may be a risk all by itself if you want to avoid terrorists using one to make an NBC weapon.
> I don't care about racial bias or what to call pope when I want chatgpt to write Python code.
I recognise my mirror image. It may be a bit of a cliché for a white dude to say they're "race blind", but I have literally been surprised to learn coworkers have faced racial discrimination for being "black" when their skin looks like mine in the summer.
I don't know any examples of racial biases in programming[1], but I can see why it matters. None of the code I've asked an LLM to generate has involved `Person` objects in any sense, so while I've never had an LLM inform me about racial issues in my code, this is neither positive nor negative anecdata.
The etymological origin of the word "woke" is from the USA about 90-164 years ago (the earliest examples preceding and being intertwined with the Civil War), meaning "to be alert to racial prejudice and discrimination" — discrimination which in the later years of that era included (amongst other things) redlining[0], the original form of which was withholding services from neighbourhoods that have significant numbers of ethnic minorities: constructing a status quo where the people in charge can say "oh, we're not engaging in illegal discrimination on the basis of race, we're discriminating against the entirely unprotected class of 'being poor' or 'living in a high crime area' or 'being uneducated'".
The reason I bring that up, is that all kinds of things like this can seep into our mental models of how the world works, from one generation to the next, and lead to people who would never knowingly discriminate to perpetuate the same things.
Again, I don't actually know any examples of racial biases in programming, but I do know it's a thing with gender — it's easy (even "common sense") to mark gender as a boolean, but even ignoring trans issues: if that's a non-optional field, what's the default gender? And what's it being used for? Because if it is only used for title (Mr./Mrs.), what about other titles? "Doctor" is un-gendered in English, but in Spanish it's "doctor"/"doctora". But here matters what you're using the information for, rather than just what you're storing in an absolute sense, as in a medical context you wouldn't need to offer cervical cancer screening for trans women (unless the medical tech is more advanced than I realised).
[1] unless you count AI needing a diverse range of examples, which you may or may not count as "programming"; other than that, the closest would be things like "master branch" or "black-box testing" which don't really mean the things being objected to, but were easy to rename anyway
While I would agree that "don't tell it how to make bombs" seems like a nice idea at first glance, and indeed I think I've had that attitude myself in previous HN comments, I currently suspect that it may be insufficient and that a censorship layer may be necessary (partly in a addition, partly as an alternative).
I was taught, in secondary school, two ways to make a toxic chemical using only things found in a normal kitchen. In both cases, I learned this in the form of being warned of what not to do because of the danger it poses.
There's a lot of ways to be dangerous, and I'm not sure how to get an AI to avoid dangers without them knowing them. That said, we've got a sense of disgust that tells us to keep away from rotting flesh without explicit knowledge of germ theory, so it may be possible although research would be necessary, and as a proxy rather than the real thing it will suffer from increased rates of both false positives and false negatives. Nevertheless, I certainly hope it is possible, because anyone with the model weights can extract directly modelled dangers, which may be a risk all by itself if you want to avoid terrorists using one to make an NBC weapon.
> I don't care about racial bias or what to call pope when I want chatgpt to write Python code.
I recognise my mirror image. It may be a bit of a cliché for a white dude to say they're "race blind", but I have literally been surprised to learn coworkers have faced racial discrimination for being "black" when their skin looks like mine in the summer.
I don't know any examples of racial biases in programming[1], but I can see why it matters. None of the code I've asked an LLM to generate has involved `Person` objects in any sense, so while I've never had an LLM inform me about racial issues in my code, this is neither positive nor negative anecdata.
The etymological origin of the word "woke" is from the USA about 90-164 years ago (the earliest examples preceding and being intertwined with the Civil War), meaning "to be alert to racial prejudice and discrimination" — discrimination which in the later years of that era included (amongst other things) redlining[0], the original form of which was withholding services from neighbourhoods that have significant numbers of ethnic minorities: constructing a status quo where the people in charge can say "oh, we're not engaging in illegal discrimination on the basis of race, we're discriminating against the entirely unprotected class of 'being poor' or 'living in a high crime area' or 'being uneducated'".
The reason I bring that up, is that all kinds of things like this can seep into our mental models of how the world works, from one generation to the next, and lead to people who would never knowingly discriminate to perpetuate the same things.
Again, I don't actually know any examples of racial biases in programming, but I do know it's a thing with gender — it's easy (even "common sense") to mark gender as a boolean, but even ignoring trans issues: if that's a non-optional field, what's the default gender? And what's it being used for? Because if it is only used for title (Mr./Mrs.), what about other titles? "Doctor" is un-gendered in English, but in Spanish it's "doctor"/"doctora". But here matters what you're using the information for, rather than just what you're storing in an absolute sense, as in a medical context you wouldn't need to offer cervical cancer screening for trans women (unless the medical tech is more advanced than I realised).
[0] https://en.wikipedia.org/wiki/Redlining
[1] unless you count AI needing a diverse range of examples, which you may or may not count as "programming"; other than that, the closest would be things like "master branch" or "black-box testing" which don't really mean the things being objected to, but were easy to rename anyway