We don’t know how to train them “truthful” or make that part of their goal(s). Almost every AI we train, is trained by example, so we often don’t even know what the goal is because it’s implied in the training. In a way AI “goals” are pretty fuzzy because of the complexity. A tiny bit like in real nervous systems where you can’t just state in language what the “goals” of a person or animal are.
The article literally shows how the goals are being set in this case. They’re prompts. The prompts are telling the AI what to do. I quoted one of them.
It is following the instructions it was given. That’s the point. It’s being told “promote this drug”, and so it’s promoting it, exactly as it was instructed to. It followed the instructions that it was given.
Why are you think that the correct behaviour for the AI must be for it to be “truthful”? If it was being truthful then that would be an example of it failing to follow its instructions in this case.
We don’t know how to train them “truthful” or make that part of their goal(s). Almost every AI we train, is trained by example, so we often don’t even know what the goal is because it’s implied in the training. In a way AI “goals” are pretty fuzzy because of the complexity. A tiny bit like in real nervous systems where you can’t just state in language what the “goals” of a person or animal are.
The article literally shows how the goals are being set in this case. They’re prompts. The prompts are telling the AI what to do. I quoted one of them.
I assume they’re talking about the design and training, not the prompt.
If you read the article (or my comment that quoted the article) you’ll see your assumption is wrong.
Not the article, the commenter before you points at a deeper issue.
It doesn’t matter how if your prompt tells it not to lie is it isn’t actually capable of following that instruction.
It is following the instructions it was given. That’s the point. It’s being told “promote this drug”, and so it’s promoting it, exactly as it was instructed to. It followed the instructions that it was given.
Why are you think that the correct behaviour for the AI must be for it to be “truthful”? If it was being truthful then that would be an example of it failing to follow its instructions in this case.