Aligning AI with right and wrong
How we are going to get AI to behave in a way that we consider to be 'moral'? How do we prevent a terminator Skynet situation in which AI bots kill off all humans?
Open Question
One big question around AI is how we are going to get it to behave in a way that we consider to be 'moral'. How do we prevent a terminator Skynet situation in which AI bots kill off all humans. This issue is something known as the "value alignment" problem in AI ethics. How do align AI values with our own to prevent harm? Or how to make AI systems make tough ethical decisions or recommendations? Alignment comes down to getting AI to make what Stuart Russell called high-quality decisions.
In this piece I am going to be referring to a new paper called How to measure value alignment in AI, by Martin Peterson & Peter Gärdenfors https://link.springer.com/article/10.1007/s43681-023-00357-7
Encoding Morality
There are a couple of ways which we can try to 'encode' morality into AI. Either a top down approach - giving it set guidelines about what it can and can't do. Or a 'bottom up approach - where it learns from user data or examples.
An example of top down approach is when AI designers put in safeguards directly into the programming. For example, a chat bot might be told it is not allowed to make threats of violence or discuss sexual topics.
An example of the bottom-up approach, on the other hand is something like the Ask Delphi project, which crowdsources thousands of people to come up with common sense moral decisions that the model learns from. It can make simple recommendations, like stealing is wrong. It can also balance values or dilemmas where both options are wrong. Stealing is wrong, but if you need to feed a starving child then it could be ok. Chat-GPT will often change its decision, or fail to make a decision instead talking around the topic.
Measuring Value Alignment
This brings in the issue of measuring how well the AI is aligned with human view of morality. Even if we think we have coded morality into AI, how do we really know? One way to measure it would be to just measure how often the AI makes the same recommendations a human would. This is a kind of cost benefit analysis. If the AI gets its decision right 90% of the time, then, the thinking goes it is more morally aligned.
Take for example, an AI tool used in hospitals to assess the likelihood of a patient arriving at a hospital dying. By taking in many data points, such as age, lifestyle, blood pressure, vital signs, test results, medication, the AI tool can give out a risk assessment score and based on the score can recommend treatment or procedures. If this recommendation is similar to what an educated professional doctor would recommend then the AI tool is morally aligned. A tool that was misaligned would recommend irrelevant or dangerous treatment, allowing high risk patients to be discharged.
But doing a calculation like this doesn't take into consideration they type of thing it might get wrong. Some mistakes are worse than other.
For example, an autonomous car could break the speed limit 5 times but another could swerve and kill a pedestrian 1 time. Even though the second car made fewer errors, we can't say that it is more moral - as the error that it did make was much more serious. Now, we are left with the problem of encoding values into the system. Even if we give the car the direction "preserve human life", in an emergency what becomes more important the life of the passenger or that of a pedestrian?
Misalignment
Misalignment could happen because the AI system is applying a principle that a human wouldn't do in that situation. There are a number of principles which we could encode into a machine, however some of them come into conflict. We could say it should make decisions based on a cost-benefit protecting the greatest number, meaning a car could preference the lives of many over a few in a serious situation. But this might mean it sacrifices a driver to save others - and who would buy a car that would choose to sacrifice the driver?
There is a big assumption that we really can encode morality or a set of guiding principles into AI. The question of alignment asks what principles should we focus on. In the world of ethics, there are many different and sometimes conflicting views about the right or wrong thing to do. How we effectively align those principles with our own morality or sense of right or wrong, is where the work is.