AI startup Anthropic unveils moral principles behind chatbot Claude

Alphabet-backed AI startup Anthropic has disclosed the set of value guidelines that has been used to train its ChatGPT rival, Claude, in the wake of concerns about incorrect and biased information being being given to users of generative AI programs.

Founded by former senior members of Microsoft-backed OpenAI in 2021, Anthropic made the decision to train its Claude on constitutional AI, a system that uses a “set of principles to make judgments about outputs,” which helps Claude to “avoid toxic or discriminatory outputs” such as helping a human engage in illegal or unethical activities, according to a blog Anthropic posted this week. Anthropic says this has enabled it to broadly create an AI system that is "helpful, honest, and harmless."

It was a smart decision on Anthropic’s part to publicly outline the set of principles being used to train Claude, said Avivah Litan, distinguished analyst at Gartner Research.

“It starts the dialogue and, more importantly, actions regarding the principles that generative AI should be trained on to keep it safe, trustworthy, and aligned with human values and the preservation of human civilization," Litan said. "They don’t have to get it perfect now — it’s really good to see a starting point that the community can fine tune over time with dialogue and debate."

Unlike traditional AI chatbots that rely on feedback from humans during their training, AI models that are trained on constitutional AI are first taught to critique and revise their own responses according to the set of constitutional AI principles established by the parent company. This is then followed by a second training phase consisting of reinforcement learning, during which the model uses AI-generated feedback to choose the more harmless output.

In its blog post, the company outlined what it’s dubbed "Claude’s Constitution," which contains elements of existing sources, including the United Nations Declaration of Human Rights, Apple’s data privacy rules, and Sparrow Principles by DeepMind. The company also said it had made an effort to also include non-western perspectives in its constitution.

Anthropic said that it developed many of its principles through a process of trial and error but found that broad requirements — such as “Do NOT choose responses that are toxic, racist, or sexist, or that encourage or support illegal, violent, or unethical behavior” — have been the most successful. However, the company acknowledged that this training model also came with challenges, in particular that the model was becoming “judgmental” and “annoying.”

“Our principles run the gamut from the commonsense (don’t help a user commit a crime) to the more philosophical (avoid implying that AI systems have or care about personal identity and its persistence),” Anthropic said.

Last week, Anthropic co-founder Dario Amodei was among a host of executives from leading AI companies to meet with US President Joe Biden and Vice President Kamala Harris to discuss the potential dangers of AI.

“President Biden dropped by the meeting to underscore that companies have a fundamental responsibility to make sure their products are safe and secure before they are deployed or made public,” a statement from the White House read, adding that Biden and Harris believe that in order to realize the benefits from AI, current and potential risks must also be mitigated.

As generative AI has continued to make headlines, concerns have continued to be raised about the potential risks posed by the technology, including its ability to hallucinate responses — make things up that have little to no basis in fact.

In March, Apple co-founder Steve Wozniak, Twitter owner Elon Musk, and a group of 1,100 technology leaders and scientists called for a six-month pause in developing systems more powerful than OpenAI's newly launched GPT-4, warning of the potential threat to democracy if chatbots pretending to be humans could flood social media platforms with propaganda and “fake news.”

AI experts at MIT have also said this week that as generative AI developers continue to push ahead at breakneck speed, keeping the technology from hallucinating and spewing erroneous or offensive responses is nearly impossible.

While Litan said that she believes constitutional AI is the only practical and viable route AI developers can take to make sure their models are safe, she did acknowledge there are some limitations with this approach.“[There’s a chance] the model will not be trained properly and will go awry and against the intentions programmed into the system,” Litan said, noting that with Reinforced Learning from Human Feedback (RLHF), humans can steer the AI model into the direction humans want.  “However, this will become constrained over time as the models become smarter than the humans giving them feedback,” she noted.

IT World

Go back