OpenAI to use GPT-4 LLM for content moderation, warns against bias

ChatGPT-creator OpenAI is working on the development of its GPT-4 large language model (LLM) to automate the process of content moderation across digital platforms, especially social media.

OpenAI is exploring the use of GPT-4’s ability to interpret rules and nuances in long content policy documentation, along with its capability to adapt instantly to policy updates, the company said in a blog post.

"We believe this offers a more positive vision of the future of digital platforms, where AI can help moderate online traffic according to platform-specific policy and relieve the mental burden of a large number of human moderators," the company said, adding that anyone with access to OpenAI’s API can implement their own moderation system.

In contrast to the present practice of content moderation, which is completely manual and time consuming, OpenAI’s GPT-4 large language model can be used to create custom content policies in hours, the company said.

In order to do so, data scientists and engineers can use a policy guideline crafted by policy experts and data sets containing real-life examples of such policy violations in order to label the data.

"Then, GPT-4 reads the policy and assigns labels to the same dataset, without seeing the answers. By examining the discrepancies between GPT-4's judgments and those of a human, the policy experts can ask GPT-4 to come up with reasoning behind its labels, analyze the ambiguity in policy definitions, resolve confusion and provide further clarification in the policy accordingly," the company said.

These steps may be repeated by data scientists and engineers before the large language model can generate satisfying results, it added, explaining that the iterative process yields refined content policies that are translated into classifiers, enabling the deployment of the policy and content moderation at scale.

Other advantages of using GPT-4 over the present manual approach to content moderation include a decrease in inconsistent labelling and faster feedback loop, according to OpenAI.

"People may interpret policies differently or some moderators may take longer to digest new policy changes, leading to inconsistent labels. In comparison, LLMs are sensitive to granular differences in wording and can instantly adapt to policy updates to offer a consistent content experience for users," the company said.

The new approach, according to the company, also takes less effort in terms of training the model.

Further, OpenAI claims that this approach is different from so-called constitutional AI, under which content moderation is dependent on the model's own internalized judgment of what is safe. Various companies, including Anthropic, have taken a constitutional AI approach in training their models to be free of bias and error.

Nevertheless, OpenAI warned that undesired biases may creep into content moderation models during training.

"As with any AI application, results and output will need to be carefully monitored, validated, and refined by maintaining humans in the loop," it said.

Some industry experts think OpenAI's approach to content moderation has potential. "GPT-4 is a super capable model and OpenAI has a never-ending stream of users trying to make it do harmful things. Which is great training data," said Tobias Zwingmann, managing partner at AI services company Rapyd.AI.

Whether AI can handle all content moderation tasks, though, is an open question.

"The real question is how much automation of content moderation makes sense. There would seem to be approaches where some automation would, perhaps in sifting content, identifying targets and making recommendations," said Mark Beccue, AI research director at The Futurum Group.

Meanwhile, the lack of any sort of announcement or guidance from Facebook regarding AI's use for content moderation casts some doubt on the technology's potential efficacy for that particular application, since Facebook is also a generative AI leader, Beccue noted.

"Meta is one of the world leaders in AI innovation," he said. "Wouldn't you think Meta would be very focused on developing content moderation automation? Doesn't the lack of what Open AI is proposing from Meta say something about that? IOW, why would Open AI know something Meta doesn't know about content moderation? Bottom line — GPT-4 as content moderation is perhaps an experiment worth sandboxing, but no guarantees."

If OpenAI’s large language model can be used successfully for content moderation, it will open up a multibillion-dollar market for the company.

The global content moderation services market, according to a report from Allied Market Research, was valued at $8.5 billion in 2021, and is projected to reach $26.3 billion by 2031, growing at a compound annual growth rate of 12.2% from 2022 to 2031.

(Editor's note: This story has been update to include comments from Mark Beccue.)

IT World

Go back