How will China’s Generative AI Regulations Shape the Future? A DigiChina Forum

Chinese regulators at the Cyberspace Administration of China (CAC) on April 11 issued draft Measures to govern generative AI service provision in China. The draft, which is open for public comment until May 10, would target services that generate text, images, video, code, and other media, and its announcement follows the sensation caused by the US firm OpenAI’s ChatGPT roll-out—as well as a scramble by Chinese companies to offer similar products.

As drafted, the Measures for the Management of Generative Artificial Intelligence Services (see full translation) would make companies providing generative AI services to the public responsible for the outputs of their systems and would require that data used to train their algorithms meet strict requirements. DigiChina asked several specialists to consider what this draft means for the future of China’s AI market, how feasible it might be for companies to provide compelling services while complying, and what this regulation adds to an already active Chinese regulatory space on AI. Already existing regulations include rules on “deep synthesis” services in effect since January and filing requirements and other provisions on recommendation algorithms. –Graham Webster, Editor-in-Chief, DigiChina

HELEN TONER
Director of Strategy and Foundational Research Grants, Center for Security and Emerging Technology, Georgetown University;

These draft rules are the latest brick in the regulatory structure that China is constructing around AI and related technologies. To make sense of this new draft, the most important reference point is the set of rules for so-called “deep synthesis” technologies that came into force three months ago. That document laid a solid initial foundation to regulate generative AI, including text-generating systems such as chatbots, so it is interesting to see a new set of provisions already. To my mind, two important elements of these new draft rules, and one point of curiosity, stand out:

One significant difference between the existing deep synthesis regulations and this April draft is how they govern the data used to train generative AI systems. Whereas the existing rules only touch briefly on this topic, Article 7 of the new draft lays out broad and demanding requirements. The requirement to exclude “content infringing intellectual property rights” is somewhat opaque, given that the copyright status of much of the data in question—typically scraped at massive scale from a wide range of online sources—is murky. The clause requiring providers to “ensure the data’s veracity, accuracy, objectivity, and diversity” is even more striking. It’s conceivable that the need to comply with this clause could force Chinese AI teams to develop much more effective filtering tools for the data they use, perhaps ultimately providing a leg-up against more ad hoc international efforts. But the quantity of available training data is already an important bottleneck on the size and sophistication of cutting-edge generative AI models. Most likely, the primary effect of such demanding (and vague) rules will be that Chinese groups will struggle to assemble the enormous datasets that they would need to keep pace with international competitors.

A second important new element is that in a single sentence, Article 5 appears to do away with a conundrum that has been puzzling European Union policy makers for months. It specifies that companies providing access to generative AI via “programmable interfaces”—aka APIs like those released by OpenAI and Google—are responsible for all content produced. (Disclosure: I am a member of the board of directors of OpenAI.) This stance is appealingly simple, but seems likely to run into practical hurdles. While the original AI developer can and should be responsible for some types of problems, this approach would hold them liable for everything, including issues arising from choices the downstream client company makes about app design or how to restrict user behavior. It will be interesting to see if this provision remains intact throughout the comment and review process.

To close, one point to watch will be how (and whether) these rules apply to research and development. Article 2 clearly states that “[t]hese Measures apply to the research, development, and use of products with generative AI functions.” This statement caught my eye, as it would seem ambitious and unusual to oblige researchers to meet the kinds of standards described in the draft. However, beyond that initial scoping sentence, all of the remaining articles specify that they apply to providers of products and services, which would seem to exclude research. I am curious what led to this seeming contradiction, and whether any provisions will be added to future drafts to cover earlier parts of the research-to-product pipeline.

ZAC HALUZA
Author, Root Access; Associate Editor, DigiChina

As generative AI products are developed in line with these new regulatory guardrails, AI-generated content (AIGC) could very well unlock China's software-as-a-service (SaaS) potential. Historically, the SaaS market in China has lagged behind infrastructure-as-a-service (IaaS) and platform-as-a-service (PaaS). This is because enterprises have generally focused on migrating their infrastructure to the cloud, while moving business processes to the cloud tends to be trickier. However, the potential applications of generative AI—creating art and video assets for advertising materials, powering executive assistants trained on internal data and policies, and informing administrative decisions by synthesizing and summarizing large quantities of data, to name a few—open up a new level of potential for China's SaaS market.

For a better sense of the types of domestic AIGC products and services that might emerge in the wake of these regulations, it's important to consider the state's current priorities of increased digitalization, self-sufficiency in science and technology, and spurring innovation by private enterprises. A significant amount of generative AI research and development could intentionally target regulatory "safe zones," such as creating tools for industrial, administrative, and IT applications. AIGC innovations that are in line with the government's stated goals of stimulating the digital economy will certainly be encouraged and incentivized. With these guidelines also requiring AIGC product developers to document the entire lifecycle of their products, China can (at least in theory) feel more at ease as it aims to ramp up the development of AI that could drive its digital development.

YAN LUO
Partner, Covington & Burling

XUEZI DAN
Associate, Covington & Burling

The draft Measures provide requirements that can cover a wide range of issues that are frequently debated in relation to the governance of generative AI globally, such as data protection, non-discrimination, bias, and the quality of training data. The draft Measures also highlight issues arising from the use of generative AI that are of particular concern to the Chinese government, such as content moderation, the completion of a security assessment for new technologies, and algorithmic transparency. The draft thus reflects the Chinese government’s objective to craft its own governance model for new technologies such as generative AI.

With respect to the scope of application, the draft Measures would regulate generative AI services that are “provided to the public” in mainland China. It is unclear from the wording whether “the public” refers to consumers in China, thus excluding generative AI services offered to enterprises from their scope. It is also unclear whether providers of generative AI outside of China that are not specifically targeting the Chinese market will be subject to these rules. Providers of generative AI services also seem to include both the companies providing underlying technologies and companies offering services at the application level.

Some noteworthy requirements that could raise practical challenges for providers that wish to offer their generative AI services in China:

Before offering a generative AI service to the public at large, a provider must apply to the CAC for a security assessment, and file certain information regarding its use of algorithms (e.g., the name of the service provider, service form, algorithm type, and algorithm self-assessment report) with the CAC.
Providers of generative AI are responsible for content produced by generative AI products and, where personal information is processed, must assume legal obligations of “personal information processing entities” (essentially equivalent to “data controllers” under the EU GDPR [and translated as "personal information handlers" by DigiChina]) and fulfill personal information protection obligations.
Providers of generative AI are required to adopt measures to filter any inappropriate content created by generative AI, and to optimize algorithms to prevent the generation of such content within 3 months.
Providers of generative AI are required to enable the use of tagging mechanisms to identify content/video created by generative AI.
Training data must not contain content that infringes intellectual property, and if any personal information is involved, such data must be obtained on the basis of consent from data subjects, or otherwise comply with the requirements provided under applicable Chinese laws and regulations.

MATT SHEEHAN
Fellow, Carnegie Endowment for International Peace

China’s draft Measures on generative AI reveal both vulnerabilities and strengths in its approach to regulating AI. The partly duplicative nature of the draft highlights the whack-a-mole problem with regulations that target specific AI applications. But the way the generative AI rules deploy governance tools created by earlier regulations shows the benefits of an iterative approach to regulation.

In rolling out its earlier regulations on recommendation algorithms and "deep synthesis," the CAC took a “vertical” approach to regulating AI: Each regulation targeted a specific AI application, or set of applications. This contrasts with the more “horizontal” regulatory approach in the European Union’s AI Act, which covers most applications of the technology. The strength of a vertical approach is its precision, creating specific remedies for specific problems. The weakness is its piecemeal nature, with regulators forced to draw-up new regulations for new applications or problems.

The generative AI rules are an example of the latter. Last year’s deep synthesis regulations cover almost exactly the same applications: using AI-related technologies to create text, images, audio and video. The generative AI regulations just tweak how the technologies are described, and they replace “virtual scenes” with “code” in the outputs covered.

So why draft a new regulation four months after the last one took effect? Because the use cases shifted, and so did the regulators’ concerns. The deep synthesis regulations were clearly crafted to target audio and visual media, with the intellectual roots and bureaucratic imperative of the regulation focusing on deepfakes. The regulation included “text” as a covered output, but wasn’t written for a world where AI chatbots are used by tens of millions of people. When ChatGPT took the world by storm and Chinese companies raced to create competitors, CAC rushed out new rules that better target text. New requirements that outputs be “true and accurate” make sense for language models, but not for AI-generated images.

This scramble to write new rules for each twist of the technology illustrates a challenge of the piecemeal approach, but the speed and robustness of the new draft also illustrate its strength. The CAC could churn out the generative AI regulations so quickly because it’s been building its bureaucratic muscles and stocking its regulatory toolkit for governing AI. The draft deploys existing tools like the algorithm filing system (算法备案系统), which was created by the recommendation algorithm provisions, and has since been utilized by the deep synthesis rules. The filing system requires developers to conduct security assessments and disclose details about how their algorithms were trained, creating a flexible tool that regulators can embed in each new regulation.

As governments around the world grapple with regulating AI, they can draw lessons from China’s experience. A vertical and iterative approach to regulation requires constant tending and updating. But by accumulating experience and creating reusable regulatory tools, that process can be faster and more sophisticated.

SEATON HUANG
Research Associate, Council on Foreign Relations

The emergence of ChatGPT competitors in China has left many wondering how the country’s regulatory environment will affect the potential and usability of generative AI services. New draft Measures released earlier this month by the CAC provide some answers, but raise other questions about how well equipped firms are to comply with the state’s vision for consumer use of the cutting-edge technologies.

Regarding censorship, the CAC’s draft Measures should not be seen as a departure from China’s existing policies on content regulation. AI-generated content (AIGC) would be required to follow “Socialist Core Values”—a nebulous term that has for years been defined as the general decorum expected of Chinese citizens—and not to subvert state authority. Moreover, the Measures would require firms to assume legal responsibility for both the training data and the content created on their generative AI platforms.

Chinese firms would likely either use pre-firewalled information to train their platforms or employ "reinforcement learning from human feedback" to steer them away from certain outputs. This would be a challenging undertaking, especially considering the capacity generative AI has already exhibited worldwide to create problematic content ranging from deepfakes to disinformation. Either strategy to comply with the state’s requirements would surely require the commitment of additional human capital or specialized software to deploy internal censorship mechanisms.

Likewise, the Measures’ requirement that generative AI products undergo a security assessment before their release would necessitate additional due diligence. Some Chinese firms, including the cybersecurity company QiAnXin, have already explored offering generative AI security models and consulting services to help providers reach regulatory prerequisites.

References to intellectual property protection in the draft Measures illustrate regulators’ concerns that generative AI could be used to exploit the work of original content creators or gain unfair business advantages. These concerns are not unique to the Chinese context, but the question remains how firms can ensure that collecting training data and creating AIGC do not violate intellectual property rights. Regarding unfair competition, existing regulations on recommendation algorithms provide precedent for a requirement that generative AI in tools such as search engines should not give preference to a company's own products and services.

At the moment, both Beijing and Chinese enterprises seem committed to innovating and employing tools such as generative AI. But given the legal and financial hurdles firms would need to overcome to release and maintain generative AI products if these Measures are implemented, it remains unclear whether in China these tools will be able to achieve the productive potential, profitability, and consumer excitement experienced in other markets.

KIMBALL CHEN
Student Editor, DigiChina

The Chinese government's draft Measures on generative AI services place significant emphasis on truth or accuracy of content. Article 4 states that “[c]ontent generated through the use of generative AI shall be true and accurate, and measures are to be adopted to prevent the generation of false information.” If enacted without revisions, this provision may create a considerable regulatory burden for Chinese companies developing generative AI services, potentially affecting product quality, for at least two reasons.

First, AI models are not trained to discern truth but to identify patterns. Models like ChatGPT learn from extensive datasets containing text from various sources, which may encompass both accurate and inaccurate information. These models generate text based on data patterns without inherently understanding truth or falsehood. Emphasizing accuracy in AI models during early development stages may disincentivize companies from pursuing commercialization and require them to divert more resources to moderating training data and outputs than their Western counterparts. Second, the concept of truth and accuracy is ambiguous. For instance, it is unclear whether the completeness of an AI response is a necessary component of truth. In certain cases, providing partial or incomplete information can result in similar or worse consequences than disseminating entirely false information. The draft lacks specificity regarding the definition of truthfulness.

ROGIER CREEMERS
Lecturer in Modern Chinese Studies, University of Leiden; Senior Editor, DigiChina

In a way, these new regulations are unsurprising. Since having gained primary authority over online content in 2014, the Cyberspace Administration of China has delivered very similar sets of regulations for all kinds of online content production and distribution, and generative AI is no exception. The lists of prohibited content, the location of liability with major service providers, and the security requirements all echo earlier forms of regulation.

The major question is, of course, what this will imply for the future of Chinese generative AI services, for instance Baidu’s recently presented Ernie chatbot. It is very clear that regulatory constraints will push them in a rather different direction. Much of the conversation surrounding Western services like ChatGPT has focused on their ability to write essays and poetry, tell jokes, or respond to political questions—in other words, matters of import to the chattering classes. Chinese services will obviously be subject to political censorship. However, they also emerge within a different industrial policy landscape, one which sees the future of these technologies as being closely intertwined with existing products and services. Baidu has announced partnerships for Ernie with household goods and car manufacturers. This means these services will likely evolve in a more delineated, task-specific manner: People usually don't ask their car or toaster for relationship advice or political opinions. Moreover, it is likely these technologies will see application in areas of priority under the 14th Five-Year Plan, whose implementation is in full swing. These include social services for the disabled, and in relatively underprivileged rural regions. As such, they may enhance the delivery of healthcare and education in locations where doctors and teachers are scarce. For companies, that will mean the landscape will evolve in less of a headline-grabbing manner than might happen in the west, but, on the other hand, there is ample government support for immediate application in industrial partnership and public sector reform.

PAUL TRIOLO
Senior Associate, Trustee Chair in Chinese Business and Economics, Center for Strategic and International Studies

With generative AI, chatbots, and large language models all the rage, it was no surprise that China’s regulatory watchdogs for data and AI felt the need to rush out new draft regulations this month. Clearly, major alarm bells rippled through Chinese regulators and the large and well-developed content censorship system as OpenAI’s ChatGPT-3 and -4 seized the attention of users, corporations, and investors over the past several months. This evolving system has been in a 20+ year cat-and-mouse game with technology and content delivery, since the early days of China’s Great Firewall in the mid-1990s. I was in Beijing in 1996, when China first started blocking select oversees news sites and sites dealing with sensitive topics like Taiwan and Tibet. This system has evolved in major ways since then, and constitutes the world’s most sophisticated content filtering system, requiring huge financial and personnel resources to maintain.

Suddenly, instead of trying to control searches on websites and monitor forbidden terms in emails, the system will have to deal with individual users being able to ask questions to a generative AI application without any ability to monitor and block the output for sensitivity and offending words. The keyword-based approach for censorship will not work with generative AI models. Hence Beijing and the CAC are in the initial stages of coming up with a regulatory regime that pushes companies toward political alignment as they develop their models. This is new territory for regulatory bodies like CAC, and for the entire Internet censorship apparatus that China has developed over the past three decades.

The political alignment problem is a tough one, and will be addressed via a combination of data sources, per the new regulations, and making companies developing large language models (LLMs) responsible for the applications based on the models avoiding responses that run afoul of the many sensitive political issues the system has previously solved for. The approach will require deft use of things like reinforcement learning from human feedback (RLHF), and building in guardrails around certain sensitive topics. It will add to the burdens Chinese LLM developers already face.

There are several looming issues for the new regulations around generative AI. As with the other AI regulatory moves the CAC has taken over the past year, the issues of implementation, enforcement, and regulatory capacity to monitor and evaluate AI algorithms remain unclear. This is a unique combination of both content regulation and technological knowhow, and it is not clear that CAC’s traditional Party-overseen content regulation authorities are up to the task. Second, it is likely that CAC’s regulations will initially serve primarily as a deterrent, to force the now more than 20 Chinese companies and organizations to come up with new ways to ensure political alignment. Nevertheless, the nature of these platforms, with end users asking a wide range of questions and getting a nearly instantaneous response, will challenge the entire censorship system to figure out how to iterate toward some level of political alignment—without sacrificing response times, relevance, and user satisfaction. If the guardrails become too heavy, the responses could become too predictable and therefore uninteresting. Threading this needle through training datasets, RLHF, and other programing techniques, all while getting increased scrutiny from regulators in Beijing will challenge Chinese firms quest to achieve an acceptable level of political alignment. We are in a whole new world….

CAROLINE MEINHARDT
Student Editor, DigiChina (Spring 2022)

These draft Measures are a clear signal that the Chinese government is ready to crack down on the vast data collection and processing practices that underpin generative AI models. At a time when European and Canadian data protection authorities are already investigating OpenAI’s data practices and ChatGPT’s potential violation of data protection regulations, Chinese authorities may be trying to preempt similar infringements by up-and-coming Chinese ChatGPT competitors.

Of course China already has a comprehensive data privacy law. Since its implementation in late 2021, the Personal Information Protection Law (PIPL) has limited Chinese companies’ ability to hoover up personal data by requiring prior notice and consent, as well as minimized data collection for specified purposes. The new draft rules feature reminders that existing laws like the PIPL apply to generative AI systems, too. But they also provide important, additional clarifications regarding requirements that are specific to the generative AI context.

Most notably, the measures put the onus on providers of such generative AI products and services to ensure that personal information protection obligations are met for all kinds of data. Providers must ensure the “veracity, accuracy, objectivity, and diversity” of their training datasets, as well as obtain prior consent from the data subjects if they include personal information. They also face restrictions regarding personally identifiable information that users enter into generative AI systems through interfaces such as chatbots, which they may not illegally store or share with other users. Additionally, providers have to share specifics regarding the sources and characteristics of their model’s training data with users.

These requirements set a high bar for Chinese AI companies, calling into question the massive web scraping activities that are commonplace when training generative AI and forcing potentially unprecedented levels of data transparency for AI systems. Yet in their current, vague form it’s uncertain how strictly the measures can and will be enforced—and how feasible it is for companies to comply while still offering their generative AI products. For instance, it is unclear how companies will create the required complaint mechanism that allows them to respond to individual requests for revisions, masking, and removal of personal information—and how they can even commit to making such changes to their training data pool. But the rules provide Chinese authorities with a powerful lever they can use to shut down generative AI products and services as soon as privacy violations are discovered.

Image Credit: Alexa Steinbrück / Better Images of AI / Explainable AI / CC-BY 4.0