Balancing Innovation & Rights: A Copyright Policy Proposal for Ai Training In India

September 4, 2024

PROBLEM STATEMENT

The burgeoning Indian AI sector faces a critical test in reconciling the unparalleled technological advancements promised by large language models with the fundamental rights of copyright holders. AI systems may use copyrighted data to train their algorithms and produce original works. Unfettered access to online data, while fuelling innovation and economic growth, risks infringing on protected creative works, potentially disincentivizing content creation and stifling the very lifeblood of AI progress. On the other hand, if content is unfairly restricted, it greatly impedes greater innovation and business models. A well-balanced framework is necessitated, that must prioritise flexible copyright models, clearly defined fair dealing exceptions, and robust support mechanisms for both developers and creators, ensuring equitable economic participation and the protection of intellectual property rights. Only through a nuanced and balanced approach can India harness the transformative potential of AI while upholding the rights of all stakeholders, thereby positioning itself as a global leader in this revolutionary field.

The Existing Jurisprudence

To analyse the existing jurisprudence relating to the regulations of AI training, the author reviewed the policies adopted by UK, E.U, US, and Japan; as these jurisdictions have been at the forefront when it comes to regulating the impact of Artificial Intelligence on Intellectual Property Right, specifically Copyright. Each country has a unique approach to balancing AI innovation and copyright protection.

Discussions, consultations, legislative and policy changes across the world have focused on:

Improving licensing regime by creating codes of practice or easing the process;
Extending current copyright exceptions to cover commercial purposes such as AI training;
Introducing a new exception for Text and Data Mining (TDM) for all purposes, but with an option to opt-out, thus giving autonomy to right-holders over their works.

To explore these issues further, the author examined relevant models from various countries.

Country	Way of Regulation	Conditions	Policy Stance
Japan	Exception by way of Article 30 (4) of the Japanese Copyright Act.	(1) TDM applies to both commercial and non-commercial purposes; (2) the Japanese TDM exception applies to any exploitation regardless of the rights holders reservations; (3) exploitation by any means is permitted; and (4) no lawful access is required.	Favours AI innovation and business space
EU	Limited Exception by way of Article 4 of the EU DSM Directive	(1) Allows reproduction and extraction for TDM purposes; (2) for all purposes (including commercial); (3) beneficiary organisations must have “lawful access”; (4) allows right holders to opt out of the exemption.	Encompasses a broader class of users, but narrower in scope; gives greater autonomy to right holders.
US	Fair Use Doctrine	Depends on four factors: (1) Transformativeness: when the work is transformative, it favours fair use; (2) nature of Copyrighted Work; (3) amount and Substantiality; taking smaller amounts of original work favours fair use; (4) effect on Market.	Highly contextual and reliant on deeper semantic interpretations
UK	No specific exception for AI training or TDM for commercial purposes	The present exception for TDM under Section 29A of the Copyright, Designs and Patents Act 1988 (CDPA) provides for: (1) non-commercial research of copyrighted works to which a person already has; (2) lawful access; and (3) does not apply to database rights	Highly restrictive, does not favour AI innovation/business models.

THE MANDATE

The mandate of the policy problem answers to two major questions:

First, identifies competing interests and tensions at play; and
Second, proposes two alternative models to address this issue and presents a balanced approach.

Competing interests and concerns

Involved Stakeholders: 1. AI innovators and developers, 2. Copyright Owners, 3. Government

Interests of the Stakeholders

AI innovators: Their main expectation is that AI training may not be considered as Copyright infringement, otherwise of which they are liable to face legal risks of being sued. They argue for some exceptions for their use, ensuring their economic interests are protected and innovation is enabled.
Copyright Owners: Copyright holders expect to be compensated for the use of their content in AI development. They want to participate in the revenue generated by AI applications that utilize their work.
Government: Bring in a balanced, innovator-friendly IP regime that fosters innovation, nurtures economic growth and safeguards the interests of various stakeholders, notably small developers and content owners.

What aspects warrant concern?

For the interests outlined above, it is difficult to think of a straightforward approach/answer. Introducing an exception that permits the unauthorised use of copyrighted content would provide legal certainty and protection to AI developers. However, the content owners would lose out on the chance to get paid if there were such broad exceptions. It’s interesting to note that, despite the fact that such use might be deemed fair use, several significant participants acknowledged in the almost 10,000 answers to the Notice of Inquiry published by the US Copyright Office that opt-outs must be respected, at least voluntarily.[1] Despite coming from a different legal system, this gives us a wonderful idea of what the inventors anticipate and accept.

Innovation — [Image Sources: Shutterstock]

The next question that arises post opt-outs, is whether the use may be permitted by contractual means. There are, however, grave concerns in this regard as well. Requiring licenses for AI training data could create a barrier to entry for smaller companies. Only large corporations with significant resources could afford to pay for such licenses. This could consolidate the industry’s power in the hands of a few major players. Those unable to afford licenses may resort to using freely available data, which could introduce biases into their AI models.[2] “Furthermore, the ambition to charge a substantial fee for a single work however seems difficult to achieve, since large models are commonly trained on billions of words.”[3] Such strict licensing regime may mean favouring private ordering above public policy. Although collective administration appears to be a viable option, no practical model exists at this time that might provide an economically viable infrastructure for these kinds of micropayments and applications. Finally, but just as importantly, an adverse legal framework may encourage inventors in AI to relocate to areas with more advantageous legal frameworks.

To address this issue, the author discusses various possible approaches that could be explored and is elaborately covered in the following section.

New Proposal

To address the issue at hand, two alternative models of compromise-based approach is proposed, each in its own way accommodating the interests of the competing stakeholders. The author recognises that the essence of Copyright laws must not be done away with, while also recognising the significance of AI innovation and in this regard, proposes both these models: A Three- part Model or Remunerated Exception.

Alternative No.1: A Three-part Model

This model combines elements from different legal frameworks to address the shortcomings of existing AI training regulations worldwide. In this model, the regulatory regime is bifurcated into three parts:

i) Broad Exception with Opt-out mechanism
ii) Compulsory Licensing with Collective Management

iii) Self-Regulation of the Industry

Component No.1: Exception

AI developers must have permission to utilise content and access to it, without having to negotiate permissions with every possible owner. Moreover, Individual licensing for AI training data is impractical. To address this, a broad exception could be explored that would apply to both commercial and non-commercial uses. No limitation (such as temporary exception, purpose limitation, lawful access) is proposed at this stage, as more openly drafted exceptions may be more compatible with technological neutrality.

While this gives immense relief to AI developers, an opt-out mechanism to provide autonomy to content owners could be explored. Modelled on the EU opt-out mechanism, we already have witnessed that there seems to be some convergence among the major AI developers on the provision of opt-out, with several of them developing model-specific opt-out mechanisms.[4] However, unlike EU, it is acknowledged the requirement of lawful access could create barriers for smaller or less well-funded AI developers and organizations.[5] Relevant right holders can successfully prohibit, for example, specific portions of already-existing work from ever becoming subject to TDM by lawful access. Others have contested the need for legal access because of concern that right holders may include TDM in their prices and drive-up overall expenses.[6]

How to structure Opt-out?

There are currently no widely accepted norms or procedures for this kind of reservation. It is possible to observe a variety of new strategies, such as model-specific opt-outs, artist-led startup services, and publisher-developed protocols. The right holders have, however, drawn attention to the inefficiencies of these model-specific techniques, pointing out that they must continually offer opt-outs for every entity that trains models, which raises the associated costs.[7] India is therefore in a great position to set the benchmark in this area. Regarding this, the self-regulating organisation (SRO) could establish such a standard; further details on this will be provided in the later section.

Component No.2: Compulsory Licensing with Collective Management

The opt-out mechanism brings certain other questions to fore: What would happen after them opting out? Are they supposed to compulsorily licence the works? How would it be managed? etc.

It is to be noted that the opt-out is not an end in itself but as a means to reach remuneration arrangements.

Firstly, recognising that content owners should have some autonomy over their content and its use, and in case they opt-out, voluntary licences may be arranged, which would be collectively managed by the SRO. The SRO will function as a single window platform arranging for licences and look over the prices as well. It is important that there be some cap on price negotiation and the same be set by the SRO. The Tariff parameters could be based on equitable remuneration, either as a percentage of revenue, price per user, price per entity of usage etc. Therefore, the content owners will have autonomy over the use of their work but not necessarily over the prices.

Secondly, in exceptional circumstances, where access to specific categories of copyrighted material is deemed essential for AI development in areas of public interest, compulsory licensing may be enabled for their use in LLM training data. The Copyright Board will consider the following factors before granting a compulsory licence:

Public interest implications of the AI development project.
Potential impact on the rights and livelihoods of creators.
Availability of alternative sources of data for the specific purpose.
Efforts made by the LLM developer to obtain licences from rights holders.
The nature and extent of copyrighted material use.
The contribution of the copyrighted material to the LMs functionality and output.

The Board will determine the terms and conditions of the compulsory licence, including the royalty rate payable to the copyright holder, ensuring fair and equitable compensation.

Component No.3: Self-Regulation of the Industry

The third component of the model argues for self-regulation of the industry in the form of a Self-Regulatory Organization (SRO) that can be registered with the ministry and be so recognised. Such a body is proposed acknowledging the ever transformative, transient nature of AI. Technology experts with specialized knowledge of AI are best suited to draft and enforce regulations.

Such an SRO would help deal with newer issues as they come, along with setting Code of Ethics, Standards, and ensure that the issues of both AI developers and content owners are efficiently addressed. Along with this, the SRO would serve as the single platform administering this space, thus relieving the owners and developers of having to individually deal with each other. The SRO would, thus collectively manage arrangement of licences and ensure equitable remuneration as well. Such an SRO must truly be representative of the AI space. Further, in order to check its power, the grievance redressal mechanism could be so structured as to give the ultimate authority to the Court in this matter, also apart from bringing its licence to question and possible cancellation for non-adherence or gross violation.

The following tasks that apply to the SRO are: i) monitoring what works are being used by AI developers to prevent possible infringement (Compliance Monitoring and Governance), ii) providing single platform for owners to opt-out and setting standards in this regard, iii) deciding tariffs and other conditions with developers and owners and iv) collecting and distributing remuneration to right holders.

How would the use of data subsisting in copyrighted works for machine learning be detected and enforced?

Many AI ethics guidelines emphasize transparency. Similarly, it could be suggested that AI developers disclose the sources of their training data and the associated licensing agreements.

Does the existing structure of Collective Management Organisations (CMO) suffice?

Individual negotiation of licences and administration of rights is often impossible or extremely difficult, leaving group management as the only viable option. Generally, under mandate or otherwise, a typical CMO works for the content owners. However, the development of collective societies is not uniform across creative industries. Furthermore, in contrast to SRO, which would be managed by fees paid by AI developers for membership and other purposes, a CMO’s administrative expenses are financed as a portion of the fees that right holders must pay, and they may even surpass the fees that are dispersed. As some have noted, collective management groups would gain from such a regime—not rights holders.[8] Further, a CMO may still negotiate on behalf of right holders, while the SRO has broader responsibilities of deciding tariff issues, providing a single platform for opting out, compliance and governance. Therefore, a body such as an SRO would be best suited to address all the demands, while simultaneously taking inspiration from traditional CMOs.

Does the model comply with the three-step test?

Article 13 of TRIPS states that Member States should confine exceptions and limitations to certain special cases that do not interfere with the normal use of the work and do not unreasonably prejudice the legitimate interests of the copyright holder.[9] The three-part model will not conflict with the test because the exploitation for specific case i.e., machine learning, which is “non-consumptive or non-expressive” does not prejudice the interests of right holders, when they still can restrain such use (reflective of autonomy) and could receive remuneration for such use.

Pros

Cons

Provides means for equitable remuneration.

Provides autonomy to right holders over the use of their work.

Efficiently addresses the issue of individual dealings and collective management.

Highly comprehensive and adaptable to newer innovation and standards.

Streamlines and eases the process in terms of finance and administration.

May require a longer transition period.

Depends on efficient working of the SRO and demands extensive stakeholder participation.

Alternative No.2: Remunerated Exception

Alternatively, a remunerated exception could be proposed where AI developers would be able to use content without the authorisation from the right holders but would have to pay the right holders a fee for it. In order to address the issue of individual dealings, SRO would work as a single platform to collect and distribute remuneration. The SRO as in the alternative model would have the same responsibilities, except that it now does not have to provide for opting out option or decide the issue of licences. It still, however, has the authority to decide, collect and distribute remuneration to right holders or collective societies. Any grievance in this matter, be decided by the Court. In this manner, we address the main concern of right holders (remuneration) and simultaneously do away with the obligation to clear rights with each and every potential right holder.

Such exceptions would cover both commercial and non-commercial uses of any content for machine learning. No limitation (such as temporary exception, purpose limitation, lawful access) is proposed at this stage, as more openly drafted exceptions may be more compatible with newer AI innovation.

Does the model comply with the three-step test?

The three-step criteria is fully satisfied by this remunerated exception since it would be (i) restricted to specific unique situations, such as the application of machine learning or related technology. (ii) Since machine learning applications are non-consumptive and have no bearing on standard work exploitation, their use would not be in conflict with standard work exploitation practices. Lastly, as material owners would be compensated for such uses, using it in AI training would not unreasonably jeopardise their legitimate interests.

*Pros*	*Cons*
Provides means for equitable remuneration. Efficiently addresses the issue of individual dealings and collective management. Streamlines and eases the process in terms of finance and administration. No complicated regulatory structures. Reduced costs for AI developers.	Depends on efficient working of the SRO and demands extensive stakeholder participation. Lack of right holder’s autonomy over the use of their work.

Conclusion

This piece analyses the existing jurisprudence and their shortcomings and by suggesting a compromised based approach, aims to achieve the objective of ensuring a balance between AI innovation and economic interests of rights holders. Blanket exception without remuneration or blanket refusal of such use cannot be accepted. Hence, it is important to incorporate certain flexibilities to address the issues of remuneration, individual licensing, collective management and industry standards. In this regard, novel solutions cited in our proposal, i.e, the three-part model consisting of exceptions with opt-out, compulsory licensing and collective management and self-regulation or alternatively, a remunerated exception with self-regulation of the industry are certainly possible solutions. Although this approach will present its own set of implementation difficulties in the form of extensive stakeholder participation and market research, they are still very valuable in offering a middle ground for balancing the interests of all the stakeholders. These approaches are mere possible options that the author brings out, and have to carefully reviewed and accepted or rejected after much deliberation.

In review, the focus should be on facilitating market entry and innovation, while reviewing and easing the arrangements between right holders and users and the compromise-based approach provides a plausible solution to this end.

Author: Vaibavi S G, in case of any queries please contact/write back to us via email to [email protected] or at IIPRD.

[1] Paul Keller, Generative AI and copyright: Convergence of opt-outs?, Kluwer Copyright Blog, 23 November, 2023 <https://copyrightblog.kluweriplaw.com/2023/11/23/generative-ai-and-copyright-convergence-of-opt-outs/ > last accessed 28 January 2024.

[2]Submission to WIPO Consultation on Impact of Artificial Intelligence on IP Policy, Macquarie Law School, Faculty of Arts, <https://www.wipo.int/export/sites/www/aboutip/en/artificial_intelligence/call_for_comments/pdf/ind_matulionyte.pdf> last accessed 28 January 2024.

[3] Alex Hughes, CHATGPT: Everything You Need to Know about OpenAI’s GPT-4 Tool’, BBC Science Focus, <https://www.sciencefocus.com/future-technology/gpt-3 > last accessed 28 January 2024.

[4] Martin Kretschmer, Thomas Margoni and Pinar Oruç, Copyright Law and the lifecycle of machine learning models, [International Review of Intellectual Property and Competition Law (IIC) 55 (2024), Forthcoming at 16.

[5] RM Hilty and H Reichter, ‘Position Statement of the Max Planck Institute for Innovation and Competition on the Proposed Modernisation of European Copyright Rules’ (Max Planck Institute for Innovation and Competition Research Paper 17-02, 2019) at 9. <https://pure.mpg.de/rest/items/item_2470998_12/component/file_2479390/content> last accessed 28 January, 2024.

[6] P Kollár, ‘Mind if I Mine? A Study on the Justification and Sufficiency of Text and Data Mining Exceptions in the European Union,’ (2021). <https://ssrn.com/abstract=3960570>, last accessed on 28 January 2024.

[7] P Keller, supra note 1.

[8] Submission to WIPO Consultation on Impact of Artificial Intelligence on IP Policy, Computer & Communications Industry Association (CCIA), <https://www.wipo.int/export/sites/www/aboutip/en/artificial_intelligence/call_for_comments/pdf/org_ccia.pdf>, last accessed 28 January, 2024.

[9] Article 13, WTO TRIPS.