skip to Main Content

The Case of Copyright Infringement in the Use of Training Artificial Intelligence Vis-À-Vis the Positions in India & Us: A Critical Analysis


Artificial Intelligence has been the buzzword recently all across the globe, and we are continuously exploring more use cases of AI in our day-to-day lives than ever before. AI has not just diversified its operations but also has been more reliant and accurate with the passage of time. At the time of the advent of AI, the major limitation of AI was the lack of adequate data and information, due to which it produced factually incorrect responses, thereby the public perception of AI being a gimmick.

However, a lot has changed, and with proper training of AI with colossal amounts of data, the AI is said to be more unbiased, factually correct, reliable and convenient to use and is still striving for more accuracy as it feeds on more data. While in the social and technological sphere, the feats of AI can be regarded as commendable, but in the legal sphere, there are several underlying legal issues which are needed to be resolved in a fast-paced manner. One such bone of contention is the issue of copyright infringement that comes with training AI with data which is copyrighted and cannot be put to use without the consent of its owner. This study aims to delve deep into this issue and critically analyze the current legal position around it whilst clearing the mists of confusion.

Legal Position in US

The Copyright Act, 1976 exists in United States which protects and regulates the use of copyrighted data. The fair use doctrine has also been given the shape of legislation in the given statute with the help of judicial pronouncements.[i] The doctrine lists out four essential factors the decipher the use of copyright as fair, namely, intent of use, nature of data, the extent and significance of data used in relation to entirety of copyrighted work and the impact on the value of copyrighted work and market after its usage.[ii]

Ai and Copyright
[Image Sources:Shutterstock]

Under the first factor, it is clearly articulated that the copyrighted work which is used must be for an altogether different purpose in relation to the original intent of the author behind creating such work. Hence, it must be for a transformative purpose that introduces a novel element and at the same time, does not interfere with the intent of the author.[iii] If the expressive purpose is similar to that of the original work, it cannot be held to be transformative.[iv]

As regards the nature of data, the fair use of data can be done when the data itself is factual in nature but is less likely to be under the scope of fair use doctrine when the work is creative in nature.[v] However, creative work can be used for factual data for training AI. The component of data being used to train AI comes under a factual nature, thereby having an inclination towards the fair use.

With respect to the extent of usage of copyrighted work, the initial position was to discourage the use of entire copyrighted work and only a portion of it could be used to enter into the realm of fair use. However, with growing technological innovations, this position has been changed. The usage of the entire work was held to be fair use if it is necessary to use the entirety of data to establish a transformative purpose.[vi] The intent must not clash with the owner’s intent and both of their intents should work upon differentiation of elements that the end users seek. Moreover, there should not be complete revelation of data to the general public by the AI company, which defeats the whole purpose of copyright.

Coming to the last criteria, the value of the copyrighted work post usage of data shall not deteriorate. The AI must not offer substitute material for the original work by feeding on the original work itself. While owning a copyright does not entitle the owner to gaining all profit related to his work, it guarantees that the owner does not suffer substantial loss from the unwanted usage of copyrighted work. Hence, the market value of copyrighted work must not deteriorate upon the advent of AI. Thus, for example, AI cannot produce an animated movie which is same as the original one but it can provide the users with the information about movie for the purposes of locating genres, actors etc. in the film.

Legal Position in India

Section 52 of Copyright Act, 1957 contains the provisions with regards to a fair dealing principle which is a valid exception to infringement of copyrights. However, it is to be noted that India’s fair dealing exceptions are much narrower in scope in relation to US’s fair use doctrine.[vii] Moreover, Indian position lacks the much-needed extensive judicial pronouncements and jurisprudence in comparison to US.

Section 52 only enlists a few numbers of allowed exceptions such as translations, criticism, review, back-up, storage, recitation etc. While the statutory provisions seem to be outdated and silent on the issue of the usage of copyrighted data to train AI, there has been increasing debates and conflict of interests between the owner of copyrighted works and the companies looking to improve their AI model.

A latest update around these debates has been the Indian government’s rigid stance on the mandatory permission to be taken from the owners of copyrighted works by the AI developers if their AI model is designed to work for commercial motives.[viii] Hence, the major focus has been on the intent behind the operation of AI with regards to commercial purposes. Thus, this position favors the authors of such copyrights and is aligned towards protecting the interests of owners of copyrighted material. However, on the opposite side, this position may turn out to be fatal for the existence of AI developers as they need to seek the permission of many to feed their algorithm with the large chunks of data. Moreover, the focus needs to shift from the intent of operation of AI to the effect of using such data upon the owners and analyzing its impact on the originality and commercial value of such copyrighted works.

Hence, the position of law in India needs a revamp as the law needs to change with respect to the changes in society and technology so as to prevent any misuse in terms of gray areas.


Hence, even though there exists a list of exceptions to escape the liability of infringement of copyrighted material, the case of AI feeding on chunks of data including copyrighted works remains to be complex and needs to be decided upon the facts and circumstances of each case. There is a new emerging jurisprudence in a global sense with regards to this issue, and several countries have made similar yet very different stances on this burning issue. The position in US can seem to be comforting for AI developers who have been absolved from any liability on the condition that the use case of AI amounts to a diverged purpose. However, the Indian government have put the usage of copyright works by AI developers on hold by establishing a requirement for mandatory permission from the owners of such works, thereby mitigating the loopholes derived from dearth of jurisprudence. Nonetheless, it is important to note that such position may discourage the development of AI in the context of Indian works and should be changed from a long-term perspective.

Author: CHAITANYA VOHRA, in case of any queries please contact/write back to us via email to [email protected] or at IIPRD.


[i] Campbell v. Acuff-Rose Music, Inc., 510 U.S. 569, 575 (1994).


[iii] Authors Guild v. Google, Inc., 804 F.3d 202 (2d Cir. 2015).

[iv] Associated Press v. Meltwater, 931 F. Supp. 2d 537 (S.D.N.Y. 2013).

[v] Stewart v. Abend, 495 U.S. 207, 237 (1990).

[vi] Kelly v. Arriba Soft Corporation, 280 F.3d 934.



Back To Top