By: Connor Charney

Google and Reddit recently finalized a licensing agreement (“the Agreement”) that will allow Google to train its Vertex AI on Reddit’s data.[1] The Agreement is allegedly valued at $60 million annually.[2] While the Agreement’s precise terms and value have not been made public, commentators are quick to label this as an example of a new paradigm of licensing agreements for AI companies.[3] However, Google and Reddit’s unique positions may make this an agreement of convenience instead of a harbinger of agreements to come. Google and Reddit both enter this agreement in interesting business climates that make the Agreement specifically advantageous for both parties.[4] Specifically, Google benefits by insulating itself from potential intellectual property lawsuits and Reddit gets a nice annual deal to bolster its upcoming IPO.[5] In order to understand the impact of this deal we must evaluate its business context, the underlying Data API terms for Reddit, and the recent trend in AI copyright cases.

            Google enters the Agreement in both an offensive and defensive posture. From the offensive posture, Google’s Gemini AI[6] needs more training data.[7] In the AI arms race, Gemini has been chasing frontrunner ChatGPT.[8] In Gemini’s demo at the end of 2023 for its early 2024 release, Google showed a model with impressive capabilities, but the capabilities that were shown were not unique and already within the functionality of ChatGPT.[9] Google has the technical expertise to take the lead on the generative AI, and now, with the Reddit deal, Google has a mountain of data available to train the AI.[10]

On the defensive end, this deal helps Google avoid further legal scrutiny from the United States Government and Reddit itself. The DOJ recently brought suit in United States v. Google[11], alleging anti-trust concerns.[12] This suit was narrowed to carve out Google’s search features but included allegations around Google weakening Specialized Vertical Providers, such as Expedia or OpenTable, by limiting their visibility on Google searches.[13] Reddit may not exactly be a Specialized Vertical Provider[14], but, as Google itself recognizes, Reddit is usually part of a Google search to find direct commentary on topics.[15] By creating a partnership with Reddit through the Agreement, Google may be cutting off a lawsuit before it can begin.[16] Additionally, Google may be avoiding lawsuits similar to those faced by OpenAI.[17]

Reddit enters this agreement with a looming IPO, but it also enters with a recent change to its Data API[18] Terms (“Terms”) around user content. In April 2023, Reddit changed its Terms to expressly prohibit “[the use of] User Content for . . .  training a machine learning or AI model, without the express permission of rightsholders . . . .”[19] In the Terms, Reddit also empowers its users by recognizing that the content created by users (“User Content”) is owned by the user, and Reddit grants any user of its Data API a non-exclusive license to use that data.[20] On its face, Reddit’s Terms allow users to retain their intellectual property (“IP”) rights to the content posted on the site and require that users of the Data API receive the express permission of the rightsholder to use the content to train an AI.[21] So, presumably Reddit gains a nice annual payment from Google for using its data, and also passes any sort of liability for AI training onto Google as well, unless otherwise provided in the licensing agreement.[22] However, whether or not an individual can recover for IP infringement by an AI remains to be seen.

Most notably the New York Times sued Microsoft and OpenAI for copyright infringement when it was discovered that ChatGPT was trained on New York Times’ stories.[23] Other newspapers have expressed interest in licensing agreements, like the Agreement, with OpenAI to account for ChatGPT’s use of their articles.[24] While these large institutions have resources and a direct claim to the IP in question, lawsuits by smaller parties have shown that IP violations, like copyright infringement, may be hard to prove.[25] Although the caselaw is in its infancy, it appears that courts are unlikely to find infringement unless there is proof of the AI directly using the intellectual property in a response.[26] It is unlikely that a court will extend protections to data used to train AI because it touches on a similar issue that was addressed in Feist Publications, Inc. v. Rural Telephone Service Co.[27] There the Supreme Court found that only the information contained within a database has copyright protection and that the copyright protection does not extend to the database as a whole.[28] Disaggregated data fed to an AI will likely appear more similar to a database than an individual piece of data to a court and will be unlikely to receive the copyright protection from each piece of data within the database.[29]

While some may herald the Agreement between Google and Reddit as a sign of licensing agreements to come, it is not clear that this trend will continue. While AI companies are interested in mitigating risk, the case law around data used to train AI is unsettled at best.[30] Additionally, Google and Reddit are in unique bargaining positions that make the Agreement more attractive to each party. Unless some of the final terms of the Agreement are disclosed, we will not know to what extent Google can use IP data from Reddit, but it is unlikely that the training data will contain protected information since Google is likely interested in optimizing Gemini for a search function instead of creating any type of new material.

[1] Annelise Gilbert, Google-Reddit AI Deal Heralds New Era in Social Media Licensing, Bloomberg L. (Mar. 7, 2024, 5:06 AM), https://www.bloomberglaw.com/product/blaw/bloomberglawnews/ip-law/BNA%200000018e0ac4d20babce9accef790001?bna_news_filter=ip-law#.

[2] Id.

[3] Id.

[4] See Christine Lagorio-Chafkin, Report: Reddit Sets an IPO Date, Price Range, Inc. (Mar. 7, 2024), https://www.inc.com/christine-lagorio/report-reddit-sets-ipo-date-price-range.html; Kyle Wiggers, Reddit says it’s made $203M so far licensing its data, Tech Crunch (Feb. 22, 2024, 5:27 AM), https://techcrunch.com/2024/02/22/reddit-says-its-made-203m-so-far-licensing-its-data/.

[5] Christine Lagorio-Chafkin, supra note 4; Kyle Wiggers, supra note 4.

[6] This blog refers to Gemini AI generally which encompasses Google’s entire AI suite. However Google only specifically mentions Vertex AI in its announcement of the Agreement. Vertex AI is specifically used for search functions. See Rajan Patel, An expanded partnership with Reddit, Google: The Keyword (Feb. 22, 2024), https://blog.google/inside-google/company-announcements/expanded-reddit-partnership/. Since the blog post only discusses Reddit’s use of Vertex AI, and does not mention which portion Gemini will use the Reddit Data API, I will assume that Google will use the data to train all of the AI bots in the Gemini suite.

[7] See Imad Khan, Reddit’s $60 Million Deal With Google Will Feed Generative AI, CNET (Feb. 22, 2024, 2:52 PM), https://www.cnet.com/tech/services-and-software/reddits-60-million-deal-with-google-will-feed-generative-ai/.

[8] Parmy Olson, Google’s Gemini Looks Remarkable, But It’s Still Behind OpenAI, Bloomberg (last updated Dec. 7, 2023 11:13 AM), https://www.bloomberg.com/opinion/articles/2023-12-07/google-s-gemini-ai-model-looks-remarkable-but-it-s-still-behind-openai-s-gpt-4.

[9] Id.

[10] See Imad Khan, Reddit’s $60 Million Deal With Google Will Feed Generative AI, CNET (Feb. 22, 2024, 2:52 PM), https://www.cnet.com/tech/services-and-software/reddits-60-million-deal-with-google-will-feed-generative-ai/.

[11] No. 20-CV-3010, 2023 WL 4999901 (D.D.C. Aug. 4, 2023).

[12] Id. at * 1 (explaining that the lawsuit was brought by the United States and Attorney General of 38 states contending that Google violated Section 2 of the Sherman Act).

[13] Id. at * 2.

[14] See id. at *2 (defining Specialized Vertical Providers as companies focused on niche markets).

[15] Rajan Patel, An expanded partnership with Reddit, Google: The Keyword (Feb. 22, 2024), https://blog.google/inside-google/company-announcements/expanded-reddit-partnership/; see also Jay Peters, ‘Reddit can survive without search’: company reportedly threatens to block Google, Verge (Oct. 20, 2023, 1:55 PM), https://www.theverge.com/2023/10/20/23925504/reddit-deny-force-log-in-see-posts-ai-companies-deals (discussing Reddit’s tensions with Google around “search crawling”).

[16] See Jay Peters, ‘Reddit can survive without search’: company reportedly threatens to block Google, Verge (Oct. 20, 2023, 1:55 PM), https://www.theverge.com/2023/10/20/23925504/reddit-deny-force-log-in-see-posts-ai-companies-deals (discussing Reddit’s tensions with Google around “search crawling”).

[17] See Haleuya Hadero & David Bauder, The New York Times sues OpenAI and Microsoft for using its stories to train chatbots, Associated Press (Dec. 27, 2023, 5:35 PM), https://apnews.com/article/nyt-new-york-times-openai-microsoft-6ea53a8ad3efa06ee4643b697df0ba57.

[18] API stands for Application Programming Interface. Spcifically Data API is a set of protocols that allow to different software applications to exchange data back and forth. See generally What is a Data API?, Gigaspaces, (last visited Mar. 22, 2024), https://www.gigaspaces.com/data-terms/data-api#:~:text=Data%20APIs%20are%20designed%20to,providing%20a%20reliable%20user%20experience..

[19] Data API Terms, Reddit (last revised April 18, 2023), https://www.redditinc.com/policies/data-api-terms.

[20] Id.

[21] Id.

[22] See id.; Annelise Gilbert, supra note 1 (reporting that Reddit will be paid $60 million annually by Google for the Agreement).

[23] Haleuya Hadero & David Bauder, supra note 17.

[24] Nitasha Tiku, Newspapers want payment for articles used to power ChatGPT, Wash. Post (last updated Oct. 20, 2023, 2:03 PM), https://www.washingtonpost.com/technology/2023/10/20/artificial-intelligence-battle-online-data/.

[25] See Recent Rulings in AI Copyright Lawsuits Shed Some Light, but Leave Many Questions, Perkins Coie (Dec. 14, 2023), https://www.perkinscoie.com/en/news-insights/recent-rulings-in-ai-copyright-lawsuits-shed-some-light-but-leave-many-questions.html.

[26] See Kadrey v. Meta Platforms, Inc., No. 23-cv-03417-VC, 2023 WL 8039640, at *1 (N.D. Cal. 2023) (noting that a theory that every output of an AI model trained on copyrighted material is an infringing deriviative work is not correct, only outputs of the AI that “incoporate in some form” a portion of the copyrighted work constitute deriviateve infringement).

[27] 499 U.S. 340 (1991).

[28] See id. at 363 (holding that an arrangement of information lacks “the creative spark” required for copyright protection).

[29] See id.

[30] See supra note 5 and accompanying text.

Share this post