Data utilization remains a contentious issue within the realm of artificial intelligence. In an effort to address concerns, major industry players have been forming partnerships with social and content platforms. One notable example is the collaboration between Reddit and Google. In a recent development, Reddit is currently in discussions with Google to renegotiate their content-sharing agreement for AI training, which was initially established over a year ago for a reported $60 million annually.
The ongoing negotiations between the two tech giants are centered around two primary areas, as reported by Bloomberg. Firstly, Reddit is proposing a revised deal structure that shifts away from a fixed payment model to a more dynamic pricing approach. Under this new arrangement, Reddit’s compensation would be based on the frequency with which its content is referenced or utilized as a source for answers generated by AI platforms such as Google’s AI Overviews. Reddit’s leadership believes that the existing terms do not accurately reflect the true value of their data to AI companies.
Secondly, Reddit is also exploring a different type of partnership with Google that aims to drive more of the traffic it receives from the search engine towards active community engagement on its platform. Currently, users who discover Reddit content via Google often do not transition to become regular users of the Reddit platform itself. This poses a challenge for Reddit in terms of expanding its user base and generating fresh content for future AI training purposes. By fostering deeper user engagement, this potential partnership could provide a steady stream of high-quality data for AI models.
The ongoing discussions between Reddit and Google underscore the significance of Reddit’s data as a valuable resource for AI enterprises. Large-scale language models heavily rely on extensive datasets sourced from the internet, with Reddit’s distinctive format featuring in-depth, user-driven discussions across a wide array of topics being a frequently cited data source. Data indicates that Reddit is the most referenced domain for prominent AI tools like Perplexity and Google’s AI Overviews.
It’s worth noting that Reddit is not the only platform addressing the issue of fair compensation for data usage by AI models. Various content providers, including news publishers, have also been grappling with this dilemma. For instance, The New York Times has filed lawsuits against both OpenAI and Google, alleging unauthorized usage of its content. In a similar vein, Reddit has also taken legal action against Anthropic, a competitor of OpenAI, accusing the AI startup of unlawfully scraping its data for model training.
While the outcome of the negotiations between Reddit and Google is still pending, these discussions shed light on how content platforms are striving to establish equitable compensation mechanisms for their valuable content.