{"id":20690,"date":"2024-04-18T05:44:30","date_gmt":"2024-04-18T05:44:30","guid":{"rendered":"https:\/\/interface.media\/?p=20690"},"modified":"2024-04-18T05:44:37","modified_gmt":"2024-04-18T05:44:37","slug":"the-next-generation-of-generative-ai-will-be-trained-on-reddit-threads-and-tumblr-posts","status":"publish","type":"post","link":"https:\/\/interface.media\/blog\/2024\/04\/18\/the-next-generation-of-generative-ai-will-be-trained-on-reddit-threads-and-tumblr-posts\/","title":{"rendered":"The next generation of generative AI will be trained on Reddit threads and tumblr posts"},"content":{"rendered":"\n<p>Generative artificial intelligence (AI) companies like OpenAI, Google, and Microsoft are on the hunt for new training data. In 2022 a research paper warned that we could run out of high quality data on which to train stable diffusion algorithms and large language models (LLMs) <a href=\"https:\/\/techxplore.com\/news\/2023-11-ai.html\">as soon as 2026<\/a>. Since then, AI firms have reportedly found a potential source of new information: social media.\u00a0<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-social-media-offers-vast-amounts-of-usable-training-data\">Social media offers \u201cvast\u201d amounts of usable training data<\/h3>\n\n\n\n<p>In February, it was revealed that the social media site reddit had struck a deal with a large AI company. The <a href=\"https:\/\/arstechnica.com\/information-technology\/2024\/02\/your-reddit-posts-may-train-ai-models-following-new-60-million-agreement\/\">$60 million per year agreement<\/a> will see the company train its generative AI using content created by reddit\u2019s users.\u00a0The buyer was later revealed to be Google, which is locked in a bitter AI race with OpenAI and Microsoft.<\/p>\n\n\n\n<p>This will allegedly provide Google with an &#8220;efficient and structured way to access the vast corpus of existing content on Reddit.&#8221;\u00a0<\/p>\n\n\n\n<p>The move caused significant controversy in the ramp up to an expected public offering by the company. A week later, social media platform tumblr and blog hosting platform WordPress also announced that they would be <a href=\"https:\/\/bnnbreaking.com\/tech\/tumblr-wordpress-announce-user-data-sales-for-ai-training-privacy-concerns-arise\">selling their users\u2019 data<\/a> to Midjourney and OpenAI.\u00a0<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-the-race-for-ai-training-data\">The race for AI training data \u00a0<\/h3>\n\n\n\n<p>These developments mark an evolution of an existing trend. Increasingly the AI industry is shifting from unpaid data scraping towards a model where the owners of data are paid for it. Recently, OpenAI was revealed to be paying between <a href=\"https:\/\/www.theverge.com\/2024\/1\/4\/24025409\/openai-training-data-lowball-nyt-ai-copyright\">$1 million and $5 million<\/a> a year to licence copyrighted news articles from outlets like the <em>New York Times <\/em>and the <em>Washington Post <\/em>to train its AI models.\u00a0\u00a0<\/p>\n\n\n\n<p>In December 2023, OpenAI also signed an agreement with Axel Springer. The German publisher is being paid an undisclosed sum for access to articles published Politico and Business Insider. OpenAI has also struck deals with other organisations, including the Associated Press, and is reportedly in licensing talks with CNN, Fox, and Time.\u00a0<\/p>\n\n\n\n<p>However, a content creation (or journalistic) organisation licensing out the content it creates and distributes is one thing. The sale of public <em>and private <\/em>user data generated on social media is an entirely different matter. Of course, such data is already sold and mined heavily for advertising purposes. Income from the sale of personal data makes up the majority of social media sites like Facebook\u2019s revenue. <\/p>\n\n\n\n<p>If social media content is mined to train the next generation of AI, it&#8217;s essential that user data is anonymised. This may be less of an issue on sites like Reddit and Tumblr, where user identities are already concealed. However, the race for AI training data continues to gather pace. Soon, AI companies may look towards less anonymised sites like Instagram and X (formerly Twitter).  <\/p>\n","protected":false},"excerpt":{"rendered":"<p>Social media sites are seeking new revenue by selling users\u2019 content to train generative AI models. <\/p>\n","protected":false},"author":480,"featured_media":20691,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"apple_news_api_created_at":"2024-04-18T05:44:35Z","apple_news_api_id":"8edcaa28-d39e-40f3-8bec-bae361f55fcf","apple_news_api_modified_at":"2024-04-18T05:44:35Z","apple_news_api_revision":"AAAAAAAAAAD\/\/\/\/\/\/\/\/\/\/w==","apple_news_api_share_url":"https:\/\/apple.news\/AjtyqKNOeQPOL7LrjYfVfzw","apple_news_cover_media_provider":"image","apple_news_coverimage":0,"apple_news_coverimage_caption":"","apple_news_cover_video_id":0,"apple_news_cover_video_url":"","apple_news_cover_embedwebvideo_url":"","apple_news_is_hidden":"","apple_news_is_paid":"","apple_news_is_preview":"","apple_news_is_sponsored":"","apple_news_maturity_rating":"","apple_news_metadata":"\"\"","apple_news_pullquote":"","apple_news_pullquote_position":"","apple_news_slug":"","apple_news_sections":[],"apple_news_suppress_video_url":false,"apple_news_use_image_component":false,"footnotes":""},"categories":[3],"tags":[],"topic":[614],"class_list":["post-20690","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-the-interface","topic-data-ai"],"acf":[],"apple_news_notices":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v26.6 (Yoast SEO v26.6) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>The next generation of generative AI will be trained on Reddit threads and tumblr posts - Interface<\/title>\n<meta name=\"description\" content=\"Social media sites are seeking new revenue by selling users\u2019 content to train generative artificial intelligence models.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<meta property=\"og:locale\" content=\"en_GB\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"The next generation of generative AI will be trained on Reddit threads and tumblr posts\" \/>\n<meta property=\"og:description\" content=\"Social media sites are seeking new revenue by selling users\u2019 content to train generative artificial intelligence models.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/interface.media\/blog\/2024\/04\/18\/the-next-generation-of-generative-ai-will-be-trained-on-reddit-threads-and-tumblr-posts\/\" \/>\n<meta property=\"og:site_name\" content=\"Interface\" \/>\n<meta property=\"article:published_time\" content=\"2024-04-18T05:44:30+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2024-04-18T05:44:37+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/interface.media\/wp-content\/uploads\/sites\/3\/2024\/04\/iStock-1591754867.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1254\" \/>\n\t<meta property=\"og:image:height\" content=\"836\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Dan Brightmore\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Dan Brightmore\" \/>\n\t<meta name=\"twitter:label2\" content=\"Estimated reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"3 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/interface.media\/blog\/2024\/04\/18\/the-next-generation-of-generative-ai-will-be-trained-on-reddit-threads-and-tumblr-posts\/\",\"url\":\"https:\/\/interface.media\/blog\/2024\/04\/18\/the-next-generation-of-generative-ai-will-be-trained-on-reddit-threads-and-tumblr-posts\/\",\"name\":\"The next generation of generative AI will be trained on Reddit threads and tumblr posts - Interface\",\"isPartOf\":{\"@id\":\"https:\/\/interface.media\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/interface.media\/blog\/2024\/04\/18\/the-next-generation-of-generative-ai-will-be-trained-on-reddit-threads-and-tumblr-posts\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/interface.media\/blog\/2024\/04\/18\/the-next-generation-of-generative-ai-will-be-trained-on-reddit-threads-and-tumblr-posts\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/interface.media\/wp-content\/uploads\/sites\/3\/2024\/04\/iStock-1591754867.jpg\",\"datePublished\":\"2024-04-18T05:44:30+00:00\",\"dateModified\":\"2024-04-18T05:44:37+00:00\",\"author\":{\"@id\":\"https:\/\/interface.media\/#\/schema\/person\/7c33499ca8e42b097028109cccb22748\"},\"description\":\"Social media sites are seeking new revenue by selling users\u2019 content to train generative artificial intelligence models.\",\"breadcrumb\":{\"@id\":\"https:\/\/interface.media\/blog\/2024\/04\/18\/the-next-generation-of-generative-ai-will-be-trained-on-reddit-threads-and-tumblr-posts\/#breadcrumb\"},\"inLanguage\":\"en-GB\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/interface.media\/blog\/2024\/04\/18\/the-next-generation-of-generative-ai-will-be-trained-on-reddit-threads-and-tumblr-posts\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-GB\",\"@id\":\"https:\/\/interface.media\/blog\/2024\/04\/18\/the-next-generation-of-generative-ai-will-be-trained-on-reddit-threads-and-tumblr-posts\/#primaryimage\",\"url\":\"https:\/\/interface.media\/wp-content\/uploads\/sites\/3\/2024\/04\/iStock-1591754867.jpg\",\"contentUrl\":\"https:\/\/interface.media\/wp-content\/uploads\/sites\/3\/2024\/04\/iStock-1591754867.jpg\",\"width\":1254,\"height\":836,\"caption\":\"Shanghai,China-July 30th 2023: X (new Twitter), Threads, Facebook, YouTube, Instagram, WeChat, WhatsApp. Douyin(TikTok) and Sina Weibo app icons. Assorted online social media software brands\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/interface.media\/blog\/2024\/04\/18\/the-next-generation-of-generative-ai-will-be-trained-on-reddit-threads-and-tumblr-posts\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/interface.media\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"The next generation of generative AI will be trained on Reddit threads and tumblr posts\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/interface.media\/#website\",\"url\":\"https:\/\/interface.media\/\",\"name\":\"Interface\",\"description\":\"Delivering World Class Content \u201cFrom Executive, For Executive\u201c\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/interface.media\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-GB\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/interface.media\/#\/schema\/person\/7c33499ca8e42b097028109cccb22748\",\"name\":\"Dan Brightmore\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-GB\",\"@id\":\"https:\/\/interface.media\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/e9ca282f0ef431735a64685769ad57886e24b074c4c58314392755fb79164164?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/e9ca282f0ef431735a64685769ad57886e24b074c4c58314392755fb79164164?s=96&d=mm&r=g\",\"caption\":\"Dan Brightmore\"},\"url\":\"https:\/\/interface.media\/blog\/author\/dbrightmore\/\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"The next generation of generative AI will be trained on Reddit threads and tumblr posts - Interface","description":"Social media sites are seeking new revenue by selling users\u2019 content to train generative artificial intelligence models.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"og_locale":"en_GB","og_type":"article","og_title":"The next generation of generative AI will be trained on Reddit threads and tumblr posts","og_description":"Social media sites are seeking new revenue by selling users\u2019 content to train generative artificial intelligence models.","og_url":"https:\/\/interface.media\/blog\/2024\/04\/18\/the-next-generation-of-generative-ai-will-be-trained-on-reddit-threads-and-tumblr-posts\/","og_site_name":"Interface","article_published_time":"2024-04-18T05:44:30+00:00","article_modified_time":"2024-04-18T05:44:37+00:00","og_image":[{"width":1254,"height":836,"url":"https:\/\/interface.media\/wp-content\/uploads\/sites\/3\/2024\/04\/iStock-1591754867.jpg","type":"image\/jpeg"}],"author":"Dan Brightmore","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Dan Brightmore","Estimated reading time":"3 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/interface.media\/blog\/2024\/04\/18\/the-next-generation-of-generative-ai-will-be-trained-on-reddit-threads-and-tumblr-posts\/","url":"https:\/\/interface.media\/blog\/2024\/04\/18\/the-next-generation-of-generative-ai-will-be-trained-on-reddit-threads-and-tumblr-posts\/","name":"The next generation of generative AI will be trained on Reddit threads and tumblr posts - Interface","isPartOf":{"@id":"https:\/\/interface.media\/#website"},"primaryImageOfPage":{"@id":"https:\/\/interface.media\/blog\/2024\/04\/18\/the-next-generation-of-generative-ai-will-be-trained-on-reddit-threads-and-tumblr-posts\/#primaryimage"},"image":{"@id":"https:\/\/interface.media\/blog\/2024\/04\/18\/the-next-generation-of-generative-ai-will-be-trained-on-reddit-threads-and-tumblr-posts\/#primaryimage"},"thumbnailUrl":"https:\/\/interface.media\/wp-content\/uploads\/sites\/3\/2024\/04\/iStock-1591754867.jpg","datePublished":"2024-04-18T05:44:30+00:00","dateModified":"2024-04-18T05:44:37+00:00","author":{"@id":"https:\/\/interface.media\/#\/schema\/person\/7c33499ca8e42b097028109cccb22748"},"description":"Social media sites are seeking new revenue by selling users\u2019 content to train generative artificial intelligence models.","breadcrumb":{"@id":"https:\/\/interface.media\/blog\/2024\/04\/18\/the-next-generation-of-generative-ai-will-be-trained-on-reddit-threads-and-tumblr-posts\/#breadcrumb"},"inLanguage":"en-GB","potentialAction":[{"@type":"ReadAction","target":["https:\/\/interface.media\/blog\/2024\/04\/18\/the-next-generation-of-generative-ai-will-be-trained-on-reddit-threads-and-tumblr-posts\/"]}]},{"@type":"ImageObject","inLanguage":"en-GB","@id":"https:\/\/interface.media\/blog\/2024\/04\/18\/the-next-generation-of-generative-ai-will-be-trained-on-reddit-threads-and-tumblr-posts\/#primaryimage","url":"https:\/\/interface.media\/wp-content\/uploads\/sites\/3\/2024\/04\/iStock-1591754867.jpg","contentUrl":"https:\/\/interface.media\/wp-content\/uploads\/sites\/3\/2024\/04\/iStock-1591754867.jpg","width":1254,"height":836,"caption":"Shanghai,China-July 30th 2023: X (new Twitter), Threads, Facebook, YouTube, Instagram, WeChat, WhatsApp. Douyin(TikTok) and Sina Weibo app icons. Assorted online social media software brands"},{"@type":"BreadcrumbList","@id":"https:\/\/interface.media\/blog\/2024\/04\/18\/the-next-generation-of-generative-ai-will-be-trained-on-reddit-threads-and-tumblr-posts\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/interface.media\/"},{"@type":"ListItem","position":2,"name":"The next generation of generative AI will be trained on Reddit threads and tumblr posts"}]},{"@type":"WebSite","@id":"https:\/\/interface.media\/#website","url":"https:\/\/interface.media\/","name":"Interface","description":"Delivering World Class Content \u201cFrom Executive, For Executive\u201c","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/interface.media\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-GB"},{"@type":"Person","@id":"https:\/\/interface.media\/#\/schema\/person\/7c33499ca8e42b097028109cccb22748","name":"Dan Brightmore","image":{"@type":"ImageObject","inLanguage":"en-GB","@id":"https:\/\/interface.media\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/e9ca282f0ef431735a64685769ad57886e24b074c4c58314392755fb79164164?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/e9ca282f0ef431735a64685769ad57886e24b074c4c58314392755fb79164164?s=96&d=mm&r=g","caption":"Dan Brightmore"},"url":"https:\/\/interface.media\/blog\/author\/dbrightmore\/"}]}},"_links":{"self":[{"href":"https:\/\/interface.media\/wp-json\/wp\/v2\/posts\/20690","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/interface.media\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/interface.media\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/interface.media\/wp-json\/wp\/v2\/users\/480"}],"replies":[{"embeddable":true,"href":"https:\/\/interface.media\/wp-json\/wp\/v2\/comments?post=20690"}],"version-history":[{"count":1,"href":"https:\/\/interface.media\/wp-json\/wp\/v2\/posts\/20690\/revisions"}],"predecessor-version":[{"id":20692,"href":"https:\/\/interface.media\/wp-json\/wp\/v2\/posts\/20690\/revisions\/20692"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/interface.media\/wp-json\/wp\/v2\/media\/20691"}],"wp:attachment":[{"href":"https:\/\/interface.media\/wp-json\/wp\/v2\/media?parent=20690"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/interface.media\/wp-json\/wp\/v2\/categories?post=20690"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/interface.media\/wp-json\/wp\/v2\/tags?post=20690"},{"taxonomy":"topic","embeddable":true,"href":"https:\/\/interface.media\/wp-json\/wp\/v2\/topic?post=20690"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}