Exclusive-Multiple AI companies bypassing web standard to scrape publisher sites, licensing firm says

📆 15/12/45 08:38 م
📰 Investingcom

⏱ Reading Time:
52 sec. here
2 min. at publisher
📊 Quality Score:
News: 24%
Publisher: 53%

المملكة العربية السعودية أخبار أخبار

المملكة العربية السعودية أحدث الأخبار,المملكة العربية السعودية عناوين

Exclusive-Multiple AI companies bypassing web standard to scrape publisher sites, licensing firm says

-Multiple artificial intelligence companies are circumventing a common web standard used by publishers to block the scraping of their content for use in generative AI systems, content licensing startup TollBit has told publishers.

A Wired investigation published this week found Perplexity likely bypassing efforts to block its web crawler via the Robots Exclusion Protocol, or"robots.txt," a widely accepted standard meant to determine which parts of a site are allowed to be crawled.The News Media Alliance, a trade group representing more than 2,200 U.S.-based publishers, expressed concern about the impact that ignoring"do not crawl" signals could have on its members.

The company tracks AI traffic to the publishers' websites and uses analytics to help both sides settle on fees to be paid for the use of different types of content. "What this means in practical terms is that AI agents from multiple sources are opting to bypass the robots.txt protocol to retrieve content from sites," TollBit wrote."The more publisher logs we ingest, the more this pattern emerges."

The AI companies use the content both to train their algorithms and to generate summaries of real-time information.

اكتب تعليق

لقد قمنا بتلخيص هذا الخبر حتى تتمكن من قراءته بسرعة. إذا كنت مهتمًا بالأخبار، يمكنك قراءة النص الكامل هنا. اقرأ أكثر:

المملكة العربية السعودية أحدث الأخبار, المملكة العربية السعودية عناوين