Skip to content
Unknown's avatar

Lost Boy

The blog of @ldodds

Tag: webcrawler

Will AI hamper our ability to crawl the web for useful data?

As websites start to block Common Crawl, and as the project leans in to its role in training LLMs, will it become harder to use data from the web for other purposes?

Leigh Dodds Data, The Commons, Web September 23, 2023September 23, 2023 2 Minutes

Recent Posts

  • A non-digital service example of working in the open
  • Calculating carbon emissions for energy data in the UK
  • “AI-Ready Data” is the wrong framing
  • Falsehoods this programmer believed about energy meters
  • Falsehoods this programmer believed about half-hourly energy data

Top Posts & Pages

  • Do data scientists spend 80% of their time cleaning data? Turns out, no?
  • First impressions of the Octopus Home Mini
  • Minecraft Activities for Younger Kids
  • Confused by SOLID
  • Remembering INPUT magazine

Follow me on Mastodon

RSS Feed RSS - Posts

Categories

  • Bath (5)
  • Books (15)
  • Data (42)
  • Energy (12)
  • Energy Sparks (6)
  • Fiction (6)
  • Films (2)
  • Fun (7)
  • Games (15)
  • Java (21)
  • Lego (2)
  • Markup (36)
  • Music (7)
  • Personal (65)
  • Research Notebook (2)
  • REST (22)
  • Science and Technology (19)
  • Semantic Web (191)
  • The Commons (157)
    • Data Ecosystems (12)
    • Data Infrastructure (53)
      • Standards (10)
    • Open Data (102)
      • Open Data Parable (5)
    • Open Source (6)
  • Uncategorized (58)
  • Web (52)

Archives

Pages

  • About
  • Now
  • Ideas

Meta

  • Create account
  • Log in
  • Entries feed
  • Comments feed
  • WordPress.com
Creative Commons Licence
This work is licensed under a Creative Commons Attribution 4.0 International License.
Blog at WordPress.com.
  • Subscribe Subscribed
    • Lost Boy
    • Join 80 other subscribers
    • Already have a WordPress.com account? Log in now.
    • Lost Boy
    • Subscribe Subscribed
    • Sign up
    • Log in
    • Report this content
    • View site in Reader
    • Manage subscriptions
    • Collapse this bar