As websites start to block Common Crawl, and as the project leans in to its role in training LLMs, will it become harder to use data from the web for other purposes?
As websites start to block Common Crawl, and as the project leans in to its role in training LLMs, will it become harder to use data from the web for other purposes?