
Wikimedia Foundation, the parent non-profit behind Wikipedia, shares similar sentiments, explaining how maintaining over 65 million articles already requires careful budget allocation, which the current turbulence has only exacerbated. A spokesperson told 404 Media that it sees "the primary impact in the purchase of memory and hard drives but also in terms of lead times on server deliveries and our capacity to place future orders."
Beyond the shortage, the AI boom has managed to affect archival efforts in another way that's likely not reversible: scraping. LLMs are trained on huge chunks of data often acquired from the internet, sometimes even illegally . As you'd expect, a lot of sites don't appreciate being randomly scraped to become part of some AI's learning material, so they've put up countermeasures that prevent companies from doing so.
Archiving the internet shares the same first step — it needs to extract information in order to preserve it, but website operators have been increasingly blocking such efforts . Bots that would otherwise scrape a site just to produce a snapshot for educational purposes are now being treated the same way as a bot looking to gather information for artificial intelligence, unintentionally or not.
People in the community who contribute to preservation efforts are also having to think twice about what to preserve. Since hard drives are so expensive now, even enthusiasts part of the r/DataHoarders subreddit are doom-posting about how they've stopped archiving entirely, waiting for prices to level out. You can occasionally find deals, but seeing a large-capacity drive at MSRP has become nearly impossible.
Get Tom's Hardware's best news and in-depth reviews, straight to your inbox.
Key considerations
- Investor positioning can change fast
- Volatility remains possible near catalysts
- Macro rates and liquidity can dominate flows
Reference reading
- https://www.tomshardware.com/pc-components/storage/SPONSORED_LINK_URL
- https://www.tomshardware.com/pc-components/storage/internet-archival-sites-struggling-to-preserve-the-internet-because-of-skyrocketing-hard-drive-prices-due-to-the-ai-boom-wayback-machine-and-wikimedia-punished-by-stratospheric-storage-pricing-and-stricter-anti-scraping-measures-blocking-the-wrong-bots#main
- https://www.tomshardware.com/subscription
- $200 'socketed' Nvidia AI GPU for servers hacked into a PCIe card with custom PCB and 3D-printed cooling — modded Tesla V100 SMX data center GPU runs AI LLMs an
- ‘Your Career Starts at the Beginning of the AI Revolution,’ NVIDIA CEO Tells Graduates
- NVIDIA and ServiceNow Partner on New Autonomous AI Agents for Enterprises
- Cloudflare cuts 20% of its jobs due to AI, and its stock takes a 19% spill — 1,100 jobs disappearing as company increased usage of AI sixfold over past months
- NASA pushes Mars helicopter rotors past the speed of sound for the first time ever — next-gen “SkyFall” aircraft's rotors hit 3,750 RPM, ten times faster than n
Informational only. No financial advice. Do your own research.