
Wikimedia Foundation, the parent non-profit behind Wikipedia, shares similar sentiments, explaining how maintaining over 65 million articles already requires careful budget allocation, which the current turbulence has only exacerbated. A spokesperson told 404 Media that it sees "the primary impact in the purchase of memory and hard drives but also in terms of lead times on server deliveries and our capacity to place future orders."
Beyond the shortage, the AI boom has managed to affect archival efforts in another way that's likely not reversible: scraping. LLMs are trained on huge chunks of data often acquired from the internet, sometimes even illegally . As you'd expect, a lot of sites don't appreciate being randomly scraped to become part of some AI's learning material, so they've put up countermeasures that prevent companies from doing so.
Archiving the internet shares the same first step — it needs to extract information in order to preserve it, but website operators have been increasingly blocking such efforts . Bots that would otherwise scrape a site just to produce a snapshot for educational purposes are now being treated the same way as a bot looking to gather information for artificial intelligence, unintentionally or not.
People in the community who contribute to preservation efforts are also having to think twice about what to preserve. Since hard drives are so expensive now, even enthusiasts part of the r/DataHoarders subreddit are doom-posting about how they've stopped archiving entirely, waiting for prices to level out. You can occasionally find deals, but seeing a large-capacity drive at MSRP has become nearly impossible.
Get Tom's Hardware's best news and in-depth reviews, straight to your inbox.
Key considerations
- Investor positioning can change fast
- Volatility remains possible near catalysts
- Macro rates and liquidity can dominate flows
Reference reading
- https://www.tomshardware.com/pc-components/storage/SPONSORED_LINK_URL
- https://www.tomshardware.com/pc-components/storage/internet-archival-sites-struggling-to-preserve-the-internet-because-of-skyrocketing-hard-drive-prices-due-to-the-ai-boom-wayback-machine-and-wikimedia-punished-by-stratospheric-storage-pricing-and-stricter-anti-scraping-measures-blocking-the-wrong-bots#main
- https://www.tomshardware.com
- Save $1,100 on this powerful gaming rig with Nvidia RTX 5070 Ti, Intel 14900KF, 32GB of RAM, and 2TB of storage — ABS Stratos Aqua on deep discount
- Commodore Amiga-emulating TheA1200 retro computer delayed nearly half a year by ‘global chip shortages’ — Retro Games Ltd says it will use the extra time to fin
- GameNative unlocks up to 100 fps gameplay for PC games on Android devices by adding multi-frame generation — Vulkan version of Lossless Scaling boosts performan
- White House reportedly considers mandatory government vetting of AI models before release — executive order under discussion
- PCIe 8.0 spec hits 1 TB/s of bandwidth and has new connector technology — spec hits 0.5V milestone, final ratification expected in 2028
Informational only. No financial advice. Do your own research.