Just a few blocks from the Golden Gate Bridge (San Francisco, USA), the white building with 8 massive pillars standing outside is quietly carrying out the greatest mission in the history of the Internet: Storing more than 1,000 billion web pages, equivalent to more than 100,000 terabytes of data, enough to fill tens of millions of DVDs.
The building, once a century-old Christian Science church, is now home to the Internet Archive, the world's largest nonprofit digital library.

Internet Archive headquarters in San Francisco (USA). (Photo: KALW)
The old sound of Bible readings was replaced by the sound of cooling fans from thousands of servers located right in the middle of the main church, under the brilliant stained glass windows.
It is here that the Wayback Machine, used by millions of people every day, has preserved nearly three decades of Internet history. This October, the massive repository officially reached the milestone of one trillion web pages archived since Brewster Kahle, founder of the Internet Archive, began the project in 1996.
At that time, a year’s worth of web data took up only about 2 terabytes, the same amount of storage as an iPhone today. Now, the Wayback Machine collects nearly 150 terabytes every day, the equivalent of hundreds of millions of new web pages.
Brewster Kahle, with his silver hair and ever-present smile like an enthusiastic science teacher, chose to buy the old church because it resembled his organization's symbol: ancient Greek columns - a symbol of longevity.
“We want to remind people that the Internet also needs a modern-day 'Great Library of Alexandria,'” he said, sitting on the same wooden bench that remains from the church's active days.
Preserve of “digital memories”

Brewster Kahle, founder of the Internet Archive. (Photo: AP)
Wayback Machine doesn't just take screenshots of websites that were and are still active, it saves the entire HTML, CSS, and JavaScript source code so it can recreate the website exactly as it was at that time, even if the original server has long been shut down.
Thanks to that, journalists can find removed articles, researchers can compare information from each government term, or Internet users can revisit their favorite websites that have disappeared, such as Geocities, Gawker, and MTV News.
Even as artificial intelligence (AI) blurs the line between real and fake, the Internet Archive has another mission: to archive AI-generated content.
Every day, the library’s team of engineers and librarians come up with hundreds of questions based on breaking news, feed them into ChatGPT, Gemini, or other AI models, and then store both the questions and answers. The summaries that appear at the top of Google search results are also carefully archived.

Founder Brewster Kahle makes no secret of his reasons, saying: “Libraries are always the first target when every new administration comes into power." (Photo: Amber Hughes)
To avoid natural or political risks, data copies are placed in many places around the world. Founder Brewster Kahle makes no secret of the reason when he says: “Libraries are always the first target when every new government comes to power. We learn from history to design for the future.”
In 2017 and more recently under the Trump administration, a series of US government websites were wiped clean of information about climate change, LGBTQ+ rights, and the achievements of Black military personnel. Thanks to the Internet Archive, the press was able to accurately restore the above information.
Home of the “cyberpunk spirits”
Entering the Internet Archive headquarters, visitors can easily imagine themselves lost in a living museum of the internet. More than 100 1-meter-tall terracotta statues, each depicting an employee who has worked here for at least 3 years, stand in rows like the terracotta army in the mausoleum of Qin Shi Huang.

About 200 people work at the Internet Archive. (Photo: CNN)
Meanwhile, the Internet Archive’s in-house book scanners are working tirelessly, flipping through each physical book and scanning it one by one, with the entire process live-streamed on YouTube with a soothing lo-fi soundtrack playing in the background.
Right next to it, a record player with a turntable from the 1920s still spins. They play classic tunes, mixed with a series of other ancient media reading devices such as microfilm projectors, old CD players, even satellite TV receivers from the early days of digital technology… All create a space that is both nostalgic and modern, where all information formats of humanity are respected and protected.
The two hundred people here, from programmers to librarians, are all “cyberpunk,” as one guest put it at a party celebrating the Wayback Machine’s 1 trillionth web page. They work not for high salaries but for the belief that if no one were to archive them, humanity’s entire digital memory would vanish overnight.
Brewster Kahle reiterates that the Internet Archive is not a museum to tell a single story, nor a censor of truth. It is simply a resource for anyone to write their own story from the intact digital past. And with 1 trillion pages already saved, the journey to protecting humanity’s collective memory has only just begun.
Source: https://vtcnews.vn/nha-tho-co-luu-giu-hon-1-000-ty-trang-web-toan-cau-ar988112.html






Comment (0)