Community

GitHub Aims to Make Open Source Code Apocalypse-Proof in Arctic Vault

GitHub wants to make sure its entire warehouse of open source code survives an apocalypse by burying it deep within an Arctic vault as one of several preservation strategies.

GitHub, which Microsoft purchased last year for US$7.5 billion, last week announced that it is creating the GitHub Arctic Code Vault as a data repository for the existing Arctic World Archive. The AWA is a very-long-term archival facility about 0.16 miles deep in the permafrost of an Arctic mountain.

Located in a decommissioned coal mine in the Svalbard archipelago, the archive is closer to the North Pole than the Arctic Circle. GitHub will capture a snapshot of every active public repository on 02/02/2020 and preserve that data in the Arctic Code Vault.

Svalbard is regulated by the international Svalbard Treaty as a demilitarized zone. It is the location of the world’s northernmost town and is one of the most remote and geopolitically stable human habitations on Earth.

Future historians will be able to learn about us from open source projects and metadata, and might regard the current age of open source ubiquity, volunteer communities, and Moore’s Law as historically significant, according to GitHub.

“The human race has developed a lot of ways of destroying itself, ranging from nuclear holocaust to global warming,” observed Steve Foley, CEO of Bulk Memory Cards.

“So it’s probably a good idea to preserve what we know, somewhere, on the off chance a few people survive Armageddon,” he told LinuxInsider.

Not an Isolated Effort

GitHub has partnered with numerous organizations to ensure that its open source data will be safe, no matter what threatens its continued existence. GitHub considers its vast collection of open source projects a cornerstone of modern civilization.

The organization wants open source technology to survive climate change, political strife, and whatever else may result from the current general state of global affairs. As part of its plans, GitHub will tap into Microsoft’s Project Silica as another Doomsday storage option.

Project Silica will provide further help to archive all active public repositories for more than 10,000 years. The plan calls for writing them into quartz glass platters using a femtosecond laser. Microsoft recently announced a completed concept test of the new glass data technology by storing a copy of the 1978 Superman movie with the tech.

GitHub has partnered with the Long Now Foundation, the Internet Archive, the Software Heritage Foundation, the Arctic World Archive, Microsoft Research, the Bodleian Library and Stanford Libraries to ensure the long-term preservation of the world’s open source software. The goal is to store multiple copies across various data formats and locations.

Computer hardware can outlive most of today’s storage media, especially older ones and/or those with mask ROM. A range of possible futures exists where working modern computers exist but their software has largely been lost to bit rot. The Archive Program will preserve that software, according to GitHub.

The Arctic vaults were not constructed for the sole purpose of supporting GitHub’s plans, but the preservation of software code is a major focus.

“Various other items from around the world are also stored there, such as scientific and historical documents and valuable artwork. There is also a nearby seed vault ensuring the future of crops,” noted Foley.

An apocalypse vault is one of those things you do not need until you need it. The hope is that it never will be necessary, but if the option is on the table, it makes sense to utilize it, he suggested.

How It Works

For the Arctic World Archive, GitHub will store the data on 3,500-foot film reels, provided and encoded by Piql, a Norwegian company specializing in very-long-term data storage. The film technology relies on silver halides residing on polyester.

The result is expected to provide a minimum lifespan of 500 years to the archived data. Simulated aging tests indicate Piql’s film will last twice as long, allowing the data to survive a millennium.

The stored data will QR-encoded, and a human-readable index and guide will itemize the location of each repository and explain how to recover the data.

Is Long-Term Storage Really Needed?

The answer depends on several factors. Code is like writing. Some of it is great and important, and it should be preserved, said Chris Nicholson, CEO of Skymind.

“Storing all of GitHub’s open source code in a vault in the Arctic sounds both useful and wasteful,” he told LinuxInsider.

“There are some great projects and also a ton of bad, useless ones. I think they should cull it,” Nicholson said. “The survivors of a nuclear holocaust will not have the time or inclination to wade through 10,000 re-implementations of a Javascript Web tool.”

It also depends on the nature of the apocalyptic event. For example, managers of the seed vault already have made adjustments based on how climate change is impacting the Arctic, noted Bulk Memory Cards’ Foley.

“The GitHub plan is designed to preserve the data for 1,000 years; even if the entire planet loses electricity, it can be read by a magnifying glass,” he said.

Skeptical Perspective

A storage program for computer code is necessary if you believe that in a post-apocalyptic hellscape someone will enough care about open source coding to mount an expedition to the Arctic, said Charles King, principal analyst at Pund-IT.

The odds aren’t terribly good that GitHub’s plan will actually work, he suggested.

First, someone would have to look for, find, and gain access to the repository. Then there is the matter of the discoverers decoding instructions, starting up power supplies, getting systems up and running, and learning to code.

“The farther away you get from the day the materials are stored, the less likely that the rosy outcome GitHub envisions is likely to occur,” King told LinuxInsider.

GitHub’s plan is almost certainly a public relations play designed to generate buzz for the company, said Phil Strazzulla, founder of Select Software Reviews.

“Think about all of the servers that are stored around the world that hold repositories of this code. The only way the Arctic vault would be useful is if the entire human civilization was essentially wiped out, and then somehow another form of life eventually figured out how to find and analyze this code,” he told LinuxInsider.

He sees the bottom line as the absence of any scenario in the future in which saving open source technology would become useful, even if you believe there is a high likelihood of doomsday scenarios.

“This is more a calculus of how much the effort will cost relative to the amount of press that it will generate,” Strazzulla said.

Back to the Future

GitHub’s plan could be vital or superfluous. It suggests one of two outcomes for the long-term value of open source technology.

It depends on how you view the future, observed Rob Enderle, principal analyst at the Enderle Group.

We do seem to be ignoring the risks that could end the human race, both natural and man made. This code-storage would offset some of that risk, he pointed out.

“The effort can work, but it will depend on the nature of the catastrophe,” Enderle told LinuxInsider.

For example, if the catastrophe wipes out most life, this effort can work. If it wipes out all life, we are done regardless.

“Open source should make the effort more viable,” Enderle said, “because the needed skills will be more prevalent and thus more likely to survive. This could significantly improve the chances of survival post-catastrophe.”

Opposing Views on Values

It is hard to say what the storage efforts suggest about the value of open source to a recovering world, Pund-IT’s King argued. To be charitable, it is laudable that GitHub cares enough about its code to mount so complex an effort.

“From a more cynical viewpoint, the company may simply be trying to divert attention from employees who continue to resign over GitHub’s contract with Immigration and Customs Enforcement,” he remarked.

Big Question: Will It Work?

One of the big risks with this plan is that code depends on a whole software stack: hardware, assembly language, and a certain form of electricity. The chips that code runs on are really incredibly complex, noted Skymind’s Nicholson.

“You would need all that underlying infrastructure to run the code GitHub stores. I hope GitHub will also include some model hardware in its vault. It would be too much to ask to include a fab,” he said.

For technology’s survival, open source stands out for two reasons:

First, you can increase the positive feedback loops between the people who write code and those who use it. That leads to much better code quality compared to closed-source projects with limited users looking over the source.

“The importance of that cannot be understated,” said Nicholson.

Second, open source code minimizes legal risk. That is also extremely important, he added, noting that some great closed-source code probably should go into the vault.

“But why risk a lawsuit?” Nicholson reasoned. “Open source code really is moving society forward in a lot of ways, based on the work of a few dedicated teams and a relatively small number of core committers.”

Jack M. Germain

Jack M. Germain has been an ECT News Network reporter since 2003. His main areas of focus are enterprise IT, Linux and open source technologies. He has written numerous reviews of Linux distros and other open source software.Email Jack.

Leave a Comment

Please sign in to post or reply to a comment. New users create a free account.

Related Stories
More by Jack M. Germain
More in Community

LinuxInsider Channels