Software Heritage is a non-profit organization which provides a service for archiving and referencing historical and contemporary software — with a focus on human readable source code. The site was unveiled in 2016 by Inria and is supported by UNESCO. The project itself is structured as a non‑profit multi‑stakeholder initiative.
Overview
The stated mission of Software Heritage is to collect, preserve and share all software that is publicly available in source code form, with the goal of building a common, shared infrastructure at the service of industry, research, culture and society as a whole.
Software source code is collected by crawling code hosting platforms, like GitHub, GitLab.com or Bitbucket, and packages archives, like npm or PyPI, and ingested into a special data structure, a Merkle DAG, that is the core of the archive. Each artifact in the archive is associated with an identifier called a SWHID. In 2023, the expansion of SWHID was changed from Software Heritage identifier to software hash identifier.
In order to increase the chances of preserving the Software Heritage archive over the long...