SCOOP: From DOIs to ORCIDs — How Persistent Identifiers Work (and How to Make Them Work for You)/
June 26, 2020
The average person spends five months of her life looking for misplaced things. One might say this is one of the great common experiences of humanity — look at the recent outpouring of social media sympathy about cleaners at a Suffolk library who took bays of books down to dust and replaced them in order by size.
Now imagine the Internet as a library with 4.2 billion books on 8.2 million shelves, managed by hundreds of millions of (mostly) volunteers who, for various reasons (running out of shelf space, shelving cost increases, desire for glass doors on shelves…), ceaselessly shuffle books around with no coordinated plan. The result is the “404 Not Found” errors you encounter on the Web when the folder content once sat in has been moved or renamed, or the content is now nested in a place you (and your browser) didn’t expect.
Normally, when you go to a website, you get there by typing in its URL (Uniform Resource Locator) or clicking a link that contains the URL. The URL points to a real hard drive somewhere that contains the files you want to access. Let’s say you clicked the link in last month’s SCOOP column to the report on COVID-19 and Libraries at https://crsreports.congress.gov/product/pdf/LSB/LSB10453
Now imagine the Internet as a library with 4.2 billion books on 8.2 million shelves...
By clicking that link, you told your browser you wanted it to contact the server (a computer that provides files to other computers) called congress.gov, go into a section of that server called crsreports, then into a folder named “product,” a folder named “pdf,” a folder named “LSB,” and finally to give you the file named “LSB10453.” (This is a slight simplification, but it serves our present purpose.) This works well as long as no one changes anything. If someone decides the “LSB” folder is getting too crowded and creates subfolders, though, or realizes the report number is wrong and changes it to 10454, your browser will get stuck looking for a folder or file name that doesn’t exist anymore or isn’t where the URL says it should be. In technical terms, this is called “link rot,” and it’s where a persistent identifier (sometimes called a “PID”) comes in.
PIDs for Persistence
In the example above (and in all URLs), congress.gov is actually a kind of nickname for a server whose real name is an ungainly string of numbers called an IP address. This nickname is called a “domain name,” and it is mapped to the IP address in a registry maintained by “registrars” designated by the Internet Corporation for Assigned Names and Numbers (ICANN). This serves two purposes. A domain name is much easier to remember and to type than a straight IP address is, and it allows websites to move to new servers when they upgrade systems, need more storage space, etc. All they have to do is tell their registrar to update the record to point their domain name to a new IP address. As long as they keep the file structure the same on the new server, all the links to their website will still work.
A persistent identifier does the same thing, but for individual files instead of servers. One of the most familiar persistent identifiers is the DOI (Digital Object Identifier). Let’s look at an example from the most recent issue of Theological Librarianship: https://doi.org/10.31046/tl.v13i1.560
One of the most familiar persistent identifiers is the DOI (Digital Object Identifier).
This looks a lot like the URL in our last example, but what’s going on behind the scenes is a little different. Your browser is still contacting a server (in this case, doi.org) and requesting information, but the information isn’t the file itself. Instead, it’s a metadata record containing information about the file and, most importantly, the file’s location. When you click this link, a piece of software on the server called a “resolver” looks at the metadata record, pulls that location, and provides it to your browser, which then sends you to the TL website to read “Using the Anti-racism Digital Library and Thesaurus to Understand Information Access, Authority, Value and Privilege.”
That article is very timely, but so is the DOI itself, because TL is moving this summer. Right now, that PDF resides in a folder on a server called theolib.atla.com. By October, it will live in a folder on serials.atla.com/theolib. If you had just a URL, the link would become useless in a couple of months. When we move the article, though, we will ask our friends at our DOI registry, Crossref, to update the metadata in the DOI record to contain the new location, so the DOI I just gave you will still bring you to that same article every time you click it. As long as the publisher of a particular item works with their registrar to keep their metadata updated, the same link can work through server moves, changes to folder structures, renamings, etc.
DOIs are some of the most common PIDs, but they are not the only kind. In fact, DOIs themselves are just one implementation of a simpler standard called the Handle System, which can be found in many repositories. There are also many different registrars for PIDs, just as there are for URLs. Finally, there are many different uses for PIDs, going well beyond simply having a stable link.
Other Uses for PIDs
At Atla Open Press, we believe in open access, and so some of our favorite PID uses are for making OA resources easier to find. URLs are location-dependent, so if the same article is hosted on six different servers, it will have six different URLs. Persistent links based on metadata records, however, don’t have that limitation; the same record can hold multiple URLs all referenced to the same identifier. This means that the DOI record can resolve to a different URL if one becomes a dead link, but it also means that different resolvers can prioritize different URLs. This is how Unpaywall works. It can seem like magic when the plugin lights up to tell you there is an OA version elsewhere of the article you are looking at, but it’s really just a DOI resolver that prioritizes resolving the DOI to a URL at an OA source.
At Atla Open Press, we believe in open access, and so some of our favorite PID uses are for making OA resources easier to find.
Routing traffic through a single identifier, rather than linking directly to different URLs, is also useful for metrics, enabling more comprehensive counts of the number of visits, downloads, and citations an item receives. Research Data Netherlands has a short video explaining how persistent identifiers underpin contemporary metrics.
Many innovative practices take advantage of the versatility of PIDs. The OECD assigns DOIs to Excel spreadsheets containing the datasets for each chart and graph in their publications, and then includes those DOIs alongside the published visualizations. The theology journal CURSOR moderates public comments on articles and assigns DOIs to the most thoughtful and well-researched. This makes it easier to cite comments and, in the process, helps to legitimize public comments as a form of scholarship. PIDs thus not only make traditional scholarship easier to find, but can also shift perceptions about what constitutes an enduring contribution to the literature.
PIDs for People
To recognize scholarship, however, we also have to be able to find scholars persistently. Researchers with common names can be difficult to locate, and many scholars’ names change between publications (even something as simple as including or leaving out a middle initial can skew search results). The result is very similar to the result of changing the file name of an article’s PDF — someone goes looking for a resource and comes up empty-handed.
PIDs for people are therefore growing in popularity. One of the most common is the ORCID — a unique, resolvable sixteen-digit identifier that many journals now accept as part of submission metadata. This (together with DOIs for articles) enables automated systems to make semantic connections between authors and their research and to aggregate bibliographies of researchers’ activity despite differing name forms or similarity with other names. Just as DOIs are not the only object identifiers, ORCIDs are not the only personal identifiers. Many metrics services, for instance, have implemented their own, like the ResearcherID used by Web of Science or the Scopus Author ID.
PIDs for You
Persistent identifiers aren’t a universal feature of the research landscape yet, but their importance is growing. Here are a few ways to put them to work for you:
- Always see if a DOI or other PID is available for a source you are citing and, if it is, use it. This protects your publications from “link rot,” ensuring that readers can access your references long into the future.
- As a publisher, consider assigning PIDs to your outputs and, as an author, check if a journal or press you’re considering publishing with will assign a PID to your work. In both cases, this encourages the use of your publication as a source (many publishers and projects now prefer references that can be linked with a PID) and can improve your metrics.
- As a researcher, consider registering an ORCID or other PID for yourself. These are free in most cases and easy to sign up for, and they make you and your work much more visible online.
- This interview with Anna Tolwinska from Crossref explains how to get started with DOIs at a journal, providing insight into the process for registering and acquiring them.
- A good breakdown of the differences between DOIs and the Handle system they are built on (and which is often used independently) is available at doi.org.
- R-Bloggers offers advice on getting a DOI for your computer code.
- In Code4Lib, Lukas Koster provides a good summary of persistent identifiers for heritage objects, including ARKs and other systems designed for archives, as well as guidance on making appropriate choices between them.
- Tabish Virani has compiled a useful annotated bibliography of resources on PIDs, with something for readers of every level, from first introductions to advanced treatments.
The SCOOP, Scholarly COmmunication and Open Publishing, is a monthly column published to inform Atla members of recent developments, new resources, or interesting stories from the realm of scholarly communication and open access publishing.
Enjoying the Atla Blog?
Subscribe to receive email alerts of new blog posts of a specific type. Members, subscribers, publishers, or anyone interested in the study of religion & theology are welcome to sign up to one or all alerts to keep up to date with the Atla community. If you or your institution are a member, the Atla Newsletter delivers a monthly curated email of top posts to your email inbox.