October 3, 2017
Shlomo Sanders, CTO, Ex Libris
This article is an update of the March 2015 blog “Linked Library Data: Making It Happen.” Indeed, a lot has happened since then.
One of the things that has not changed since then is that libraries still rely heavily on integrated systems for managing and delivering library services. These systems encompass a wide range of services and have multiple components empowered by a central catalog, whose records are still usually in MARC 21 format. Yes, MARC has not gone away in the last 2 years. I think it is safe to say that the consensus is that any linked data library must continue to support MARC for many years to come and at the same time endeavor to support new and improved workflows that incorporate linked data.
The Ex Libris Alma® library management system now has over 1,000 signed institutions. This is a significant number and has the potential to create a large, uniform, linked data library community. As an example, Alma will soon support publishing MARC in BIBFRAME format. Immediately upon release, close to one billion bibliographic records worldwide can be available as BIBFRAME. All a customer needs to do is turn on the BIBFRAME publishing feature with a click on a single checkbox.
Alma employs a simplified record format for internal operations but must still interact with the library world and the vendors that use MARC formats and, occasionally, Dublin Core metadata. As we will see below, the roadmap includes support for natively cataloging in BIBFRAME format and for loading BIBFRAME format in the same way that it supports MARC and Dublin Core.
To simplify access by patrons, many institutions provide a discovery system that offers a unified view of all their institutional data, whether it resides in the primary catalog or in one of the institutional repositories. Ex Libris Primo®, for example, aggregates multiple sources into a common discoverable repository. From the Primo perspective it makes no difference what the original metadata format was. MARC is the same as Dublin Core which is the same as BIBFRAME for discovery purposes. If Primo VE is used, then bibliographic information in any format is discoverable within moments of being cataloged in Alma. Again, there is no difference in discoverability no matter the metadata format that is used.
Another thing that has not changed is that the non-library world still does not care which system the data resides in and cannot process MARC—and probably will never be able to. Linked data standards hold the promise of making a two-way interchange of data possible between library systems and non-library systems on an as-needed and real-time basis.
Making Library Data Accessible
Making library data accessible to the world requires information from all relevant information sources to be available to users of the library. We refer to a discovery system as unified if it combines data from multiple sources of library information, such as a library’s catalog and an institutional repository. A discovery system can make the following forms of information available as linked data: titles; URIs referring to authoritative authors and subjects, publication locations and languages; publishers; descriptions; and availability information to help users access and borrow materials.
The richness of the available metadata depends wholly on the data that the discovery interface displays to its users, which usually includes the most important information that non-library users are likely to require. This information is also what non-library applications need in order to make use of a library’s descriptive data. An additional advantage of the discovery-system approach is that such a system is designed to be accessible by both people and computers in the world at large, and not just by local institutional users.
Linked data is data that is “published on the Web in such a way that it is machine-readable, its meaning is explicitly defined, it is linked to other external data sets, and can in turn be linked to from external data sets.” Built on standard web technologies such as HTTP and URIs, linked data can be read not only by humans but also by computers.
The linked data infrastructure lends itself to the development of numerous types of user services. In their research, patrons access a wide variety of data sources; through linked data, patrons are presented with enriched data in the appropriate context regardless of the interface in which they are conducting their search. In addition, linked data can be exploited to enrich the library catalog, which other applications can use to enrich their data.
The Bibliographic Framework (BIBFRAME) Initiative is a Library of Congress project for defining a bibliographic data model. Based on linked data principles, BIBFRAME has been designed to replace the MARC standards and to make bibliographic data more useful both within and outside the library community.
BIBFRAME is expressed in Resource Description Framework (RDF) format, which is based on the concept of making statements about resources (particularly web resources) in the form of subject-predicate-object expressions. These expressions are referred to as triples.
BIBFRAME 2.0 is the latest version made available by the Library of Congress. Ex Libris has decided to initiate development of BIBFRAME support based on this version. This does not mean that BIBFRAME 2.0 is perfect but it is obviously much more mature than the first version and has been deemed mature enough to start the long road to adoption by libraries. I suspect that another major version may be needed before BIBFRAME can indeed be the primary metadata format used by libraries. In any case, until then much work needs to be done and is being done by Ex Libris towards that goal.
More information on Ex Libris and BIBFRAME can be found on our Developer Network.
Ex Libris and Linked Data
As a vendor that is deeply engaged with the global library community and benefits from collaborative and forward-thinking customers and user groups, Ex Libris is at the forefront of discussions about linked data and is leading the way in developing linked-data functionality and services and enhancing the use of linked data in discovery and resource management systems.
The combination of the Ex Libris Alma® resource management service and Primo® discovery solution enables Ex Libris to leverage the power of linked data to the benefit of libraries and end users and to support end-to-end services that are based on and can be enriched by linked data. The merging of services supplied by Primo VE with data supplied by Alma empowers discovery system users as well as library staff with new and exciting possibilities, including richer metadata, enhanced workflows for technical services, improved search results, new ways to explore content, and more. In addition, third-party tools supporting linked data will consume linked data supplied by Alma, and Primo will supply services that are not based on Alma.
As a rule, we endeavor to support three linked data formats: BIBFRAME, RDA/RDF, and JDON-LD. More information pertaining to Ex Libris and linked data can be found on our Developer Network.
Key Elements of Linked Data for Ex Libris Roadmaps
The following elements related to linked data have helped shape the roadmap of the Alma resource management solution.
- URI support for cataloging and technical services: identifying “things” based on URIs instead of simple identifiers. Where possible, Alma does auto-enrichment of URI in published MARC records. First release complete, pending updates after public comment.
- Support for the BIBFRAME model and ontology as they mature. This step is in process and is being executed in collaboration with a leading institution.
- Technical services supporting native cataloging of new resources in BIBFRAME. We expect it to take more than a year to create a mature, fully functional, high quality and user friendly BIBFRAME cataloging interface. This process will begin with our next roadmap.
- Loading of BIBFRAME bibliographic records
- Access to linked data to enrich data displayed to staff in routine workflows
The following principles have helped shape the roadmap of the Primo and Summon discovery and delivery solutions:
- Discovery of the underlying metadata and access to it via URIs
- The use of linked data by non-library applications
- The discovery system as the key interface to make data accessible to people and computers
- The use of RESTful APIs to provide support for applications based on linked data
- Improving discoverability by general search platforms (i.e. Google) by embedding schema.org in discovery pages – This process is included in our next roadmap.
Status of Ex Libris Linked Data Projects
Ex Libris is involved in multiple linked data projects, including the Europeana cultural portal and the European Digital Library project. Experience has revealed the following challenges:
- On-the-fly linking of triples in distributed data stores is rather slow and hinders sophisticated discovery. Harvesting is necessary to enable a search engine to use the triples.
- Keeping RDF triples up-to-date in a central index is problematic. Maintaining triples is a matter of scale, and even medium-size institutions cannot surmount the problems. Multiple institutions have concurred with this conclusion.
- Most of the current metadata sources do not provide RDF triples, and the ontology is not standardized. The metadata has to undergo conversion.
Alma supports a wide variety of RESTful web services, such as services for the retrieval of bibliographic records, holding records, and purchase orders. Retrieved data may be in either XML or JSON format. The RESTful nature of these web services means that the Alma responses include URIs of related entities.
Recognizing the importance of up-to-date URIs that are part of BIB records, as well as the large number of linking-based services that can be provided through such URIs, Ex Libris has released a RESTful API in Alma for retrieving any record in a library’s catalog in JSON-LD linked-data format.
When this API is used, links will be created as embedded URIs or will be based on existing IDs that can be processed to generate full URIs. Alma makes as much use as possible of existing data sources and APIs to generate full URIs. Third-party applications and databases for which URIs are created include: Library of Congress subject and name headings, MeSH, GND, VIAF and links to Wikidata.
As a rule, Alma endeavors to support automatic URI creation dependent on two conditions:
- A template-based translation of an identifier to a URI must be possible
- URIs can be added only to MARC fields that officially support them. Today this is in most cases subfield $0, but this is changing; in the future some will be in $0 and others in $1. Unfortunately, at this time there are identifiers in MARC that do not have official support for URIs in subfield $0 or $1.
More information pertaining to MARC enrichment can be found on our Developer Network.
Primo supports a variety of RESTful web services for generating searches, retrieving full records, and retrieving patrons’ e-shelf contents. The Primo APIs include embedded URIs for Primo records and patrons’ e-shelf contents. The Primo web service responses are in JSON-LD format, containing URIs pointing to records; any application that consumes linked data can embed these URIs to create valuable links to bibliographic records indexed in Primo.
The inclusion of URIs and JSON-LD–formatted data in the returned results supports the streamlined consumption of Primo data already in the form of linked data.
With this search API and URIs that return full metadata, more than two billion metadata records that reside in over a thousand institutions using Primo worldwide and in the Ex Libris Primo Central Index are now available. The Primo Central Index enables discovery of over a billion articles, e-books, and other types of content from a multitude of vendors. However, not all metadata in Primo Central is available via the URI because of vendor-imposed restrictions. The Primo URI provides access to metadata that a library has not defined as search-restricted. Similarly, Primo Central URIs give access to metadata on which vendors have not imposed copyright restrictions. All records that are defined as open access (for example, institutional repositories that universities upload to Primo Central) are available through the Primo URI, as well as vendor metadata (in keeping with institutional licensing policies).
Making Linked Data Richer
One can easily envision end-to-end support for URIs in the Alma and Primo metadata ingest and cataloging processes. The option to incorporate such URIs would then be available in discovery services and in the linked data provided by Primo. Indeed, for authoritative URIs to achieve a high degree of accuracy, the metadata maintenance module (that is, cataloging) must take linked data into account and make persistent keys or URIs available for downstream use. As of today, URIs auto-created in $0 flow down this very path to Primo for indexing.
Ex Libris Alma and Primo SaaS deployments live in highly scalable multitenant environments. These SaaS environments proxy incoming RESTful API calls through an API gateway that serves a dual purpose. First, by providing a “Try It Now” button, the gateway enables any developer to obtain easy access to documentation and an API test harness, thereby dramatically reducing the time to first “hello world.” The second purpose of the gateway is to act as a run-time proxy so that unusual scenarios will not inadvertently lead to a denial of service. The proxy also ensures that an incoming URI will be automatically routed to the correct repository, thus facilitating the work of developers and keeping persistent URIs persistent despite the operational needs of live systems.
The world is just beginning to generate and use linked data. Leveraging library discovery systems to help advance the growth of linked data seems to be the more pragmatic solution. In one fell swoop, Ex Libris is making library data available as linked data from Alma and Primo, with a consistent JSON-LD context; in Alma all MARC data can soon be published as BIBFRAME and accessed using URIs. Furthermore, because the products are SaaS and have frequent update cycles, all users of Alma and Primo SaaS linked data will benefit immediately as the linked support deepens.