1. What’s this all about and why should we consider doing it?
1.1 Background and framework
Adopting a linked-data approach will enhance the sustainability of our organisation by improving efficiency in the development of our software and the production of our Reviews, and enhancing visibility and flexibility of our content.
This change is not seen as an abrupt replacement of current structures and development processes, but a gradual, planned evolution that maintains the considerable strengths of what Cochrane’s software teams have already developed, but adapts their processes and Cochrane data to this new approach.
1.2 Why consider a change?
Our current approach to creating and presenting our content has served us well, but a number of factors in today's environment suggest that it is time for a change.
Cochrane is committed to developing open access for our reviews. This will require us to rapidly explore new models for delivery of Cochrane content - with the possibility that the same content will be used in multiple ways for and by different users. Our software development processes will need to be nimble in order to make our content nimble.
The Cochrane Author Experience
Authoring and editing of Cochrane Reviews can be difficult and time intensive. As Cochrane adds new review types, methodological standards and increasingly complex analyses, we need to find ways to make our processes easier and more fluid. The Auckland Colloquium included a number of presentations of new technologies that might help - software screening of citations to identify candidates for included studies, crowdsourcing of some parts of the review process, and software developed outside of the Cochrane IT process to facilitate collaborative review production by author teams. It will be important to structure Cochrane data in a way that allows us to more easily incorporate these new technologies.
Initiatives such as AllTrials and the Systematic Review Data Repository are part of a broader societal push for increasing open access not only to research reports but also to research data. Cochrane review production software needs to be able to take advantage of such data repositories and to provide data to them when appropriate.
1.3 Brief background on Linked Data and Semantic Web
“Cochrane needs to get better at talking to machines.” – Ben Goldacre, UK Symposium, 2013
In brief, Linked data can be thought of as supporting the creation of a ’neuronetwork for computers’ by creating links or ’tags’ (by computer coding) to give the web and computers ’intelligence’ on where to identify or how to recognise data. Ultimately this can mean that the data is much more flexible as it allows for the possibility of creating information we haven’t yet recognised we need. This has the potential to add enormous value and decrease the need for specific programming as part of large projects.
The goal of the linked data approach is to make finding, sharing, and combining information easier. In the decades between its conception and its realisation there have been many alternative attempts to achieve that goal outside the linked data paradigm using a variety of tools and techniques, but none of these has fully succeeded. The problem has always been with data structures any - without knowing how other people’s data are structured you can’t begin to make your data compatible or to create new structures or products that combine data from multiple providers.
A case in point (with direct parallels to Cochrane data) is Digital Resource Management (DRM) software used by libraries to catalogue their holdings. Almost since the start of the Internet, libraries have looked for ways to make their catalogues searchable online, and a plethora of DRM software has been written over the years that tried to address that aim. Today there are a handful of DRM packages in use by major academic institutions and government libraries which until recently have been largely incompatible with each other. The consequence was that to find an item within a library was relatively easy, but to search across library catalogues was largely impossible. In the last few years all of main DRM packages have adopted a linked data approach to storage and retrieval of their catalogues with the result that now most library catalogues can be accessed and interrogated by machines, and links between catalogues can be followed quickly and automatically. The first consequence of that is that searchers find more easily the items they are looking for in diverse repositories. More important though, searchers can find things they may not have found before because they can interrogate catalogues by the knowledge they contain rather than by the simple keyword search of legacy systems. DRM software that does not comply with linked data standards is rapidly becoming obsolete.
1.4 Building on what we have
Much of our past history and our current approaches to review production put Cochrane in a good position to move to a linked data approach. Our "journal" has been structured as a database from its inception. While most of the linked data world struggles with ways to convert text documents into more machine-friendly formats, our Reviews are already in a structured XML format and housed within a data repository that handles versioning and other complexities of the current authoring processes. In addition, the structure of the Cochrane Register of Studies (CRS) and its linkages with the Cochrane Database of Systematic Reviews (CDSR) have given us a model of the literature that can appropriately incorporate trial reports, studies and Reviews, and provides us with a unique ability to navigate between them. The move to linked data would build on these strengths and link them to relevant systems and data repositories developed by others.
The limited moves we have made to explore linked data to date have led to enthusiastic expressions of interest from a number of potential collaborators who recognise the significant strengths of the Collaboration and our major potential role if we adopt this approach. To date, we have been asked to collaborate in:
- a grant proposal to the US National Library of Medicine on linked data approaches to drugs and drug interactions;
- a consortium with representatives from GIN, clinicaltrials.gov, GRADE/DECIDE and others to discuss ways to structure our data so that linkages could be made between trials, trial reports, Cochrane Reviews, guidelines and electronic health records;
- a half day discussion at a major semantic web conference of systematic reviews and their implications for restructuring of the article in a linked data world.