Providing Full Text links to text and data mining tools
The CrossRef Metadata API makes use of CrossRef DOI content negotiation to provide a common mechanism for text and data mining users to locate the full text of articles on the publisher’s site. The publisher is still responsible for delivering the content to the user, and as such, any existing access control (if applicable) and usage statistics data can be applied to content accessed using the DOI content negotiation mechanism.
Normally, when a user “clicks” on a DOI link or types a DOI URI into a browser, the browser sends a signal to the web server telling it that they wants the content returned in “HTML” for display in the browser. Hence, when a user accesses a DOI using a browser, they are shown the publisher’s landing page.
With “content negotiation,” a user can write programs that specify that, instead of returning a human-readable “HTML” representation of the landing page, the server should return a machine-readable representation of the data connected with the DOI.
To support text and data mining, publishers need to update their metadata to include full text URI(s) for each piece of content with a DOI. Anybody using the CrossRef Metadata API to query a CrossRef DOI will be able to retrieve these URLs and follow them directly to the full text. Publishers who want to be able to support multiple “representations” of the full text of the article will be able to do so. So, for instance, they could support the delivery of either PDF, XML or HTML, or just one of these formats.
There are two mechanisms that publishers can use to register links to content full text. The first is for publishers (probably most publishers as of 2013) who do not support content negotiation on their platforms. The second is for publishers who natively support content negotiation on their own platforms.
Method 1: Publisher provides specific URIs for each mime-type they support
The following section of CrossRef deposit XML for the DOI
10.5555/515151 illustrates how to specify separate full text URIs for separate mime types. This mechanism should be used by publishers who do not support content negotiation on their platforms.
In the above case, the following content negotiation request (using curl) on the DOI
curl -L -iH "Accept: text/turtle" http://dx.doi.org/10.5555/515151
Would return the following in the HTTP LINK header:
Link: <http://data.crossref.org/fulltext/10.5555%2F515151>; rel="http://id.crossref.org/schema/fulltext"; type="application/pdf"; anchor="http://annalsofpsychoceramics.labs.crossref.org/fulltext/10.5555/515151.pdf", <http://data.crossref.org/fulltext/10.5555%2F515151>; rel="http://id.crossref.org/schema/fulltext"; type="application/xml"; anchor="http://annalsofpsychoceramics.labs.crossref.org/fulltext/10.5555/515151.xml"
Which would, in turn, direct the requestor to the appropriate URIs for whatever full text representations are supported.
Method 2: Publisher provides a URI which points to content negotiation resource
The following section of CrossRef deposit XML illustrates how to specify a single URI end-point where the publisher platform can handle content negotiation. This should only be used by publishers who support content negotiation on their platforms.
In the above case the CrossRef the following curl HTTP GET request:
curl -L -iH "Accept: text/turtle" http://dx.doi.org/10.5555/525252
Will return the following in the HTTP link header.
Link: <http://data.crossref.org/fulltext/10.5555%2F525252>; rel="http://id.crossref.org/schema/fulltext"; anchor="http://annalsofpsychoceramics.labs.crossref.org/fulltext/10.5555/525252"