The CrossRef Metadata API works across all publishers regardless of their business model (open access, subscription, combination). It makes use of CrossRef DOI content negotiation to provide researchers with links to the the full text of content located on the publisher’s site. The publisher remains responsible for actually delivering the full text of the content requested. Thus, open access publishers can simply deliver the requested content while subscription based publishers continue to control access using their existing access control systems. In both cases publishers will be able to use their existing site statistics packages (e.g. COUNTER) to measure use of content accessed by TDM tools using the API.
There is only one step that is required of all publishers wishing to participate in CrossRef TDM: the registration of TDM-specific metadata. Publishers who are concerned about the impact of automated TDM harvesters on their site performance may optionally want to implement Standard Rate Limiting Headers.
Step One (and possibly the only step): Depositing Additional Metadata
There are two additional metadata elements that publishers will need to deposit to support TDM via CrossRef. These are
- Full Text URIs: One or more URIs that point to full text representations of the content identified by your CrossRef DOIs.
- License URIs: One or more URIs pointing at licenses that govern how the full text content can be used.
If you are an open access publisher or if your existing subscription licenses already allow TDM of subscribed full text, then the registration of the above metadata deposit is the ONLY thing you need to do in order to enable TDM of your content via the CrossRef Metadata API.
Step Two (optional): Implement Rate Limiting Headers
Many publisher platforms are designed and scaled to handle typical interactive browsing and downloading behaviour. The process of bulk-downloading full text for TDM purposes could potentially put a major strain on servers that are not architected to handle automated processes. For publishers who are concerned about the potential performance implications of bulk downloading, we have defined a set of standard HTTP headers that can be used by servers to convey rate-limiting information to automated TDM tools. Well-behaved TDM tools can simply look for these headers when they query publisher sites in order to understand how best to adjust their behaviour so as not to effect the performance of the site. The headers allow a publisher to define a “rate limit window”- which is basically a time span (e.g. a minute, and hour, a day). The publisher can then specify:
|Header Name||Example Value||Explanation|
|CR-Prospect-Rate-Limit||1500||Maximum number of full text downloads that are allowed to be performed in the defined rate limit window|
|CR-Prospect-Rate-Limit-Remaining||76||Number of downloads left for the current rate limit window|
|CR-Prospect-Rate-Limit-Reset||1378072800||Remaining time (in UTC epoch seconds) before the rate limit resets and a new rate limit window is started|
It will be entirely up to the publisher to implement rate limiting should they require it. It will also be up to the publisher to define a rate limit that is appropriate for their servers. CrossRef will play no role in enforcing or providing this rate limiting, the guidelines above simply define the set of standard headers that should be used by servers implementing rate limiting so that TDM tools can use a common mechanism for adjusting behaviour for sites that may otherwise struggle serving bulk requests for full text downloads.
CrossRef has created an example publisher, Tinypub site that implements the CrossRef Metadata API, including rate limiting and IP-based subscription access. The code for the example site can be downloaded from Github for reference. Please note that this code is only meant to illustrate the workings of the system. It is not in any way intended for production.
Error Messages from the CrossRef Metadata API
Reporting error messages to TDM tools