Friday, 19 December 2008

ArXiv Plugin

We have developed an Eprints ArXiv import plugin based on the PubMedID and PubMedXML plugins. This plugin imports the following metadata from ArXiv: item type, article title, abstract, authors’ names, journal title, volume, issue, page number(s), date and DOI. Furthermore it also extracts links to the full text PDF and the ArXiv abstract page (entered in the Official links field). The data is pulled from ArXiv using the ArXiv API.

How does it work?
Initially the author/article needs to be identified uniquely using the arXiv ID. The arXiv API ( is used to retrieve an XML file containing all the metadata associated with that ID number. This is done with and the retrieved XML file is parsed with The most difficult aspect of the writing this plugin was parsing the “journal_ref” field from the arXiv xml file. This is because it is a free text field containing the journal reference information which is not in a consistent format. To overcome this we installed Biblio::Citation::Parser::Jiao, which extracts the journal title, volume number, issue number, date and the ISSN from a given journal reference. Biblio::Citation::Parser::Jiao can also be installed from CPAN.

PubMed Bug Fix
The PubMedID import plugin was designed for importing multiple IDs however this did not appear to be working and only one item could be imported at a time. This was due to a bug in the code which did not remove the html encoded end of line characters. After removing the end of line characters, the PubMed import plugin works well and can import multiple IDs either from a file or from the text box.

Monday, 8 December 2008

WRRO Upgrade Complete

White Rose Research Online and its sister repository White Rose Etheses Online are now running on EPrints 3.1.1. This upgrade offers some useful new "back end" functions which will improve management of material being deposited and material already in the repositories. We hope to complement the upgrade with the introduction of improved statistical tools. Watch this space.

Friday, 7 November 2008

New IncReASe website launched today

We've just launched the redesigned IncReASe website at:

The aim of the redesign was to improve the navigation so that it's easier to find your way through the outputs from the project. Hopefully it also looks nicer too! There's some new information available about the different strands of the project. More will be added in the next few weeks, so please do check back if you're interested. There will also be more outputs added, including interview summaries and the questionnaire report.

Wednesday, 15 October 2008

Repositories and local adminsitrative processes

One of the most important challenges we are facing as a repository is embedding our repository within other institutional systems. This is made a little more tricky as we are a consortium of three with a single installation of software at one partner site. Should deposit be invisible to the depositor? Should our capture processes be so subtle and moved so far towards the author that, effectively, they need to take no additional action for the work to be captured and deposited? Moving towards "desktop deposit" of "desktop capture" may indeed be a desirable goal. This may be particularly key if the process of metadata creation becomes detatched from the capture of the relevant version of the file. For example, we are now seeing an emphasis on bulk capture and import of metadata. University of Leeds has recently purchased the Symplectic Publication Management System which merrily trawls PMC and Web of Knowledge for Leeds publications, pulls in the metadata and uses this to populate a University publication database. We are told that we'll also be able to take a feed of this metadata into the repository and make this openly available. Great!

The potential drawback here, though, is that the emphasis on post publication, post indexing metadata capture means a long (potentially very long) delay between work being accepted for publication and metadata appearing in the local publication database and institutional repository. Asked when they would like to deposit their work, responses from White Rose authors varied but the most popular deposit point was "at or after publication". There is a danger that collection of metadata and compilation of publication lists is stongly associated in the researchers' minds with administrative and accounting processes. There is a danger that collection metadata and publications within repositories is also seen as a largely bureacratic, adminsitrative process. This, in turn, leads to the potential danger of seeing deposit as "a summer job" or some kind of periodic, mopping up process.

Clearly, if we are to capture the research we aim to, we need to have efficient, relatively pain free processes to capture research at the relevant point. This is likely to be at the point it is accepted for publication. This means that the process is less "tidy" because we won't, at that point, have complete publication metadata for the work. We need to consider what key metadata we can capture automatically and what information needs to be supplied by the author.

If the repository is effectively "invisible" to the author and there is no explicit process of deposit we need to think carefully about what this means for advocacy. Authors should be aware of the key aims of capture - not for bean counting but to foster dissemination, visibility and impact, reuse of research.

White Rose Research Online will continue to work towards better embedding within local systems. An initial step will be the proposed linkage between WRRO and Symplectic in Leeds but we will also be keeping a close eye on developments at Sheffield and York - in particular, any systems which are put in place to handle REF submission.

Thursday, 11 September 2008

Understanding Organisational Cultures Workshop

This workshop organised by the repository team at Cranfield University provided some really interesting insights into the realities of working with repositories.

The first half of the morning was taken up with 2 sessions from the researcher perspective. Dr Colin Macduff from The Robert Gordon University presented as the Convert. He described his PhD experience and specifically how he made his thesis available electronically. He suggested it was time to reconceptualise the thesis as an electronic entity. This could open up the possibilities for new types of PhD submission. Colin had linked to his PhD from his web page and also set up an online evaluation. His thesis has been downloaded 1400 times in about a year. He felt this access to his PhD had also helped him gain research grants.

Dr Bruce Jefferson from Cranfield was positioned as the sceptic. He gave a very lively presentation about the need for evidence to support claims that repositories increase citations. He ended his presentation with an outline of his plans to track the citation rates of his papers by depositing some in CERES (Cranfield's repository) and not others. He will then compare the citation rates. He also plans to contact people who have cited papers in CERES and ask if they accessed the paper via CERES or via another route. The results of Bruce's test could be very interesting, despite the small scale.

Neil Jacobs from JISC gave the final presentation of the morning with an outline of the national picture for repositories. He argued Open Access could enable new developments in research but that it was necessary to reduce legal worries and the need for data entry.

A breakout session then followed with 3 groups looking at cultural influences, identifying barriers and impact of RAE/REF. This was followed by a very nice lunch!

The first session of the afternoon was a very upbeat presentation about mandates by Michael White from the University of Stirling. He gave the background to the introduction of Stirling's etheses and eprints mandates. He outlined the support that is being provided to ensure the mandate is successful. High level advocacy from the Vice Principal has been hugely significant. Departments are being made responsible for ensuring compliance and so each has a rep to assist with this. They are continuing to look at ways to help make compliance as easy as possible, including bulk upload.

John Harrington then talked about how Cranfield have been re-assessing their repository, CERES. They looked at the advocacy and felt a more concerted approach was needed. They recently renamed the repository and launched a new publicity campaign. It is hoped this campaign will address most of the misconceptions about depositing in an institutional repository. They have also been talking directly with academics and hope to continue this face to face effort.

The final presentation of the day was by William Nixon from the University of Glasgow. He looked at the ways in which they have had to change and adapt the repositories at Glasgow to address the needs of the university. Enlighten, the repository for academic papers, has been positioned as central to the university's publications policy. It is also seen as key to REF with Glasgow being one of the 22 pilot institutions.

These presentations were followed up with another breakout session, looking at mandates, advocacy and responding to inevitable change. I was part of the group discussing mandates and this provided some really interesting points. A representative from the research councils asked if they should be enforcing their mandates more strictly. This lead to a discussion about how mandates could be enforced and whether penalties (such as financial) were the best approach. It was suggested that incentives might be a better approach. One approach would be for institutions to tie their repositories into the appraisal and promotion process to ensure academics deposit their papers. It was felt that without some form of sanction / incentive then mandates were unlikely to be successful.

The day ended with a general question and answer session, though there wasn’t much time left for significant discussion. Overall it was a very useful day, and provided some thought provoking discussions of the current issues institutional repositories are facing.

Understanding Organisational Cultures Workshop

Monday, 8 September 2008

DRIVER workshop Tilburg, Netherlands

The "Digital Repository Infrastructure Vision for European Research" (DRIVER) project is funded by the European Commission under the "Research Infrastructure" unit. The aim of the project is to enable open access of scientific publications and research data through a network of European institutional repositories. Although DRIVER is a European project, collaboration is sought with international repository providers from USA, Latin America, Asia and Africa.

Norbert Lassou (Scientific coordinator of the project) talked about the DRIVER community’s need to agree to fundamental principal such as:
i) Make research publications open access.
ii) Follow the guidelines (DRIVER) to make services interoperable.
iii) The need to be committed to ensure long term access.
iv) Join the DRIVER community (open to anyone) to become part of the repository network.

He emphasised that the main aim of DRIVER was not just to build a network infrastructure but to promote open access of research data. Their goal is to support open access publishing (journal and book), new types of citation & impact measuring services and to develop and provide long-term archiving services for repositories. They support three user groups: repository managers, service providers and researchers as well as the general public. They claim one of the benefits of joining will be that funding and research organisations will be able to build their own interface on the DRIVER data index as well as sharing developments by service providers. Furthermore they aim to develop automated article transfer from the publisher to institutional repositories (European PEER project) and metadata transfer workflows from article databases into institutional repositories (e.g. PubMed > Fact Science research database)

In order to become harvestable by DRIVER your repository has to be compliant with DRIVER guidelines. (The networking of repositories is done using OAI-PMH protocol although this has not been working very well however it is too widespread to drop it completely.) The DRIVER guideline is an ongoing development. A new version of the guideline is currently under development. DRIVER has developed a tool (Validator) to check the contents of repositories (full text) for the quality of metadata. There were technical glitches with this software that has been sorted now. The Validator tool was used (to validate the content of repositories) on 100 repositories and no repository was found to be 100% compliant. However, only a few were non-harvestable. They now allow for ~5% error margin.

The DRIVER team has also developed D-Net Software (DRIVER Network-Evolution-Toolkit) which is an open source toolkit for re-use by repository networks. Use of this software ensures interoperability but it is not considered necessary for each institution to have this software. This software requires a lot of resource and hardware and therefore the cost is high. They would only recommend this to national consortiums or large repositories that support several institutions.

The other talks were more focused on the technical side, providing details of how the network of repositories is achieved; data is aggregated and indexed to be used for searching, browsing and profiling by users. They spoke about the “European Information Space” that is maintained by the DRIVER consortium [].

Mentor scheme (mediation)
Soon to be launched, this project aims to mediate networking between repositories staff/managers. If a particular individual is experiencing problems with the different aspects of repositories or are starting to collect a different type of material and they have no prior experience then they would contact DRIVER and they should find a list of other people doing similar things.

Slides from the talk are available form the Tilburg website [

Wednesday, 27 August 2008

Repositories Exchange of Experience - Universities of York and Leeds

An interesting meeting took place on Friday between various library staff who work on "repositories" at the Universities of York and Leeds. University of York is developing a digital library - using Fedora with a Muradora front end - to manage a variety of digital content - including, for example, a collection of images from the History of Art Department (some already digitised, others which will need to be scanned). Leeds has a multimedia repository running on the Digitool platform - originally created as part of the MIDESS project and now continuing as LUDOS (Leeds University Digital Objects) . LUDOS collections include Medieval Manuscripts and images of Physics Instruments. It will also house longitudinal qualitative data from the ESRC funded Timescapes project.

There is significant overlap between the target content for the York Digital Library and LUDOS.

Both the LUDOS and York Digital Library act as a complement to Leeds' and York's VLE systems and to the shared (Leeds, Sheffield, York) repository for research outputs - White Rose Research Online (EPrints software).

Repository development can be a lonely business sometimes! It's good to get in touch with other people who are working through very similar issues. Some of the common ground identified at the meeting:
  • access control : for legal reasons - but also may wish to restrict access to the metadata of non OA items in some cases.
  • metadata - York is investigating VRA Core for images; Leeds decided against this standard, preferring MODS because it enables nested relationships. Potentially, much fruitful sharing of exerience to be had here.
  • data input - repository workflows need to be able to cater for expert and non-expert inputters
  • authentication - we need it!
  • relationships between digital objects
  • to customise or not to customise? - software may be flexible enough to allow lots of tailoring - but future migration should always be considered
  • ingestion of really big files! Should large files be referenced elsewhere e.g. vidoes might live on a streaming server.
  • multiple copies of files - if usage data is important, how to we aggregate statistics?
  • trust and creditials between repository systems - for sharing data
  • metasearch tools from OPAC - various tools were discussed. York and Leeds have similar interests - it would be very useful to share data on this.
  • how comprehensive should metasearching be?
  • embedded (human!) behaviour - how to change it?
  • desktop deposit - what will it look like and is mediation always necessary?
We also had a couple of brief presentations on SWORD - Simple Web-service Offering Repository Deposit- by Julie and John. We're looking at SWORD as a potential deposit mechanism for to populate both WRRO and ESRC's Social Sciences repository. See the IncReASe project. It would also be interesting to look into desktop deposit using SWORD e.g. as used in the experimental Microsoft eJournal Service.

York: Julie Allinson, Peri Stracchino, Ellizabeth Harbord, Yankui Feng, Matthew Herring, Lucy Jaques
Leeds: Jonathan Ainsworth, Michael Emly, John Salter
White Rose: Rachel Proudfoot, Beccy Shipman, John Salter, Lucy Jaques

This was a useful meeting - and raised lots of issues which there was not time to address in great detail on the day. It was agreed to have another exchange meeting in about 6 months' time.

Thursday, 14 August 2008

Access statistics 07-08

The latest access statistics from Google Analytics show continued excellent usage of the repository over the year 01/08/07 to 31/07/08. Most traffic continues to come from Google - predominantly from regular Google rather than Scholar. But there are interesting examples of high volumes of traffic from individual or departmental web site where these have been linked to papers in WRRO. The traffic breakdown is as follows:

Search Engines 81,332.00 (81.97%)
Referring Sites 12,736.00 (12.84%)
Direct Traffic 5,150.00 (5.19%)
Other 6 (0.01%)

WRRO had 88,464 individual visitors, making a total of 99,224 visits to the site and looking at 229,381 pages.

Most users were from the UK, followed by the USA, Canada, India and Germany. In total, we attracted visitors from 190 countries/territories . The three most regularly downloaded papers were:

Thompson, C. (2000) Clinical decision making in nursing: theoretical perspectives and their relevance to practice – a response to Jean Harbison. Journal of Advanced Nursing, 35 (1). pp. 134-137.

Dickinson, T.M. (2005) Symbols of protection : the significance of animal-ornamented shields in early Anglo-Saxon England. Medieval Archaeology, 49 (1). pp. 109-163.

Rodrigues, A.S.L., Andelman, S.J., Bakarr, M.I., Boitani, L., Brooks, T.M., Cowling, R.M., Fishpool, L.D.C., da Fonseca, G.A.B., Gaston, K.J., Hoffmann, M., Long, J.S., Marquet, P.A., Pilgrim, J.D., Pressey, R.L., Schipper, J., Sechrest, W., Stuart, S.N., Underhill, L.G., Waller, R.W., Watts, M.E.J. and Yan, X. (2004) Effectiveness of the global protected area network in representing species diversity. Nature, 428 (6983). pp. 640-643.

Sunday, 3 August 2008

WRRO and arXiv

It's well known that researchers who routinely use arXiv may be reluctant to post their work to a local repository. The work is already openly accessible through a well established, well used service so isn't posting to a local repository simply redundant extra work?

We're keen to see all disciplines and research outputs captured by White Rose Research Online but none of us wants to duplicate effort. To offer a meaningful service to the physicist, mathematicians, computer scientists etc. who use arXiv we need to allow a single deposit to feed more than one system. When depositing in WRRO, we'd like to offer an option to Depost in arXiv, whereby the metadata and any attached files are pushed into arXiv. With work underway on developing and using the SWORD protocol (Simple Web-service Offering Repository Deposit) this service development is moving closer week by week. (See the SWORD / arXiv case study by Simeon Warner). However, perhaps a more likely scenario is that arXiv users continue to deposit in arXiv but that we identify WRRO content and pull it back into WRRO from arXiv. Or perhaps depositors will be able to PUT works into WRRO from arXiv - perhaps using SWORD. There are some interesting developments coming from Microsoft which could allow desktop to repository deposit - so perhaps the longer term solution will be single or multiple repository deposit from within Word. (See Microsoft Research Unveils Free Software Tools to Help Scholars and Researchers Share Knowledge.)

We hope that before too long WRRO will have more attractive options to offer arXiv users than simply asking them to duplicate work by adding works to both arXiv and WRRO. We'd be very happy to discuss possibilities with any arXiv users from the White Rose Universities; in any case, we will be identifying and contacting arXiv users with WRRO content during the course of the IncReASe project with a view to importing content from arXiv and, perhaps, offering local WRRO to arXiv deposit.

Tuesday, 29 July 2008

IncReASe Questionnaire Findings / Report

This week I'm working on the questionnaire report. This will provide a more detailed discussion of the results of the questionnaire. The findings document we've already produced has been well received so far. The results of the mandates questions have sparked some particular interest. This reflects the seemingly high positive response to the question of compliance. Hopefully the report will help illustrate this is not such a clear indication that mandates would work. The figures in the findings document include both those who would comply willingly and reluctantly. There is also some variation between the institutions. At one institution only 60% of respondents would comply willingly with an institutional mandate. This suggests the introduction of a mandate there would be unpopular with a considerable number of academics.

Also this week, we are meeting to discuss the progress of our interviews with academics. We have carried out about 5 so far and need to think about who else we want to target. We will look at whether the current interview schedule is working, and how the interview data might be used to create some case studies.

Anyway, back to the report writing!

Tuesday, 8 July 2008

IncReASe Project Questionnaire Findings

Back in February / March this year we carried out an online questionnaire across the 3 White Rose institutions. We were interested in what people currently did with their publications and whether they put them up online. We also asked about awareness of WRRO, funder and institutional mandates and what services WRRO should be offering.

We offered a prize of £50 voucher or an iPod Shuffle. Here's a picture of Dr Tom Webb, the lucky winner, receiving his voucher.

We have produced a findings document which has just been sent out to those who responded to the questionnaire. Over the next couple of months we will be producing a report aimed at the repositories community.

Harvard Arts and Sciences Faculty recognized as newest SPARC Innovators

SPARC is the Scholarly Publishing and Research Coalition. Twice a year, SPARC names an "Innovator" - an individual or group - which has been "..working to challenge the status quo in scholarly communication for the benefit of researchers, libraries, universities, and the public." The latest SPARC innovator is the Harvard Arts and Sciences Faculty. There has been widespread coverage of the Faculty's decision to make research outputs openly available (a similar policy has also been adopted by Harvard's Law Faculty).

The Harvard researchers want to assert more control over the dissemination of their work and ensure wider access to their research than is the case for papers locked behind solely subscription-based access.

Some extracts from the agreed motion:
"..The Faculty of Arts and Sciences of Harvard University is committed to disseminating the
fruits of its research and scholarship as widely as possible. In keeping with that commitment,
the Faculty adopts the following policy: Each Faculty member grants to the President and
Fellows of Harvard College permission to make available his or her scholarly articles and to
exercise the copyright in those articles...
"To assist the University in distributing the articles, each Faculty member will provide an
electronic copy of the final version of the article at no charge to the appropriate representative
of the Provost’s Office in an appropriate format (such as PDF) specified by the Provost’s
Office. The Provost’s Office may make the article available to the public in an open-access

The full agreement is online at

The research is deposited in Harvard's open access repository. The equivalent system for Leeds, Sheffield and York is White Rose Research Online.

Tuesday, 1 July 2008

Repository Support Project Summer School 2008

I’ve just got back from the RSP summer school held at Thornton Manor in the Wirral. Well, nearly 10 days ago now I suppose. It was a very interesting couple of days, not least because of the fabulous venue and great food! In fact there must have been almost as much talk about food and Lord Leverhulme’s old house as there was about repositories.

RSP held a summer school last year that was aimed at new repositories and those just in the process of setting up. This year’s was for those with already existing repositories. A good range of topics were covered in the sessions including interoperability, copyright, advocacy, preservation and statistics.

A number of the break out sessions were extremely useful as they offered the chance to discuss the issues directly with other people doing the same / similar job. I know it’s been said many times before but it is fascinating how different institutions staff their repositories. There is such a variation in the number of staff employed, the amount of their time dedicated to the repository and also the other demands repository staff have on their time.

Perhaps one of the most interesting sessions for me was Niamh Brennan’s paper on advocacy. Niamh works on the repository at Trinity College Dublin and there they have integrated the repository with their Current Research Information System (CRIS). Academics must upload details of their publications into the CRIS, and they are now offered a button to add full text. Take up seems to have been very high, and now the repository has a lot of content to deal with. Much of the work around repositories has been about raising their profile and advocacy work. Perhaps the key is actually to make repositories invisible!

Monday, 23 June 2008

WRRO launch new blog

This is the new blog for White Rose Research Online (WRRO), the open access research outputs repository for the Universities of Leeds, Sheffield and York. We have been running a blog for the past year so if you want to see what we’ve been doing up to now please check it out. It won't be updated any more as we've relocated here.

Just a brief bit of background info - WRRO has been live for about 4 years now. It was developed as part of the Sherpa project and has enjoyed steady progress, now containing about 3000 records. Last year WRRO gained JISC funding for the IncReASe project. The aim of the project is to increase the content in the repository and develop services so it is easier to use and becomes part of depositors’ regular workflows.

As part of the IncReASe project we have undertaken a web survey of publications across the 3 university websites and an online questionnaire of academics’ existing depositing habits, knowledge about open access and funders’ OA policies. We are currently interviewing academics further about how they disseminate their research outputs, and what services they would like WRRO to offer. We are also working closely with the ESRC to develop our aim of “deposit once, use many times”. The aim is that ESRC grant holders will be able to deposit with us and we will send their research outputs to the ESRC to fulfil their grant requirements.

We will use both the project website and here to disseminate our findings for the IncReASe project.