CAIRSS Blog

2010/03/09

Following and communicating with CAIRSS

Filed under: Uncategorized — caulcairss @ 3:46 pm

Please note – The CAIRSS blog has relocated to http://cairss.caul.edu.au/blog

2010/01/20

How to export contents from an Institutional Repository to a Spreadsheet

Filed under: Digital Commons,DigiTool,DSpace,EPrints,Equella,ERA,Fedora,Fez,Java,OAI-PMH,SOLR — tpmccallum @ 5:21 pm

Please note – The CAIRSS blog has relocated to http://cairss.caul.edu.au/blog

The idea

A short time ago CAIRSS was approached by a Repository Manager from within the CAIRSS community to assist with exporting the contents of their repository to a spreadsheet. It was made apparent that accomplishing this task would greatly assist with Institutional Repository management tasks and most importantly ERA related work.

The tools

There are many ways that data can be extracted, moved and converted. The wisest choice is to use tools that are interoperable. An example of this would be choosing OAI-PMH to extract data rather than attempting to communicate with an individual Repositories data storage device or database etc.

The solution

Our CAIRSS Technical Officer Tim McCallum has completed a solution to address this task in the form of a Java Web Application. FoREveR – Flexible Repository Export Reporter.

Extracting the data

The data extraction is carried out using an OAI-PMH harvester. In this instance The Fascinator was used to accomplish this task. With regards to recent trends in Institutional Repository development and the use of SOLR the next step was an easy choice; simply extract the data from The Fascinator using a SOLR query. As an added bonus SOLR is able to supply the data in JSON (JavaScript Object Notation) format.

Converting the data

Overview

After testing different methods of converting the data including XSLT and Python some research was done revealing some excellent JSON libraries written in Java. The final choice was Java given the fact that the JSON libraries could meet the requirements for this application and that OAI-PMH, The Fascinator and SOLR were all already written in Java.

Technical

The JSON data returned is the result of an HTTP request (can be set to fetch all by default). This data is converted to Java Maps and Java ArrayLists for further processing. The application loops through every record that has been returned and creates a Java Set (unique list/master list). This Set is then displayed in the users browser. This is a last minute chance to select or deselect metadata before the final report is written. It is sometimes the case that a metadata field containing a large amount of content is best left out, as this can make the spreadsheet unmanageable from an end users perspective.

Reporting the data

Once approved the application creates an HTML file with all data saved to a table. The table includes table headings, table rows, table data cells and unordered lists for repeating information. This file can be opened in Microsoft Excel and Open Office spreadsheet applications or viewed in a browser.

Screen Shots

Optional SOLR Query

graphics1Note: It is not necessary to know SOLR query syntax, the application can be set to get everything by default. This may be an area to address with the community and feedback is welcome.

Feedback

graphics2

Small sample of spreadsheet output

graphics3

Using the Flexible Repository Export Reporter (FoREveR)

As this software is in the very early stages of its life cycle reports can be created by CAIRSS and emailed out to you. Please contact CAIRSS Central if you think that your institution can benefit from the use of this tool.

The source code is available at http://cairss.caul.edu.au/trac/browser/code/FoREveR for your interest, however it has not been extensively tested. All feedback is welcome. CAIRSS will endeavor to improve and enhance the software to meet your needs.

2009/12/07

eResearch Australasia 2009, who CAIRSS?

Filed under: Uncategorized — ptsefton @ 4:15 pm

Please note – The CAIRSS blog has relocated to http://cairss.caul.edu.au/blog

Kate Watson has reminded me to blog about the eResearch Australasia conference held in November from a CAIRSS perspective. What’s going on in eResearch that university repository managers should be aware of?

Here’s my top five things to think about in order of urgency, with 1 being the most immediate and five being a longer-term consideration:

  1. Look at what other CAIRSS sites are doing with eResearch and data. There were some great examples of different thinking about how IRs fit into eResearch at the workshop on data management run by QUT and CSIRO, with appearances from some familiar faces from the IR world talking about their institutional planning for data management: Institutional approaches to data management support: exploring different models. We’re interviewing for a new one-year position at USQ for an ANDS/CAIRSS liaison person to help bring these stories to the CAIRSS community, start to put up resources for data management on the CAIRSS site and help the IR community keep in contact with ANDS.

  2. Consider RIF-CS, the new ANDS-developed metadata format for describing data collections.

    The Registry Interchange Format – Collections and Services (RIF-CS) Schema was developed as a data interchange format for supporting the submission of metadata to a collections service registry.

    http://ands.org.au/resource/rif-cs.html

    This format is something that will be important to those IRs which end up hosting data collections and/or or metadata about data collections. I am encouraging the ANDS team to hold at least one meeting for the developers and metadata specialists in the repository community to tell us the background to this schema, and go through the thinking behind the design. (I know there’s a workshop about deploying the new standard, Gumboots for the Data Deluge: defining and describing collections for the Australian Research Data Commons, but I am thinking more about one that might (a) convince us why we need a new standard by explaining the thinking behind its design and (b) take input into future directions for the standard).

  3. Think about the Australian Access Federation. It’s still rolling out, apparently. I have always been quite sceptical about some of the more complicated use-cases involving role-based authorisation to repository resources, but I think the current AAF story is a bit more believable; I wrote about promising developments in the Australian Access Federation on my blog. Repository managers, it would be worthwhile checking with your local IT department if you are not already in the AAF. And if you have any IR requirements to lock-down content for AAF users then let Tim McCallum the CAIRSS techie know and we’ll see what we can do to help.

  4. Looking beyond the kinds of interfaces we’re using now there was a wonderful presentation from Mitchell Whitelaw of new visualisation techniques for navigating large data sets: Exploring Archival Collections with Interactive Visualisation. This was a revelation to me, seeing a word-cloud linked to a dynamic visualisation. Do yourself a favour and check out the A1 explorer Screencast. In the same session Duncan Dickinson from our team at USQ showed some early work we have done on bringing data capture down to the desktop with The Fascinator, Creating an eResearch Desktop for the Humanities. We’ll definitely be looking at how we can let you use Mitchell’s tools over your data.

  5. Get ready for web-scale annotation services as part of the scholarly communications process. I missed the presentation on Universal Collaborative Annotations with Thin Clients Supporting User Feedback to the Atlas of Living Australia but I heard about it from a few people. The team here at ADFI was inspired to plug the open source tools released by UQ into our ICE publishing system as part of ICE week and The Fascinator (it you’re technically inclined you can try it out). It’s early days yet but I think that the standards behind these systems will be key to a new world of peer-review, thesis examination and public participation in scholarship not to mention collaboration on document authoring, assignment marking and thesis supervision.

Copyright Peter Sefton, 2009. Licensed under Creative Commons Attribution-Share Alike 2.5 Australia. <http://creativecommons.org/licenses/by-sa/2.5/au/>

HTTP://DBPEDIA.ORG/SNORQL/?QUERY=SELECT+%3FRESOURCE%0D%0AWHERE+{+%0D%0A%3FRESOURCE+%3CHTTP%3A%2F%2FDBPEDIA.ORG%2FONTOLOGY%2FPERSON%2FBIRTHPLACE%3E+%3CHTTP%3A%2F%2FDBPEDIA.ORG%2FRESOURCE%2FSYDNEY%3E+%3B%0D%0A%3CHTTP%3A%2F%2FDBPEDIA.ORG%2FONTOLOGY%2FPERSON%

This post was written in OpenOffice.org, using templates and tools provided by the Integrated Content Environment project and published to WordPress using The Fascinator.

2009/10/13

Australian repository software in use

Filed under: Digital Commons,DigiTool,DSpace,EPrints,Equella,Fedora,Fez,Software,VITAL — caulcairss @ 3:28 pm

Please note – The CAIRSS blog has relocated to http://cairss.caul.edu.au/blog

CAIRSS now has a current list of Australian university research repositories (see: http://cairss.caul.edu.au/www/repository_software/repository_software.htm).

As outlined on this CAIRSS webpage, all 39 Australian universities have a research repository, with seven various repository software options currently in use.

CAIRSS will be working in the future to list which version of the software each installation uses.

2009/10/12

Harvesting from Flickr

Filed under: flickr,harvesting — caulcairss @ 4:36 pm

Please note – The CAIRSS blog has relocated to http://cairss.caul.edu.au/blog

Flickr is a web suite capable of hosting still images and videos. CAIRSS has received questions from the community regarding harvesting from this suite in particular.

There are other types of photography management software available such as Picasa, F-Spot and DigiKam. However these are out of the scope of this blog post.

Registering

To get started with Flickr you have to register for a Yahoo account. Once registered you are able to create a Flickr account and customize your profile.

Uploading content

There are quite a few ways to upload content to Flickr, examples of these include:

Flickr Uploadr
A client application available for Windows and Mac. This software is an official Flickr tool and allows the user to add titles, tags, descriptions and sets.
iPhoto and Aperture
iPhoto and Aperture are client applications for the Mac, features include uploading, editing and organizing images.
Email
Flickr is capable of accepting content from a users computer or mobile device via email.

Editing content

Picnik
Picnik is an online photo editing and official Flickr partner.

Organizing content

Organizr
Organizr, an official Flickr tool, is an online photo organizer and editor used to search and browse content as well as organize content into collections and sets.

Harvesting content

There is a growing number of third party applications (not official flickr applications) available for harvesting content from Flickr. Examples of these include:

flishr
Upload, Download and search tool for Windows (requires .NET framework).
Dfo
Desktop Flickr Organizer for Gnome. Preferably installed in Ubuntu Linux this application allows online and offline editing including adding, removing and editing of photos, tags, sets and comments.
Flickrdown
Windows application that allows downloading of photos in bulk using sets, requires .NET framework.
flickrexport
A plugin capable of directly exporting flickr images via iPhoto or Aperture.
Flump
An Adobe Air application available for Windows, Mac and Linux.

Flickr API

Flickr has provided an API for non-commercial use by outside developers.

Interacting with the Flickr API can be done using most of the common programming languages such as Java, .NET, PHP, Python, Ruby and Perl.

If there is sufficient interest from the CAIRSS community, CAIRSS Central can move to investigate creating a customized application using Java, PHP etc or even a command line script using Curl, Python or Perl.

If anybody has successfully harvested from Flickr and would like to contribute their information to the CAIRSS community or if you are a member of the CAIRSS community and would like to request assistance with this topic please contact cairss-technical@caul.edu.au

2009/07/21

NicNames and People Australia – some thoughts for CAIRSS

Filed under: Uncategorized — ptsefton @ 1:01 pm

Please note – The CAIRSS blog has relocated to http://cairss.caul.edu.au/blog

This post looks at a couple of name-related services that will be of interest to CAIRSS people. There are a lot of Author ID systems. This post is not a survey of all of them, see this list .

I visited the National Library of Australia on Wednesday June 17th, to look at the their new Single Business Discovery System prototype and the People Australia service. The SBDS is much more interesting than the name implies (I’m assured the name will change), it lets you search a big index of:

I’d like to talk about one part of this, People Australia, and reflect on how this fits with the ARROW mini-project looking at researcher identities, the NicNames Project. I dropped in on NicNames at Swinburne University in Melbourne on July 15th.

This blog post has been reviewed by staff from the NLA and Swinburne, thanks all I tried to address the things you raised but please use the comments if I missed anything.

Why would a repository manager care about these services? Both of these systems are built around identities of people. They promise to allow us to identify researchers uniquely and link those identities to research outputs and other stuff in the repository. No more searching through all the Lees and Smiths for the one you want or trying to bring together publications for people who have changed their name.

It is well understood that people will have lots of IDs from lots of sources. University staff numbers which are probably private, tax file numbers, email addresses , openIds and Australian Access Federation IDs (eventually, maybe).

I will look at how I think these services might look to a variety of players; repository managers, depositors and end-users. For a more background about some of the issues I recommend that you read this post by Andy Powell first. He reports that the UK scene seems to be involve attempts to create centralized name services, whereas here we are looking at distributed systems that talk to each other. Yes People Australia is a large project at the National Library of Australia, but it is designed to be part of a decentralised mesh of naming services.


Repository Managers: Establishing Local IDs

If you’re a repository manager, and you want to get usable, stable persistent IDs this is what you will do.

  1. Arrange to have NicNames installed. It’s a web application which is installed separately to the repository. There might be a local instance at your site, or potentially you could share with another institution.

  2. Load all your repository data into the NicNames website from your repository. This involves extracting data from the repository in, say, MARCXML and then feeding that to NicNames.

    The NicNames application will consider the data and try to identify unique authors by looking the strings used to identify them and by looking at co-authors and subject codes. So if there are two Alan Smiths working in different disciplines it will try to work out that they are different. NicNames can be configured to store as much information about each identity as you like such as multiple different name strings that have been used to refer to it, and potentially the ID of any repository items (this will become important later).

    Once it has done this it will give the repository manager some kind of interface to confirm NicNames’ suspicions and/or correct it when it gets things wrong. Behind the scenes, NicNames assigns unique IDs for individuals. These are not related to People Australia IDs at this stage of the process. So now instead of just a string like Alan Smith in the name field of some metadata we have something like:

     <person><name>Alan Smith</name><id>NN:00000001</id></person>

    We might also have other records that have a different form of the name but with the same ID which is the whole point of this exercise. (You could also store a canonical string like Smith, Alan to save the software having to look it up in the NickNames system but then what if you have to change it?)

    <person><name>Smith, Prof. A</name><id>NN:00000001</id></person>

  3. When you are happy with the names they can be imported back into the repository. This will have to be coded for each repository platform separately and at this stage this has not been done. One of the issues is that VITAL (which is the ARROW platform) does not have any APIs to allow this kind of batch update something would have to be written at the level of Fedora, which does the data storage under VITAL.

    Alternatively it would be possible to use an architecture where you didn’t have to change the repository at all it would continue to store strings for names, and the NicNames system would hold the data about IDs. A third system, a discovery layer, could then present a browse-and-search view of the repository-plus-name-IDs. That might sound a bit problematic, but it might be pragmatic where it is difficult to change the core repository software (even plugins we develop at USQ take months to make it into our local ePrints). It’s actually a semantic-web approach where different facts can be distributed on the web. I’ll write up this design pattern in another post on my own blog.

Object1At Swinburne the main use case involves repository staff running batches of records from EndNote into VITAL as that’s the way their repository workflow is set up it’s all done in the library with no self deposit They will:

  • Transform EndNote data to MARCXML.

  • Put the MARCXML into NicNames as above and sort out the names.

  • Export the MARCXML back out of NicNames, with added IDs.

  • Put the MARCXML into VITAL as per normal practice at Swinburne.

I talked to Swinburne staff about having a look at Squire the ARROW-sponsored replacement for VITAL which might be able to be integrated into their workflow and it might be able to be adapted to help inject NicNames IDs back into Fedora. So for new records, there will be a unique identifier in the record. How this will be displayed in VITAL remains to be seen.


1 Depositors

Now we turn our attention to the users, whoever is putting in resources using some kind of web ingest system. The NicNames team are starting with VALET which is the open source repository ingest tool that comes with VITAL but it should be simple to plug it in to other systems like ePrints. Here’s what a typical depositor will do:

  1. Start depositing a new item as usual.

  2. Start typing in a name field for, say, an author.

    If what you are typing appears to match with an existing identity a form will pop up where you can pick which author you mean. See the screenshot on the NicNames blog.

    If there are no matches then there will have to be some way to create a new identity in NicNames.

    Behind the scenes the ingest application will be talking to NicNames.

That’s it. Your repository now knows a definite identity for each person associated with a resource so you can have as many Alan Smiths as you like, and be able to deal with people who have published under several names. There is still the matter of what the interface will look like. Rebecca Parker tells me:

One of the outcomes of the project will be recommendations from a user-centred design process … we’ll be making suggestions about best practice for displaying name variants in research databases generally, with obvious local impact on how to manage these in institutional repositories.


2 Where does the NLA come into this?

So if a repository manager has used NicNames to establish IDs for people, and depositors have associated new deposits with those ID we have a local unique ID for each person in the repository, but that doesn’t help when records are harvested by the NLA, for their Discovery Service,because to that system a NicNames ID is meaningless unless it can associate it with the People Australia name system. What we want is a way to tie the NicNames ID to the People Australia ID.

People Australia is designed to work with multiple distributed identity management systems; it keeps an EAC record for each entity (person or organisation) which can have multiple identifiers associated with it. I assume what’s needed in the case of repository content is either to match a NicNames identity with an existing People Australia identity or if the match can’t be made, make a new People Australia identifier with an EAC record that contains the NicNames ID.

The process will work like this:

  1. People Australia will harvest name data out of NicNames systems using OAI-PMH and will attempt to match them to People Australia identities, and if that fails make new ones. (I’m still not really clear on how this might happen this bit is not in either system yet).

  2. Now, when the People Australia harvester pulls OAI PMH records out of the repository they will have NickNames IDs in them in the Dublin core, and the NLA system will be able to associate those with People Australia IDs.

Basil Dewhurst at the NLA summed up some of the advantages of public persistent IDs, which is what People Australia will provide (NicNames can’t provide that on it’s own it’s a bit of software and you need software-plus-governance to provide persistence):

The case for People Australia Ids is that theyre _persistent_, _public_, and enable discovery of information about people and the resources they create across collections and domains.  Importantly we plan to pull in the VIAF names later in the year and this means that we can link to researchers internationally as well as in Australia.  Research doesnt stop at the coast line !

He notes that Machine to Machine interfaces are another key advantage of these systems which I take to mean that repositories can talk to each other to build a distributed identity system.

3 Summary what does this mean for repository managers?

For most repositories I think that it is a case of waiting to see what happens. At the moment, the People Australia services is not creating IDs for stuff that comes in via the ARROW Discovery Service, and the NicNames project has not yet released any code, or got it working in any of the partner institutions. When NicNames is released the CAIRSS team will have a look and see what would be involved in getting it into production across the various platforms in use in Australia, and we’ll keep talking to the NLA about how NicNames IDs will flow through to People Australia.


* It’s impressive that when I copied the tabs from the top of the SBDS page they pasted into my document as bullet points. That’s clean design.

2009/06/10

Open Repositories Conference 09 Part 2

Filed under: Open Repositories 09,The Fascinator — tpmccallum @ 3:08 pm

Please note – The CAIRSS blog has relocated to http://cairss.caul.edu.au/blog

Performance

I learned of a multi-disciplinary search engine for academically relevant web resources called BASE during some recreation/mingling time. I spoke to a couple of developers about two important issues relating to repository systems/solutions. The first issue was performance. We discussed how BASE uses a search technology called FAST. I believe that Microsoft acquired FAST Search & Transfer in 2008, the product is now known as FAST ESP from Microsoft. Base currently holds over 20 Million records from 1265 sources around the globe and is contributing to the Digital Repository Infrastructure Vision for European Research (DRIVER).

The average search that I did on generic topics produced about 150,000 hits out of 20,084,184 items, all in fractions of a second. Very impressive performance. I have never tested using item numbers this large and certainly have not seen results like this with even a fraction of the content. It appears that BASE holds meta data, full text and precise bibliographic data and uses OAI-PMH for harvesting. I searched for quite a while to get a data stream such as a PDF served via the BASE url but was redirected every time. I am therefore assuming that there are no data streams stored locally (meta data only). Guys please correct me if I am wrong about this.

The Fascinator

I do not wish to make any performance comparisons at all with BASE as The Fascinator has only been tested with a minute amount of records compared to BASE. The interesting part that I would like to raise is that The Fascinator is not only able to harvest and provide meta data only, but can harvest and store data stream content locally as well. In this case it is possible to configure The Fascinator in two ways. The first way is to enable it to engage directly with Fedora and harvest meta data as well as data streams using Fedora’s API’s. The second way is to configure The Fascinator to harvest using OAI-ORE, if there are references to data streams in the resource maps they will be downloaded and stored locally along with the meta data it was configured to harvest at the time. The University of Southern Queensland in conjunction with the CAIRSS project is getting ready to carry out a nation wide harvest called the Australian University Repository Census(AURC), this harvest will be carried out using The Fascinator software.

Normalization

As I mentioned above there was another important issue that was brought up in our casual conversation, Normalization. It appears that this is a problem for everyone in the repository space and harvesting projects. I was throwing a developer challenge idea around in my head before the conference about creating an application, well more of a web service really that would harvest a repositories metadata and then display it in a web browser. pointing out obvious mistakes first, followed by suggestions for normalization (all the while linking back to the item, so that the user could organize the editing of that item). I talked to Oliver Lucido briefly (could not discuss it with Peter as he was a Judge for the challenge). We came to the conclusion that this is pretty much what we are doing with AURC using The Fascinator. This being my first conference, I was unsure about how much conference content I would miss out on by trying to code something up for 2 out of the 4 days… so that idea kind of died.

Now that I am back I am revisiting that idea and wondering if it is possible to put together some pieces that exists already and combine that with some software (plagiarism detection style) in the hope of creating a web service that is capable of pointing out problems with normalization on a Institution by Institution basis, giving suggestions regarding conforming with other institutions and/or repairing internal normalization issues. I think ultimately the best solution would be for each individual institution to be able to see and repair normalization issues in house.

2009/06/03

Open Repositories 2009 – Peter Sefton's thoughts

Filed under: Uncategorized — ptsefton @ 4:20 pm

Please note – The CAIRSS blog has relocated to http://cairss.caul.edu.au/blog

I posted a general summary of my trip to Open Repositories 2009 over on my blog (this is Peter Sefton writing). Katy Watson asked me to make some comments for CAIRSS. As with Tim McCallum’s trip I was funded by USQ.

The big theme mentioned in my post, and this is something that I used to go on about in the RUBRIC project, is that a repository is not a bit of software, it’s a way of life. Put more formally, repositories are more about governance of services than they are about software applications. There was a fair bit of this ‘set of services’ approach evident a OR09 I take this as a positive sign that we are moving beyond the idea that the bit of software that you call The Repository is all there is. For a local example, look at the way some sites are going to deal with evidence for the ERA by putting files up on a simple web server. Provided this is accompanied by procedures and governance to ensure the materials persist for an appropriate length of time, I think it’s just part of the repository service offered by the library to the institution.

I didn’t see much at the conference about institutional repositories specifically so not much to report to the CAIRSS community about that, and as I’m primarily in technical role I spent a fair bit of time with the technical crowd.

One thing I think is striking is how well ePrints is doing; it seems that their model of single-institution support is a good way to provide vibrant software they are producing new releases at least as fast as DSpace and Fedora. I get the sense that the administrative overhead of establishing the DuraSpace organization and managing highly distributed developer teams is making progress hard for those platforms at the moment. When we did the RUBRIC project I think there was a feeling that ePrints was old technology and ‘better’ Fedora based solutions were going to be the way forward, but at least one Australian ePrints site has stayed with the software and not gone ahead with a planned move to a Fedora based system. Note my prediction in my blog post that there will be a Fedora back-end option for ePrints by the time Open Repositories 2010 comes around. At this stage I think ePrints is a really good solution for document-based repositories. Me, I would not be managing other kinds of collection with it, but at Southampton they do and I may be eating those words soon.

I pointed this out in my other post but I will do so here as well. USQ now has ePrints hooked up to our ICE content management system, meaning that we have papers, presentations and posters going in not just as PDF but as web pages. This is going to allow us to do much more with linking documents to data and providing rich interactive experiences. My last few items all have HTML as the primary object, with PDF available if you click through there are a few glitches to sort out but we’re getting there.

VTLS, of VITAL fame, had a small presence, pitching a bit of open source software for OAI-PMH. Nice to see them contributing in this way.

Remember, it’s not a software package, it’s a state of mind.

2009/05/28

Open Repositories Conference 09

Please note – The CAIRSS blog has relocated to http://cairss.caul.edu.au/blog

General Overview

This years Open Repositories Conference was held in Atlanta Georgia USA. This year marked the 4th year for this International Annual Conference. The Conference was held at the Georgia Institute of Technology Hotel and Conference Center.

Participants and Sponsors

The Conference had representatives from organizations including Dspace, Eprints, Fedora, VTLS, JISC, Microsoft Research, Sun Microsystems, @MIRE and NSF (National Science Foundation).

Microsoft Research

It was made quite clear to me throughout the conference that Microsoft Research were looking at carrying out research and development and not concerned with directly profiting from their involvement. There were several open discussions during workshops about how they would best create plug in functionality for their products that would enable their users to interact with Repositories. There was allot of constructive conversation hovering around how SWORD would be integrated with new Microsoft Research products/plug-ins. There were good discussions about whether the processing and converting of documents and meta data should be done on the Client, as a Web Service or handled directly by the Repository Software. The main challenges that I could see with doing this is deciding how much freedom to give to the user. Do they simply click a button upon completion of their work, or does the software allow them to interact at quite a low level with regards to meta data and file types, allowing them to review their work in the different formats before the final submission. I am assuming that if a researcher has spent several years writing and researching they would have a substantial amount of time to put the final touches on the master document to make sure that it rendered correctly in HTML and PDF. It would be amazing if we could write software that would handle everything behind the scenes, perhaps eventually we will arrive at this point.

DSpace

Where is Dspace heading? 2.0 can be expected early 2010

In the mean time 1.6 will be released as a stepping stone to 2.0 and will include bug fixes (due October 2009)

I ran into Kim Shepherd from the Library Consortium of New Zealand on my way out of Atlanta. Kim is a DSpace committer, we had a good conversation about DSpace 2.0 amongst other things. I will be sure to keep in touch and keep an eye on future development.

Dura Space Organization

Dura Space is an organization. The first technology to emerge from Dura Space will be a product called Dura Cloud. Dura Cloud consists of a complete hosting service using Dura Space partners (commercial cloud providers). While Dura Space is offering a cloud computing solution as a service, it is possible to download the code and create a cloud computing solution inside your own institution.

Components used by Dura Space are Akubra (A pluggable file storage interface), Mulgura (Semantic store), and Dura Cloud.

Dura Space expect more components will be considered for use as they are discovered.

@MIRE

I took a bit of time to talk to Bram Luyten from @MIRE. From what I understand @MIRE is a commercial company that works very closely with the developers of DSpace, as I understand it their staff include DSpace committers. @MIRE provide services including preparing and implementing repository solutions, technical assistance, bug fixes, customizations and a support service for the DSpace product.

As I understand it DSpace ships with a BSD license and is therefore very open to this sort of interaction and collaboration with a commercial company. To me this seems to be a fairly good approach to a Repository solution as it allows the flexibility of using an open source product with the option to request immediate assistance and support at a price should you need it.

Fedora

Fedora 3.2 wants to shift to using Akubra to replace the old Fedora storage interface. The Akubra API is not turned on by default in Fedora 3.2, it is hoped that developers will take interest in it over time. This will allow the new technology to be tested and implemented gradually.

An interesting feature of Fedora 3.2 that was mentioned is that you are now able to run multiple Fedoras instances with one Tomcat instance. This has been a topic that I have heard raised a few times over the last couple of years.

Poster Presentations

Squire

The poster sessions included Squire. Squire as you probably already know is the Java version ov the VTLS product VALET. It was developed with ARROW funding. It appears that VTLS has recently taken an interest in this product and it is possible that they will further develop it. Whether it remains open source or not remains to be seen.

graphics1

The Fascinator

This poster was presented by Peter Sefton. The Fascinator is an Apache Solr front end to the Fedora commons repository, I am again guessing that most of you probably already know that. You can find out more about The Fascinator here.

graphics2

You can find a full list of the Open Repositories Poster Sessions here

Photographs

The Open Repository organisers have provided a Flickr slide show of the entire conference. You will see Peter and Myself in the Minute Madness Poster Presentations as well as us discussing the finer points of our posters in the ball room.

Wrap up

I found that I got just as much information out of talking to people casually than I did during the formal presentations. I met so many people that I have a big job of going through my notes and contacting them all.

In my opinion there was a definite trend towards having distributed systems rather than a single repository. There were even discussions about Repository performance and how running only the database components on separate servers had marked increase in said Repositories performance. I was surprised at how many people are using open source products and building their own applications over the top. Very few used full proprietary solutions. One of the many examples of this would be Ruby on Rails application that incorporated Fedora using Jruby and of course one of the most impressive, our very own The Fascinator complete with multi-portal creation, harvesting framework, Solr indexing, security model as well as installers for Linux, MacOS and Windows. Oliver Lucido has also recently created a screen cast of the new desktop feature. Peters presentation went down really well he got quite a few laughs with some witty humor. Over all had a great time and cant wait until next time.

2009/05/25

Are you a tweeter? You can follow CAIRSS updates on Twitter if that is your style…

Filed under: Twitter — caulcairss @ 1:00 pm

Please note – The CAIRSS blog has relocated to http://cairss.caul.edu.au/blog

Want another way to stay updated on all CAIRSS and repository news? The ‘CAIRSS Central’ team are on Twitter – http://twitter.com/caulcairss  

The CAIRSS website News Stand has details on how to subscribe to CAIRSS twitter updates via your email, rss feeds, or via your own Twitter account. http://cairss.caul.edu.au/www/news_stand.htm

#caulcairss is Twitter Hash Tag you can use to tag when posting CAIRSS specific information in your own personal Twitter account. Tim has been tweeting some interesting ideas from the Atlanta Open Repositories Conference.

Next Page »

Blog at WordPress.com.