Getting started with Fedora Repository

May 24, 2009 at 2:34 pm 10 comments

I recently had to get started with the Fedora Repository. Because the Fedora Commons site is somewhat chaotic, and a quickstart is hard to find online, I’ll write the basic steps in getting a Fedora Repository up and running here. 

fedora_commons

What is it?

First of all, the Fedora Repository has nothing to do with the famous Linux distro (there actually was some dispute over the trademark). Fedora Repository is an open source digital repository management system, originating at the Cornell University in 1998 and open source since 2003. The current release is Fedora 3.2. 

The repository is meant to store (or link to) digital assets (images, video, audio, … anything really) and their related metadata. For example, I am using Fedora as a repository for video files. The video files are available in different formats (high resolution, low resolution flash video, …) and there is an XML file per video describing the video contents. 

This article describes the setup of the repository, the data model Fedora uses, and how to interact with the repository using Java and Ruby.

Installation

First download the latest version of Fedora from their site (go to Developers > Downloads). As mentioned before, the current release is fedora-installer-3.2.jar. When installing you can choose between the quick and the custom installation. Choose the latter, because we want to enable the REST interface. Go to the download location in your terminal and type

java -jar fedora-installer-3.2.jar

to start the installation process. The default options are ok except that you should enable the REST api. More information on the installation is available here.

Fedora comes runs inside a tomcat server. After installation you can start Fedora by going to fedora_home/tomcat/bin and typing in a command line (Linux / Mac)

sh catalina.sh run

or (Windows)

startup.bat

You should see some messages that indicate that tomcat is starting up and that Fedora is being deployed and the final message should say something like “server startup complete”. Now you can fire up your browser and surf to http://localhost:8080/fedora. If everything went well, an ugly page showing some info about Apache Axis.

The Fedora data model

A Fedora object contains several datastreams

A Fedora object contains several datastreams

The image on the right shows a Fedora object. It has a unique ID called the PID (persistent ID) and some other properties. A Fedora object also contains so-called datastreams. These contain the actual data. This data can be the essence (e.g., the video material) or the metadata about this essence (e.g., an XML file with descriptions). 

Fedora has some default datastreams (RELS-EXT, DC and AUDIT), and the user can add as many other datastreams as necessary. So what do these default datastreams contain?

  • The RELS-EXT datastream contains the relationships to other objects in the repository. This can for example contain a relationship “is part of” that states that this object is part of another object (e.g. a collection “news videos”).
  • The DC datastream contains Dublin Core metadata about the object. Dublin Core is a metadatastandard with some basic fields (like title, author etc.), so this stream is a basic description of the object’s contents.
  • The AUDIT datastream contains the history of actions performed on the object.
  • The other datastreams could contain the different versions of a video file (High resolution version, low resolution, audio channels, …). Datastreams can also contain more metadata (e.g., a NewsML XML file).

You can find more information about the datastreams in tutorial 1 on the Fedora site.

Fedora offers a number of cool features, like versioning datastreams (so you can go back to a previous version of a datastream, e.g. when someone messed up the metadata describing a videofile). It is also possible to define transformations of your datastream using webservices. If you have for example a webservice that can convert an image to grayscale, you could couple this to Fedora and it would be like your object had an extra datastream that contained the grayscale version, but actually when this datastream is called a grayscale version would be created on the fly. 

Adding your first objects

The best way to get started with Fedora is using the client administrator program that is included in the installation (fedora_home/client/bin/fedora-admin). 

This is explained in tutorial2 on the Fedora site: tutorial2, so I won’t go into detail. You really should follow this tutorial as it gives a good overview of Fedora’s possibilities. The image below shows a screenshot of the application:

The Fedora Administrator application

The Fedora Administrator application

With the application you can add new objects to Fedora. 

Connecting to Fedora with Ruby

ruby_logo

The Fedora repository exposes two types of interfaces to the outside world: a SOAP API and a REST API. The latter is the simplest and we’ll start with it to demonstrate access from within a Ruby script to Fedora. 

The guys at http://www.yourmediashelf.com/ have created two Ruby gems for communication with Fedora: RubyFedora and ActiveFedora (homepage). RubyFedora is a Ruby wrapper around the Fedora REST interface and is the gem we’ll be using in this paragraph. ActiveFedora tries to provide an ActiveRecord-like experience when using Fedora from Ruby, I haven’t tested this gem yet. 

Start by installing the RubyFedora gem:

gem install ruby-fedora

Use the following Ruby script (fill in your username, e.g., fedoraAdmin, and password in the repository url) to create a new object, save it and find it back:

#!/usr/local/bin/ruby
require 'ruby-fedora'
repository = Fedora::Repository.register('http://user:pass@localhost:8080/fedora')
test_object = Fedora::FedoraObject.new(:label => 'honolulu', :contentModel => 'Image', :state => 'A', :ownerID => 'fedoraAdmin')
repository.save(test_object)
objects = repository.find_objects('label~Image*')
object = repository.fetch_content('demo:1')

(The last line will result in an error if a file with pid equal to demo:1 does not exist)

The following is a more elaborate example that will print some of the object’s fields and will also print some of the fields of the Dublin Core datastream:

#!/usr/local/bin/ruby
require ‘ruby-fedora’
require ‘rexml/document’
#connect to the repository
repository = Fedora::Repository.register(‘http://fedoraAdmin:test@localhost:8080/fedora’)
#create a new object
test_object = Fedora::FedoraObject.new(:label => ‘blublub’, :contentModel => ‘Video’, :state => ‘A’, :ownerID => ‘fedoraAdmin’)
#save the object
repository.save(test_object)
#find objects with pid video* (e.g., “video:1” / “video:2” / … )
vids = repository.find_objects(‘pid~video*’) 
vids.each { |video|
  #print this item’s fields
  puts video.pid.to_s     + “\n******************\n”
  puts “create_date … ” + video.create_date.to_s      + “\n”
  puts “modified_date … ” + video.modified_date.to_s       + “\n”
  puts “state … ” + video.state.to_s     + “\n”
  puts “label … ” + video.label.to_s     + “\n”
  puts “owner_id … ” + video.owner_id.to_s     + “\n”
#  puts “profile … ” + video.profile.to_s     + “\n”
  
  #extract Dublin Core datastream
  xml_data = video.object_xml
  doc = REXML::Document.new(xml_data)
  root = doc.root
  dc_field = root.elements[“foxml:datastream[@ID=’DC’]/foxml:datastreamVersion/foxml:xmlContent/oai_dc:dc”]
  puts “\n” + dc_field.elements[“dc:identifier”].text + ” => ” + dc_field.elements[“dc:title”].text
}
#!/usr/local/bin/ruby
require 'ruby-fedora'
require 'rexml/document'
#connect to the repository
repository = Fedora::Repository.register('http://fedoraAdmin:test@localhost:8080/fedora')
#create a new object
test_object = Fedora::FedoraObject.new(:label => 'blublub', :contentModel => 'Video', :state => 'A', :ownerID => 'fedoraAdmin')
#save the object
repository.save(test_object)
#find objects with pid video* (e.g., "video:1" / "video:2" / ... )
vids = repository.find_objects('pid~video*') 
vids.each { |video|
  #print this item's fields
  puts video.pid.to_s     + "\n******************\n"
  puts "create_date ... " + video.create_date.to_s      + "\n"
  puts "modified_date ... " + video.modified_date.to_s       + "\n"
  puts "state ... " + video.state.to_s     + "\n"
  puts "label ... " + video.label.to_s     + "\n"
  puts "owner_id ... " + video.owner_id.to_s     + "\n"
  #puts "profile ... " + video.profile.to_s     + "\n"
  #extract Dublin Core datastream
  xml_data = video.object_xml
  doc = REXML::Document.new(xml_data)
  root = doc.root
  dc_field = root.elements["foxml:datastream[@ID='DC']/foxml:datastreamVersion/foxml:xmlContent/oai_dc:dc"]
  puts "\n" + dc_field.elements["dc:identifier"].text + " => " + dc_field.elements["dc:title"].text
}

Connecting to Fedora with Java

296946221_6ef6d4e99b

(Duke image taken from here). The Fedora installation comes with a client demo illustrating some of the things you can do with the SOAP interface. It is located at Fedora_Home\client\demo\soapclient. To create a project in Eclipse with this code, start up Eclipse and add the following libraries to your build path: fedora_home/client/fedora-client-3.1.jar and everything in fedora_home/client/lib/. Add the DemoSOAPClient.java file to your project and use the functions in it in your own code. For example, the following code will add an item to Fedora based from a FOXML file.

 

 

public static void addToFedora(String filename,long theID) {
//FEDORA REPO
DemoSOAPClient caller;
try {
	caller = new DemoSOAPClient("http", "localhost", 8080, "fedoraAdmin", "test");
	//RepositoryInfo repoinfo = caller.describeRepository();
	//delete item if it exists
	String purgeDate=null;
	try {
		purgeDate = caller.purgeObject(
			"id:"+theID, // the object pid
			"purge object", // an optional log message about the change
			 false);  // do not force changes that break ref integrity
	} catch (Exception e) {
		//System.out.println("Hack...just ignore failures since objects may not exist yet." + e.getMessage());
	}

	//add the item to Fedora
	FileInputStream inStream=null;
	String ingestPID=null;
	File ingestFile=new File(filename);
	try {
		inStream=new FileInputStream(ingestFile);
	} catch (IOException ioe) {
		System.out.println("Error on ingest file inputstream: " + ioe.getMessage());
		ioe.printStackTrace();
	}
	System.out.println(" - ingest FoXML in Fedora");
	try {
	        ingestPID = caller.ingest(inStream, fedora.common.Constants.FOXML1_1.uri, "ingest of item");
	} catch (IOException ee1) {
		System.out.println("Error during ingest: "+ee1.getMessage());
		//ee1.printStackTrace();
	}
	//System.out.println("Finished test ingest of sdef object: " + ingestPID);
} catch (Exception e1) {
	// TODO Auto-generated catch block
	e1.printStackTrace();
}
}

That’s it for this Fedora introduction!

Advertisements

Entry filed under: Uncategorized. Tags: , , , , , .

Distance scanner using servomotor and distance sensor Free GPS tracker on handheld / pocket pc

10 Comments Add your own

  • 1. anep  |  September 9, 2009 at 4:22 pm

    everytime to start fedora need to install from the first step?

    Reply
  • 2. Karel  |  September 9, 2009 at 4:46 pm

    After installation you can start Fedora by going to fedora_home/tomcat/bin and typing in a command line (Linux / Mac)
    sh catalina.sh run
    or (Windows)
    startup.bat

    Reply
  • 3. anep  |  September 10, 2009 at 2:21 am

    i need to re-determine the environment variable first.

    start at :
    set JAVA_HOME=c:\jdk1.5.0_07
    set FEDORA_HOME………….
    set CATALINA_HOME……….

    is it because my installation not properly done?

    Reply
  • 4. anep  |  September 10, 2009 at 2:26 am

    I’m in Windows environment

    to start fedora i need to type
    [CODE]
    set JAVA_HOME=c:\jdk1.5.0_07
    set FEDORA_HOME=c:\fedora\home
    set CATALINA_HOME=c:\fedora\home\tomcat
    startup.bat
    c:\fedora\home\client\fedora-client-3.2.1.jar
    [/CODE]

    Reply
  • 5. Karel  |  September 10, 2009 at 9:00 pm

    You could put the
    set … = …
    statements inside the startup.bat batch script.

    Alternatively, you could permanently add them to your environment variables (link: http://www.microsoft.com/resources/documentation/windows/xp/all/proddocs/en-us/environment_variables.mspx?mfr=true)

    Reply
  • 6. anep  |  September 11, 2009 at 11:28 am

    how to make fedora accessible via web n users can upload files to the repository? does it need mysql? i have installed mysql but dont know to to connect it to fedora repository

    Reply
  • 7. Karel  |  September 11, 2009 at 5:58 pm

    fedora can work with the embedded derby database or with mysql, using the embedded database is the easiest

    Reply
  • 8. anep  |  September 12, 2009 at 2:56 am

    i’ve already installed it but now stuck on how to use it..
    where do i need to input data n upload the files?

    Reply
  • 9. anep  |  September 12, 2009 at 1:17 pm

    referring on ur post on September 10, 2009 at 9:00 pm
    which one need to chage? user or system?

    Reply
  • 10. Karel  |  September 16, 2009 at 8:53 pm

    change user if you just want to do it for you or system if you want to change it for all the different users on your system

    Reply

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Trackback this post  |  Subscribe to the comments via RSS Feed


Feeds

Articles to be written…

Twitter – kr3l

my del.icio.us

RSS Google Reader Shared Stuff

  • An error has occurred; the feed is probably down. Try again later.

RSS Listening to..

  • An error has occurred; the feed is probably down. Try again later.

%d bloggers like this: