Posts tagged ‘ruby’

Use Google Docs for generating live reports on your Ruby scripts

I have to transfer and convert a lot of files this week, and in order not to loose track, I created a spreadsheet that summarizes which files are transferred and converted. Because it was a pain to keep the spreadsheet up to date I wrote a Ruby script that monitors the files written and writes the results back to a Google docs spreadsheet.

To monitor the files I use the so-called backtick quote, which reads the output of a command to a variable:

res = `cd /home/thefolder;ls -l`
res.each_line do |line|
  cols = line.split(' ')
  if cols.length<9  #probably a header row
  if File.file?('/home/thefolder/'+cols[8])
    #the file exists, your code here
    #my code writes to a hash called items
To write the results back I used gimite’s google-spreadsheet-ruby gem ( that makes it trivially easy to connect to a Google spreadsheet (just follow the “how to use” on the first page).
def dump_to_google(items)
  session = GoogleSpreadsheet.login("", password)
  ws = session.spreadsheet_by_key("YOURKEYYOURKEYYOURKEY").worksheets[0]
  col = 1 
  row = 1
  ws[row,1] = 'ID'
  ws[row,2] = 'Filename'
  items.each do |id,itm|
    ws[row,1] = id
    ws[row,2] = itm[:filename]


The end result is pretty cool, a spreadsheet that is updated every minute (or whatever interval you decide). You can also create graphs that are updated automatically with the new data:

December 10, 2009 at 9:13 pm Leave a comment

Extract a keyframe from a video with FFMpeg

If you want to extract a particular frame from a video, you can use the following ffmpeg command:

ffmpeg -i INPUTPATH -vframes 1 -ss TIMESTAMP -f image2 -vcodec mjpeg OUTPUTPATH.jpg

where TIMESTAMP is in the format hh:mm:ss.ff

If you have the frame number instead of the timestamp, you can use the following Ruby script to convert the frame number to the timestamp (for a framerate of 25fps) – there’s probably a better way for formatting a number to have two digits though :-):

def self.profile_thumbnail(frame_nr = nil)
#calculate timestamp
  if frame_nr.nil?
    framenr = 25*60*2
    framenr = frame_nr.to_i

  framenr = 25*60*2

uur = (framenr/(60*60*25))
min = ((framenr-uur*60*60*25)/(60*25))
sec = ((framenr-uur*60*60*25-min*60*25)/25)
frame = ((framenr-uur*60*60*25-min*60*25-sec*25))
uur_s = uur.to_s
if uur<10
  uur_s = '0'+uur_s
min_s = min.to_s
if min<10
  min_s = '0'+min_s
sec_s = sec.to_s
if sec<10
frame_s = frame.to_s
if frame<10
  frame_s = '00'+frame_s
elsif frame<100
  frame_s = '0'+frame_s

stamp = uur_s+':'+min_s+':'+sec_s+'.'+frame_s
return  ' -vframes 1 -ss '+stamp+' -f image2 -vcodec mjpeg '


December 3, 2009 at 9:07 pm Leave a comment

Using Hudson for a Ruby project

This post explains how to let Hudson automatically run your unit tests / rspec tests upon pushing code to your git repository.

Installing and running Hudson is easy:

nohup java -jar hudson.war --httpPort=7080 --prefix=/hudson >> "hudson.log" 2>&1 &

If you browse to localhost:7080/hudson you’ll be greeted by the user interface of Hudson. I’ve explicitly set the port to 7080 because the default port (8080) was already used by tomcat in my case. Now we’ll add some plugins. In Hudson’s user interface click on “manage Hudson” and click on “add plugins”. Install the following plugins: git, ruby, ruby metrics, rake. Click on “restart Hudson when no jobs are running” after the plugins are installed.

We’ll use rake for defining our test tasks. Make a rake file (named Rakefile) in your project root folder.

require ‘rake’
require 'rake/testtask'
require 'rake/packagetask'
require 'rake'
require 'rake/testtask'
require 'rake/packagetask'
require 'spec/version'
require 'spec/rake/spectask'
require 'rcov'
gem 'ci_reporter'
require 'ci/reporter/rake/test_unit'	#
require 'ci/reporter/rake/rspec'
desc 'do unit tests' do |t|
    t.libs << 'lib'
    t.pattern = 'test/test_*.rb'
    t.verbose = true

namespace :spec do
  desc "do rspec tests and test coverage"'rcov') do |t|
    t.spec_files = FileList['spec/**/*_spec.rb']
    t.spec_opts = ['--format html:results/spec_results.html']
    t.warning = true
    t.rcov = true
    t.rcov_dir = 'coverage'
    t.rcov_opts = ['--exclude', "kernel,load-diff-lcs\.rb,instance_exec\.rb,lib/spec.rb,lib/spec/runner.rb,^spec/*,bin/spec,examples,/gems,/Library/Ruby,\.autotest,#{ENV['GEM_HOME']}",                  '-I', 'lib/']

Take a look at the following sites for more info on the Rakefile:
Now create a new Hudson job (freestyle project) that checks out the code from your repository. Add two build steps of the type “execute shell command”:
  1. GEM_HOME=/opt/myproject/shared/gems rake ci:setup:testunit test CI_REPORTS=results
  2. GEM_HOME=/opt/myproject/shared/gems SPEC_OPTS=”–format html:resultsspec/spec_results.html” rake ci:setup:rspec spec:rcov CI_REPORTS=resultsspec

The GEM_HOME=… part is only necessary if you install your gems locally in you project folder instead of system-wide. In the first job we call the take task ci:setup:testunit which is defined in the ci_reporter gem. This gem will create Hudson-compatible reports out of the reports generated by test unit or rspec. Next we call the rake test task which will actually run the unit tests. We also pass a variable CI_REPORTS which tells ci_reporter where to create the reports.

The second job first runs the ci:setup:rspec task which again prepares ci_reporter. Then we call spec:rcov which will run the rspec tests and create the coverage reports. Note the SPEC_OPTS variable in which we tell rspec to create an html report (which will be converted to xml files by ci_reporter).

Now check the post build actions “publish JUnit reports” and “publish coverage reports”, save the Hudson project and build it!

If all went well, Hudson will create some nice reports out of the test and coverage reports:

Because sometimes you want all the detail you can get, you might want to add links to the original (html) test and coverage reports in the description of your project. Click “Change description” and enter something like:

<a href="http://localhost:7080/hudson/job/myproject/ws/resultsspec/spec_results.html" target="_blanc">detailed spec results</a><br>

<a href="http://localhost:7080/hudson/job/myproject/ws/coverage/index.html" target="_blanc">detailed code coverage report</a>

Et voila! You now have a butler running your tests and creating nice reports as soon as you commit code to your repository!

December 3, 2009 at 8:44 pm Leave a comment

Getting started with Fedora Repository

I recently had to get started with the Fedora Repository. Because the Fedora Commons site is somewhat chaotic, and a quickstart is hard to find online, I’ll write the basic steps in getting a Fedora Repository up and running here. 


What is it?

First of all, the Fedora Repository has nothing to do with the famous Linux distro (there actually was some dispute over the trademark). Fedora Repository is an open source digital repository management system, originating at the Cornell University in 1998 and open source since 2003. The current release is Fedora 3.2. 

The repository is meant to store (or link to) digital assets (images, video, audio, … anything really) and their related metadata. For example, I am using Fedora as a repository for video files. The video files are available in different formats (high resolution, low resolution flash video, …) and there is an XML file per video describing the video contents. 

This article describes the setup of the repository, the data model Fedora uses, and how to interact with the repository using Java and Ruby.


First download the latest version of Fedora from their site (go to Developers > Downloads). As mentioned before, the current release is fedora-installer-3.2.jar. When installing you can choose between the quick and the custom installation. Choose the latter, because we want to enable the REST interface. Go to the download location in your terminal and type

java -jar fedora-installer-3.2.jar

to start the installation process. The default options are ok except that you should enable the REST api. More information on the installation is available here.

Fedora comes runs inside a tomcat server. After installation you can start Fedora by going to fedora_home/tomcat/bin and typing in a command line (Linux / Mac)

sh run

or (Windows)


You should see some messages that indicate that tomcat is starting up and that Fedora is being deployed and the final message should say something like “server startup complete”. Now you can fire up your browser and surf to http://localhost:8080/fedora. If everything went well, an ugly page showing some info about Apache Axis.

The Fedora data model

A Fedora object contains several datastreams

A Fedora object contains several datastreams

The image on the right shows a Fedora object. It has a unique ID called the PID (persistent ID) and some other properties. A Fedora object also contains so-called datastreams. These contain the actual data. This data can be the essence (e.g., the video material) or the metadata about this essence (e.g., an XML file with descriptions). 

Fedora has some default datastreams (RELS-EXT, DC and AUDIT), and the user can add as many other datastreams as necessary. So what do these default datastreams contain?

  • The RELS-EXT datastream contains the relationships to other objects in the repository. This can for example contain a relationship “is part of” that states that this object is part of another object (e.g. a collection “news videos”).
  • The DC datastream contains Dublin Core metadata about the object. Dublin Core is a metadatastandard with some basic fields (like title, author etc.), so this stream is a basic description of the object’s contents.
  • The AUDIT datastream contains the history of actions performed on the object.
  • The other datastreams could contain the different versions of a video file (High resolution version, low resolution, audio channels, …). Datastreams can also contain more metadata (e.g., a NewsML XML file).

You can find more information about the datastreams in tutorial 1 on the Fedora site.

Fedora offers a number of cool features, like versioning datastreams (so you can go back to a previous version of a datastream, e.g. when someone messed up the metadata describing a videofile). It is also possible to define transformations of your datastream using webservices. If you have for example a webservice that can convert an image to grayscale, you could couple this to Fedora and it would be like your object had an extra datastream that contained the grayscale version, but actually when this datastream is called a grayscale version would be created on the fly. 

Adding your first objects

The best way to get started with Fedora is using the client administrator program that is included in the installation (fedora_home/client/bin/fedora-admin). 

This is explained in tutorial2 on the Fedora site: tutorial2, so I won’t go into detail. You really should follow this tutorial as it gives a good overview of Fedora’s possibilities. The image below shows a screenshot of the application:

The Fedora Administrator application

The Fedora Administrator application

With the application you can add new objects to Fedora. 

Connecting to Fedora with Ruby


The Fedora repository exposes two types of interfaces to the outside world: a SOAP API and a REST API. The latter is the simplest and we’ll start with it to demonstrate access from within a Ruby script to Fedora. 

The guys at have created two Ruby gems for communication with Fedora: RubyFedora and ActiveFedora (homepage). RubyFedora is a Ruby wrapper around the Fedora REST interface and is the gem we’ll be using in this paragraph. ActiveFedora tries to provide an ActiveRecord-like experience when using Fedora from Ruby, I haven’t tested this gem yet. 

Start by installing the RubyFedora gem:

gem install ruby-fedora

Use the following Ruby script (fill in your username, e.g., fedoraAdmin, and password in the repository url) to create a new object, save it and find it back:

require 'ruby-fedora'
repository = Fedora::Repository.register('http://user:pass@localhost:8080/fedora')
test_object = => 'honolulu', :contentModel => 'Image', :state => 'A', :ownerID => 'fedoraAdmin')
objects = repository.find_objects('label~Image*')
object = repository.fetch_content('demo:1')

(The last line will result in an error if a file with pid equal to demo:1 does not exist)

The following is a more elaborate example that will print some of the object’s fields and will also print some of the fields of the Dublin Core datastream:

require ‘ruby-fedora’
require ‘rexml/document’
#connect to the repository
repository = Fedora::Repository.register(‘http://fedoraAdmin:test@localhost:8080/fedora&#8217;)
#create a new object
test_object = => ‘blublub’, :contentModel => ‘Video’, :state => ‘A’, :ownerID => ‘fedoraAdmin’)
#save the object
#find objects with pid video* (e.g., “video:1” / “video:2” / … )
vids = repository.find_objects(‘pid~video*’) 
vids.each { |video|
  #print this item’s fields
  puts     + “\n******************\n”
  puts “create_date … ” + video.create_date.to_s      + “\n”
  puts “modified_date … ” + video.modified_date.to_s       + “\n”
  puts “state … ” + video.state.to_s     + “\n”
  puts “label … ” + video.label.to_s     + “\n”
  puts “owner_id … ” + video.owner_id.to_s     + “\n”
#  puts “profile … ” + video.profile.to_s     + “\n”
  #extract Dublin Core datastream
  xml_data = video.object_xml
  doc =
  root = doc.root
  dc_field = root.elements[“foxml:datastream[@ID=’DC’]/foxml:datastreamVersion/foxml:xmlContent/oai_dc:dc”]
  puts “\n” + dc_field.elements[“dc:identifier”].text + ” => ” + dc_field.elements[“dc:title”].text
require 'ruby-fedora'
require 'rexml/document'
#connect to the repository
repository = Fedora::Repository.register('http://fedoraAdmin:test@localhost:8080/fedora')
#create a new object
test_object = => 'blublub', :contentModel => 'Video', :state => 'A', :ownerID => 'fedoraAdmin')
#save the object
#find objects with pid video* (e.g., "video:1" / "video:2" / ... )
vids = repository.find_objects('pid~video*') 
vids.each { |video|
  #print this item's fields
  puts     + "\n******************\n"
  puts "create_date ... " + video.create_date.to_s      + "\n"
  puts "modified_date ... " + video.modified_date.to_s       + "\n"
  puts "state ... " + video.state.to_s     + "\n"
  puts "label ... " + video.label.to_s     + "\n"
  puts "owner_id ... " + video.owner_id.to_s     + "\n"
  #puts "profile ... " + video.profile.to_s     + "\n"
  #extract Dublin Core datastream
  xml_data = video.object_xml
  doc =
  root = doc.root
  dc_field = root.elements["foxml:datastream[@ID='DC']/foxml:datastreamVersion/foxml:xmlContent/oai_dc:dc"]
  puts "\n" + dc_field.elements["dc:identifier"].text + " => " + dc_field.elements["dc:title"].text

Connecting to Fedora with Java


(Duke image taken from here). The Fedora installation comes with a client demo illustrating some of the things you can do with the SOAP interface. It is located at Fedora_Home\client\demo\soapclient. To create a project in Eclipse with this code, start up Eclipse and add the following libraries to your build path: fedora_home/client/fedora-client-3.1.jar and everything in fedora_home/client/lib/. Add the file to your project and use the functions in it in your own code. For example, the following code will add an item to Fedora based from a FOXML file.



public static void addToFedora(String filename,long theID) {
DemoSOAPClient caller;
try {
	caller = new DemoSOAPClient("http", "localhost", 8080, "fedoraAdmin", "test");
	//RepositoryInfo repoinfo = caller.describeRepository();
	//delete item if it exists
	String purgeDate=null;
	try {
		purgeDate = caller.purgeObject(
			"id:"+theID, // the object pid
			"purge object", // an optional log message about the change
			 false);  // do not force changes that break ref integrity
	} catch (Exception e) {
		//System.out.println("Hack...just ignore failures since objects may not exist yet." + e.getMessage());

	//add the item to Fedora
	FileInputStream inStream=null;
	String ingestPID=null;
	File ingestFile=new File(filename);
	try {
		inStream=new FileInputStream(ingestFile);
	} catch (IOException ioe) {
		System.out.println("Error on ingest file inputstream: " + ioe.getMessage());
	System.out.println(" - ingest FoXML in Fedora");
	try {
	        ingestPID = caller.ingest(inStream, fedora.common.Constants.FOXML1_1.uri, "ingest of item");
	} catch (IOException ee1) {
		System.out.println("Error during ingest: "+ee1.getMessage());
	//System.out.println("Finished test ingest of sdef object: " + ingestPID);
} catch (Exception e1) {
	// TODO Auto-generated catch block

That’s it for this Fedora introduction!

May 24, 2009 at 2:34 pm 10 comments


Articles to be written…

Twitter – kr3l


RSS Google Reader Shared Stuff

  • An error has occurred; the feed is probably down. Try again later.

RSS Listening to..

  • An error has occurred; the feed is probably down. Try again later.