Jeroen Reijn


on Tech, Open Source and software development

Metadata extraction with Apache Tika

At Hippo I work with/for customers that have quite a lot of content. The projects I work on have content in the range of 5.000 to 500.000 document gathered in one content repository. This can be just textual content, but most of the time this is a variety of different content types. You might think of images, PDFs and Microsoft office document formats. By default Apache JackRabbit, the layer underneath Hippo Repository, indexes this kind of content by using extractors, so that the information can be found within the Hippo CMS 7 search or from any application connected to the Hippo Repository which is performing a search on the content repository. Being able to search on content found within a file is interesting, but there is so much more that you can do with this kind of information.

Continue reading »


Jboss ModeShape: A federating JCR repository

Some interesting stuff is happing in the JCR community. With Apache Jackrabbit 2.0.0 out (with JCR 2.0) and an interesting project called Jboss ModeShape almost reaching it’s final 1.0 release. ModeShape recently came to my attention and it seems an interesting project. In this post I will give a short introduction of ModeShape and it’s features.

Continue reading »


Creating an IntelliJ launcher on Ubuntu 9.10

Over the last couple of months I’ve slowly switched from Eclipse to IntelliJ 9 as my main IDE for Java development. After having used Eclipse for more then 5 years I got pointed to IntelliJ by friends from JTEAM, that I’m working with at one of my projects. They challenged me to start using IntelliJ, because I would eventually be impressed and would never want to switch back.

Continue reading »


Content mangement and the semantic web

I came across the term ‘semantic web’ a couple of years ago, when one of the original creators of Apache Cocoon went of to work on the SIMILE Project at MIT. I didn’t pay much attention to the concept of ‘semantic web’ back then, because I just started learning Apache Cocoon and still had a lot to learn.But over the last couple of months I’ve been doing some research on the currently available standards for providing semantic data on the web with a strong focus on RDFa.

Continue reading »


Apache Cocoon and Javascript minification

A couple of days ago somebody on the Apache Cocoon user list send a message to the mailing-list about on the fly minification of for instance Javascript files. This topic has been quite popular over the past years, since web application have become richer and Javascript files have become larger.

Continue reading »