User login

Powering Drupal search with Apache Solr

After having spent some weeks in the US with my colleagues from Natick, Massachusetts, and Providence, Rhode Island, I am back in Germany now. We went to DrupalCon DC together which turned out to be an inspiring event for a Drupal newbie like me. One of the sessions I attended was More than search; how ApacheSolr changes the way you build sites by Jacob Singh.

What is Solr? Here's what the developers say:

Apache Solr is an open source enterprise search server based on the Lucene Java search library, with XML/HTTP and JSON APIs, hit highlighting, faceted search, caching, replication, a web administration interface and many more features. It runs in a Java servlet container such as Tomcat.

Jacob and others have made it available to Drupal through the Apache Solr Search Integration module. Thanks to the well done presentation I felt motivated to install it on one of our servers. It's actually very easy following the instructions on the module page. Encouraged by my first Solr search experience I decided to deploy it to the Tomcat application server that ships with Debian lenny...

There are also Solr packages in the current stable distribution which are based on Solr 1.2 but we need the development version for Drupal integration. If you happen to run Tomcat with SecurityManager activated on Debian the first start might be, er, challenging. Some hours reading Tomcat documentation, searching the internet and using the permissions policy file from the official Debian package; a Solr Tomcat web application is running that can serve different websites through virtual hosts. This is how I distributed Solr over the filesystem.

Libraries

stefan@server:/usr/local/share/solr% ls -l
total 24
drwxr-sr-x 2 root staff 4096 Mar 18 11:32 META-INF
drwxr-sr-x 3 root staff 4096 Mar 18 11:32 WEB-INF
drwxr-sr-x 3 root staff 4096 Mar 18 11:32 admin
-rw-r--r-- 1 root staff 1146 Mar 18 11:32 favicon.ico
-rw-r--r-- 1 root staff 1694 Mar 18 11:32 index.jsp
drwxr-sr-x 2 root staff 4096 Mar 18 13:11 scripts

In case you wonder, I renamed the bin directory to scripts like the Debian developers did.

Virtual host configuration

server:/etc/tomcat5.5/Catalina# cat sciencecollaboration.agariclabs.com/solr.xml
<!--
    Context configuration file for the Solr Web App
-->

<Context path="/solr" docBase="/usr/local/share/solr"
   debug="0" privileged="true" allowLinking="true" crossContext="true">
  <!-- make symlinks work in Tomcat 5 -->
  <Resources className="org.apache.naming.resources.FileDirContext" allowLinking="true" />

  <Environment name="solr/home" type="java.lang.String" value="/var/lib/solr/sciencecollaboration" override="true" />
</Context>

And the corresponding server.xml snippet:

<Host name="sciencecollaboration.agariclabs.com" appBase="webapps"
       unpackWARs="true" autoDeploy="true"
       xmlValidation="false" xmlNamespaceAware="false">

</Host>

Most important is setting the solr/home environment variable to a different folder for every virtual host. We don't want our different instances overwrite each others data directories.

solr/home

server:/var/lib/solr# ls -l
total 8
drwxr-x--- 3 tomcat55 root 4096 Mar 21 08:51 localhost
drwxr-x--- 3 tomcat55 root 4096 Mar 21 08:51 sciencecollaboration
server:/var/lib/solr# ls -l sciencecollaboration/
total 4
lrwxrwxrwx 1 tomcat55 root   30 Mar 21 09:35 bin -> /usr/local/share/solr/scripts/
lrwxrwxrwx 1 tomcat55 root   15 Mar 21 09:35 conf -> /etc/solr/conf/
drwxr-x--- 5 tomcat55 root 4096 Mar 21 09:21 data

The application specific configuration files are stored in /etc/solr/conf. That's the place to put schema.xml and solrconfig.xml bundled with the Drupal module.

I am not sure whether the layout conforms to the file system hierarchy standard or not. The beauty of it is that you just have to create a new virtual host with a different solr/home variable to serve another Drupal site.

SecurityManager permissions

Finally Solr wants some permissions and Debian's SecurityManager configuration is tight. I started with the distro's version modified some FilePermissions to reflect the changes I made to the directory layout, and after running into some more Java Security Exceptions added the last two entries:

grant codeBase "file:/usr/local/share/solr/-" {
  permission java.lang.RuntimePermission "modifyThread";
  permission java.lang.RuntimePermission "accessClassInPackage.org.apache.tomcat.util.http";
  permission java.util.PropertyPermission "java.io.tmpdir", "read";
  permission java.util.PropertyPermission "user.dir", "read";
  permission java.util.PropertyPermission "solr.*", "read";
  permission java.util.PropertyPermission "org.apache.lucene.lockDir", "read,write";
  permission java.util.PropertyPermission "org.apache.lucene.store.FSDirectoryLockFactoryClass", "read";
  permission java.io.FilePermission "/usr/share/java/-", "read";
  permission java.io.FilePermission "/var/log/tomcat5.5/-", "read,write";
  permission java.io.FilePermission "/var/lib/tomcat5.5/webapps/solr/-", "read";
  permission java.io.FilePermission "/var/lib/tomcat5.5/temp/-", "read,write";
  permission java.io.FilePermission "/etc/solr/-", "read";
  permission java.io.FilePermission "/usr/local/share/solr/-", "read";
  permission java.io.FilePermission "/usr/local/share/solr", "read";
  permission java.io.FilePermission "/var/lib/solr/-", "read,write,delete";
  permission java.util.PropertyPermission "javax.management.MBeanServer", "read,write";
  permission javax.management.MBeanServerPermission "*";
};

Comments

help with apache solr search integration

how to configure apache solr search integration in drupal 6.please explain step-step.if possible provide me screnncasts.thank you.

Post new comment

The content of this field is kept private and will not be shown publicly.
  • You may post code using <code>...</code> (generic) or <?php ... ?> (highlighted PHP) tags.
  • You can use Markdown syntax to format and style the text. Also see Markdown Extra for tables, footnotes, and more.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <img> <blockquote> <small> <h2> <h3> <h4> <h5> <h6> <sub> <sup> <p> <br> <strike> <table> <tr> <td> <thead> <th> <tbody> <tt> <output>
  • Lines and paragraphs break automatically.

More information about formatting options

By submitting this form, you accept the Mollom privacy policy.