Using Lucene syntax in Jackrabbit SQL queries Part 1 - Exact matches

A couple of up-front warnings:
  1. SQL queries are not part of the JCR 170 spec. This means that you're tied to the Jackrabbit implementation of JCR 170 if you decide to use SQL syntax to query the repository.
  2. The guys from Jackrabbit warn that it is possible the current SQL query handler implementation which allows piping of Lucene syntax may change in the future if they move to a different index implementation.

Okedoke. Now for the story.
Let's assume we're searching for the two words 'Ooga' and 'Booga' on the property 'name' on every node. A few scenarios I can think of are:
  1. Exact matches (name contains 'Ooga Booga') ; variations could be a case insensitive search.
  2. Proximity matches ('Ooga' occurs within 3 words of 'Booga')
  3. General search (either 'Ooga' and/or 'Booga' occur anywhere in name in any order)

Lucene syntax is used in Jackrabbit through the contains keyword. Here is what the SQL would look like...

1. Exact Matches
select * from nt:base where contains(name, 'Ooga Booga')
Matches returned would be:
name = Ooga Booga
name = ooga booga
name = booga ooga
name = the big Ooga booga
name = the big booga ooga

select * from nt:base where contains(name, '"Ooga Booga"')
Matches returned would be:
name = Ooga Booga
name = ooga booga
name = the big Ooga booga

Failures would be:
name = booga ooga
name = the big booga ooga

select * from nt:base where name='Ooga Booga'
Matches returned would be:
name = Ooga Booga

Failures would be:
name = ooga booga
name = the big Ooga booga
name = booga ooga
name = the big booga ooga

to be continued...

1 comment:

Sridhar Raman said...

I am trying to change the analyzer that Lucene would use for these searches. But haven't met with any success. I changed the required settings in the repository.xml, and the workspace.xml file. But I don't seem to be getting the required output.

Have you tried it, and has it worked successfully?

sridhar.raman@gmail.com