User Lexer Sample Code
What's a User Lexer?Oracle Text allows you to provide your own "plug-in" modules at various points in the indexing chain. For example, you can provide a user datastore, a user filter, and now (as of 9.2) a user lexer.
So what's a lexer?The lexer is the component responsible for splitting text into individual words, or tokens. It is also responsible for processing compound words in languages such as German and Dutch, where the word "redsportscar" might need to be indexed as "red", "sports" and "car".
Why C?The lexer has a lot of work to do, and gets called for every row in the table, as well as every query. Experience has shown that neither PL/SQL nor Java are really fast enough for this task. So it has to be implemented as an external procedure in C.
What is this demoThis demo provides a simple user lexer which breaks words into whitespace-delimited tokens, and upper-cases them. This is basically the same as the default English lexer in Oracle Text. It is designed as a "shell" into which users can fit their own special language processing requirements.
How do I install it?See the comments at the top of
Instructions are currently provided for Windows, Unix instructions to follow.
After compiling and installing the C dynamic linked library, you should run
Note the C code currently writes debugging information to a file "C:\debug.txt". You may want to change this, or remove references to it completely.
What are the limitations?
Is it supported?No. This is SAMPLE CODE, which is not supported. You can request help from Oracle Support on issues which arise from using this code, but you cannot expect them to debug problems with the code. If you email the author at email@example.com, I will do my best to help you, but can't promise any specific level of support.
Is it tested?Yes, but only on a single machine at present, and only with the data as included in user_lexer.sql. Please send any feedback to the author