Coherence Queries, Indexes and Explain Plans

Overview

    Purpose

    Prior to Coherence 3.6 developers were required to perform queries against caches using a filtering mechanism which, while concise and efficient,  required a fair amount of developer effort to code correctly.  Coherence 3.6, introduced the Coherence Query Language or CohQL, a lightweight SQL like language for performing select type operations on caches.  Coherence 3.7 introduced additional functionality to support selections but focused on performance.  The purpose of this tutorial is to examine and exercise:

    • Filters - How can we select a specific set of data from a cache based on a given criteria?
    • Coherence Query Language - How can we simplify Filters to use a select like syntax
    • Explain Plan - How can we estimate the cost of a filter or select?
    • Trace - How can we evaluate the actual cost of a filter or select?

    Time to Complete

    Approximately 45 minutes 

    Introduction

    The Coherence Query Language or CohQL allows developers to create complex SQL like queries against cached data. Queries are performed in one of two ways:

    • Using Filters - Filters, based on the com.tangosol.util.Filter class, allow the development of queries using combinations of value extractors and combinations of and, or, not, like and similar filters to perform queries. Filters are then executed against a cache and return a result set.
    • Using Queries - Queries use and SQL like syntax and the import com.tangosol.util.QueryHelper class to create a Filter instance. Behind the scenes QueryHelper creates a set of Filters which match the provided select statement.

    Filters are then executed against a cache which performs the operation on each member of the cluster, aggregates the results and returns it. Filter performance can then be improved by the thoughtful use of indexes. Indexes provide an identical function to their database cousins allowing Coherence to more quickly return data.  However indices are not without cost, requiring memory and processing power to keep up to date and are directly dependent on their entity objects for data extraction. Entities may or may not efficiently extract data and as a result, keeping an index up to date may not be worth the time saved in a query.These are but a few of the concerns about why you may or may not apply a index to an entity field.

    Prior to Coherence 3.7.1 there was no way to determine the performance of a query other then timing it in code or by hand, neither of which being optimal or even convenient.  In addition it was impossible to determine which portions of the query had the most impact. For example did a specific and filter have more impact on the query performance then a like filter? Would an index on one of the fields changed the performance? 

    Coherence 3.7.1 introduced two new features to support static and dynamic evaluation of query performance.  These two features are:

    • Explain plans - a static analysis of a query producing an estimated cost for the query
    • trace - a dynamic analysis of the query, producing an actual effectiveness rating for a query.

    Using explain plans and trace statements developers can examine how queries are executed, in what order clauses are executed and the effect of a given index.  Both of these features can be executed from both the command line, or from within an applications code. Providing an easy to use robust mechanism for examining the performance of a given query.

    The purpose of this tutorial is to:

    • Use basic Filter and QueryHelper classes to query a cache.
    • Estimate the performance of a query using explain plans
    • Evaluate performance of a query and the impact of filters
    • Explore queries using the Coherence query application

    Hardware and Software Requirements

    The following is a list of hardware and software requirements:

    • Coherence 3.7.1, which can be downloaded here.
    • Coherence CQL sample code, which can be downloaded here.

    Prerequisites

    Before starting this tutorial, you should: 

    • Have a basic working knowledge of Linux commands and command line syntax
    • Have access to and have already installed Coherence 3.7.1
    • Have downloaded and unpacked the OBE sample code. This code is designed to run on Linux and similar operating systems but can be adapted to Windows if required.
    • Have downloaded and installed an Oracle Java JDK version 1.6.0_26 or later. The Oracle JDK kits and associated products can be found here.
      The installation of Java is outside the scope of this OBE. For more information, see the JDK download and installation pages. 
    • Have a working knowledge of Java annotations.

    Before running this tutorial, you should download and install Oracle Coherence 3.7.1 and the associated example source code. The following steps review the installation process of Coherence 3.7.1, as well as the expected installation locations.  Note that if you have previously installed Coherence 3.7.1 you may skip this step.

      Using the link above, accept the license agreement and download Coherence 3.7.1.
      Note that for this tutorial, it is assumed that files are downloaded to/tmp/.  and installed into /opt.

      Open a command prompt and change directory to the location where Oracle Coherence will be installed.

      Unzip the contents of Coherence zip using a command that is similar to:

      unzip /tmp/coherence-java-3.7.1...zip 

      Note that the exact name of the file may vary.

      For this tutorial, all files will be unzipped to /opt/coherence


      Unzip the contents of the OBE example code using a command similar to:
      unzip /tmp/CQL.OBE.src.zip


Query a Coherence Cache using Filters and Selects

    Filters

    Coherence uses a filter mechanism to query cache data. Filters are similar to query statements in that a collection of operations, like, equals, greater, less and others, are combined with a set of logical operations to return a specific set of data in a cache.  Filters are then used with the NamedCache.entrySet method to return results.

    Examples of Filters might include (all of which can be found in the com.tangosol.util.filter.* package):

    • AndFilter - Returns the logical "and" of two other filters.
    • OrFilter - Returns the logical "or" of two other filters.
    • LikeFilter - Compares a value for pattern match. A pattern can include regular characters and wildcard characters '_' and '%'.
    • EqualsFilter - Compares a value with a constant for equality

    These examples represent only a small subset of the filters available, see the Oracle Coherence documentation for a complete list of filters.

    Some example filters might include, assuming a Person object with age, first name and last name attributes:
    • Filter greaterThen = new GreaterFilter( new ReflectionExtractor("getAge"), 16);
    • Filter lessThen = new LessFilter(new ReflectionExtractor("getAge"), 50));

    Combining these two filters using and might include:
    • Filter and = new AndFilter( greaterThen,lessThen);

    Example

    Assuming:
          NamedCache cache = . . .;

    A query would then be performed using calls similar to:
       cache.entrySet(filter,null);


    Note that the 2nd argument, null, represents a comparator which would be used to sort the result.

    A note on field/data access in objects: In order to access data in a object, some form of 'extractor' must be used.
    In the examples used in this tutorial a ReflectionExtractor is used. The reflection extractor takes a method name, and applies it to a object to return a value.
    Assuming an appropriate cache object similar with methods font face="Courier New, Courier, monospace">getAge, getFirstName, and so forth, a reflection extractor can be specified which uses something such as: ReflectionExtractor("getAge") to return the value of a age field from an object. For more information on extractors see the Extractors package page.

    Configure the environment

      A variety of scripts have been provided to simplify the testing of POF.  These scripts depend on certain environment variable settings.
      Appropriate defaults have been provided for these variable, which if correct need not be changed.

      The scripts assume:
      • Coherence has been installed into /opt/coherence
      • The OBE source has been unzipped to /opt/CQL.OBE.

      If either of these values is different follow the instructions below.

      In a command prompt window ensure that you are in the directory where the OBE source was unpacked.

      cd /opt/{obe source directory}

      In your favorite editor open the bin/set-env.sh script.

      gedit bin/set-env.sh

      Near the top are two export statements specifying the locations of both the OBE source and Coherence.
      Un-comment th two variables and set them to the appropriate values for your environment.

      Note that if the default values of /opt/coherence and /opt/CQL.OBE are used no changes are required.

      #!/bin/bash

      export OBE_HOME=/opt/my.cql.obe.location
      export COH_HOME=/opt/my.coherence.location

      Save your changes, and exit the editor

    Modify a class to support a Filter.


      This tutorial provides two sample entity object classes which are used as examples

      • example.entity.Person.java - a representation of a person, defining characteristics of such as name, sex, age, and address
      • example.entity.Address.java - a representation of an address, including street, city and state

      These two classes implement java Serializable by default and can be used with Coherence caches without change. Additionally the Person class implements the Java Comparable interface and sorts results on last name when used in a query.

      Additionally several classes are provided which can be used support generation of test data and as a starting point for developing queries:
      • example.test.RunQuery.java - This class provides some boiler plate code for accessing a cache and will be modified over the course of this tutorial.
      • example.simulation.Driver and Simulation- these two classes are used to simulation cache interaction and add, modify and delete data as they run from the 'dist-person' cache.

      During the following steps students will modify an existing class to perform a query, start a simulation, and then execute the query.


      Specifically students will define a filter which returns all Person objects which have a first name which starts with 'Al' and have age between 17 and 49 inclusive.
      Open a command prompt and chance directory where the OBE source was unpacked.

      Using gedit open the src/java/example/RunQuery.java class

      gedit src/java/example/RunQuery.java class

      Note that a completed version of this section can be found in: {OBE_HOME}/completed/1.RunQuery.java.
      Find the getFilter method which resembles:

      public Filter getFilter() {
          /*
           * Enter filter here
           */
          return null;
      }

      The filter is constructed of  two numeric Filters,  and an all filter which together result in the statement described above.
      Directly below the comment enter greater and less filters resembling:

      Filter lower = new GreaterFilter(new ReflectionExtractor("getAge"), 16);
      Filter upper = new LessFilter(new ReflectionExtractor("getAge"), 50);


      The partially completed code should resemble:

      Note that the required imports have been added for you.  All filters can be found in the com.tangosol.util.filter.* package.
      Beneath the second filter statement add an and statement, combining the age limits.  The new statement should resemble:

      Filter and = new AndFilter(lower,upper);

      Beneath the and filter, add a statement to filter by name.  The statement should resemble:

      Filter name = new LikeFilter("getFirstName", "Al%");

      An AllFilter can be used to combine the entire result into a single statement.  Add an AllFilter and return the final result from the method. The statements should resemble:

      Filter all =  new AllFilter(new Filter[] { and,name });
      return all;

      The completed method should resemble:

      Save your changes and exit the editor.

    Build and test the Filters

      Tests are built and run using several scripts, the primary purpose of these scripts is to provide an environment for development and to ease testing issues such as correct classpath and related concerns.
      • bin/build.sh-         Recompiles and repackages all source code.
      • bin/start-sim.sh-Starts a simulation which generates and adds Person objects to a cache, then randomly changes the objects over time.
      • bin/stop-sim.sh-  Stops the simulation scripts.
      • bin/check-sim.sh-Reports whether the PIDs associated with simulation processes are running..
      • bin/runquery.sh  -Runs the query test application.
      In the command window rebuild the sources using the provided build script. Correct any compile errors.

      bin/build.sh

      The command should produce output similar to:
      Start the simulation using a command similar to:

      bin/start-sim.sh

      The command should produce output similar to:
      Give the simulation at least one minute or more to generate data and then examine the output of the started simulation for errors using a command similar to:

      tail -f sim.0.log

      The output should resemble that shown below. Note that the simulation will randomly generate Person objects so the exact content will vary.

      Run the updated query application.

      bin/runquery.sh > query.log 2>&1


      If the query has been correctly specified the command should produce output similar to those shown below.  Use less, gedit or a similar tool to examine the results:

    Modify query to use CohQL.

      The Coherence query language, or CQL, is a SQL like language which can be used to access cache data using select style statements.
      Depending on how CQL is used you might be supplying an entire select statement or only a where clause.  Additionally CQL supports filling in variables and values at run time to avoid hard coding queries.

      For example, when using the Coherence command line tool, an entire statement including select, from and where is required. However, in an application, given a specific NamedCache instance, only the where portion of the select is required. For this section we focus on application use only..

      Primary components of query are:
      • select - select {field set} from "{cache}" where {where clause}.  
      • QueryHelper - com.tangosol.util.QueryHelper, a class which provides a set of factory methods which can be used to generate instances of Filter objects.
        For more information on QueryHelper see here.

      Open the query application.

      gedit src/java/example/RunQuery.java class

      Note that a completed version of this section can be found in:

      completed/2.RunQuery.java

      Find the previously updated getFilter method and delete the contents.
      Create a where clause statement to search for any Person objects with first name starting with Al and age > 16 and < 50.

      The updated method should resemble:

      public Filter getFilter() {
          String where = "firstName like 'Al%' and (age > 16 and age < 50)";

      }

      Complete the method by adding statement to return a filter based on the where clause using QueryHelper.
      Note that the appropriate import com.tangosol.util.QueryHelper has been added for you.

      Filter filter = QueryHelper.createFilter(where); return filter;


      The completed method should resemble:
      Save your changes and exit the editor
      In the command window rebuild the sources using the provided build script.

      bin/build.sh

      Correct any compiler errors if required.
      [optional] Confirm that the simulation is still running.
      The simulation uses a set of configuration files which are define how the simulation runs, including how long to run.
      By default the simulation runs for 60 minutes.  You can confirm that the simulation is running using the check-sim.sh script which examines the set of know PIDs for the simulation and checks that they are executing.  To execute the script enter:

      bin/check-sim.sh


      Which will produce output similar to:



      If the simulations have completed the command will produce output similar to:



      In which case simply restart the simulation using the bin/start-sim.sh command
      Run the query application.

      bin/runquery.sh > query.log 2>&1


      If the query has been correctly specified the command should produce results virtually identical to the past section.
      Remember that data is generated and constantly changing so the results may not match exactly.

Working with Explain plan and Trace Statements

    Explain Plan and Trace

    Applications can profile queries using the com.tangosol.util.aggregator.QueryRecorder class, documentation for which can be found here

    The process works by creating a QueryRecorder instance and then is submitted to a cache for execution using the NamedCache.aggregate(filter,agent) method. 

    The method examines the provided filter on all members of the cluster and returns a QueryResult record, which can be examined to understand how the query performed.

    Example

    Query Recorders are instantiated for explain or trace by creating an instance based on a QueryRecorder.RecordType, and passing the instance to the cache for execution.

    In general you need:

    Imports:

    import com.tangosol.util.aggregator.QueryRecorder;
    import static com.tangosol.util.aggregator.QueryRecorder.RecordType;


    Note the use of static above to avoid the need to fully qualify RecordType later.

     Create and execute the recorder:

    QueryRecorder agent = new QueryRecorder(RecordType.EXPLAIN);
    Object resultsExplain = cache.aggregate(filter, agent);

    Output the results:

    System.out.println("\n" + resultsExplain + "\n");


    Examine and evaluate an Explain Plan:
    Name                                  Index        Cost      
    ======================================================================================
    com.tangosol.util.filter.AndFilter  | ----       | 0         
      LikeFilter(.getFirstName(), Al%)  | 0          | 2300000         
      com.tangosol.util.filter.AndFilte | ----       | 0         
        GreaterFilter(.getAge(), 16)    | 1          | 2300000         
        LessFilter(.getAge(), 50)       | 1          | 2300000         
    
    
    Index Lookups
    Index  Description                               Extractor             Ordered   
    ======================================================================================
    0      No index found                            .getFirstName()       false
    1      No index found                            .getAge()             false 


    The output can then be evaluated as follows:

    • Name – This column shows the name of each filter in the query. Composite filters show information for each of the filters within the composite filter.

    • Index – This column shows whether or not an index can be used with the given filter. If an index is found, the number shown corresponds to the index number on the Index Lookups table shown beneath the cost table.
    • Cost – This column shows an estimated cost of applying the filter.
      If an index can be used, the cost is given as 1 becayse the operation of applying the index requires just a single access to the index content.

      In the example, there are 3 storage-enabled cluster members and thus the cost reflects accessing the index on each of the three members. If no index exists, the cost is calculated as EVAL_COST * number of keys where EVAL_COST value is a constant value= 1000. Each row is intended to show the relative cost of doing a full scan to reduce the key set using the filter.

      In the example, there are ~1500 cache entries which need to be evaluated. Querying indexed entries is always relatively inexpensive as compared to non-indexed entries but does not necessarily guarantee effectiveness.


    Examine and evaluate a Query Trace: 

    Trace differs from Explain Plans in that the query is run and timed for the provided dataset and each index evaluated for a relative effectiveness.

    When using trace we simply create the recorder using RecordType.TRACE rather then RecordType.EXPLAIN as shown below:

    QueryRecorder agent = new QueryRecorder(RecordType.TRACE)

    However the resulting report is considerably different.
    Trace
    Name                                  Index        Effectiveness          Duration  
    ======================================================================================
    com.tangosol.util.filter.AndFilter  | ----       | 12852|1386(89%)      | 134       
      LikeFilter(.getFirstName(), Al%)  | 0          | 12852|2142(83%)      | 85        
      com.tangosol.util.filter.AndFilte | ----       | 2142|1386(35%)       | 42        
        GreaterFilter(.getAge(), 16)    | 1          | 2142|2058(3%)        | 17        
        LessFilter(.getAge(), 50)       | 1          | 2058|1386(32%)       | 24        
    
    
    Index Lookups
    Index  Description                               Extractor             Ordered   
    ======================================================================================
    0      SimpleMapIndex: Extractor=.getFirstName(  .getFirstName()       true
    1      SimpleMapIndex: Extractor=.getAge(), Ord  .getAge()             true
    
    ++++++++
    
    Trace output can be interpreted as follows:
        
    • Name – This column shows the name of each filter in the query. Composite filters show information for each of the filters within the composite filter.
    • Index – This column shows whether or not an index can be used with the given filter. If an index is found, the number shown corresponds to the index number on the Index Lookups table.
    • Effectiveness – This column shows the amount a key set was actually reduced as a result of each filter. The value is given as prefilter key set size | postfilter key set_size and is also presented as a percentage. The prefilter key set size value represents the key set size prior to evaluating the filter or applying an index. The post filter key set size value represents the size of the key set remaining after evaluating the filter or applying an index. For a composite filter entry, the value is the overall results for its contained filters. Once a key set size can no longer be reduced based on an index, the resulting key set is deserialized and any non index filters are applied.
    • Duration – This column shows the number of milliseconds spent evaluating the filter or applying an index. A value of 0 indicates that the time registered was below the reporting threshold.

    Using Explain Plans


      In this section we will modify the RunQuery.java class to examine and test a query using Explain plans to determine the cost of queries and associated indexes.
      Open the query application.

      gedit src/java/example/RunQuery.java class

      Note that a completed version of this section can be found in:

      completed/3.RunQuery.java

      Find the executeQuery method which resembles:

      /**
        * Execute the provided query against the named cache.
        */
      public void executeQuery() {

          dumpReturnedEntries(getNamedCache(), getFilter());

          /*
           * Enter trace and/or explain plan code here.
           */
      }

      Immediately following the comment add  to obtain an instance of the cache and filter used in the query. 

      Note: The getNamedCache() and getFilter() were provided/previously created.

      The additional code should resemble:

      NamedCache cache = getNamedCache();
      Filter filter = getFilter();


      Add code to obtain an instance of a query recorder and execute it,

      The additional code should resemble:

      QueryRecorder agent = new QueryRecorder(RecordType.EXPLAIN);
      Object results = cache.aggregate(filter, agent);


      Note: The required imports have been added for you and are shown below.

      import com.tangosol.util.aggregator.QueryRecorder;
      import static com.tangosol.util.aggregator.QueryRecorder.RecordType;

      Lastly add code to print the results:  The statement should resemble:

      System.out.println("\n" + results + " \n");

      The completed method should resemble:
      Build and execute the updates.
      Check that the sim is running

      bin/check-sim.sh

      Build the code.

      bin/build.sh

      Correct any compiler errors  and run the query.

      bin/runquery.sh > query.log 2>&1

      Using gedit or less examine the output, search for the Explain Plan

      The result should resemble:

    Add indices and examine results


      In the last section we used an explain plan to examine the cost of a query. Indices can be used to improve performance. In this section we examine a query after adding indexes to various values.
      Open the query application.

      gedit src/java/example/RunQuery.java class

      Note that a completed version of this section can be found in:

      completed/4.RunQuery.java

      Find the executeQuery method and inside it, before the QueryRecorder add code to index the Age and FirstName fields.
      The new code should resemble:

      cache.addIndex(new ReflectionExtractor("getAge"), true, null);
      cache.addIndex(new ReflectionExtractor("getFirstName"), true, null);

      [optional] Add code to remove the indexes.

      Following the System.out line, add code to remove the indexes,  the remove index code should resemble:

      cache.removeindex(new="" reflectionextractor("getfirstname"));
      cache.removeIndex(new ReflectionExtractor("getAge"));

      The updated method should resemble:


      Save the changes and exit the editor.
      Build and execute the updates
      Check that the sim is running

      bin/check-sim.sh

      Build the code.

      bin/build.sh

      Correct any compiler errors and run the query.

      bin/runquery.sh > query.log 2>&1

      Examine the results.

      Using gedit or less examine the output, search for the Explain Plan.   Note that the cost of the query has dropped dramatically.

      The result should resemble:

    Using Trace vs Explain Plan


      In this section we will modify the RunQuery.java class to determine the relative effectiveness of a index.
      Open the query application.

      gedit src/java/example/RunQuery.java class

      Note that a completed version of this section can be found in:

      completed/5.RunQuery.java

      Find the executeQuery method and change the query recorder to type RecordType.TRACE
      When complete the method should resemble:

      QueryRecorder agent = new QueryRecorder(RecordType.TRACE);

      Save your changes and exit the editor.
      Build and execute the updates.
      Check that the sim is running, ensure that the sim has been running for a few minutes.

      bin/check-sim.sh

      Build the code.

      bin/build.sh

      Correct any compiler errors  and run the query.

      bin/runquery.sh > query.log 2>&1

      Examine the output of the TRACE, an example of which is shown below.

      Question: How effective is the use of the age index?
      [Optional] Remove the age index. Re-build and re-run the query. What was the result?

Using the Coherence Query Command Line Tool

    Coherence provides a mechanism for querying data from the command line.  The class com.tangosol.coherence.dslquery.QueryPlus implements much of the functionality normally used in querying a database.
    QueryPlus supports:
    • select - select from a cache
    • insert - insert an element into a cache
    • delete - delete an element from a cache
    • create index - create an index on a cache
    • drop index - Remove an index from a cache
    • And others
    • For a complete list of supported commands and the syntax of each see here.
    In the command prompt start the query plus application using the provided bin/querytool.sh

    bin/querytool.sh

    Note: Use the bin/check-sim.sh script if needed to ensure the simulation is running
    You should be presented with a CohQL> prompt similar to that shown below:
     

    Note:
    Command line editing is enabled, so you can use the arrow keys to retrieve or edit prior commands.
    At the CohQL> prompt enter:

    select * from "dist-person" [CR]

    Note: [CR] represents a carriage return and is not to be entered.

    A set of Person objects will be returned.
    Select only first name, last name and age from the cache

    select FirstName, LastName, Age from "dist-person" 

    Add a where clause for age > 15 and < 50 and name like Al

    select FirstName,LastName,Age from "dist-person" where age > 15 and age < 50 and FirstName like "Al%"

    Using the trace command trace the execution

    trace select FirstName,LastName,Age from "dist-person" where age > 15 and age < 50 and FirstName like "Al%"

    Using the explain plan for command to examine the execution of the command

    explain plan for select FirstName,LastName,Age from "dist-person" where age > 15 and age < 50 and FirstName like "Al%"

    Using the create index add an index on first name

    create index "dist-person" FirstName;

    Rerun the trace and explain plan for commands and evaluate the addition of the index

    trace select FirstName,LastName,Age from "dist-person" where age > 15 and age < 50 and FirstName like "Al%"

    explain plan for select FirstName,LastName,Age from "dist-person" where age > 15 and age < 50 and FirstName like "Al%

    Using the drop index drop theindex on first name

    drop index "dist-person" FirstName

    Exit Query Plus

    quit

Summary

    In this tutorial, you have learned:

    • Use basic Filter and QueryHelper classes to query a cache.
    • Estimate the performance of a query using Explain Plans
    • Evaluate performance of a query and the impact of filters
    • Explore queries using the Coherence query application

    Resources

    • The Oracle Coherence 3.7.1 documentation, which is found here
    • Various Oracle Coherence Development and Administration classes and advance training that can be found here
    • To learn more about Oracle Coherence, additional OBEs in the Oracle Learning Library

    Credits

    • Lead Curriculum Developer: Al Saganich
    • Other Contributors: Jason Howes, Tim Middleton, Noah Arliss, and others

To help navigate this Oracle by Example, note the following:

Hiding Header Buttons:
Click the Title to hide the buttons in the header. To show the buttons again, simply click the Title again.
Topic List Button:
A list of all the topics. Click one of the topics to navigate to that section.
Expand/Collapse All Topics:
To show/hide all the detail for all the sections. By default, all topics are collapsed
Show/Hide All Images:
To show/hide all the screenshots. By default, all images are displayed.
Print:
To print the content. The content currently displayed or hidden will be printed.

To navigate to a particular section in this tutorial, select the topic from the list.