Automatic Cube Generation

Jethro auto cubes brings the great performance value of pre-aggregated cubes, combined with the ability to automatically generate and transparently maintain those cubes.
Cubes can be useful to accelerate performance for certain sets of queries, mainly queries that are not highly filtered and would normally scan large parts of the data set. Jethro optimizer combines cubes with indexing and caching technology to ensure interactive queries response time on wide range of use cases.

Cubes can be generated in two modes:

  • Auto-generation, based on queries sent to the server.
  • Manual cube generation for a given query, by using the GENERATE CUBES command.

Auto cube generation process is transparent to the user. Jethro intercepts incoming queries and automatically generates the appropriate cubes. The initial cube generation phase for BI dashboard is called training phase. Training phase for a new dashboard usually takes anywhere between few minutes to an hour.

Requirements and Initial Setup

Because cube generation can be a resource-consuming process, it is advisable to allocate a designated Jethro node for this task. If a designated node cannot be allocated, it is recommended turn on cube auto-generation only when business users are not accessing the system.

JethroMaint service is responsible for intercepting incoming queries and sending cube generation request to the designated cube generation execution server. By default, the cube generation host is 'auto', which means 'localhost' (and the default port is identified automatically by the system). To change server address or port, set the following parameter: dynamic.aggregation.auto.generate.execution.hosts.

For example, if the cube generation node DNS is 'cubes.gen', and the JethroServer is running on port 9112, run the following set global command from SQL client to connect to the instance:
set global dynamic.aggregation.auto.generate.execution.hosts=cubes.gen:9112;

Cubes are persistently stored at the instance storage location (HDFS, NFS, or local file system). When the total cubes' size exceeds the maximum storage size allocated for cubes, unused cubes are automatically dropped. The default storage size is set to 100GB. To set a different size, run the following set global command from SQL client to connect to the instance:

set global adaptive.cache.quota=<Required size in bytes>

Turn on Automatic Cube Generation

By default, auto cube generation is turned off. The turn cube generation state (on/off) is controlled by the following parameter :
dynamic.aggregation.auto.generate.enable.
Use set global to enable/disable auto cube generation.

To enable auto cube generation, set the value to 1:
set global dynamic.aggregation.auto.generate.enable=1;

To disable auto cube generation, set the value to 0:
set global dynamic.aggregation.auto.generate.enable=0;

Manual Cube Generation

Cubes can be manually created from a query, by running the GENERATE CUBES command, which generates one or more cubes from a given query. The GENERATE CUBES command also allows generating cubes with a WHERE statement. For further information, see Generate Cubes under SQL Reference.

Cubes Management

To display all generated cubes, use:

SHOW CUBES

To remove all cubes from the cube repository, use:

DROP CUBES

When new data is loaded, cubes are automatically updated in the background. The maint service is responsible for updating cubes by sending the following command to the cube generation server. Cubes can also be updated manually by running the command:

UPDATE CUBE BY KEY