Loading Data Guide

Creating Tables

Before loading data, you need to create tables in Jethro.
To create tables in Jethro:

  1. Start JethroClient by running the command:

    JethroClient <instance name> <host>:<port> -p <password>

    For example:

    JethroClient demo localhost:9111 -p jethro
  2. Copy the create table command from the *.ddl file previously created on the chapter Analyzing Data and drop it at the command prompt.
    Alternatively, you can run JethroClient with the -i option, specifying the *.ddl file of the table.
    In the example mentioned above:

    JethroClient demo localhost:9111 -p jethro -i sales_demo.ddl

    The tables created will only contain data types supported by Jethro. The supported data types are:
    INT | BIGINT | FLOAT | DOUBLE | STRING | TIMESTAMP

Loading Data to Tables

Once the tables are created, you can start loading data.
The Jethro loader utility JethroLoader allows fast and efficient data loading from text files into a Jethro table. JethroLoader parses its input into rows and loads them into an existing Jethro table.
To load data to a table, run the command:

JethroLoader <instance name> <description file> <input files or directories>

For example:

JethroLoader demo sales_demo.desc sales_demo.10m.csv

The load process may take some time, so you may run it in the background by specifying nohup at the beginning of the command and & at the end of the command.
When the loader starts running, it prints the path where the loader report file is located. You can view this file to monitor the load progress by using the command:

tail -f

If the data is not in text format (for example: when loading data directly from a hive table), you may need first to make the data accessible to Jethro. For more advanced loading options consult the chapter Loading Data Guide in the Reference Guide.
You can also learn more about loading data from the following video:
https://www.youtube.com/watch?v=Dol7TyTbgVU&t=27s.

Reviewing Data in Jethro

To review the data loaded to Jethro:

  1. Start JethroClient by running the command:

    JethroClient demo localhost:9111 -p Jethro
  2. Explore which tables exist in the specific instance or schema by using the command:

    SHOW TABLES;
  3. Explore the data in the selected table:

    SELECT * from sales_demo limit 5;

    Alternatively, you can present more detailed information of the existing tables, including number of columns, number of rows, number of partitions, and disk size by using the command:

    SHOW TABLES EXTENDED;