Loading Data Guide
Creating Tables
Before loading data, you need to create tables in Jethro.
To create tables in Jethro:
Start JethroClient by running the command:
JethroClient <instance name> <host>:<port> -p <password>
For example:
JethroClient demo localhost:9111 -p jethro
Copy the create table command from the *.ddl file previously created on the chapter Analyzing Data and drop it at the command prompt.
Alternatively, you can run JethroClient with the -i option, specifying the *.ddl file of the table.
In the example mentioned above:JethroClient demo localhost:9111 -p jethro -i sales_demo.ddl
The tables created will only contain data types supported by Jethro. The supported data types are:
INT | BIGINT | FLOAT | DOUBLE | STRING | TIMESTAMP
Loading Data to Tables
Once the tables are created, you can start loading data.
The Jethro loader utility JethroLoader allows fast and efficient data loading from text files into a Jethro table. JethroLoader parses its input into rows and loads them into an existing Jethro table.
To load data to a table, run the command:
JethroLoader <instance name> <description file> <input files or directories>
For example:
JethroLoader demo sales_demo.desc sales_demo.10m.csv
The load process may take some time, so you may run it in the background by specifying nohup at the beginning of the command and & at the end of the command.
When the loader starts running, it prints the path where the loader report file is located. You can view this file to monitor the load progress by using the command:
tail -f
If the data is not in text format (for example: when loading data directly from a hive table), you may need first to make the data accessible to Jethro. For more advanced loading options consult the chapter Loading Data Guide in the Reference Guide.
You can also learn more about loading data from the following video:
https://www.youtube.com/watch?v=Dol7TyTbgVU&t=27s.
Reviewing Data in Jethro
To review the data loaded to Jethro:
Start JethroClient by running the command:
JethroClient demo localhost:9111 -p Jethro
Explore which tables exist in the specific instance or schema by using the command:
SHOW TABLES;
Explore the data in the selected table:
SELECT * from sales_demo limit 5;
Alternatively, you can present more detailed information of the existing tables, including number of columns, number of rows, number of partitions, and disk size by using the command:
SHOW TABLES EXTENDED;