CREATE SCHEDULED LOAD
Creates a scheduled load for a specified table from a specified drop folder.
Syntax
CREATE SCHEDULED LOAD [schema_name.]target_table_name FROM <drop-folder> SCHEDULE { [CONTINUES] | [PERIODIC x {MINUTES | HOURS}] } DESCFILE <desc-file-path> [<optional-parameters>] <drop-folder> :: = { <folder>[/file-pattern] } <desc-file-path> :: = { <folder>/description-file-name } <optional-parameters> :: = { [PRELOAD_FILE_ACTION {RENAME | NONE}] [POSTLOAD_FILE_ACTION {RENAME | DELETE | MOVE <folder>}] [FAILED_FILE_ACTION {RENAME | DELETE | MOVE <folder>}] } <folder> :: = { [HDFS://[<ip>]]/<path> }
Description
Creates a scheduled load for a specified table from a specified drop folder. Loader scheduler initiates a check for new files in the drop folder, based on a defined schedule, and starts a loader task if one or more new files are found in the drop folder. Following a successful or failed load, input files will be renamed, deleted or moved as defined (if defined) in the create scheduled load command.
Parameter Details
Parameter | Mandatory? | Details |
---|---|---|
Target Table Name | MANDATORY | The name of the target table, for which the data will be loaded into. |
Drop Folder | MANDATORY | Location of the input file to load with optional files pattern. A drop folder can be defined on locally mounted file system or on HDFS:
|
Schedule | MANDATORY | The load schedule frequency.
|
Description file | MANDATORY | DESCFILE {desc-file-path} - Full path to load description file. If the file is located on HDFS, use HDFS: prefix notation. |
Pre-load file action | OPTIONAL | PRELOAD_FILE_ACTION {RENAME | NONE} RENAME (default) - Rename file before load to *.loading |
Post-load file action | OPTIONAL | POSTLOAD_FILE_ACTION {RENAME | DELETE | MOVE} RENAME (default) - Rename to *.done |
Failed file action | OPTIONAL | FAILED_FILE_ACTION {RENAME | DELETE | MOVE} RENAME (default) - Rename to *.failed |
- All paths must not include spaces
- All paths must be accessible from the Jethro loader machine. Please note that during the creation of a schedualed load, the query engine will verfiy that the desc file is accessible. However, the query node might not always use the same machine as of the one assigned for load processes. Therefore, it is recommended to store the Desc file, and to choose a move&drop path, that will be shared and accessible from all the Jethro query nodes. For example - A shared folder on the shared storage, such as <storage.root.path>/loads/desc/
- Move folders must be of same storage type (for example, HDFS/POSIX) as the drop folders
- Scheduled load service must be started to allow scheduled loads to run. To start scheduled load service run: service jethro start {Instance-name} loadscheduler
Example
CREATE SCHEDULED LOAD sales FROM hdfs://mycluster:8020/data/sales/ SCHEDULE PERIODIC 15 MINUTES DESCFILE hdfs://mycluster:8020/desc/sales.desc; CREATE SCHEDULED LOAD sales FROM /home/jethro/instances/myinstance/loads/drop/sales/*.csv SCHEDULE CONTINUES DESCFILE /home/jethro/instances/myinstance/loads/desc/sales.desc; CREATE SCHEDULED LOAD sales FROM /home/jethro/instances/myinstance/loads/drop/sales/ SCHEDULE PERIODIC 1 HOURS DESCFILE /home/jethro/instances/myinstance/loads/desc/sales.desc PRELOAD_FILE_ACTION NONE; CREATE SCHEDULED LOAD sales FROM /home/jethro/instances/myinstance/loads/drop/sales/ SCHEDULE PERIODIC 1 HOURS DESCFILE /home/jethro/instances/myinstance/loads/desc/sales.desc POSTLOAD_FILE_ACTION MOVE /home/jethro/instances/myinstance/loads/done; CREATE SCHEDULED LOAD sales FROM /home/jethro/instances/myinstance/loads/drop/sales/ SCHEDULE PERIODIC 1 HOURS DESCFILE /home/jethro/instances/myinstance/loads/desc/sales.desc FAILED_FILE_ACTION DELETE;
See Also
SHOW SCHEDULED LOADS - Print all scheduled loads
DROP SCHEDULED LOAD {table-name} - Remove scheduled load of specific table
DROP SCHEMA SCHEDULED LOADS {schema-name} - Remove scheduled loads of specific schema (or default schema if parameter is empty).