CREATE SCHEDULED LOAD

Creates a scheduled load for a specified table from a specified drop folder. 

Syntax

CREATE SCHEDULED LOAD target_table_name
FROM <drop-folder>
SCHEDULE 
{
[CONTINUES] | 
[PERIODIC x {MINUTES | HOURS}] 
}
DESCFILE <desc-file-path>
[<optional-parameters>]


<drop-folder> :: =
{
 <folder>[/file-pattern]
}

<desc-file-path> :: =
{
  <folder>/description-file-name
}

<optional-parameters> :: =
{
  [PRELOAD_FILE_ACTION  {RENAME | NONE}] 
  [POSTLOAD_FILE_ACTION {RENAME | DELETE | MOVE <folder>}]
  [FAILED_FILE_ACTION   {RENAME | DELETE | MOVE <folder>}]
}

<folder> :: =
{
  [HDFS://[<ip>]]/<path>
}

Description

Creates a scheduled load for a specified table from a specified drop folder. Loader scheduler initiates a check for new files in the drop folder, based on a defined schedule, and starts a loader task if one or more new files are found in the drop folder. Following a successful or failed load, input files will be renamed, deleted or moved as defined (if defined) in the create scheduled load command.

Parameter Details

ParameterMandatory?Details
Target Table Name

MANDATORY

The name of the target table, for which the data will be loaded into.
Drop Folder

MANDATORY

Location of the input file to load with optional files pattern. A drop folder can be defined on locally mounted file system or on HDFS:
  • Local path - Full path of a folder on a locally mounted file system. For example: /home/jethro/salesdata/ 
  • HDFS full path - Full path of an HDFS folder. For example: hdfs://my.cluster:8020/data/sales/
Schedule 

MANDATORY

The load schedule frequency.
  • CONTINUES - Continually checks if new files are available in the drop folder, and start loading if at least one new file is found in drop folder
  • PERIODIC x MINUTES | HOURS - Checks for new files in the drop folder every X minutes/hours. Start load if at least one new file is found in the drop folder.
Description file

MANDATORY

DESCFILE {desc-file-path} - Full path to load description file. If the file is located on HDFS, use HDFS: prefix notation.

Pre-load file action

OPTIONAL

PRELOAD_FILE_ACTION  {RENAME | NONE}

RENAME (default) - Rename file before load to *.loading
NONE - Don't rename file before load

Post-load file action

OPTIONAL

POSTLOAD_FILE_ACTION {RENAME | DELETE | MOVE}

RENAME (default) - Rename to *.done
DELETE - Delete loaded file
MOVE [HDFS://[{ip}]]loaded-files-folder - Move the loaded file into a predefined loaded files folder

Failed file action

OPTIONAL

FAILED_FILE_ACTION   {RENAME | DELETE | MOVE}

RENAME (default) - Rename to *.failed
DELETE - Delete failed file
MOVE [HDFS://[<ip>]]failed-files-folder - Move the failed file into a predefined failed files folder

Comments and limitations:
  • All paths must not include spaces
  • All paths must be accessible from the Jethro loader machine. Please note that during the creation of a schedualed load, the query engine will verfiy that the desc file is accessible. However, the query node might not always use the same machine as of the one assigned for load processes. Therefore, it is recommended to store the Desc file, and to choose a move&drop path, that will be shared and accessible from all the Jethro query nodes. For example - A shared folder on the shared storage, such as <storage.root.path>/loads/desc/
  • Move folders must be of same storage type (for example, HDFS/POSIX) as the drop folders
  • Scheduled load service must be started to allow scheduled loads to run. To start scheduled load service run: service jethro start {Instance-name} loadscheduler

Example

CREATE SCHEDULED LOAD sales FROM hdfs://mycluster:8020/data/sales/ SCHEDULE PERIODIC 15 MINUTES DESCFILE hdfs://mycluster:8020/desc/sales.desc;
CREATE SCHEDULED LOAD sales FROM /home/jethro/instances/myinstance/loads/drop/sales/*.csv SCHEDULE CONTINUES DESCFILE /home/jethro/instances/myinstance/loads/desc/sales.desc;
CREATE SCHEDULED LOAD sales FROM /home/jethro/instances/myinstance/loads/drop/sales/ SCHEDULE PERIODIC 1 HOURS DESCFILE /home/jethro/instances/myinstance/loads/desc/sales.desc PRELOAD_FILE_ACTION NONE;
CREATE SCHEDULED LOAD sales FROM /home/jethro/instances/myinstance/loads/drop/sales/ SCHEDULE PERIODIC 1 HOURS DESCFILE /home/jethro/instances/myinstance/loads/desc/sales.desc POSTLOAD_FILE_ACTION MOVE /home/jethro/instances/myinstance/loads/done;
CREATE SCHEDULED LOAD sales FROM /home/jethro/instances/myinstance/loads/drop/sales/ SCHEDULE PERIODIC 1 HOURS DESCFILE /home/jethro/instances/myinstance/loads/desc/sales.desc FAILED_FILE_ACTION DELETE;

See Also

SHOW SCHEDULED LOADS - Print all scheduled loads
DROP SCHEDULED LOAD {table-name} - Remove scheduled load of specific table
DROP SCHEMA SCHEDULED LOADS {schema-name} - Remove scheduled loads of specific schema (or default schema if parameter is empty).