CREATE EXTERNAL TABLE
Maps a foreign data source table (such as Hive tables) into Jethro.
Syntax
CREATE EXTERNAL TABLE new_external_table_name (<columns_list_spec>) DATA SOURCE=existing_external_data_source_name LOCATION=<location_spec> [<values_format_spec>] [<rejects_handling_spec>] [<rejects_thresholds_spec>] <columns_list_spec> ::= <column_spec> [,<column_spec>...n] <column_spec> ::= { column name column_type } <location_spec> ::= { <table_name_from_the_external_data_source> } <rejects_handling_spec> ::= REJECT_POLICY= { FILL_NULL | SKIP_ROW | FAIL } <rejects_thresholds_spec> ::= { REJECT_LIMIT_COUNT=<a integer value specifying max number of allowed rows with reject> }
Description
Creates an external table in Jethro schema, mapped to a table on an external data source. Typically external tables are used for loading data from external data sources into Jethro table using INSERT INTO command.
When a policy or a formatting rule is set for an external table, it will override the policy/formatting rule of the external data source it is belonging to. Otherwise, it will inherit the settings of its external data source.
<location_spec> - Specifies the location within the external data source from which the data will be fetched.
table_name_from_the_external_data_source - The name of the table in Hive.
<rejects_handling_spec> - Specifies a policy for handling rejects at row/columns level. The allowed polices are:
FILL_NULL (default) - Replace the rejected value with a NULL.
SKIP_ROW - Skip the entire row when any value is rejected.
FAIL - Fail the entire request on the first time it encountered with a rejected value (Choosing this policy will ignore the rejects threshold defined under <rejects_thresholds_spec>).
Setting <rejects_handling_spec> override spec set at data source level via CREATE EXTERNAL DATA SOURCE command.
<rejects_thresholds_spec> - Specifies a rejects threshold policy for handling of multiple rejects at request level. The total number of rows with rejects are counted, and if that number reaches the defined threshold, the request fails (A row with 2 rejects will be counted as 1). The allowed rejects thresholds are:
REJECT_LIMIT_COUNT (default) - An integer value specifying the maximum number of allowed rows with rejects. If the number of rows with rejects reaches the limit, the request will fail. The default value is: 0.
Setting <rejects_thresholds_spec> override spec set at data source level via CREATE EXTERNAL DATA SOURCE command.
Reject policy & thresholds apply only within INSERT INTO SELECT operation. Otherwise fetching data from an external table using SELECT will fail, and issue an error whenever a mismatchbetween external table data type and the actual fetched value from the external data source is encountered.
Examples
Creating an external table for an Hive table called 'customer_address', with REJECT_POLICY=FAIL:
CREATE EXTERNAL TABLE hive_ext_customer_address ( ca_address_sk INTEGER, ca_address_id STRING, ca_street_number STRING, ca_street_name STRING, ca_street_type STRING, ca_suite_number STRING, ca_city STRING, ca_county STRING, ca_state STRING, ca_zip STRING, ca_country STRING, ca_gmt_offset FLOAT, ca_location_type STRING ) DATA SOURCE=exds1 LOCATION='customer_address' REJECT_POLICY=FAIL;