CREATE EXTERNAL TABLE

Maps a foreign data source table (such as Hive tables) into Jethro.


Syntax

CREATE EXTERNAL TABLE new_external_table_name
(<columns_list_spec>)
DATA SOURCE=existing_external_data_source_name
LOCATION=<location_spec>
[<values_format_spec>]
[<rejects_handling_spec>]
[<rejects_thresholds_spec>]


<columns_list_spec> ::=
<column_spec> [,<column_spec>...n]

<column_spec> ::=
{
  column name
  column_type
}

<location_spec> ::=
{
	<table_name_from_the_external_data_source>
}

<rejects_handling_spec> ::=
REJECT_POLICY=
{
    FILL_NULL |
    SKIP_ROW |
    FAIL
}

<rejects_thresholds_spec> ::=
{
    REJECT_LIMIT_COUNT=<a integer value specifying max number of allowed rows with reject>
}

Description

Creates an external table in Jethro schema, mapped to a table on an external data source. Typically external tables are used for loading data from external data sources into Jethro table using INSERT INTO command.

When a policy or a formatting rule is set for an external table, it will override the policy/formatting rule of the external data source it is belonging to. Otherwise, it will inherit the settings of its external data source.

<location_spec> - Specifies the location within the external data source from which the data will be fetched.

table_name_from_the_external_data_source - The name of the table in Hive.

<rejects_handling_spec> - Specifies a policy for handling rejects at row/columns level. The allowed polices are:
FILL_NULL (default) - Replace the rejected value with a NULL.
SKIP_ROW - Skip the entire row when any value is rejected.
FAIL - Fail the entire request on the first time it encountered with a rejected value (Choosing this policy will ignore the rejects threshold defined under <rejects_thresholds_spec>).
Setting <rejects_handling_spec> override spec set at data source level via CREATE EXTERNAL DATA SOURCE command.

<rejects_thresholds_spec> - Specifies a rejects threshold policy for handling of multiple rejects at request level. The total number of rows with rejects are counted, and if that number reaches the defined threshold, the request fails (A row with 2 rejects will be counted as 1). The allowed rejects thresholds are:
REJECT_LIMIT_COUNT (default) - An integer value specifying the maximum number of allowed rows with rejects. If the number of rows with rejects reaches the limit, the request will fail. The default value is: 0.
Setting <rejects_thresholds_spec> override spec set at data source level via CREATE EXTERNAL DATA SOURCE command.

Reject policy & thresholds apply only within INSERT INTO SELECT operation. Otherwise fetching data from an external table using SELECT will fail, and issue an error whenever a mismatchbetween external table data type and the actual fetched value from the external data source is encountered.


Examples

Creating an external table for an Hive table called 'customer_address', with REJECT_POLICY=FAIL:

CREATE EXTERNAL TABLE hive_ext_customer_address
(
    ca_address_sk             INTEGER,
    ca_address_id             STRING,
    ca_street_number          STRING,
    ca_street_name            STRING,
    ca_street_type            STRING,
    ca_suite_number           STRING,
    ca_city                   STRING,
    ca_county                 STRING,
    ca_state                  STRING,
    ca_zip                    STRING,
    ca_country                STRING,
    ca_gmt_offset             FLOAT,
    ca_location_type          STRING
)
DATA SOURCE=exds1
LOCATION='customer_address'
REJECT_POLICY=FAIL;

See Also

INSERT INTO

CREATE TABLE

CREATE EXTERNAL DATA SOURCE