Bulk Load Activity
You can use this activity for loading data by providing record input on the Input tab. The data can also be loaded to the Redshift table by staging data files in the S3 bucket with the appropriate S3 bucket URI and file format. It supports the following file formats for staged data in S3: CSV, JSON, CSV, MANIFEST, JSON, and PATHFILE.
For data provided as input, the CSV,MANIFEST format is supported.
General
The General panel contains the following fields:
Property | Module Property? | Description |
---|---|---|
Name | No | Name displayed as the label for the shared resource in the process |
Redshift Connection Resource | No | Click ![]() |
Entity | No | Click ![]() The entities downloaded on the Schema tab of the Redshift Connection shared resource are available for selection. |
Staging Table | Yes | The table where the data is staged during validation and processing merge before loading to the target table. By default, this table is created during the activity execution. However, the user can prior create the staging table and provide the name. The staging table must match the target table in the definition. The staging table is deleted when the operation is complete. |
Table | Yes |
If the data is staged in the S3 bucket, then the user provides the name of the table to which the data is loaded. Note: All table names must be schema qualified to prevent undesired results and errors.
|
Primary Key | Yes | A column in the table to be loaded holds unique data. |
Insert Duplicate Rows | Yes | By default, bulk load operation performs UPSERT (Insert/Update) activity. If a row of data to be loaded is already present in the table, then the old data is updated with the incoming row of data. However, the user can specify whether they want to retain existing data. When this check box is deselected, old existing data is replaced. |
Description
On the Description tab, you can type a short description for the BulkLoad activity.
Advanced
The Advanced panel contains the following fields:
Property | Module Property? | Description |
---|---|---|
Validation Mode | No | This field indicates how data validation is handled. The field is selected by default. When selected (validation "ON'), the bulkload transaction fails upon encountering the first data error. |
Maximum Error Limit | Yes | When Validation Mode Is OFF (unchecked), the user can specify an error count. The transaction continues to load until the error count is equal to the count set. The error records are skipped and valid data is loaded. The error details will be returned in the response to users. |
S3 Bucket URI | Yes | The location of S3 bucket where the users want their data files to be stored. |
Data Staged in S3 | Yes | The user can indicate if they have already placed the data files to load in S3 buckets. |
Interpret Empty string as NULL | Yes | If this check box is selected, an empty string is interpreted as a NULL value. |
Time Out | Yes | Defines activity timeout in seconds (Default: 0 seconds). |
File Format | No |
This field provides information about the format of the file that is present in the Amazon S3 bucket. The following file formats are supported:
|
Input
Based on the different file types the respective input for the activity is as follows.
CSV and CSV,MANIFEST:
Input Item | Data Type | Description |
---|---|---|
compression | string | File compression type. Supported compression types are BZIP2, GZIP, ZSTD, and LZOP. |
headerRowsToIgnore | string | The number of header rows in the CSV file should be skipped before reading the data. |
emptyFieldAsNullValue | string | The empty field should be treated as null |
quoteCharacter | string | If the input data is qualified with a quotation, the character used as a quote. |
escape | string | Can be set to True or False. If a character in the input data should be escaped, then the value is set to true. The data to be escaped is preceded by a backslash (\), which is the default escape character. |
JSON:
Input Item | Data Type | Description |
---|---|---|
compression | string | File compression type. Supported compression types are BZIP2, GZIP, ZSTD, and LZOP. |
ignoreColumnCase | string | The case of column names being ignored while reading data from an input file. When set to false, the column names in the file should match the case with which column names are defined in the database. |
JSON, PATHFILE:
Input Item | Data Type | Description |
---|---|---|
compression | string | File compression type. Supported compression types are BZIP2, GZIP, ZSTD, and LZOP. |
jsonPathfile | string | The S3 bucket URI of the path file is used for uploading data. When this option is chosen, the data is loaded following the column order as specified in the path file. |
xsd:date
and xsd:dateTime
format, respectively. It is auto-mapped to the relevant redshift format. Supported for both basic and Okta authentication.Tags
Output
The following is the output for the activity.
Output Item | Data Type | Description |
---|---|---|
S3_bucket_reference | string | URI to s3 bucket where data files are present. |
status | string | The status of the operation, SUCCESS or FAILED. |
rows_processed | string |
Total number of records processed. Note: When Validation Mode is OFF and Status is Failed then
rows_processed is equal to number of errors_encountered . |
rows_uploaded | string | Total number of records loaded. Both of these fields are the same if all records are processed and loaded without errors. |
errors_encountered | string | The number of errors encountered.0 indicates success and 1 indicates failure when validation mode is on and is equal to the number of errors encountered with the maximum error count set otherwise. |
max_error_count | string | The maximum number of errors allowed, as specified by the user. |
error_message | string | Indicates errors that are unrelated to the data. |
error_records | string | The details of records with data errors. |
file_name | string | The S3 URI to the file which contains an error record. |
error_row | string | The row number where error data is located. If the file has headers, then the header row is row 1. |
error_column | string | The column name that has error data. |
error_column_data | string | The incorrect column value. |
error_data | string | The row data for the error record. |
error_message | string | The reason for the error. |
Fault
The Fault tab lists exceptions that are thrown by this activity.
Error Schema Element | Data Type | Description |
---|---|---|
msg | string | The error message returned by the plug-in. |
msgCode | string | The error code returned by the plug-in |