Bulk Load Activity

You can use this activity for loading data by providing record input on the Input tab. The data can also be loaded to the Redshift table by staging data files in the S3 bucket with the appropriate S3 bucket URI and file format. It supports the following file formats for staged data in S3: CSV, JSON, CSV, MANIFEST, JSON, and PATHFILE.

For data provided as input, the CSV,MANIFEST format is supported.

Note: A manifest file is recommended for parallel loading of data or for simultaneously loading data from multiple CSV files from different buckets.

General

The General panel contains the following fields:

Property Module Property?  Description
Name No Name displayed as the label for the shared resource in the process
Redshift Connection Resource No Click to select a Redshift Connection shared resource. If no matching Redshift Connection shared resource is found, click Create Shared Resource to create one. For more details, see Creating and Configuring the Redshift Connection Resource
Entity No Click to select an entity.
The entities downloaded on the Schema tab of the Redshift Connection shared resource are available for selection.
Staging Table Yes The table where the data is staged during validation and processing merge before loading to the target table. By default, this table is created during the activity execution. However, the user can prior create the staging table and provide the name. The staging table must match the target table in the definition. The staging table is deleted when the operation is complete.
Table Yes

If the data is staged in the S3 bucket, then the user provides the name of the table to which the data is loaded.

Note: All table names must be schema qualified to prevent undesired results and errors.
Primary Key Yes A column in the table to be loaded holds unique data.
Insert Duplicate Rows Yes By default, bulk load operation performs UPSERT (Insert/Update) activity. If a row of data to be loaded is already present in the table, then the old data is updated with the incoming row of data. However, the user can specify whether they want to retain existing data. When this check box is deselected, old existing data is replaced.

Description

On the Description tab, you can type a short description for the BulkLoad activity.

Advanced

The Advanced panel contains the following fields:

Property Module Property?  Description
Validation Mode No This field indicates how data validation is handled. The field is selected by default. When selected (validation "ON'), the bulkload transaction fails upon encountering the first data error.
Maximum Error Limit Yes When Validation Mode Is OFF (unchecked), the user can specify an error count. The transaction continues to load until the error count is equal to the count set. The error records are skipped and valid data is loaded. The error details will be returned in the response to users.
S3 Bucket URI Yes The location of S3 bucket where the users want their data files to be stored.
Data Staged in S3 Yes The user can indicate if they have already placed the data files to load in S3 buckets.
Interpret Empty string as NULL Yes If this check box is selected, an empty string is interpreted as a NULL value.
Time Out Yes Defines activity timeout in seconds (Default: 0 seconds).
File Format No

This field provides information about the format of the file that is present in the Amazon S3 bucket. The following file formats are supported:

  • CSV

  • JSON

  • CSV,MANIFEST

  • JSON,PATHFILE

Note: When Validation mode is unchecked and the maximum error limit is set, if records in the payload have data errors, the response returned provides details of errors present in specific records. These errors must be rectified for the successful upload of data. Multiple repeated executions of the activity with the same payload without rectifying errors can lead to intermittent responses with no further details on errors. The process uploads the correct payload. The counts only reflect the records in the batch that are successfully processed and uploaded. In case of a full rollback, it reflects that no records are processed, or uploaded.

Input

Based on the different file types the respective input for the activity is as follows.

CSV and CSV,MANIFEST:

Input Item Data Type Description
compression string File compression type. Supported compression types are BZIP2, GZIP, ZSTD, and LZOP.
headerRowsToIgnore string The number of header rows in the CSV file should be skipped before reading the data.
emptyFieldAsNullValue string The empty field should be treated as null
quoteCharacter string If the input data is qualified with a quotation, the character used as a quote.
escape string Can be set to True or False. If a character in the input data should be escaped, then the value is set to true. The data to be escaped is preceded by a backslash (\), which is the default escape character.

JSON:

Input Item Data Type Description
compression string File compression type. Supported compression types are BZIP2, GZIP, ZSTD, and LZOP.
ignoreColumnCase string The case of column names being ignored while reading data from an input file. When set to false, the column names in the file should match the case with which column names are defined in the database.

JSON, PATHFILE:

Input Item Data Type Description
compression string File compression type. Supported compression types are BZIP2, GZIP, ZSTD, and LZOP.
jsonPathfile string The S3 bucket URI of the path file is used for uploading data. When this option is chosen, the data is loaded following the column order as specified in the path file.
Note: While providing data on the input tab, the date and timestamp values must be entered in xsd:date and xsd:dateTime format, respectively. It is auto-mapped to the relevant redshift format. Supported for both basic and Okta authentication.

Tags

Output

The following is the output for the activity.

Output Item Data Type Description
S3_bucket_reference string URI to s3 bucket where data files are present.
status string The status of the operation, SUCCESS or FAILED.
rows_processed string

Total number of records processed.

Note: When Validation Mode is OFF and Status is Failed then rows_processed is equal to number of errors_encountered.
rows_uploaded string Total number of records loaded. Both of these fields are the same if all records are processed and loaded without errors.
errors_encountered string The number of errors encountered.0 indicates success and 1 indicates failure when validation mode is on and is equal to the number of errors encountered with the maximum error count set otherwise.
max_error_count string The maximum number of errors allowed, as specified by the user.
error_message string Indicates errors that are unrelated to the data.
error_records string The details of records with data errors.
file_name string The S3 URI to the file which contains an error record.
error_row string The row number where error data is located. If the file has headers, then the header row is row 1.
error_column string The column name that has error data.
error_column_data string The incorrect column value.
error_data string The row data for the error record.
error_message string The reason for the error.

Fault

The Fault tab lists exceptions that are thrown by this activity.

Error Schema Element Data Type Description
msg string The error message returned by the plug-in.
msgCode string The error code returned by the plug-in