Bulk Load Activity

You can use this activity for loading data by providing record input on the Input tab. The data can also be loaded to the Redshift table by staging data files in the S3 bucket with the appropriate S3 bucket URI and file format. It supports the following file formats for staged data in S3: CSV, JSON, CSV, MANIFEST, JSON, and PATHFILE.

For data provided as input, the CSV,MANIFEST format is supported.

Note: A manifest file is recommended for parallel loading of data or for simultaneously loading data from multiple CSV files from different buckets.

General

The General panel contains the following fields:

Property	Module Property?	Description
Name	No	Name displayed as the label for the shared resource in the process
Redshift Connection Resource	No	Click to select a Redshift Connection shared resource. If no matching Redshift Connection shared resource is found, click Create Shared Resource to create one. For more details, see Creating and Configuring the Redshift Connection Resource
Entity	No	Click to select an entity. The entities downloaded on the Schema tab of the Redshift Connection shared resource are available for selection.
Staging Table	Yes	The table where the data is staged during validation and processing merge before loading to the target table. By default, this table is created during the activity execution. However, the user can prior create the staging table and provide the name. The staging table must match the target table in the definition. The staging table is deleted when the operation is complete.
Table	Yes	If the data is staged in the S3 bucket, then the user provides the name of the table to which the data is loaded. Note: All table names must be schema qualified to prevent undesired results and errors.
Primary Key	Yes	A column in the table to be loaded holds unique data.
Insert Duplicate Rows	Yes	By default, bulk load operation performs `UPSERT` (Insert/Update) activity. If a row of data to be loaded is already present in the table, then the old data is updated with the incoming row of data. However, the user can specify whether they want to retain existing data. When this check box is deselected, old existing data is replaced.

Description

On the Description tab, you can type a short description for the BulkLoad activity.

Advanced

The Advanced panel contains the following fields:

Property	Module Property?	Description
Validation Mode	No	This field indicates how data validation is handled. The field is selected by default. When selected (validation "ON'), the bulkload transaction fails upon encountering the first data error.
Maximum Error Limit	Yes	When Validation Mode Is OFF (unchecked), the user can specify an error count. The transaction continues to load until the error count is equal to the count set. The error records are skipped and valid data is loaded. The error details will be returned in the response to users.
S3 Bucket URI	Yes	The location of S3 bucket where the users want their data files to be stored.
Data Staged in S3	Yes	The user can indicate if they have already placed the data files to load in S3 buckets.
Interpret Empty string as NULL	Yes	If this check box is selected, an empty string is interpreted as a NULL value.
Time Out	Yes	Defines activity timeout in seconds (Default: 0 seconds).
File Format	No	This field provides information about the format of the file that is present in the Amazon S3 bucket. The following file formats are supported: CSV JSON CSV,MANIFEST JSON,PATHFILE

Note: When Validation mode is unchecked and the maximum error limit is set, if records in the payload have data errors, the response returned provides details of errors present in specific records. These errors must be rectified for the successful upload of data. Multiple repeated executions of the activity with the same payload without rectifying errors can lead to intermittent responses with no further details on errors. The process uploads the correct payload. The counts only reflect the records in the batch that are successfully processed and uploaded. In case of a full rollback, it reflects that no records are processed, or uploaded.

Input

Based on the different file types the respective input for the activity is as follows.

CSV and CSV,MANIFEST:

Input Item	Data Type	Description
compression	string	File compression type. Supported compression types are BZIP2, GZIP, ZSTD, and LZOP.
headerRowsToIgnore	string	The number of header rows in the CSV file should be skipped before reading the data.
emptyFieldAsNullValue	string	The empty field should be treated as null
quoteCharacter	string	If the input data is qualified with a quotation, the character used as a quote.
escape	string	Can be set to True or False. If a character in the input data should be escaped, then the value is set to true. The data to be escaped is preceded by a backslash (\), which is the default escape character.

JSON:

Input Item	Data Type	Description
compression	string	File compression type. Supported compression types are BZIP2, GZIP, ZSTD, and LZOP.
ignoreColumnCase	string	The case of column names being ignored while reading data from an input file. When set to false, the column names in the file should match the case with which column names are defined in the database.

JSON, PATHFILE:

Input Item	Data Type	Description
compression	string	File compression type. Supported compression types are BZIP2, GZIP, ZSTD, and LZOP.
jsonPathfile	string	The S3 bucket URI of the path file is used for uploading data. When this option is chosen, the data is loaded following the column order as specified in the path file.

Note: While providing data on the input tab, the date and timestamp values must be entered in xsd:date and xsd:dateTime format, respectively. It is auto-mapped to the relevant redshift format. Supported for both basic and Okta authentication.

Output

The following is the output for the activity.

Output Item	Data Type	Description
S3_bucket_reference	string	URI to s3 bucket where data files are present.
status	string	The status of the operation, `SUCCESS` or `FAILED.`
rows_processed	string	Total number of records processed. Note: When Validation Mode is OFF and Status is Failed then `rows_processed` is equal to number of `errors_encountered`.
rows_uploaded	string	Total number of records loaded. Both of these fields are the same if all records are processed and loaded without errors.
errors_encountered	string	The number of errors encountered.0 indicates success and 1 indicates failure when validation mode is on and is equal to the number of errors encountered with the maximum error count set otherwise.
max_error_count	string	The maximum number of errors allowed, as specified by the user.
error_message	string	Indicates errors that are unrelated to the data.
error_records	string	The details of records with data errors.
file_name	string	The S3 URI to the file which contains an error record.
error_row	string	The row number where error data is located. If the file has headers, then the header row is row 1.
error_column	string	The column name that has error data.
error_column_data	string	The incorrect column value.
error_data	string	The row data for the error record.
error_message	string	The reason for the error.

Fault

The Fault tab lists exceptions that are thrown by this activity.

Error Schema Element	Data Type	Description
msg	string	The error message returned by the plug-in.
msgCode	string	The error code returned by the plug-in