Select Object Content Activity

The Amazon S3 Select Object Content activity scans the Amazon S3 objects and filters the data between the buckets.

Settings

The Settings tab has the following fields:

Field Type Required Default Value Description
AWS Connection Name Connector Yes None AWS connection to create the session and perform various operations.

Input Settings

The Input Settings tab has the following fields:

Field Type Required Default Value Description
Input Serialization Drop-down Yes CSV

The data format in which you want to read from the bucket:

  • CSV

  • JSON

  • Parquet

Compression Type Drop-down No None

The type of data compression:

  • GZIP

  • BZIP2

  • None

Output Serialization Drop-down Yes CSV

The data format for output serialization:

  • CSV

  • JSON

Input

The Input tab displays the input schema of the activity as a tree structure. The input values vary depending on the action selected on the Input Settings tab. All the elements marked with a red asterisk are required fields.

Field Type Required Default Value Description
input Drop-down Yes None

The input activity contains the following values:

Bucket: The name of the bucket

Key: The key under which the desired object is stored
Expression: Amazon S3 Select SQL Expression. For more information, see AWS documentation.
ExpressionType: Amazon S3 Select expression type is SQL.
InputSerialization Drop-down Yes None

The InputSerialization activity contains the following values:

AllowQuotedRecordDelimiter: Set the value used to allow quoted record delimiters to occur within the input.

Comments: If a character appears at the start of a row, it indicates that the row should be ignored. You can specify any character to indicate a comment line.
FieldDelimiter: A character used to separate individual fields in a record.

FileHeaderInfo: Set the first line of input. Select one of the following values:

  1. None

  2. Ignore

  3. Use

QuoteCharacter: Set the value used for escaping where the field delimiter is part of the value.
For example, if the value is a, b, Amazon S3 wraps this field value in quotation marks, as "a, b"
QuoteEscapeCharacter: Set the character used for escaping the quote character inside an already escaped value.
RecordDelimiter: A character used to separate records in the output.
Type: This field is only visible if input serialization is JSON type. For more information, see AWS documentation.
OutputSerialization Drop-down No None

The OutputSerialization activity contains the following values:

OutputType: To obtain output as a text or a file.
DestinationFilePath: When the output type is a file, provide a local path to retrieve the data.
FieldDelimiter: A character used to separate individual fields in a record.
QuoteCharacter: Set the value used for escaping where the field delimiter is part of the value.
For example, if the value is a, b Amazon S3 wraps this field value in quotation marks: "a, b"
QuoteEscapeCharacter: Set the character used for escaping the quote character inside an already escaped value.

QuoteFields: Indicates whether to use quotation marks around output fields.

  • ALWAYS: Always use quotation marks for output fields.

  • ASNEEDED: Use quotation marks for output fields when needed.

RecordDelimiter: A character used to separate records in the output.
RequestProgress Drop-down No None

The RequestProgress activity provides the status for the scanned records when the enable flag is set to true.

SSECustomerAlgorithm String No None

Specifies the algorithm to use when decrypting the requested object.

For more information about the x-amz-serversideencryption-customer-algorithm metadata key, see AWS documentation.

SSECustomerKey String No None

Specifies the customer-provided base64-encoded encryption key used to decrypt the requested object.

For more information about the x-amz-serversideencryption-customer-algorithm metadata key, see AWS documentation.

SSECustomerKeyMD5 String No None

Specifies the base64-encoded 128-bit MD5 digest of the customer-provided encryption key according to RFC 1321.

For more information about the x-amz-serversideencryption-customer-algorithm metadata key, see AWS documentation.

ScanRange Drop-down No None

It is an inclusive byte range within the selected object, the ScanRange is an optional parameter. The range consists of the following information:

Start: Start of the inclusive byte range that is to be selected.

End: End of the inclusive byte range that is to be selected.

Output

The Output tab displays the output schema of the activity as a tree structure. The output is read-only. The information in the schema varies based on the fields selected on the Settings tab. The properties that are displayed in the schema correspond to the output of this activity and can be used as input by subsequent activities in the flow.

The Output tab consists of three elements:

  • Output element: Displays the output according to the selected value from the Output Serialization drop-down.
  • OutputSerialization: Describes how results of the Select job are serialized.
  • Error element: The error element has four attributes:
    • Code - It is a string that uniquely identifies an error condition.
    • Message - It contains a generic description of the error condition.
    • RequestId - It is a string ID of the request associated with the error that occurs.
    • StatusCode - It is the HTTP response code corresponding to the error code.

Loop

If you want this activity to iterate multiple times within the flow, enter an expression that evaluates the iteration details. Select a type of iteration from the Type menu. The default type is None, which means the activity does not iterate.