Parse Data

The Parse Data activity takes a text string or input from a file and processes it by converting it into a schema tree based on the specified Data Format shared resource.

General

You can use any mechanism to obtain or create a text string for processing.

The General tab has the following fields.

Field Module Property? Description
Name No The name to be displayed as the label for the activity in the process.
Data Format No The Data Format shared resource to use when parsing the text input.
Input Type No Specify the type of input for this activity.

Input is a String. Provide the string to the text input item.

Skip Blank Spaces No Select this check box to skip any empty records when parsing the text input.

When this check box is not selected, parsing stops at the first blank line encountered in the input.

Manually Specify Start Record No You can specify the record in the input where you want to start parsing.

This is useful if you have a large number of records and you want to read the input in parts (to minimize memory usage).

Selecting this check box displays the startRecord input item. See Parsing a Large Number of Records for more information on how to read the input stream in parts.

Strict Validation No Validates every input line for the specified number of fields for the fixed format text.

For example, if the format states that there are three fields per line and this check box is selected, all lines in the input must contain three fields.

Continue On Error No Continues parsing the next record in the input after encountering an error, if any.

If an error occurs, the error information is separated from the output of the successfully parsed records and is provided in the output schema of the activity.

When this check box is not selected, the Parse Data activity quits parsing if an error is encountered while parsing the records in the input.

Irrespective of whether this check box is selected or not, the Parse Data activity quits when any data validation errors occur.

Input

The following is the input for the activity.

Input Item Datatype Description
text string The text string to parse.

This input item is available only when String is specified in the Input Type field of the General tab.

startRecord number The line number of the input stream to begin parsing. All lines before the specified line are ignored. This input item is available only if the Manual Specify Start Record check box on the General tab is selected.

The input stream begins with the line number 1 (one). This is useful for reading the input stream in parts to minimize memory usage.

See Parsing a Large Number of Records for more information.

noOfRecords number The number of records to read from the input stream. Specify -1 if you want to read all records in the input stream.

This is useful for reading the input stream in parts to minimize memory usage.

See Parsing a Large Number of Records for more information.

SkipHeaderCharacters integer The number of characters to skip when parsing. You can skip over any file headers or other unwanted information.

Output

The following is the output of the activity.

Output item Datatype Description
Rows complex This output item contains the list of parsed lines from the input. This is useful to determine the number of records parsed by this activity.

The schema specified by the Data Format resource is contained in this output item.

schema complex The schema containing the data from the parsed input text. This output item contains zero or more parsed records.
ErrorRows This output item is available when you select Continue on Error, and error(s) while parsing the records in the input.

Raw input data is put in the error string.

This field contains the list of error lines for the records from the input that failed parsing.

done boolean true if no more records are available for parsing. false if there are more records available.

This output item is useful to check whether there are no more records in the input stream when reading the input in parts to preserve memory.

Fault

The Fault tab lists the possible exceptions thrown by this activity.

Fault Thrown When..
BadDataFormatException The input format is not valid.

Parsing a Large Number of Records

The input for this activity is placed in a process variable and takes up memory as it is being processed. When reading a large number of records from a file, the process may consume significant machine resources. To avoid too much memory, you may want to read the input in parts, parsing and processing a small set of records before moving on to the next set of records.

This procedure is a general guideline for creating a loop group for parsing a large set of input records in parts. You may want to modify the procedure to include additional processing of the records, or you may want to change the XPath expressions to suit your business process. If processing a large number of records, do the following:
  1. Select and drop the Parse Data activity on the process editor.
  2. On the General tab, specify the fields and select the Manually Specify Start Record check box.
  3. Select the Parse Data activity and click the group icon on the tool bar to create a group containing the Parse Data activity.
  4. Specify Repeat Until True Loop as the Group action, and specify an index name (for example, "i").

    The loop must exit when the EOF output item for the Parse Data activity is set to true. For example, the condition for the loop can be set to the following: string($ParseData/Output/done) = string(true())

  5. Set the noOfRecords input item for the Parse Data activity to the number of records you want to process for each execution of the loop.

    If you do not select the Manually Specify Start Record check box on the General tab of the Parse Data activity, the loop processes the specified noOfRecords with each iteration, until there are no more input records to parse.

    You can optionally select the Manually Specify Start Record check box to specify the startRecord on the Input tab. If you do this, you must create an XPath expression to properly specify the starting record to read with each iteration of the loop. For example, the count of records in the input starts at zero, so the startRecord input item could be set to the current value of the loop index minus one. For example, $i - 1.