Building Custom Profiling Plans

Sometimes it is necessary to cleanse and standardize data to get more accurate profiling results. In this case, it is necessary to build a plan.

A plan file defines the logic and rules to be applied to the input data in order to produce the desired output. Plans are created by placing Steps onto a canvas and connecting them together. Steps are data processing algorithms that can be used to read, transform and analyze data, among other actions.

Examples of complex profiling plans are available in the Tutorials project in DQ Analyzer.

The Plan Editor

The image below shows a plan in the Plan Editor, which is launched every time you open or create a plan. The Plan Editor consists of the (1) Canvas, where the plan logic is defined (by connecting Steps together), and the (2) Palette, where the various steps and actions are listed.

 

The Plan Editor

Creating a Plan File

To create a new plan file:

  1. Select New > Plan by right-clicking on a project or folder in the explorer panel. Alternatively use the toolbar. Both options are shown below

    Creating a New Plan

  2. Specify the Name of the plan and the place (Container) for storing it.
     

    Creating a New Plan File

Adding Steps to the Canvas

To add steps to the canvas, do one of the following:

  • Drag needed steps from the Palette to the Canvas.
     

    Dragging a Step to the Canvas

  • Press CTRL + I or Insert and select the step from a filterable list.
     

    Insert Step Dialog

To learn how particular steps work, go through plans in the tutorial project: DQ Projects > Tutorials.

Connecting Steps

To connect two steps, drag from the out endpoint of one step to the in endpoint of another step.

  

Connecting Steps

Editing Step Properties

Most steps require (or benefit from) some configuration to perform their functions, which is done by accessing the step properties.

To edit step properties, double-click the step or right-click it and select Edit Properties:

 

Calling the Properties Dialog

In the image below, a regular expression is defined in the Regex Matching step:

 

Editing Regex Matching Properties

In the image below, the Column Assigner step is edited: a column is created and an expression is defined for it:

 

Editing Column Assigner Properties

Press Ctrl+Space to get a list of available functions and inputs columns. Press Ctrl+Space+Space to get a list of available input columns.

Dealing with Errors

Errors that may arise when constructing the plan are be reported to the Properties tab of the Status Panel:

 

Errors in the Constructed Plan

Selecting an individual step will show only the warnings and errors for that Step. Double-clicking on an error in the Properties panel will open the step properties dialog to the field which contains the error.

Adding Comments

To add a comment to your plan to explain its logic, select Comment from the Palette and click anywhere on the Canvas.

 

Adding a Comment

To edit the comment, double-click on it. The image below shows the comment editor, which allows changing the text, background, and border color as well as the text itself:

 

Editing a Comment

Running the Plan

When the plan is built and contains no errors it can be run. To do that simply click the Run button as seen below:

 

Running a Plan

When the plan is finished running, a message will appear:

 

Plan Run Successful Message

Viewing the Console Output

During and after plan execution, you can see plan execution logs in the Console tab of the Status Panel:

 

Plan Run Progress Monitoring

Viewing the Plan Execution Progress

To open plan execution progress while the plan is being executed, click the Show Progress icon in the Status Panel.

 

Monitoring Plan Progress

A new tab that opens shows the total number of records passing to each step.

 

Plan Execution Progress in a Separate Tab

Viewing Historical Run Results

To view all plan executions in the current sessions, switch to the Run Results tab in the Status Panel and select a particular run. You will be able to review the errors that occurred.

 

Run Results