Building Custom Profiling Plans
Sometimes it is necessary to cleanse and standardize data to get more accurate profiling results. In this case, it is necessary to build a plan.
A plan file defines the logic and rules to be applied to the input data in order to produce the desired output. Plans are created by placing Steps onto a canvas and connecting them together. Steps are data processing algorithms that can be used to read, transform and analyze data, among other actions.
Examples of complex profiling plans are available in the Tutorials project in DQ Analyzer.
The Plan Editor
The image below shows a plan in the Plan Editor, which is launched every time you open or create a plan. The Plan Editor consists of the (1) Canvas, where the plan logic is defined (by connecting Steps together), and the (2) Palette, where the various steps and actions are listed.
The Plan Editor
Creating a Plan File
To create a new plan file:
- Select New > Plan by right-clicking on a project or folder in the explorer panel. Alternatively use the toolbar. Both options are shown below
Creating a New Plan
- Specify the Name of the plan and the place (Container) for storing it.
Creating a New Plan File
Adding Steps to the Canvas
To add steps to the canvas, do one of the following:
- Drag needed steps from the Palette to the Canvas.
Dragging a Step to the Canvas
- Press CTRL + I or Insert and select the step from a filterable list.
Insert Step Dialog
To learn how particular steps work, go through plans in the tutorial project: DQ Projects > Tutorials.
Connecting Steps
To connect two steps, drag from the out endpoint of one step to the in endpoint of another step.
Connecting Steps
Editing Step Properties
Most steps require (or benefit from) some configuration to perform their functions, which is done by accessing the step properties.
To edit step properties, double-click the step or right-click it and select Edit Properties:
Calling the Properties Dialog
In the image below, a regular expression is defined in the Regex Matching step:
Editing Regex Matching Properties
In the image below, the Column Assigner step is edited: a column is created and an expression is defined for it:
Editing Column Assigner Properties
Press Ctrl+Space to get a list of available functions and inputs columns. Press Ctrl+Space+Space to get a list of available input columns.
Dealing with Errors
Errors that may arise when constructing the plan are be reported to the Properties tab of the Status Panel:
Errors in the Constructed Plan
Selecting an individual step will show only the warnings and errors for that Step. Double-clicking on an error in the Properties panel will open the step properties dialog to the field which contains the error.
Adding Comments
To add a comment to your plan to explain its logic, select Comment from the Palette and click anywhere on the Canvas.
Adding a Comment
To edit the comment, double-click on it. The image below shows the comment editor, which allows changing the text, background, and border color as well as the text itself:
Editing a Comment
Running the Plan
When the plan is built and contains no errors it can be run. To do that simply click the Run button as seen below:
Running a Plan
When the plan is finished running, a message will appear:
Plan Run Successful Message
Viewing the Console Output
During and after plan execution, you can see plan execution logs in the Console tab of the Status Panel:
Plan Run Progress Monitoring
Viewing the Plan Execution Progress
To open plan execution progress while the plan is being executed, click the Show Progress icon in the Status Panel.
Monitoring Plan Progress
A new tab that opens shows the total number of records passing to each step.
Plan Execution Progress in a Separate Tab
Viewing Historical Run Results
To view all plan executions in the current sessions, switch to the Run Results tab in the Status Panel and select a particular run. You will be able to review the errors that occurred.
Run Results