A data quality profile is a summary of the state of data. It allows seeing duplicates and dependencies, evaluating business rules, observing patterns in data, and more.
After reading this chapter, you will be able to create a profile and configure it for additional analyses, such as masks, dependencies, and business rules.
Step 1 Select Data to Profile
- Launch the Integrated Development Environment.
- In the File Explorer view, select one or several files (CSV, TXT, XLS, XLSX) or database tables, right-click the selection and select Create Profile.
- In order to profile a database table, you must have a database connection configured (see Connecting to a Database to learn how to do this).
- In the case of text files, you may need to assign metadata to them to describe how it is formatted. For more information, see Editing Metadata.
- If you select several inputs, you will obtain one profile file for all of them separated by input (one per table/file). See Reading a Data Quality Profile for more information on reading a profile with multiple inputs.
Step 2 Configure the Profile or Create a Profiling Plan
In the profile configuration dialog that opens, specify where to create the profile as well as which columns to profile. Drill-through functionality allows you to see the individual records that comprise the statistics that are generated (database connection required). Finally, there is the option to create a profile or a plan file.
If you select the Profile option and click Finish, the profile will be generated immediately using the specified settings and opened in the Profile Viewer. See Reading a Data Quality Profile to learn how to read the data contained in the Profile.
If you select the Plan file option, a plan for generating a profile will be created. This option is useful if you wish to modify or filter the data before profiling it or if you want to do some advanced configuration of the profiling algorithm (such as adding business rules or performing primary key analysis, for example). See the next section Configuring the Profiling Step for more information.