Frequently Asked Questions

Installation and Configuration

How to reserve more memory for executing plans?

DQA plans store data while processing it. This means the more input data there and the more the plan logic multiplies this data to produce desired outputs, the more memory needs to be reserved it.

Some of the most memory-intensive steps are Profiling and JDBC Reader.

There are two ways to change the default settings:

  1. Open Window > Preferences > Ataccama DQ Analyzer > Launching.

  2. Change the value of the Default memory for launch configuration (MB) to 1024, which is recommended for modern for modern machines.

How to reserve more memory for the DQA IDE?

  1. Open dqa.ini located in the root folder of the DQA installation. This file contains the JAVA_OPTS that will be used by the virtual machine on start.
  2. Edit Xmx and substitute the number (256 by default) by a higher number for reserving more memory, e.g. -Xmx1024m

How to configure the temporary folder used by DQA (permissions error)?

You might experience problems when installing DQA on a computer with limited permissions or high restrictions. Even if the installation is done in a folder where you have rights for installation, DQA might fail to execute plans. The reason is that DQA makes use of a temporary folder (usually in the path to the user cache folder) for placing instances of executed plans. This folder is the default Java temporary folder. The execution might throw an exception due to being unable to create a temp plan file.

You can override the path to this folder by passing its chosen value as part of the special environment variable JAVA_OPTS, which serves as the Java Virtual Machine arguments. This should be created as follows:

  1. Open for editing the dqa.ini file placed in the root folder of the DQA installation, using an editor sensitive to EOLs (in Windows, WordPad is an option).
  2. Add the variable java.io.tempdir. The variable must be passed using the -Dnotation for parameters in Java.
  3. Choose a path where you have writing, executing and reading rights in the system. An absolute path, such as C:\Temp in Windows, can also be set; it would look like the following: -Djava.io.tempdir=C:\\Temp 

    A typical location would be by placing the temporary folder within the DQA installation, by omitting any absolute path (e.g. -Djava.io.tempdir=tmp would refer to the folder tmp in the root of the DQA installation). This ensures that you have the right permissions to store temporary data, especially in restricted machines or when lacking administrator rights (e.g. the cloned plans for execution).

How do I import data from MS Access and create Metadata?

You can import data file using ODBC. It might be necessary to have MS Access installed, but the drivers are often pre-installed even if MS Access is not present.

In order to set up the database, follow these steps:

  1. Go to your Windows Control Panel > Administrative Tools > Data Sources (ODBC) and click Add…. Choose Microsoft Access Driver and click Finish. You should get the following setup window:
  2. Enter the name of the source into the Data Source Name
  3. Click Select... and provide the path to the database
  4. Go to DQA, right-click Databases > New > Database Connection. Choose ODBC as the driver type and type the DSN as the connection name:
     
    You can also use the By URL option and type the following to the Connection string field:

    jdbc:odbc:Driver={Microsoft Access Driver (*.mdb, *.accdb)};DBQ=D:\test.accdb

    DBQ is the path to the MS Access file (e.g. D:\test.accdb).

  5. Click Test Connection to see if the connection was configured properly.

The ODBC driver exclusively locks the file, so it cannot be opened by any other application.

How to connect to an SQL Server using Windows Authentication?

There are two ways to do this, using two different drivers:

  1. Connect via the JDBC driver:

    1. select MS SQL in the Database type field

    2. add integratedSecurity=true to the end of the connection string. For example:

      jdbc:sqlserver://localhost:1433;databaseName=test1;integratedSecurity=true
  2. Connect via the jTDS driver:

    1. select jTDS - MS SQL in the Database type field

    2. specify a connection string in the following format

      jdbc:jtds:sqlserver://localhost:9876/test1;domain=company
    3. fill in Username and Password

Usage

Is it possible to automate table profile creation?

Table profile creation can be automated after buying the more advanced Data Quality Center (DQC) software. On the other hand, with Data Quality Analyzer (DQA, free software), the automation is not possible.

What is the maximum number of rows and columns provided in one input to the profiler with drill-through on?

There is no limit on the number of rows written to the drill-through table. If not specified, all rows will be written.

The maximum number of columns is not a specific number and depends on the DB. See this Wikipedia page for detailed information by DB type.

Ataccama can also provide a special Profiling component, which simulates a DB by saving the data into files. However, as it relies on files for storing the profiled data, it is not transactional and does not support any advanced DB functions. The use of this component is only required whenever it is necessary to store the data from the profiling, but there is no DB available. For more information on this component, please contact support@ataccama.com.

How to monitor the current execution?

DQA offers a way to monitor the processing visually, providing counts of processed records, percentages on completion of the different operations (steps) and visual cues to the flow of data to identify bottlenecks. This can be achieved by opening a special view of any plan being executed, which will mirror the original plan, showing the processing of records and other aforementioned visual features.

To open the monitoring, click the Show progress button , which appears in the Console when the plan is being executed.

You can also configure this monitoring window to be opened automatically whenever a plan is executed. The default configuration for the monitoring can be changed by:

  1. Open Window > Preferences
  2. Navigate to Ataccama DQ Analyzer > Launching > Progress Viewer.
  3. Change Automatically open progress viewer for launched plans to Always. You can also change how the Progress Viewer should behave once the execution finishes.

How to profile multiple sources or inputs in the same profile?

A profile is generated by the Profiling step. By default, the Profiling step accepts only one input, but more inputs can be added as endpoints to be connected by flows of data within the plan. This also allows creating referential integrity checks (i.e. foreign key analysis).

In order to do that, do the following:

  1. Double-click Profiling step.
  2. Click the Add Input icon  in the upper left corner and specify the input name in the next dialog window.
  3. After computing the profile, you can access the different individual profiles (and alternatively, their roll-ups if any were set) from a drop down menu at the left side of the opened file.

You can remove an input by clicking Remove Input .

How to export a profile to open standard formats?

DQA profiles can be opened by Ataccama tools, but they can also be exported, so users with no access to the tool can view results or manage them. There are two formats offered: HTML, for visual purposes, and XML, for processing and information management purposes.

To export a profile, follows these steps:

  1. Open the profile to be exported.
  2. Click the Export button  
  3. Select the data to be exported. As seen in the file image, if the profile has multiple inputs, they can be selected individually for the export. It also offers granularity for the export down to the attribute level.
  4. Click Next to choose which format it should be exported to. As multiple files are generated in the case of HTML, there is the option to automatically zip the results as well.

How to reserve more memory for a particular plan execution in DQA?

Profiling is a memory-consuming activity, and the default settings for allocated resources are conservative. Some intensive executions can throw a memory overflow exception if the default reserved memory for DQA is used.

When a given plan execution requires more memory (e.g. great volumes or memory-intensive tasks such as roll-ups), you can extend the reserved memory for that plan only, instead of reserving the memory for DQA on start (as seen in a previous question). In order to configure the memory allocation for a particular plan, follow these steps:

  1. Click Run Configuration... instead of the regular Run button. This is done by using the drop down menu by the button.
  2. Go to the Runtimes tab, where a dialog box for the VM arguments is revealed. You can input the same variables for the VM that are added in the environment variable JAVA_OPTS.
  3. Set the Xmx variable, which configures to the maximum heap size reserved by the Java Virtual Machine. Follow the notation for passing this parameter, e.g., -Xmx1024m stands for reserving 1 GB of memory as the maximum allocated for the plan. Notice that you can see a complete list of all the plans being opened and executed by DQA in the left pane, with their particular VM argument settings.

How to configure Profiling domains and thresholds?

When profiling data using the Profiling step, a domain analysis is performed by default over every attribute being profiled. DQA provides a list of predefined domains, which can be extended by new user-defined ones. In order to inspect those domains or add custom ones, do the following:

  1. Open (double-click) the Profiling step and change the view to Normal mode (as seen in the screenshot)
  2. Navigate to the Domains and the custom domains by Profiling > Domains > Customs 
  3. Each domain analysis is resolved based on a threshold, i.e. allowing for certain exceptions outside the resolved domain. In order to configure the threshold for a given domain, change the number in the Threshold field, which represents a percentage.

How to add DB operations in the contextual menu of the Databases node explorer?

DQA can serve as an SQL client, allowing you to inspect the databases via the Databases node in the File Explorer. The connection to the databases will also unfold a list of schemas or users of the database, as well as all tables and views for each schema. DQA also includes an SQL Editor for performing SQL operations and queries over the database.

One of the features that you can take advantage of is the custom DB actions. Right-clicking on any table will reveal a list of custom operations that can be executed (e.g. Drop tableCount, etc.).

You can add new templates based on common operations performed over the tables in the database. This is done as follows:

  1. Open Window > Preferences, then navigate the menus to Ataccama DQ Analyzer > Database > Table Commands.
  2. A dialog will open listing the currently available operations. Click Add to define a new template.
  3. Create the Template expression, using {entity} and <snippet> to specify the current table or columns in the table and a custom SQL snippet to be input, respectively.

How to add predefined templates for expressions?

Every step that allows for the input of an expression shows a button called Template.... This includes the transformation step Column Assigner or business rules in the Profiling step. This button can be used to pre-fill the Expression field with one from an out-of-the-box list. You can create your own expressions for common or complex repeatable operations.

As seen below, the Template button opens a dialog with a list of out-of-the-box and user defined templates.

In order to add new templates to the list, follow these steps:

  1. Open Window > Preferences. Navigate the menus to Ataccama DQ Analyzer > Expression Editor > Templates

  2. Choose between Expression Templates and RegEx Templates and click Add Template... In the following dialog window specify the NameExpression itself and Description.

When opening the Templates section, you will see a complete list of all the Expression and RegEx Templates available. Besides inspecting the templates, you can Edit, Add, Remove templates through their respective buttons, or organize the templates by a folder structure and the Move button. Moreover, the templates can be exported or imported in bulk through the Import... and Export ... buttons.