Existing files can be used as input data for a plan, for example. Files can be added by dragging and dropping from the file system to the desired project in the Navigator panel or by copying them from the destination folder to the desired project folder inside the workspace folder in the file system.
Opening Data Files
To open a data file locate it in File or Model Explorer and double-click on it. When a data file is opened for the first time (the file does not have letter 'M' in the bottom right corner), the Metadata Editor is launched.
Metadata describes the format of the data and needs to be assigned for the file – primarily CSV, TXT, and XLS(X) – to be properly used in plans. To edit metadata, right-click on the file and select Edit Metadata....
In Section 1 of the initial Format tab, the rules for reading and interpreting the raw data can be specified, the main of which are:
- File type. Choose whether the file is delimited or fixed width.
- Encoding. Specify the encoding.
- Field Separator. Choose which symbol separates your data into fields (columns). The setting is applicable if delimited File Type has been chosen.
- Data starts at line. Specify which line the data starts at. The value is usually "2" for data with field names or "1" for data without.
Section 2 of Edit File Metadata dialog allows to see the columns resulting from specifications in Section 1, change their type, and specify their format where applicable. For example, if you choose integer or long data format, you will be able to choose the thousands separator in the Format column.
Section 3 allows to see the final data in the Result data tab and compare it to the original data in the Decoded original data tab.
Viewing Data Files
Double-clicking a text file will open it for viewing in the CSV Viewer. The CSV Viewer displays the data in rows and columns, as defined in the file metadata.
To sort a column, click the name of the column in the header row. Clicking once will sort the data in ascending order (i.e. smallest-to-largest/A-to-Z), indicated by an up arrow. Clicking again will sort the data in descending order, indicated by a down arrow. Clicking a third time will remove all sorting and revert to the original ordering of the data, indicated by no arrow.
To show only a subset of data, click the Switch View button in the toolbar as shown below:
This will open the View Settings dialog, which contains a tab named Filter.
To define a filter, click Add. Use the drop-down controls to select a column to filter and a condition to apply (e.g. =, <, contains). Then specify the matching criteria. It is also possible to specify whether the filter should be case sensitive or not. This will display only rows matching the filter criteria. Multiple filters can be defined to further refine the data that is shown. To remove a filter without deleting it, uncheck the Enabled checkbox.
By default, all data will be shown in black text on a white background using the default font settings. However, rules can be configured, so that certain data values or ranges are colored or formatted differently. This can be useful for visually scanning for key values in a large data file. The conditions are defined similarly to Filters, but there are additional options for coloring and text formatting (available via the Coloring column):
- Background. Define the highlighting to apply to the cells
- Foreground. Defines the text color
- Bold. Tick to make the text bold
- Italic. Tick to italicize the text
Additionally, there is the option to define whether the coloring rule should be applied only to the specific cell which matches the coloring rule or to all (or any subset) of the columns (in the Apply to column).
After the two coloring rules above have been defined and applied, the data in the CSV Viewer will look like this:
Coloring rules will be applied in the order in which they appear. For example, as visible in the screenshot above, the last line contains a cell with "SIN" in the src_sin column, but the text inside is not painted red as has been defined in the Colors tab of View Settings. This is because this rule is followed by the
src_birth_date >= 'NULL' rule. The order can be changed using the buttons on the right, below the Add button. If we change the order of the two rules above, src_sin column in the last row now has red font color instead of black (see below). Since the background color was not defined for this rule, it is inherited from the other one and stays yellow.
For data files with many columns it may be useful to hide certain columns to focus on specific data. This can be done in the Columns tab of the View Settings dialog. Uncheck a column to hide it from view. When columns are hidden, a note appears at the bottom of the CSV Viewer indicating the number of columns that are hidden and providing a quick link to show them all. A column can also be hidden by right-clicking on it and selecting Hide Column.
Color-coding Column Headers
Many data files use standardized naming conventions to group similar columns. The View Settings dialog allows specifying different colors for column headers based on all or part of the column name. In the Heading tab, a column mask can be defined (e.g. "src*"), which will color all headings which name starts with the text specified. A different background color can be set for each mask that is used.
The following figure displays the outcome of the settings above.
Column widths can be resized by dragging left or right when the mouse is placed over the column divider in the column header. Columns can be automatically sized to fit their contents by double-clicking the column divider.
Additionally, right-clicking on a column header will bring up a menu which offers, among other functions, the ability to "autofit" the selected column or all columns.
Another feature available in the column header context menu is the ability to visually mark changes (or identify groups) in data. This can be useful for scanning a specific column visually for changes in the data.
The column whose groups are marked is indicated by an icon showing three parallel horizontal lines next to the column name (as shown above).
To preserve the view settings (including sorting, column widths, and marked groups) for later use, click the View Settings drop-down arrow to open the options menu.
Click Save As… (for creating a new saved view) or Save (for saving changes to an existing view) to store view settings. A list of recently used views will be shown at the top of the menu. A complete list of all saved views is shown in a submenu at the bottom labeled All Saved Views. An asterisk (*) next to the view name indicates that unsaved changes have been made to the current view. To delete a view, select Manage Views…. It is also possible to import and export views for use with other copies of the product using this menu option.
The view called "Default" is a stored view with no settings applied. It cannot be changed or deleted. When changes have been made to the default view, the toolbar button label changes to <custom>, which indicates an unsaved view based on the default view. Click Save As… to name and store the new view. The Edit… option is the same as clicking the toolbar button with the view name, which opens the View Settings dialog.
Editing Data Files
Besides viewing, it is also possible to edit
.csv files directly in the Ataccama IDE. Excel files can be edited only in MS Excel.
To edit a data file: right-click it and select Open With > Text Editor.
The file will open as plain text (instead of a table) without its metadata applied.
You can also select Open With > System Editor to the file in the default editor in your operating system.
Using Data Files in a Plan
The easiest way to add a data file into a plan is by dragging it from the explorer panel directly onto the canvas, where a corresponding step will be generated (e.g., Text File Reader or Excel File Reader).