How to Parse One-Character First Names via Guess Name Surname Step

Reported for version 10

Parsing One-Character First Names in DQC Using Guess Name Surname Step

When in need of a powerful tool which may be used in names cleansing, Guess Name Surname step does the work. It is used for parsing and validating names. In extensive plans, it may be very efficient because it can be modified as a part of specifically designed components used for cleansing tasks. By default, the step limits names to at least two characters long strings. This article provides a workaround if you want to parse a one-letter first names in your plans.

Be aware of the fact that Guess Name Surname step is using pre-configured lookups. If you want to implement changes in the names recognition, you would need to prepare new lookups by providing lookup source files and then generating the actual lookup files from them. The new lookup needs to contain all valid names. You can accomplish this by using the pre-prepared lookup files and their sources as examples. You may find them in [DQC]\workspace\Tutorials\data\ext\lkp|src as well as an explanation on how to build them. 

In {FIRST_NAME} pattern definition of Guess Name Surname step, it is not possible to parse single-character names by default because the minimum length of the character sequence is two letters. A user-defined component has to be created to accept one letter first name. The setup of such component is described in the simple scenario below.  

There are three DQC steps needed for the component which will parse one-character names. Text File Reader, Guess Name Surname, and Text File Writer steps. Text File Reader reads data from an input file, creates the required columns specified in Columns tab and sends the data to Guess Name Surname step. This step is used for recognition and extraction of name parts. The output is placed into stated columns and written out by Text File Writer. In Guess Name Surname step, you need to configure two tabs - the General and Advanced tab. 

General tab properties of Guess Name Surname step

In this section, you create columns for the parsed name parts. Under Column Bindings and In property specify your source column to read the name from. Then fill in mandatory properties First Name and Last Name and other columns you wish to create. Create new columns by clicking Create... next to the property field. Doing so generates Shadow Columns (used for the parsed first and last names and other created columns) in Text File Reader as well. For the intended behavior of Guess Name Surname step which parses single-letter names you need to place the path to the corresponding lookup file in the General tab of the step > Basic > First Name Lookup File Name property.

In Pattern Groups section of the step, create new name patterns. Possible pattern definition could look as follows: 

For more pattern groups definitions, see built-in tutorials of your DQC IDE. Sample configuration can be found under 11 SOA Services directory where a component for parsing names is set up. 

Advanced tab properties of Guess Name Surname Step

When designing the properties of the Guess Name Surname step, you need to clarify word definitions and search patterns. Under Advanced tab of the Guess Name Surname step, overwrite the Word Definition property, so it will parse the single letter names as well, for example to {WORD:minLength=1,maxLength=100,chars='[:letter:]'} . 

Single letter initials should be handled in a different way than any other valid name. If wishing to parse initials as well, you may need to define different pattern groups. For the sample pattern definitions, see the 11.01 Names.comp configuration in 11 SOA Services directory of the provided DQC Tutorials.