WordLink Select Specifications
PAGE
1
WordLink Select Specifications
Select specifications are used to determine which blocks of text are included in a particular analysis.
A select specification permits particular “values” to reside in particular “columns” of a header.
Select specifications have 1 column section and 1 value section. Column and value sections must be terminated by periods.
Column sections must precede value sections.
Column sections are made up of column sets and value sections are made up of value sets.
A column set is a particular column or range of columns where some acceptable value is to be located in a header.
A value set is an acceptable value or range of values which are to be located in a particular column or range of columns in a header.
A dash between two numbers is used to specify a range of numbers in column and value sets.
Column and value sets must be separated by commas when there is more than one set in a section.
The number of column sets in a column section must match the number of value sets in the corresponding value section.
There are no limitations on the number of select specifications.. The characters in the column section are always interpreted as integers. The characters in the value section are treated as numbers by default. If they contain a non-digit character, value is treated as a string.
Numbers with decimal points cannot be included in select specifications because they will be interpreted as periods.
Blanks are not allowed anywhere in a select specification or in a header. The following are example headers and select specifications.
Headers
@@001
@@002
@@003
@@004
@@abc
Select Specification
5.1 . accepts Headers with a 1 in column 5.
5.1-4. accepts Headers with a number between 1 and 4 in column 5
3-5.1-4 accepts headers with a number between 1 and 4 spanning columns 3
through 5.
3,4,5.0,0,2. accepts headers with 0 in column 3, 0 in column 4 and 2 in column 5.
3-5.002. accepts headers with 0 in columns 3 and 4 and 2 in column 5.
3,4,5.0,0,1-4. accepts headers with 0 in column 3, 0 in column 4 and 1 through 4 in
column 5.
3-4,5.0,1-4. accepts headers with the value 0 spanning columns 3 and 4, and 1
through 4 in column 5.
3,4,5.a,b,c. accepts “abc” spanning columns 3-5.
3-5.abc. accepts “abc” spanning columns 3-5.
Exact Matching
Exact matching allows a value set to have a range of non-contiguous integers, or groups of characters, match contiguous columns of a header. This adds flexibility. If the text following the headers @@aaaa, @@aabb, and @@bbbb were all to be included in the same analysis, exact matching would accommodate this.
Exact matching permits only one column set and one value set per select specification.
The column set specifies which column or range of columns the exact match must occur in.
The value set contains an asterisk `*’.
The match list must be terminated by an “@@”.
Matches must be the exact same width as needed to fill the column set.
The following is an example of an exact match select specification.
3-5.*.
111
323
535
@@
The asterisk in the value section indicates an exact match select specification. The numbers between the * and @@ are the exact matches which will be acceptable in columns 3-5 of all headers. The @@ indicates end of exact match entry.
If your data set had a filename “wordset.dat” and your select list contained three select specifications then you would end up with three sets of output files having the filenames wordset01, wordset02, and wordset03. There would be eight files generated with each filename. An example of a select list which has three select specifications is:
3-5.*.
004
007
abc
@@
4-6.*.
999
xxx
yes
111
@@
3,5-6,7-9.a,hi,100-999.