WORDij At-a-Glance
WORDij At-a-Glance.
Program
Purpose
Input
Output
Comments
WordLink
(No Advanced Options)
Base program which counts words and word pair strings
A Text file.
A Drop List is optional.
8 files each with a unique three digit file extension:
.net, .pr, ..ptg, .stp.csv,
stw.csv, .log, wrd, wtg.
See WordLink Output File Types Summary.
See How to convert Word documents into text format.doc
See How do to append multiple text files together.doc:
WordLink
Advanced Options
Produces custom semantic networks by using one, two or three special input files.
A select file (.sel) and a Source Text file with the select headers embedded.
An include file (.inc)
A string replace file (.str)
8 files each with a unique three digit file extension:
.net, .pr, ..ptg, .stp.csv,
stw.csv, .log, wrd, wtg.
WordLink advanced options are used to refine your semantic network analysis in three ways:
1) To replace two or more words an with a single unigram using a string replace file (.str);
2) To restrict the analysis to a finite set of words using the include file (.inc); and
3) to specify sub-groups for over time analysis using the select file (.sel). One, two or three optional files can be used alone or in combination with one another.
VISij
Visualization of a network.
One or more .net files.
Graphic visualization of words (nodes) and links.
If multiple .net files are included an animation will play a network sequence change from one file to another.
VISij enables the user to change the graphic image by zooming in and out, excluding disconnected nodes and varying the number of nodes displayed as well as the link strength between nodes.
VISij reads a Pajek .net file and thus can be used for any type of network data.
The .net files can be read by UCINET’s NetDraw.
Use WORDij Conversion utilities to convert a .net file into a MultiNet Node and Link .csv files for program interoperability.
QAPNet
QAP is an overall measure of the similarity of two whole networks using a correlation coefficient.
The two .pr files you wish to compare.
Note: The order of the files does not matter.
Permutations default is 100.
A correlation value that ranges from -1.00 a perfect negative correlation to +1.00 a perfect positive correlation.
You may leave the Permutations value to the default of 100 to generate 100 bootstrap random samples against which to arrive at a probability of significance value.
The correlation value for most files rarely is large. Correlations range from -1.00 a perfect negative correlation to +1.00 a perfect positive correlation.
Nevertheless, this does not mean that on the word and word pair level there are not significant differences that are large and perhaps substantively significant to you. These are revealed in the Z Utilities.
WORDij At-a-Glance.
Program
Purpose
Input
Output
Comments
OptiComm
Produce messages that could be used to either promote change to move two words closer, more them further apart, or to reinforce aspects of the semantic networks
A .wtg and .ptg files.
A seed word and
a target word
Program defaults to producing 16 messages of alternative shortest paths
The optimal message creator, OptiComm, traces all shortest paths between a seed word and a target word, both of which must be connected indirectly in the network.
OptiComm defaults to producing 5 word strings, which you can set to be a longer value. It also defaults to producing 16 messages of alternative shortest paths.
If you do not enter a target word it defaults to the most central word in the network.
If you wanted to move two words closer together, and the concept was innovative, it may be best to select the shortest path of low frequency, using the output labeled, “Strings with Low Average Pair Frequency,” listed first in the output. Our lab experiments have shown this to be most effective. The theory is that while the words are central, they are attractive because their use is less frequent in the particular language community.
If you wanted to more two words closer together and reinforce an already strong connection, you may want to use the shortest strings of most frequent words, labeled “Strings with High Average Pair Frequency,” which is listed second in the output. These words are more frequently used in the language community.
To move two words further apart, one would select a target that is on the periphery of the word network, trying different peripheral targets until finding a desirable string connects the seed and remote target. Pick the strings in the first section, strings with low average pair frequency, listed first in the output.
WORDij At-a-Glance.
Program
Purpose
Input
Output
Comments
Z Utilities
Z-Utilities allow you to compare two text files and determine what the significant differences there are for either the words or the word pairs or the pairs from NodeTric .nets.
For Word comparison between two files use .wrd files
For Word Pair comparison between two files use .pr files
For NodeTric Net comparison between two files use NodeTRic .net files
Fixed format text file with 7 columns or variables:
Column 1 (A) is either a Word or Word Pair.
Column 2 (B) is Group 1 Frequency Count.
Column 3 (C) is Group 2 Frequency Count.
Column 4 (D) is Group 1 Relative Proportion.
Column 5 (E) is Group 2 Relative Proportion.
Column 6 (F) is the Z-Score.
Column 7 (G) is the Chi-Square.
The Z-utilities determine what words and word pairs are new and growing, what is old and declining, and what is remaining the same in relative frequencies (proportions).
Although these are called Z-Utilities, there are actually two significance tests for comparisons of words, pairs, and the pairs output of the NodeTric .net files. One is the Z-test for proportions (relative frequencies) and the other is the Chi-Square tests of differences in counts. The Z-test cannot produce a value when one of the pairs has a frequency of zero, so we enter a very small constant to replace zero.
The critical z value for two proportions are:
p < .05 is + /- 1.64, p < .01 is +/- 2.389 p < .001 is +/- 3.5 The Chi-Square test may be preferred by some analysts because it is not an inferential statistic whereas Z-tests are. Nevertheless, if the number of occurrences in one or both of the files is less than 5 then Chi-Square statistics should not be used because the estimates are invalid. The value of Chi-Square that is statistically significant for degrees of freedom 1 and p < .05 (number of cells -1) is 3.841. Values higher than this are significant at higher levels. For example: p < .01 the critical value is 6.635, p < .005 is 7.879. See Instructions on How to format Z-Utilities output into Excel. WORDij At-a-Glance. Program Purpose Input Output Comments Conversions Three types of conversions are possible: 1. Convert WordLink .wtg and .ptg files into MultiNet Node and Link .csv files. 2. Convert a WordLink .pr file into a NodeTric Pajek .net file. 3. Convert a Pajek .net file into MultiNet Node and Link .csv files. For NODConvert the input file is a .wtg file. For LINConvert the input file is a .ptg file. For NODTric the input file is a .pr file. For NetToCSV a .net file is used for both the Nodes and Links. For NODConvert it produces a MultiNet Node .csv file. For LINConvert it produces a MultiNet Link .csv file. For NodeTric it produces a .net file For a NetToCSV Nodes a MultiNet Nodes.csv is made. For a NetToCSV Links a MultiNet Nodes.csv is made. Note: MultiNet requires the pair of network .csv files to have identical names except the first three letters. The node file must start with “NOD” and the link file must start with “LIN.” For example: NOD_Sample.csv, LIN_Sample.csv. MultiNet can visualize the network in eigenspace and can run Negopy to determine groups and roles. Pajek and UCINET can read and draw .net files. WORDij At-a-Glance. Program Purpose Input Output Comments Utilities The Utilities has two applications: 1. A Proper Nouns extraction program which creates a list of Proper Nouns and a String Replace File (.str) 2. A TimeSegs program which creates new WordLink text file that has imbedded time stamp headers and a Select file (.sel) The Proper Nouns program requires a Text file input. TimeSegs requires a Text file input from Lexis/Nexis or NewsBank and setting 4 parameters: 1. Start Date (YYYMMDD) 2. End Date (YYYYMMDD) 3. Period (width or slider window) 4. Timeframe (days, weeks, months, years) The Proper Nouns produces two files: 1. An Output List file of all the Proper Nouns. 2. A String Replace file (.sel). TimeSegs outputs two files: 1. The original text will with embedded time stamp headers. 2. A Select List file based on the four parameter settings. See WordLink Advanced Options to use the String Replace list file (.str) and Select (.sel) file and the text file with the select time stamp headers to refine your network analysis. WordLink 8 Output File Types Summary. File Types Description Comments .net A .net file is a network file in Pajek format. This format is read by the VISij program to visualize a network. A .net file can be imported into UCINET for statistical analysis of the network and also used in Netdraw to generate a graphic of the semantic network. A .net file can be converted to MultiNet Node and Link .csv files using the WORDij’s Conversions NetToCSV program. .pr A .pr file contains word pair listing. The file contains three columns: the ID for Word1, the ID for Word 2 and the Frequency. The pair file contains the actual words. .ptg A .ptg is like the “.pr” file except it contains IDs rather than words. The file contains three columns: the ID for Word1, the ID for Word 2 and the Frequency. .stp.csv A .stp.csv file contains the number of pairs, number of unique pairs, average pair frequency, and pair entropy. Then there are five columns listing the pair, frequency, proportion, entropy term, and mutual information. stw.csv A . stw.csv file contains the number of words, number of unique words, average word frequency, and word entropy. Then there are four columns listing the word, frequency, proportion, and entropy term .log A .log, is a summary of the run settings .wrd A .wrd, file is an alphabetically listing of the words and a frequency count of their occurrence. The file contains two columns: Word and Frequency. .wtg A .wtgfile is an is an alphabetically listing of the words, with a unique ID number assigned to it and a frequency count of their occurrence. The file contains three columns: Word, ID number and Frequency. WordLink Input File Types File Types Description Comments .txt A text file. WordLink reads only text files. You must pre-process or convert your Word, etc files into text format to use this program. If you have a large number of files, consider a batch utility such as http://www.ultrashareware.com/Ultra-PDF-To-Text-Converter.htm or http://www.processtext.com/abctxt.html See How to convert Word documents into text format.doc See How do to append multiple text files together.doc For large jobs consider http://www.processtext.com/abcmerge.html Drop.lst A droplist file is what many information retrieval people call the “stop word” list. It contains words to exclude from the process to follow. We have created a short droplist.txt file that contains mainly prepositions, pronouns, and other function words, as well as format words used in Lexis/Nexis. A Drop List has no size limit. Edit your droplist.txt file to fit the context of your analysis. For example, in marketing, pronoun use is indicative of how close the respondent feels to the product/service, so one would want to include rather than drop pronouns in this case. .sel A select file is like a batch file or macro which systematically selects marked sections of text and places them in alternative files for analysis. The sections are marked with headers that begin with @@ followed by any alphanumerical content. A select file is used in WordLinks’s Advanced Options. WORDij Utilities TimeSegs generates a select file (.sel). A select file (.sel) is often used with an include file (.inc) and/or a string replace file (.str). .inc An include list file defines what words are only included in the analysis. It is the opposite of the drop list file. An include list file (.inc) is used in WordLinks’s Advanced Options. An include list file (.inc) is often used with a string replace file (.str) and/or a select file (.sel). .str A string replace file is a file that indicates what word strings are to be converted to a single word or unigram. For example, White House would be treated as White_House. More than two word strings are also created, like United_States_of_America. Each string after an arrow -> is treated as a unigram, equivalent to a single word in WordLink.
A string replace file (.str) is used in WordLinks’s Advanced Options.
A string replace file (.str) is used to refine semantic network maps.
WORDij Utilities ProperNouns generates a string replace file.
A string replace file (.str) is often used with an include file (.inc) and/or a select file (sel).
WORJij Limitations
Program and/or Input
Comments
WordLink input text files
There is no file size limit.
However, be carful to have enough available hard drive space, because each WordLink run generates a set of 8 output files. Very large files can consume available hard drive space very fast.
OptiComm
If “good” and “bad” are closely linked and you want to emphasize one or the other, to avoid having both words in the optimal message, consider setting strings to 7 to 10 words long and picking the longest shortest path string that has most linguistic validity and does not include the wrong sentiment word.
VISij
Currently does not output to graphic file format, so screen shots required of either static graphics or of movies.
QAPNet
Correlations are usually low and insignificant but consider using this utility also for similarity scoring of sets of pairs of networks. The relative difference between correlations scales network similarity.
Z Utilities
Even with files to be compared of different sizes, the use of proportional values removes effects of different sizes of files.
Conversions
NodeTric is very useful when one has text that may be irrelevant, such as from large volumes of news stories which may mention your concept but discuss many unrelated things. NodeTric zeros in on your key concept and the connects to it from 3 link steps or to whatever you set the parameter. If your key term is not a single word, a unigram, but is instead a bigram, trigram, etc. then use the string conversion Advanced Feature of WordLink, making a conversion file to change your multi-gram to a unigram.
Utilities
Proper nouns not only lists the proper nouns such as places, organizations, people, programs, etc. but it produces an automatic string conversion file that you may edit to produce unigrams from multigram proper noun terms.
TimeSegs currently processes dates in Lexis/Nexis or NewsBank format only.
Notes:
WORDij 3.0 is written in C++ (only the GUI is in Java) and consequently is very fast. For example, a 1 meg file can be processed in about 15 seconds on a 2ghz dual processor machine with 3gb ram while a 650 meg file processes in 5 minutes.
WORDij does not install any items in the Windows Registry keys.
To uninstall WORDij just delete the folder.
For support contact: Jim Danowski at jimd@uic.edu
WORJij At-A-Glance
Page 9 of 9
WordLink
WordLink_Advance
OptiComm
VISij
QAPNet
Z-Utilities
Conversions
Utilities
Z-Test_Word
Z-Test_Pair
Z-Test_Nets
Proper_Nouns
TimeSegs
NODConvet
LINConvert
NodeTRic
NetToCSV
Text_File
Drop_File
Select_File
Include_File
String_File
.net
.pr
.ptg
.stp.csv
.stw.csv
.log
.wrd
.wtg
OptiComm.list
Z_list
VISij_map
VISij_animation
QAPNet.stats
NOD.csv
LIN.csv
Proper_Nouns_Lis
Text_File_w_Time
UCINET
Pajek
MultiNet
Negopy
Negopy.files