Introduction Text analytics NVivo Case study Case study II
CORPFIN 2503 – Business Data Analytics: Text analytics
Week 11: October 18th, 2021
£ius CORPFIN 2503, Week 11 1/50
Copyright By PowCoder代写 加微信 powcoder
Introduction Text analytics NVivo Case study
Case study II
Introduction
Text analytics
Case study
Case study II
CORPFIN 2503, Week 11
Introduction Text analytics NVivo Case study Case study II
Introduction
In the previous lectures, we analyzed the information from tables, i.e., numerical data (e.g., GDP growth).
However, there is a lot of potentially useful information in other formats such as text:
• nancial statements
• various reports
• e-mails and other written communications • social media: Facebook, Twitter
• customer surveys • etc.
Around 80% of all business information is text.
Various text analytics techniques can be used to analyze texts.
£ius CORPFIN 2503, Week 11 3/50
Introduction Text analytics NVivo Case study
Case study II
Content analysis (and maybe text analytics) was rst used by T. C. Mendenhall in 1887:
CORPFIN 2503, Week 11
Introduction Text analytics NVivo Case study Case study II
Text analytics: Steps
Text analytics can help extract meanings, patterns, relations, and structure from the text.
Key steps of text analytics:
1. Download documents
2. Analyze documents
3. Quantitative analysis using the results from text analysis (optional)
4. Make conclusions and recommendations.
£ius CORPFIN 2503, Week 11 5/50
Introduction Text analytics NVivo Case study
Case study II
Text analytics: Types
• Sentiment analysis • Content analysis
• Cluster analysis
£ius CORPFIN 2503, Week 11
Introduction Text analytics NVivo Case study Case study II
Sentiment analysis
Sentiment analysis is the process of computationally identifying and categorizing opinions expressed in the text.
The key point is to determine the writer’s attitude with respect to a particular topic:
• positive
• negative, or • neutral.
Bloomberg and Eikon use text analytics to identify a news story or tweet as being relevant for an individual stock and to assign a sentiment score to each story or tweet.
£ius CORPFIN 2503, Week 11 7/50
Introduction Text analytics NVivo Case study
Sentiment analysis II
Sentiment analysis can also detect 8 emotions: 1. anger
2. anticipation
3. disgust
6. sadness 7. surprise 8. trust.
Case study II
CORPFIN 2503, Week 11
Introduction Text analytics NVivo Case study Case study II
Sentiment analysis: ‘s Social Media Monitor:
£ius CORPFIN 2503, Week 11 9/50
Introduction Text analytics NVivo Case study Case study II
Content analysis
Content analysis identies the presence of certain words or phrases in the text.
One analyzes the presence and relations of such words and then make inferences about the messages in the text.
One can compute:
• total number of words
• average length of words
• frequency of words
• the number of unique words • etc.
CORPFIN 2503, Week 11
Introduction Text analytics NVivo Case study
Content analysis II
According to Berelson (1952), content analysis can be used to: • reveal systematic dierences among several documents
• detect the presence of propaganda
• identify the intentions of focus of a writer
• identify psychological or emotional state of a writer.
Case study II
£ius CORPFIN 2503, Week 11
Introduction Text analytics NVivo Case study
Cluster analysis
Cluster analysis helps get insights from the text.
Cluster analysis groups similar phrases, words, reviews etc. into clusters.
E.g., customer reviews can be grouped into several clusters: • positive
• negative, and
• neutral.
Case study II
£ius CORPFIN 2503, Week 11
Introduction Text analytics NVivo Case study Case study II
Text analytics: Challenges
Text is unstructured data, in contrast to numerical and categorical data provided in various tables.
=⇒ It is much harder to analyze it. Text might have 50,000 dimensions. . .
£ius CORPFIN 2503, Week 11 13/50
Introduction Text analytics NVivo Case study Case study II
Text analytics: Challenges II
The texts can have dierent formats and layouts (especially, websites).
The texts’ length can also be dierent: novels vs. tweets.
The same text can be written in more than one language.
Texts to be expressed in an informal style, contain various errors, have unusual grammatical constructs.
£ius CORPFIN 2503, Week 11 14/50
Introduction Text analytics NVivo Case study
Text analytics: Challenges III
Many techniques and methods are very complicated: • articial neural networks
• machine learning
They are beyond the scope of this course.
Case study II
£ius CORPFIN 2503, Week 11
Introduction Text analytics NVivo Case study Case study II
Text analytics: Tools
• Manual (not recommended unless the project is very small) • Software:
• SAS Text Analytics, SAS Text Miner
• Matlab ( . Banchs, `Text Mining with MATLAB’, 2013) • etc.
£ius CORPFIN 2503, Week 11 16/50
Introduction Text analytics NVivo Case study
Case study II
NVivo is a qualitative data analysis software.
NVivo helps users organize and analyze non-numerical or unstructured data.
A free licence of NVivo can be obtained from the University’s website.
We will only learn very basic stu of NVivo. Overall, it is fairly complex package.
£ius CORPFIN 2503, Week 11
Introduction Text analytics NVivo Case study Case study II
Analysis with NVivo
Suppose we would like to analyze IBM 2017 annual report.
We want to generate:
Word frequency table: shows the absolute and relative word frequency.
Word cloud: Tree map:
Word tree:
visualization of word frequency table.
a diagram created from a word frequency count; the size of the region for a particular term is equal to the frequency appearance of that word.
shows the context surrounding a particular word from across your data.
£ius CORPFIN 2503, Week 11 18/50
Introduction Text analytics NVivo Case study Case study II
Word frequency table
£ius CORPFIN 2503, Week 11 19/50
Introduction Text analytics NVivo Case study Case study II
Word cloud
£ius CORPFIN 2503, Week 11 20/50
Introduction Text analytics NVivo Case study Case study II
A tree map
£ius CORPFIN 2503, Week 11 21/50
Introduction Text analytics NVivo Case study Case study II
Word tree for `risk’
£ius CORPFIN 2503, Week 11 22/50
Introduction Text analytics NVivo Case study Case study II
Case study: Hotel guest experience and satisfaction
The paper can be downloaded from the University’s library website.
£ius CORPFIN 2503, Week 11 23/50
Introduction Text analytics NVivo Case study Case study II
The study aims to:
• explore and demonstrate the utility of big data (next lecture is about big data) analytics to better understand important hospitality issues
• namely the relation between hotel guest experience and satisfaction
• apply a text analytical approach to a large quantity of consumer reviews extracted from Expedia.com to de-construct hotel guest experience and examine its association with satisfaction ratings.
£ius CORPFIN 2503, Week 11 24/50
Introduction Text analytics NVivo Case study Case study II
Research questions
1. What is the nature and underlying structure of the hotel guest experience represented in customer reviews?
2. Can hotel guest experience represented in customer reviews be used to explain guest satisfaction?
£ius CORPFIN 2503, Week 11 25/50
Introduction Text analytics NVivo Case study Case study II
Source: Xiang et al. (2015), p. 123.
£ius CORPFIN 2503, Week 11 26/50
Introduction Text analytics NVivo Case study Case study II
Customer reviews for all hotels listed by Expedia for the 100 largest U.S. cities.
10,537 hotels and 60,648 customer reviews, in total.
Each hotel on average had approximately 6 customer reviews.
There are around 50,000 hotels in the US; thus, the sample represents >20% of the entire hotel population in the US.
£ius CORPFIN 2503, Week 11 27/50
Introduction Text analytics NVivo Case study Case study II
Sample III
There are 6,642 unique words from all customer reviews.
Word frequencies are not uniformly distributed: • `hotel’ (33,549 times)
• `room’ (22,213 times)
• many words with a frequency of one, About 1.3 million word-review pairs.
One customer review contains about 22 unique words, on average.
£ius CORPFIN 2503, Week 11 28/50
Introduction Text analytics NVivo Case study Case study II
Source: Xiang et al. (2015), p. 123.
£ius CORPFIN 2503, Week 11 29/50
Introduction Text analytics NVivo Case study Case study II
The authors excluded:
Data analysis
• generic nouns such as `size’, `people’, `eort’, and `fault’, etc. due to the lack of specicity
• generic verbs such as `need’, `want’, `like’, and `oer’ because it was assumed meanings of these words were already captured in the objects of these verbs
• words with high ambiguity such as `break’, `rm’, `look’, `ground’, and `line’ etc.
• words related to hotel brands such as `hilton’, `marriott’, and `ramada’ since hotel identity was contained in the original downloaded dataset anyway.
As part of the coding process, all possible variations of a specic word (e.g., plurals and misspellings) were manually searched and identied.
£ius CORPFIN 2503, Week 11 30/50
Introduction Text analytics NVivo Case study Case study II
Data analysis II
416 `primary’ words that were used by consumers to describe their experiences at a specic hotel.
All the variations of these 416 words were substituted with each of the corresponding primary words.
416 primary words and their variations represented roughly 40% (414,833/1,048,575) of occurrences among all 6642 unique words.
At this stage, the sample includes 5,990 hotels because some of the customer reviews and thus the hotel cases did not contain any of these 416 words and were thus dropped.
£ius CORPFIN 2503, Week 11 31/50
Introduction Text analytics NVivo Case study Case study II
Data analysis III
Many observations were further dropped due to low frequency of primary words and the number of reviews.
Final sample includes 529 hotels and 80 guest experience-related words.
£ius CORPFIN 2503, Week 11 32/50
Introduction Text analytics NVivo Case study Case study II
An automated Web crawler was used to download the data.
Frequencies of these 416 words were calculated for each of the 10,537 hotel properties using the PivotTable function in Microsoft Excel.
SPSS was used for quantitative analysis.
£ius CORPFIN 2503, Week 11 33/50
Introduction Text analytics NVivo Case study Case study II
Source: Xiang et al. (2015), p. 125.
£ius CORPFIN 2503, Week 11 34/50
Introduction Text analytics NVivo Case study Case study II
Analysis II
Source: Xiang et al. (2015), p. 126.
£ius CORPFIN 2503, Week 11 35/50
Introduction Text analytics NVivo Case study Case study II
Analysis III
Factor analysis is used to generate 6 factors; that is, further reduce the number of primary words into meaningful groupings of words that would be easier to interpret.
6 factors consist of 34 words out of the nal 80 words explaining 22.84% of the variance.
£ius CORPFIN 2503, Week 11 36/50
Introduction Text analytics NVivo Case study Case study II
Analysis IV
Source: Xiang et al. (2015), p. 127.
£ius CORPFIN 2503, Week 11 37/50
Introduction Text analytics NVivo Case study Case study II
Example of factor loadings
Suppose we have the following 2 reviews:
1. Review #1: clean, expensive, airport, friendly. Rating = 4. 2. Review #2: expensive, distance, beach, bed. Rating = 3.
F1 F2 F3 F4 F5 F6
0.436 0.313 = 0.123 0.443
0.313 0.459 = 0.772 0
CORPFIN 2503, Week 11
Introduction Text analytics NVivo Case study Case study II
Analysis V
Regression analysis:
Source: Xiang et al. (2015), p. 128.
£ius CORPFIN 2503, Week 11 39/50
Introduction Text analytics NVivo Case study Case study II
Hybrid and Deals are the most important factors associated with guest satisfaction.
Core Product, although signicant, was not as important as Hybrid, Deals, and Family Friendliness.
The negative sign for Hybrid suggests that this factor, represented by the 14 guest experience-related words, connotes a negative meaning for guest satisfaction.
£ius CORPFIN 2503, Week 11 40/50
Introduction Text analytics NVivo Case study Case study II
Case study: Mapping of hotel brands
The paper can be downloaded from the University’s library website.
£ius CORPFIN 2503, Week 11 41/50
Introduction Text analytics NVivo Case study Case study II
The study aims to:
• improve our understanding on how insights into how brands are perceived
• create perceptual maps from the most frequent terms used in a data set collected from Expedia.com.
£ius CORPFIN 2503, Week 11 42/50
Introduction Text analytics NVivo Case study Case study II
Research questions
1. How do online reviews reect consumer perceptions of brands in the lodging industry?
2. How do online reviews dierentiate hotel brands?
£ius CORPFIN 2503, Week 11 43/50
Introduction Text analytics NVivo Case study Case study II
Sample is the same as this the previous case study.
£ius CORPFIN 2503, Week 11 44/50
Introduction Text analytics NVivo Case study Case study II
Source: Krawczyk & Xiang (2016), p. 29.
£ius CORPFIN 2503, Week 11 45/50
Introduction Text analytics NVivo Case study Case study II
XLSTAT, a statistical software tool for use within Microsoft Excel, was used to perform the statistical analysis and resulting perceptual maps (see http://www.xlstat.com/en/).
£ius CORPFIN 2503, Week 11 46/50
Introduction Text analytics NVivo Case study Case study II
Analysis II
Source: Krawczyk & Xiang (2016), p. 34.
£ius CORPFIN 2503, Week 11 47/50
Introduction Text analytics NVivo Case study Case study II
Analysis III
Source: Krawczyk & Xiang (2016), p. 37.
£ius CORPFIN 2503, Week 11 48/50
Introduction Text analytics NVivo Case study Case study II
Analysis IV
Hilton, Marriott, and Hyatt are fairly well clustered together, suggesting that they are perceived as being similar by consumers.
Carlson is closely associated with `air’, `pillows’, `bed’, `breakfast’, `airport’ and `shuttle’, etc.
Marriott/Hilton/Hyatt cluster seems to be closely associated with `location’, `amenities’, `experience’, and `sta’.
Starwood is distinctly linked with `checkin’, `tness’, `renovated’, `valet’, and `decor’.
Wyndham brand appears to be associated with a few words about negative experiences such as `noise’, `smelled’, and `freeway’.
£ius CORPFIN 2503, Week 11 49/50
Introduction Text analytics NVivo Case study Case study II
Required reading
, , . Gerdes Jr., Muzaer Uysal (2015). `What can big data and text analytics tell us about hotel guest experience and satisfaction?’ International Journal of Hospitality Management 44: 120-130.
£ius CORPFIN 2503, Week 11 50/50
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com