CS代考 IFN647 Text, Web And Media Analytics

IFN647 Text, Web And Media Analytics

Text, Web And Media Analytics

Copyright By PowCoder代写 加微信 powcoder

Social Media Li  |  Professor
School of Electrical Engineering and Computer Science
Queensland University of Technology
S Block, Level 10, Room S-1024, Gardens Point Campus
ph 3138 5212 | email 

1. Social media analysis
Microblog Retrieval
Sentiment analysis
2. Social search
Searching Tags
Inferring Missing Tags
Browsing and Tag Clouds
Searching within communities
3. Filtering and recommender systems
Static filtering
Adaptive filtering
Recommender systems
Collaborative Filtering
Rating using User Clusters
Rating using Nearest Neighbors

In this week, we mainly discuss the problems in social media analytics, social search, filtering and recommender systems.
We then discuss possible solutions to these problems based on the knowledge you have gained from previous lectures.

1. Social media analytics

It is defined as, “the art and science of extracting valuable hidden insights from vast amounts of semi-structured and unstructured social media data (e.g., Twitter, Facebook) to enable informed and insightful decision making.” 
It is also commonly used by marketers to track online conversations about products and companies.
There are three main steps in analysing social media
Data identification, identifying the subsets of available data to focus on for analysis;
What content is of interest. In addition to the text of content, we want to know: who wrote the text?
Where was it found or on which social media venue did it appear?
Are we interested in information from a specific locale?
When did someone say something in social media?
Data analysis, and
Information interpretation.

Social media

Web-based services that allow individuals, communities, and organisations to produce, share and engage with user-generated content.

Media platforms and technologies

e-commerce gateways;

microblogs (e.g., Tumblr, Instagram, Twitter);

social networking (e.g., LinkedIn, Facebook, MySpace);

multimedia portals (e.g., Vimeo, Twitter, Facebook, Periscope, TikTok, YouTube);

virtual worlds (e.g., Second Life);

review platforms (e.g., Tripadvisor, Foursquare); and

social gaming (e.g., World of Warcraft).

Microblog Retrieval

Different types of microblogging technologies are available within social media to help achieve goals
Twitter is a microblogging service introduced in March 2006. With over 125 million daily active users, Twitter is ranked among the most popular social media platforms.
The platform allows everyone to create and share information and ideas in real-time.
“Tweet” is a term that refers to a short text message that a Twitter user can produce.
This short plain text (tweet) can also include videos, photographs, and website URLs.
Until recently, Twitter allowed 140 characters for a plain text message; however, in November 2017, the length was expanded to 280 characters.
The feature can be used to mention a specific person for interaction, e.g., user network
Hashtag feature can also be used to annotate user messages where the prefix “#” character is used as a non-spacing word, e.g., Hashtag based search.

Example 1: tweets in JSON format

Query likelihood
It uses Bayesian (Dircihlet prior) smoothing. It considers both the document and the query size.

c(w, d) is the word count in the given document d,
c(w, Q) is the word count in the given query Q,
|d| and |Q| are the respective lengths (size) of the document and query
P(w|C) is the probability of the word in the collection that is used to normalise the model.
μ is the smoothing parameter

Sentiment Analysis

Sentiment analysis (opinion mining) discovers users’ opinions about products or services in on-line reviews or feedback, or observes trends in public mood to analysis of clinical records.
It is widely used to voice of customers (users) materials such as reviews and survey responses, online and social media, and healthcare materials for applications that range from marketing to customer service to clinical medicine.
Opinion can be represented as a tuple of Entity, Aspect, Orientation, Opinion Holder and Time.
An entity is the name of an entity, which could refer to a product for example.
An aspect can be a feature, component or function of the entity.
The orientation is the opinion provided about the entity and/or the aspect that was provided by the opinion holder at a specific time.

Tasks in Sentiment Analysis

Polarity classification

Group the expressed opinion in a document, a sentence or an entity feature/aspect in positive, negative or neutral regions. 

Subjectivity classification

Its goal is to separate subjective from objective information, a binary classification task.

It is regarded as a prerequisite to sentiment polarity classification. It may be tackled at different levels of granularity. For instance,

At the document level the aim is to distinguish review-like documents from non-review documents or factual newspaper articles from editorial comments.

On a more fine-grained level, the task is to identify individual text passages (e.g., sentences) as being subjective or objective.

Emotion classification

The goal is to classify a piece of text according to a predefined set of basic emotions.

It tries to identify more fine-grained differences in the expression of sentiment, e.g., six “basic” emotions – anger, disgust, fear, happiness, sadness, and surprise.

Tasks in Sentiment Analysis cont.

Source detection

It aims to identify the person, organisation, or more generally, the entity that is the source of subjective information, including named entity recognition and relationship extraction.

It is an information extraction task.

A typical application for sentiment source detection is a multi-perspective question answering system that tries to answer questions of the form: “What is X’s viewpoint/opinion on topic Y?”

It is an information extraction task.

Target detection

The goal of sentiment target detection is to determine the subject of a sentiment expression. For example,

Which blogs report positively and which negatively on the topic of settlement policy?

It is a problem of information retrieval, where sentences or documents are classified or ranked according to their relevance towards a given topic or a question.

E.g., the most recent google-released Bidirectional Encoder Representations from Transformers (BERT) https://arxiv.org/abs/1810.04805

Extraction of Opinion Sentences
Aspects are nouns and/or noun phrases, for example, “face recognition”, “zoom”, and “touch screen” are aspects of the product “camera”.
Opinion words are mostly adjectives. They are the closest adjective to the aspects in the sentence. An opinion lexicon can be used to identify and extract opinion words along with their orientation.
Extraction of opinions:
Build a list of aspects from two sources: product specifications and word synonyms. Product specifications is a list provided by the manufacturer for each product, while synonyms are the matching words taking from the WordNet dictionary.
Identify the aspects and opinions in sentences. You may need to group aspects based on frequency and synonyms.
Pattern mining is applied to find frequent sets of tags that are sets of POS tags that occur together. A set of tags is defined as frequent if it appears in more than 1% (minimum support) of the review sentences.
For example, the tag of aspect appears first, the sequence of tags [NN][VBZ][RB][JJ] corresponds to the sentence “software is absolutely terrible”.
Weighting sentences by adding tags’ weights and then select sentences with high scores.

Adjective, adverb and verb weights
Tags Description Weight
JJ Adjective 1
JJR Comparative Adjective 2
JJS Superlative Adjective 3
RB Adverb 1
RBR Comparative Adverb 2
RBS Superlative Adverb 3
Verb category Orientation Verbs Comments
Tell verbs Positive tell Positively reinforce an opinion
Chitchat verbs Positive argue, chatter, gab Positively reinforce opinion is being
Advise verbs Positive advise, instruct Positively reinforce an opinion
Negative admonish, caution, warn Negatively reinforce the degree of
certainty about a given opinion
Categories are used for verbs. If the sentence contains a verb from positive categories, then “+1” will be added to the weight and if the verb is from negative categories then “-1” will be subscribed from the total weight.

2. Social Search

Social search

Communities of users actively participating in the search process

Goes beyond classical search tasks

Key differences

Users interact with the system

Users interact with other users either implicitly or explicitly

Social search includes, but is not limited to, the so-called social media sites
Collectively referred to as “Web 2.0” (social Web) as opposed to the classical notion of the Web (“Web 1.0”)
Social media sites
User generated content
Users can tag their own and other’s content
Users can share favorites, tags, etc., with others
Digg, Twitter, Flickr, YouTube, Del.icio.us, CiteULike, MySpace, Facebook, and LinkedIn

Social Search Topics

Searching within communities

Document filtering

Recommender systems

Then: Library card catalogs
Indexing terms chosen with search in mind
Experts generate indexing terms
Terms are very high quality
Terms chosen from controlled vocabulary
Now: Social media tagging
Tags not always chosen with search in mind
Users generate tags
Tags can be noisy or even incorrect
Tags chosen from folksonomies https://en.wikipedia.org/wiki/Folksonomy
A Folksonomy is a classification system
The collective assemblage of tags assigned by many users
Make the use of public tags effective.

Types of User Tags
Content-based
car, woman, sky
Context-based
new york city, empire state building
nikon (type of camera), black and white (type of movie), homepage (type of web page)
Subjective
pretty, amazing, awesome
Organizational
to do, my pictures, readme

Example of  is most known for its reference manager to manage and share research papers and generate bibliographies for scholarly articles.

Searching Tags
Tags can be used to describe textual or non-textual items (e.g., images or videos) to provide a textual dimension to items.
These textual representations of items can be very useful for searching; however, tags are very sparse representations of very complex items.
Searching user tags is challenging
Most items have only a few tags
Tags are very short
Boolean, probabilistic, vector space, and language modeling will fail if use naïvely
Must overcome the vocabulary mismatch problem between the query and tags. Possible ways to overcome this problem
Stemming (e.g., stem classes in week 7)
Pseudo-relevance feedback for tag expansion

One unique property of tags is that they are almost exclusively textual keywords that are used to describe textual or non-textual items. Therefore, tags can provide a textual dimension to items that do not explicitly have a simple textual representation, such as images or videos.

These textual representations of non-textual items can be very useful for searching; however, tags are very sparse representations of very complex items.

The simplest way to search a set of tagged items is to use a Boolean retrieval model. However, it may fail.

For example, given the query Q = “fish, bowl” can be read as “fish AND bowl”, which returns items that are tagged with both “fish” and “bowl”. It is likely to produce high-quality results; but may miss many relevant items.

Thus, the approach would have high precision but low recall.
If use a disjunctive (OR) query “fish OR bowl”, will match many more relevant items, but at the cost of precision.

Of course, it is highly desirable to achieve both high precision and high recall. However, doing so is very challenging.

Tag Expansion
It uses search results (pseudo-relevance feedback) to enrich a tag representation.
It overcomes vocabulary mismatch problem by expanding tag representation with external knowledge.
Possible external sources
Web search results
Query logs
After tags have been expanded, we can use standard retrieval models

Age of Aquariums – Tropical Fish
Huge educational aquarium site for tropical fish hobbyists, promoting responsible fish keeping internationally since 1997.

The Krib (Aquaria and Tropical Fish)
This site contains information about tropical fish aquariums, including archived usenet postings and e-mail discussions, along with new …

Keeping Tropical Fish and Goldfish in Aquariums, Fish Bowls, and …
Keeping Tropical Fish and Goldfish in Aquariums, Fish Bowls, and Ponds at AquariumFish.net.
P(w | “tropical fish” )

Example 2.
Tag Expansion Procedure
Use tag “tropical fish” as a query Q to find top-k results;
Select terms with the highest probability, e.g., terms “fish”, “tropical”, “aquariums”, “goldfish”, and “bowls”;
Q is be expanded as Q’= “fish, tropical, aquariums, goldfish, bowls”;
Search by using the enriched query Q’.

Issues in Searching Tags
Even with tag expansion, searching tags is challenging.
Tags are inherently noisy and incorrect.
Many items may not even be tagged!
Typically, it is easier to find popular items with many tags than less popular items with few/no tags.

Inferring Missing Tags
As we just described, items that have no tags pose a challenge to a search system.
How can we automatically tag items with few or no tags?
Uses of inferred tags
Improved tag search
Automatic tag suggestions

Methods for Inferring Tags
TF*IDF if items are textual, such as books, or news articles.

where fw,D is the number of times term w (tag) occurs in item D, N is the total number of items, and dfw is the number of items that term w occurs in.
Classification
Train binary classifier for each tag (use all of the existing tag/item pairs as training data to train the classifiers, and represent an item as a feature vector)
Performs well for popular tags, but not as well for rare tags.
Maximal marginal relevance
Finds tags that are relevant to the item and novel with respect to existing tags (or not very similar to any of the other tags), where t is tag, i is an item and Ti is the current set of tags for item i.

Browsing and Tag Clouds
Search is useful for finding items of interest
Browsing is more useful for exploring collections of tagged items
Various ways to visualize collections of tags
Tag clouds
Alphabetical order
Grouped by category
Formatted/sorted according to popularity

animals   architecture   art    australia   autumn   baby   band   barcelona   beach   berlin 
birthday   black   blackandwhite   blue  california   cameraphone   canada   canon
car cat   chicago   china   christmas   church   city   clouds   color   concert  day   dog 
england   europe  family   festival   film   florida   flower   flowers   food
france   friends   fun   garden   germany   girl   graffiti   green   halloween   hawaii
holiday   home house   india   ireland   italy   japan   july   kids  lake   landscape   light   live
london macro  me   mexico  music   nature   new   newyork   night
nikon nyc ocean   paris   park   party   people  portrait   red   river   rock
sanfrancisco scotland   sea   seattle   show   sky   snow   spain   spring   street
summer sunset taiwan texas thailand  tokyo  toronto  travel
tree   trees   trip   uk usa vacation washington   water wedding

Example Tag Cloud

Searching within communities

Traditional search assumes single searcher

Collaborative search involves a group of users, with a common goal, searching together in a collaborative setting

Example scenarios

Students doing research for a history report

Family members searching for information on how to care for an aging relative

Team member working to gather information and requirements for an industrial project

An online community – Groups of entities that interact in an online environment and that share common goals, traits, or interests.

Collaborative Search
Two types of collaborative search settings depending on where participants are physically located
Co-located
Participants in same location
CoSearch system
Remote collaborative
Participants in different locations
SearchTogether system

Co-located Collaborative Searching

Remote Collaborative Searching
Collaborative Search Scenarios

Collaborative Search cont.
Challenges
How do users interact with system?
How do users interact with each other?
How is data shared?
What data persists across sessions?
Very few commercial collaborative search systems.
Likely to see more of this type of system in the future.

Document Stream

Document Stream

Profile 1.1
Profile 2.1
3. Filtering and Recommender Systems
Static Filtering
Adaptive Filtering

Represents long term information needs
Can be represented in different ways
Boolean or keyword query
Sets of relevant and non-relevant documents
Relational constraints
“published before 1990”
“price in the $10-$25 range”
Actual representation usually depends on underlying filtering model
Can be static (static filtering) or updated over time (adaptive filtering)

Static Filtering
Given a fixed profile, how can we determine if an incoming document should be delivered?
Treat as information retrieval problem
Vector space
Language modeling
Treat as supervised learning problem
Naïve Bayes
Support vector machines

Static Filtering with Language Models
Assume profile consists of K relevant documents (Ti), each with weight αi
Probability of a word given the profile is (variable P means a profile language model,  is used for smoothing):

KL divergence between profile and document model is used as score:

If –KL(P||D) ≥ θ, then deliver D to P (profile)
Threshold (θ) can be optimized for some metric

Please note, the equations used in the textbook don’t exactly meet the standard math descriptions because of the confusion of using the symbol “P”.
It looks P is used as a function if P represents a probability function, but P is also used as a variable to represent a profile.

Adaptive Filtering

In adaptive filtering, profiles are dynamic

How can profiles change?

User can explicitly update the profile

User can provide (relevance) feedback about the documents delivered to the profile

Implicit user behavior can be captured and used to update the profile

Adaptive Filtering Models
Profiles treated as vectors ( P’ is the adapted profile)

Relevance-based language models
Profiles treated as language models

Summary of Filtering Models

Fast Filtering with Millions of Profiles
Real filtering systems
May have thousands or even millions of profiles
Many new documents will enter the system daily
How to efficiently filter in such a system?
Most profiles are represented as text or a set of features
Build an inverted index for the profiles
Distill incoming documents as “queries” and run against index

Evaluation of Filtering Systems
Definition of “good” depends on the purpose of the underlying filtering system

Generic filtering evaluation measure:

α = 2, β = 0, δ = -1, and γ = 0 is widely used

Recommender Systems

Recommender systems recommend items (e.g., products, books or movies) that a user may be interested in.
Amazon.com, Net systems use collaborative filtering to recommend items to users.

Collaborative Filtering
In static and adaptive filtering, users and their profiles are assumed to be independent of each other.
However, in real world, similar users are likely to have similar preferences.
Collaborative filtering exploits relationships between users to improve how items (documents) are

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com