Week 10 Lecture Review Questions
Professor Yuefeng Li
School of Computer Science, Queensland University of Technology (QUT)
Social media analysis
Copyright By PowCoder代写 加微信 powcoder
It is defined as, “the art and science of extracting valuable hidden insights from vast amounts of semi-structured and unstructured social media data (e.g., Twitter, Facebook and etc.) to enable informed and insightful decision making.”
The lecture notes discussed two research fields: Microblog Retrieval and Sentiment Analysis. The models and algorithms we discussed in the previous lectures can be used here, but you should know some special characteristics of the two fields. For example, a “Tweet” is a short text message, it may include hashtag features or @username feature.
Question 1. Assume C is a collection of tweets. We can use the following query likelihood method to calculate the probability of a tweet d occurring for a given query Q.
𝑃(𝑄,𝑑) = ( 𝑐(𝑤,𝑄)log.1+ 𝑐(𝑤,𝑑) 5+|𝑄|log 𝜇 !∈#,% 𝜇 × 𝑃(𝑤|𝐶) 𝜇 + |𝑑|
Let |d| and |Q| are the respective lengths (size) of the tweet d and query Q, and μ is the smoothing parameter. Interpret the meaning of c(w, d), c(w, Q), and P(w|C).
Question 2. Which of the following is false for sentiment analysis? and justify your answer.
(1) Sentiment analysis (opinion mining) discovers users’ opinions about products or services in on-line reviews or feedback or observes trends in public mood to analysis of clinical records.
(2) The orientation is the opinion provided about the entity and/or the aspect that was provided by the opinion holder at a specific time.
(3) The goal of emotion classification is to separate subjective from objective information, a binary classification task.
(4) Aspects are features, components or functions of the entity. They can be nouns and/or noun phrases
(5) Polarity classification is to group the expressed opinion in a document, a sentence or an entity feature/aspect in positive, negative or neutral regions.
Social search
Social search is a term used to describe search applications that involve communities of people (users) to tag content or answer questions. It is fast becoming the key search paradigm on the web. Users can interact online in a number of ways. For example, a user might visit a social media site that has recently gained a lot of popularity.
The online world is a very social environment where users communicate with each other in various forms. These social interactions provide search systems with new and unique sources of data to exploit, as well as myriad privacy concerns.
Unlike the models we mentioned earlier, we also have a wealth of user interaction data that can help improve the overall user experience in new and interesting ways. For example, user tags used by many social media sites allow users to assign tags to items. The other is collaborative search, which involves a group of users with a common goal searching together in a collaborative environment.
Filtering and recommender systems
The filtration system has two key components. First, the long-term information needs of users must be accurately expressed. This is done by constructing a profile for every information need. Second, given a document that has just arrived in the system, a decision- making mechanism must be devised to identify which are the relevant profiles for that document.
Not only must this decision-making mechanism be efficient, especially when there may be thousands of profiles, but it must also be highly accurate.
Therefore, the difficulty with a filtering system is that it should not miss relevant documents (high recall), and perhaps more
importantly, it should not constantly alert the user to irrelevant documents (high precision).
Question 3. The textbook (chapter 10) showed and a concrete example of static filtering by using a language modeling framework. The following is the equation to calculate a word probability distributionforthegivenprofilemadeupbyT1,…,Tk (thepiecesoftext,e.g.,querydescriptions, documents, or other information):
It then uses negative KL divergence between profile and document model to compute a relevance score as follows:
The above equations used in the textbook don’t meet the standard math descriptions because of the confusion of using the symbol “P”. The problem is that P is either used as a profile variable or represents a probability function. Please update the two equations in correct mathematical expression.
Question 4. (Recommender systems)
Collaborative filtering leverages relationships between users to improve how items (documents) are matched to users (profiles). The figure below shows a group of users and their ratings for an item. The user with the question mark above its head has not rated this item yet. The goal of the recommendation algorithm is to fill in these question marks.
Suppose your team wants to design functions to implement a collaborative filtering-based recommender system, and your task is to determine the function name and its input and output data structures. For privacy reasons, you can use only numbers or IDs to represent users, such as 1 or u1, 2 or u2, etc.; for items, you can use numbers or their names; and ratings are expressed as integers (e.g., 0 to 5).
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com