CS考试辅导 Information Management

Information Management
Data Mining
(from the book Data Mining: Concepts and Techniques)

Copyright By PowCoder代写 加微信 powcoder

Università degli Studi di Mining
• What is data mining?
It is a set of techniques and tools aimed at extracting interesting patterns from data
• Data mining is part of KDD (knowledge discovery in databases)
• KDD is “the process of identifying valid, novel, potential useful, and ultimately
understandable patterns in data”
• Data warehouses are crucial for data mining purposes

Types of data mining
• Concept description
• Characterization: concise and succinct representation of a collection of data • Comparison: description comparing two (or more) collections of data
• Descriptive data mining
• Describes concepts or task-relevant data sets in a concise and informative
• Predictive data mining
• Builds a model based on data and on their analysis, the model is then used to predict trends and properties of unknown data

Data mining branches (1)
• Aggregation queries are a very simple kind of mining
• Classification
• Build a model to categorize data in classes
• Regression
• Build a model to predict the result of a real-valued function
• Clustering
• Organize data into groups of similar items
• Outlier detection
• Identify unusual data items

Data mining branches (2)
• Trend analysis and forecasting
• Identify changes in patterns of data over time
• Detect dependencies among data
• Identify whether attributes are correlated with each other • Identify which attributes likely occur together
• Temporal pattern detection (or time series mining) • Identify common patterns in time series

Data mining: be careful! (1)
• Overfitting
• Identify spurious patterns: be careful not to take coincidence for causality!
• May be due to the analysis of too many attributes or of a limited number of data items
• Example: ask 10.000 subjects to predict the color of 10 face-down cards, 10 subjects predicted correctly all the 10 cards
conclusion: 1 out 1.000 subjects have extra sensory perception NO
• Report “obvious” results that do not derive from data analysis • Example: women are more likely to have breast cancer

Data mining: be careful! (2)
• Confuse correlation and causation
• Data mining identifies correlated attributed, but this does not always imply
causality relationship!
• Example: overweight people are more likely to drink diet soda
Conclusion: diet soda causes obesity NO
• It is necessary to correctly interpret mining results
• Data mining algorithms are not magic
• Results must be carefully analyzed to avoid drawing wrong conclusions

Examples of application domains
• Market analysis
• Targeted marketing, customer profiling
• Determining patterns of purchases over time for suggestions • Cross market analysis
• Corporate analysis and risk management • Finance planning and asset evaluation
• Resource planning
• Competition

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com