ECE 657A: Data Demos – Data and Knowledge Modelling and Analysis
ECE 657A: Data Demos
Data and Knowledge Modelling and Analysis
Mark Crowley
January 18, 2017
Mark Crowley ECE 657A: Data Demos
Guess the Dataset
Mark Crowley ECE 657A: Data Demos
Answer
Shark Attacks!
Is there missing Data?
Historical Data? Accuracy?
Is data biased?
Comment on Pie charts: this is the only acceptable use of pie
charts (binary high level view) 😉
Link
http://spotfire.tibco.com/demos/shark-
attacks?type=Interactive
Investigate
Why the gender difference?
Overlay economic data.
Overlay total population.
Overlay tourism popularity.
others?
Mark Crowley ECE 657A: Data Demos
http://spotfire.tibco.com/demos/shark-attacks?type=Interactive
http://spotfire.tibco.com/demos/shark-attacks?type=Interactive
Guess the Dataset
Mark Crowley ECE 657A: Data Demos
Answer
World Cup Soccer Goal Difference
Missing data?
Is there enough data?
Is the data likely to be accurate?
Is data likely to contain biases?
Link
http://spotfire.tibco.com/demos/spotfire-soccer-
2014?type=Interactive
Investigate
Income level
Socioeconomic stability
Colonial History
others?
Mark Crowley ECE 657A: Data Demos
http://spotfire.tibco.com/demos/spotfire-soccer-2014?type=Interactive
http://spotfire.tibco.com/demos/spotfire-soccer-2014?type=Interactive
Guess the Dataset
Mark Crowley ECE 657A: Data Demos
Guess the Dataset: Hint 2
Mark Crowley ECE 657A: Data Demos
Pokemon Stats
Kaggle: Pokemon Data Clusters
Original Dataset:
Pokémon for Data Mining and Machine Learning
missing data?
outliers?
known distribution?
causal interaction? underlying simple model?
Mark Crowley ECE 657A: Data Demos
https://www.kaggle.com/jonathanbouchet/d/alopez247/pokemon/pokemon-data-clusters
Pokémon for Data Mining and Machine Learning
Guess the Dataset
Mark Crowley ECE 657A: Data Demos
Guess the Dataset: Hint 2
Mark Crowley ECE 657A: Data Demos
Uber Pickups in New York
Kaggle: Uber pickups in New York City
Missing data?
Noisy?
What would smoothing do?
Mark Crowley ECE 657A: Data Demos
https://www.kaggle.com/dotman/d/fivethirtyeight/uber-pickups-in-new-york-city/data-exploration-and-visualization
Guess the Dataset: Hint 1
Mark Crowley ECE 657A: Data Demos
Guess the Dataset: Hint 1
Mark Crowley ECE 657A: Data Demos
Genomic Data
Circos Visualization: http://circos.ca/intro/genomic data/
Each chromosome shown as a wedges with length scale.
”Data placed outside of the chromosome ring represents
degree of small- and large- scale variation in the genome at a
given position found between different populations.”
”Data placed on top of the chromosome ring highlights
positions of genes implicated in disease, such as cancer,
diabetes, and glaucoma.”
links
grey: disease-related genes found in the same biochemical
pathway
coloured: degree of similarity for a subset of the genome
Mark Crowley ECE 657A: Data Demos
http://circos.ca/intro/genomic_data/
Data Analysis Examples