程序代写代做代考 Excel chain data science data mining MET CS 689 B1 Designing and Implementing a Data Warehouse Andrew D Wolfe, Jr.

MET CS 689 B1 Designing and Implementing a Data Warehouse Andrew D Wolfe, Jr.

MET CS 689 B1
Designing and Implementing a Data Warehouse

Mary E Letourneau
Reporting, Analysis and Visualization
April 8 and 11, 2020

1

MET CS 689
Data Warehousing
Mary E Letourneau
Reporting
April 8 and 11, 2020

2

Reports
Who Needs ‘Em?
What reports of what data?
Key Performance Indicators (KPI)
Trouble Tracking
Technical mechanisms for reporting

Who needs reports?
Reports are those explorations of data warehouse data that are prepared on a regular basis
Reports provide a means of monitoring operations
Typically reports are prepared for middle or executive management
Occasionally prepared for consultants or auditors
Primarily sectioned and sorted on Date dimension

What reports of what data?
Report
Production Monitoring
Sales tracking
Expenses
Supply Chain
Promotion response

What reports of what data?
Report Data
Production Monitoring Facts: production records
Dimensions: products, locations
Sales tracking Facts: sales
Dimensions: sales org, products
Expenses Facts: expense reports/payments, payables
Dimensions: expense type, organization
Supply Chain Facts: supply shipments, projections
Dimensions: supplier, product
Promotion response Facts: Sales campaign
Dimensions: types of response, sales from response, customer
all of these use Date dimension

Key Performance Indicators (KPI)
Fact attributes directly indicating strategic success for organization
Sometimes fact quantities, e.g., numbers of sales
Primary topic of regular reports
Examples:
Sales pipeline
Employee productivity
Customer satisfaction
Appreciation of assets and inventory

Lead and Lag in KPI
Lag
How you’ve performed
Revenue Recognition, Profitability, Customer satisfaction surveys
Typically from Operational Data Systems – traditional data warehousing
Lead
Projecting upcoming results
Production pipeline, advertising response, consumer engagement
Big data sources like clickstreams and tweets can often provide valuable early projection of performance

Trouble Tracking Reports
Some regular reports are on a provisional basis for applying corrections
Productivity issues
Gaps in supply pipeline
Compensating for market changes

Technical mechanisms for reporting
Paper who?
Web interfaces
Summaries distributed through email or workflow
Dashboards
All require drill-down and roll-up

MET CS 689
Data Warehousing
Mary E Letourneau
Analytics
April 8 and 11, 2020

11

Is this the opposite of Reporting?
More ad-hoc
Works largely on Big Data semi-structured datasets
Works into data discovery and data science
We encounter analytics every time we go to Amazon

Semi-structured data?
Data Analytics begin with questions
Typically focused on clickstream because of sales focus
Tweets and hashtags for popular sentiment
Political data mining
Is this really unstructured?

What raw data?
We can’t really run analytics on raw raw data
We can use facts, whether in fact tables or not
We will want to use big data after conditioning and structuring it
This is distinct from entity resolution, that is, when it is creating dimensions

So what is “Analytics”?
The primary tool for analytics is statistical analysis
We are looking at volumes of data that are too large for ”eyeballing”
Statistical functions can help identify the nuggets of real information within a large dataset

Where do we start?
Generic exploration of data
Derive hypotheses
the analysis won’t create itself (yet!)
Look for ‘wrinkles’ or anomalies, outliers, local minima and maxima
Frame the measures to check the hypothesis
Determine the dimensions applicable to the hypothesis

Variables for Analytics
Analysis can take place using two, three, or more different variables
Often the time dimension is used as the reference variable
Typically one or more measures (fact attributes) are analyzed using statistics
Characterize the data distribution among the measures
Are logarithmic representations needed?
Create a pseudo-hierarchy of value ranges

MET CS 689
Data Warehousing
Mary E Letourneau
Visualization
April 8 and 11, 2020

18

Different forms of visualization

Bookmark this!
https://datavizcatalogue.com

Choosing your Visualization
What are the measures you want to explore?
What are the references (dimensions) that are pertinent to your exploration?
Are hierarchies available, or can they be created, to support drill-down?
Please don’t include too many variables!
Which graph type best supports the hypothesis you are exploring?

Approach for choosing graph type
How do you want to project variables against each other?
How many variables?
What is the volume of data?
Is time a variable of interest?
Are variables categorized (hierarchical) or not?
A Abela © 2009 http://extremepresentation.com

Chart Chooser

Dimension on horizontal, especially Time
Measure(s) up vertical
Each measure is a 2-d graph
Compare changes in measures with dimension
Variants: bar/column chart

Line Chart

Home Freelancer

How do you read this?

Scatter Chart
Wide assortment of facts
No linear dimension for line chart
Looking for correlations across axes
Note additional info via colors and data point shapes – low-cardinality dimensions

http://datadrivenjournalism.net/resources/DDJSchool_Tutorial_Analysing_Datasets_with_Tableau_Public

How do you read this?

Aha!

No low youth unemployment with low secondary education
(Almost) No high youth unemployment with low secondary education

Bubble Chart
Another variable to scatter chart
Bubbles will be measures
Take correlation of axes and determine if the bubble size relates to either
https://upload.wikimedia.org/wikipedia/commons/a/a0/Bubble_Chart_of_Crime_versus_Poverty_in_50_states.jpg

How do you read this?

How do you read this?

Radar/Circle Chart
http://www.practicaldb.com/data-visualization-consulting/outlier-detection/

Dimensions with repeating cycles
TIME
Helps to discover patterns in data

How do you read this?

Heat Maps
Dimensions on both axes
Render measures with color gradient
Size (a la Bubble Charts) can also present measures

Tableau 201: How to Make a Heat Map

How do you read this?

Map Graphs
Another “natural” rendering
present one or more variables against geography with
labels/values
colors – excellent “heat maps”

Map Graph example

Just say no! – Pie Charts
only useful for finding biggest and smallest

Just say no! – Tree Maps

https://techandmate.com/blog/different-chart-experiences-in-tableau/
What does this actually tell us?

Just say no! – Stacked Charts
Relative proportions not visually distinguishable

https://www.tutorialspoint.com/tableau/tableau_bar_chart.htm

Well, Maybe

http://www.storytellingwithdata.com/blog/2012/12/and-winner-is

Start by selecting your data
What are your sources for information?
You need data in one or more tables
Common variables must join the tables in some way
However not all visualizations present as joins per se

Tableau Visualization
Use each workbook for a particular hypothesis
Create and attach data sources to form and join the information you want
Use a sheet in the workbook to create a visualization testing your hypothesis

Be honest!
Especially with yourself!
Don’t let confirmation bias draw your graph or prejudge your conclusion

What does this mean?

Widgets
Jackelopes

Problems with this graph

Sample size too small to be statistically useful
Scope of data uncertain – what other dimensions or measures apply?
Outliers probably unusable
What would it mean to be above or below the line? What is “widgets per jackelope”?
Is there any kind of line, or dip, or other way you can draw a conclusion from this set of dots?

So what does this mean?

Widgets
Jackelopes

Add some sensationalistic context…

What about 3-D?
Rendering in 2-D is subject to distortion
Must be able to rotate and zoom to explore
Don’t add spurious info to make something splashy

What can we do with this?

https://www.livestories.com/blog/five-ways-to-fail-data-visualization

However…

In Defence of 3D Charts…

Useful References
https://eazybi.com/blog/data_visualization_and_chart_types/

Home

Choose Right Chart Type for Data Visualization. Part 1: Data Comparison


https://www.livestories.com/blog/five-ways-to-fail-data-visualization

MET CS 689
Data Warehousing
Mary E Letourneau
Key Points
April 8 and 11, 2020

52

Module 4 Key Points (ten questions)
Information life-cycle management (Krishnan ch 12)
Components – be able to describe the various parts of each component
ILM – Information management policies
E.g. Data acquisition, quality, retention, security
Governance
E.g. Governance council members and responsibilities
Technology Solution
E.g. Data quality, enrichment, transformation, archival and retention, master data management, metadata
Benefits
Big Data special considerations

Module 4 Key Points – con’t
Big Data Analytics, Visualization, and Data Scientists (Krishnan ch 13)
Big Data Analytics
Data discovery
Data acquisition, tagging, classification, modeling
Visualization
Reporting (Module notes)
Types of reports
KPI Reports
Ad Hoc Reports
BI Reports
Delivery of reports
Analyses
Drill-down and Roll-Up
Drill-across

Module 4 Key Points – con’t
Reporting (Module notes) – con’t
Visualization
Choosing the data
Choosing the visualization type
Understand the types of visualizations (line chart, scatter plot, bubble chart, etc) and when they are appropriate (relationship, comparison, distribution, composition)
Common mistakes
Forwarding Data to Further Stores and Uses
OLAP and Cubes
Visualized Cubes – Slicing, Dicing, Pivoting
Warehouse Data
Data Marts
Data Feeds
Data Lakes
Data Vaults

MET CS 689
Data Warehousing
Mary E Letourneau
Final Project
April 8 and 11, 2020

56

Final Project
Design
Star Schema – Must have a REALLY good reason for a snowflake
One fact table – Min 100,000 rows
Three or four (preferably) dimensions
One time dimension
One dimension with hierarchies
One or two slowly changing dimension attributes, at least one of which is implemented with Type 2.
Submit a Visio or Lucidchart diagram
ETL
Can use any combination of methods
Submit your code and/or package

Final Project (con’t)
Visualization
Using Tableau or Power BI – at least two visualizations of the data
One visualization needs to have drill-down/roll-up (i.e. use the hierarchical dimension)
One visualization needs to demonstrate filtering
One visualization needs to use one or more calculated fields
Between all visualizations, all dimensions are used.
Submit at least four images … one for each of the above criteria. Expectations are that it could require 8 to 10 images to demonstrate everything.
Word document
Assumptions
Design process
ETL process and why that process was chosen
Professional document, with references as needed.
PowerPoint – 5 – 10 minute presentation
Zip all files and submit to the Blackboard.

Have a Good Evening and a Great Week!
End of Presentation

59

/docProps/thumbnail.jpeg