PowerPoint Presentation
A S S I G N M E N T 0 4
B R E A K F A S T C E R E A L S
EXPLORE MANIPULATEIMPORT ANALYZE VISUALIZE
1 2 3 54
In this final assignment,
you will use what you have learned in
GBA 464 Programming for Analytics
to import, explore, manipulate, analyze, and visualize data
using the R programming language.
O V E R V I E W
2
Yes, you are now ready!
Am I ready to analyze data now?
D E A D L I N E S
3
Assigned on Date: Monday, October 11
Deadline: Thursday, October 21 at 11:59 PM ET (+10 days)
Graded Out of: 100 Points
Worth: 40% of numeric grade
Due to timelines for grading and final course grade submission,
no late assignments will be accepted.
S U B M I S S I O N
4
Submission An R file in .R format, containing your R code. The file name must be:
gba-464-assignment-04-[lastname]-[given/firstname].R
Be sure to submit your code, not the output which may display in R Console or R
Studio.
You do not need to include the data file.
R U B R I C
5
CATEGORY HIGH MEDIUM LOW ZERO
Requirements
60%
The work contains all required
elements, achieves all desired
results, and follow all instructions
and guidelines.
The work contains most required
elements, achieves most desired
results, and follow most
instructions and guidelines.
The work contains some required
elements, achieves some desired
results, and follow some
instructions and guidelines.
The work contains no required
elements, achieves no desired
results, or follows no instructions
or guidelines.
Understanding
20%
The work demonstrates complete
understanding and application of
module concepts and learning
objectives.
The work demonstrates
competent understanding and
application of module concepts
and learning objectives.
The work demonstrates some
understanding and application of
module concepts and learning
objectives.
The work demonstrates no
understanding and application of
module concepts or learning
objectives.
Organization
10%
The work is organized in a clear
and logical way. It is quick and
easy to process the structure and
meaning of the work. All business
logic is separated from
presentation code.
Most of the work is organized in a
clear and logical way. It is mostly
quick and easy to process the
structure and meaning of the
work. Most business logic is
separated from presentation code.
Some of the work is organized in a
clear and logical way. It is rarely
quick and easy to process the
structure and meaning of the
work. Some business logic is
separated from presentation code.
The work is not organized in a
clear or logical way. It is difficult
or time-consuming to process the
structure and meaning of the
work. No business logic is
separated from presentation code.
Pseudocode
10%
All blocks or lines of code are
preceded by a code comment. All
comments are well-written, clearly
explain what the code step is
trying to accomplish, and help the
reader understand the code.
Most blocks or lines of code are
preceded by a code comment.
Most comments are well-written,
clearly explain what the code step
is trying to accomplish, and help
the reader understand the code.
Few blocks or lines of code are
preceded by a code comment.
Few comments are well-written,
clearly explain what the code step
is trying to accomplish, and help
the reader understand the code.
The program contains no code
comments.
B A C K G R O U N D
7
B A C K G R O U N D 1 o f 5
8
Breakfast cereal is big business!
• Globally, the breakfast cereal market was expected to reach $40 billion USD by
2023 (ref: Packaged Facts, pre-pandemic).
• We will use the terms “breakfast cereal” and “cereal” interchangeably.
B A C K G R O U N D 2 o f 5
9
In emerging economies,
the following have been driving global market growth for breakfast cereals:
• Changing habits (rising incoming, Western influences)
• Dietary behaviors (health-conscious)
• Demographics
B A C K G R O U N D 3 o f 5
10
However, in the United States,
the following have contributed to a slow decline in the U.S. market for almost
three decades (ref: Forbes):
• Changing habits (cereals no longer perceived as convenient or healthy)
• Dietary behaviors (gluten-free, low-carb diets, sugar-free)
• Market saturation (too many products, too much variety, shrinking cereal box size)
B A C K G R O U N D 4 o f 5
11
You have been hired as a Business Analyst at a market research firm
specializing in the United States Food, Grocery, and Beverage market.
• Your first task is to use R to import, explore, manipulate, analyze, and visualize
data on popular breakfast cereals, including cereal names and companies, product
categories (i.e., hot vs. cold cereals), serving sizes, nutrition, Consumer Reports
ratings, and supermarket shelf placement. If only you could remember everything
you learned in GBA 464!
• Your insights will be included in a report to leadership and used as baseline input
for future hypotheses, problem analyses, and predictive models.
B A C K G R O U N D 5 o f 5
12
As with the larger GBA 464 course, we are focused on the programming
mechanics of our learning objectives. Therefore…
• You do not need to identify the problem you are tackling via analysis.
• You do not need to form a hypothesis about the data that can be either proven or
disproven.
• You do not need to derive a predictive model or any other kind of analytical
model.
A B O U T T H E D A T A
A B O U T T H E D A T A 1 o f 2
14
Below are the data column abbreviations and their meaning (shown alphabetically).
• CAL The number of calories per serving.
• CAR The number of grams of complex carbohydrates per serving.
• COM The name of the company who manufactures the cereal (ex., “General Mills”).
• CUP The number of cups per serving.
• FAT The number of grams of fat per serving.
• FIB The number of grams of dietary fiber per serving.
• NAM The product name of the cereal (ex., “Lucky Charms”).
• POT The number of milligrams of potassium per serving.
• PRO The number of grams of protein per serving.
• RAT The Consumer Reports rating.
A B O U T T H E D A T A 2 o f 2
15
• SHE The number of the display shelf used for placement of the cereal box in a supermarket. A
value of “1” represents the bottom shelf (also known as “floor level”). A value of “2”
represents the middle shelf (also known as “eye level”). A value of “3” represents the top
shelf.
• SOD The number of milligrams of sodium per serving.
• SUG The number of grams of sugar per serving.
• TYP The type of cereal. A value of “C” represents a “cold” cereal, usually eaten by adding milk. A
value of “H” represents a “hot” cereal, usually eaten by first heating or adding hot water.
• VIT The typical percentage of FDA-recommended daily vitamins and minerals per serving.
Common values are “0”, “25”, and “100”.
• WEI The weight in ounces per serving.
P A R T 1
I M P O R T T H E D A T A
For this part, write R code in RStudio to do the following.
There is no output required for this part.
• Import the provided data file using the read_csv() function from the dplyr package.
• Provide a column specification (you must account for all columns in the file).
• Assign the result to a new variable called cereals.
17
I M P O R T t h e D A T A1
P A R T 2
E X P L O R E T H E D A T A
For this part, write R code in RStudio to do the following.
See example output on the following slide for formatting requirements.
• Print the first four (4) rows of data.
• Print the number of rows in the data.
• Print the number of columns in the data.
• Access the column names.
• Iterate over the column names.
• Print each column name on its own line.
19
E X P L O R E T H E D A T A2
20
E X A M P L E O U T P U T2
You do not need to worry about how
many columns output to the screen.
This is a function of the width of the
console in RStudio.
P A R T 3
M A N I P U L A T E T H E D A T A
M A N I P U L A T E T H E D A T A 1 o f 4
22
3
For this part, write R code in RStudio to do the following.
See example output on the following slide for formatting requirements.
• Change the column names:
From NAM to name
From MFR to company
From TYP to type
From CAL to calories
From PRO to protein
From SOD to sodium
From FIB to fiber
From CAR to carbs
From POT to potassium
From VIT to vitamins
From SHE to shelf
From WEI to weight
From CUP to cups
From FAT to fat
From SUG to sugar
From RAT to rating
M A N I P U L A T E T H E D A T A 2 o f 4
23
3
• Change the order of the columns to:
name
company
type
shelf
rating
weight
cups
calories
protein
sodium
fiber
carbs
potassium
vitamins
fat
sugar
M A N I P U L A T E T H E D A T A 3 o f 4
24
3
• Modify the values in the company column:
From “A” to “American Home Food Products”
From “G” to “General Mills”
From “K” to “Kellogg’s”
From “N” to “Nabisco”
From “P” to “Post”
From “Q” to “Quaker Oats”
From “R” to “Ralston Purina”
• Modify the values in the type column:
From “H” to “Hot”
From “C” to “Cold”
M A N I P U L A T E T H E D A T A 4 o f 4
25
3
• Create a new column called “shelfName” with the following values (use ggplot2):
Where a row value in the “shelf” column is 1, the row value in the “shelfName” column should be “Bottom Shelf”
Where a row value in the “shelf” column is 2, the row value in the “shelfName” column should be “Eye Level”
Where a row value in the “shelf” column is 3, the row value in the “shelfName” column should be “Top Shelf”
• Create a new column called “caloriesPerCup” with the following values (use ggplot2):
caloriesPerCup is equal to the number 1 (for one serving) divided by the number of cups per serving times the number of calories per serving
• Sort the data by caloriesPerCup highest to lowest (use ggplot2).
• Print the first ten (10) rows of data as a table (use gt) (see example output for formatting).
E X A M P L E O U T P U T
26
3
P A R T 4
A N A L Y Z E T H E D A T A
For this part, write R code in RStudio to do the following.
See example output on the following slide for formatting requirements.
• Calculate the following (use ggplot2):
The average calories per cup for cold cereals
The average calories per cup for hot cereals
The average Consumer Reports rating
• Print these values (see example output for formatting).
• Perform exploratory analysis to find the following data (use ggplot2):
“Most Sugary Cereals”: The ten (10) cold cereals with the most sugar per serving (from highest to lowest).
“Lowest-Rated Cereals”: The ten (10) cold cereals with the lowest Consumer Reports rating (from lowest to highest).
“Cereals with No Nutritional Value”: The ten (10) cold cereals with 0% FDA recommended vitamins and minerals.
• Print the data as tables (use gt) (see example output for formatting).
A N A L Y Z E T H E D A T A
28
4
E X A M P L E O U T P U T
29
4
These plots should print to RStudio’s “Plot” panel.This output should
print to RStudio’s
“Console” panel.
P A R T 5
V I S U A L I Z E T H E D A T A
For this part, write R code in RStudio to do the following.
See example output on the following slide for formatting requirements.
• Create a scatterplot (using ggplot2 with geom_point) of Grams of Sugar Per Serving (from
lowest to highest) vs. Consumer Reports Rating (from lowest to highest).
• Create a histogram (using ggplot2 with geom_histogram) with 15 bins using rating
(Consumer Reports Rating) as the continuous variable.
• Create a bar chart (using ggplot2 with geom_bar) with shelfName on the x-axis and the
number of cereals for that shelfName on the y-axis. Filter the data to only include cereals
with a rating above the average value you determined in Part 4 (you may hard-code this
value in your ggplot2 implementation).
• Create a single boxplot (using ggplot2 with geom_boxplot) for caloriesPerCup.
V I S U A L I Z E
31
5
E X A M P L E O U T P U T
32
5
You can use any colors you wish. You may use the default colors.
Slide Number 1
OVERVIEW
DEADLINES
SUBMISSION
RUBRIC
Slide Number 6
Slide Number 7
BACKGROUND 1 of 5
BACKGROUND 2 of 5
BACKGROUND 3 of 5
BACKGROUND 4 of 5
BACKGROUND 5 of 5
Slide Number 13
ABOUT THE DATA 1 of 2
ABOUT THE DATA 2 of 2
Slide Number 16
IMPORT the DATA
Slide Number 18
EXPLORE THE DATA
EXAMPLE OUTPUT
Slide Number 21
MANIPULATE THE DATA 1 of 4
MANIPULATE THE DATA 2 of 4
MANIPULATE THE DATA 3 of 4
MANIPULATE THE DATA 4 of 4
EXAMPLE OUTPUT
Slide Number 27
ANALYZE THE DATA
EXAMPLE OUTPUT
Slide Number 30
VISUALIZE
EXAMPLE OUTPUT