Data Challenge Question 1
Using San Francisco Ford GoBike Share dataset hosted on Googles BigQuery, write queries to get the following:
1. How many trips were made for each month in 2015?
2. What region do millenials birth year between 1981 and 1996 prefer to start their trips? What region do Generation X birth year between 1965 and 1980 prefer to start their trips?
3. What is the average number of trips per station with kiosk?
Instructions
1. San Francisco Ford Go Bike Share is a public dataset hosted on Googles BigQuery. Completing the
challenge is free with BigQuerys 1TBmo of free tier processing. Access BigQuery using a Google account. Upon creating a google cloud account, create a project i.e. My Data Challenge Project.
2. Write queries to return the answers for each of the three questions above.
3. Share the queries in your submission to the data challenge
Data Challenge Question 2
localdatetime
user
storenam e
city
state
zip
dwell
loadid
loaddate
121017 17:29
f5845
soulcycle
houston
tx
78259
5
1096
121117 17:29
121217 6:11
71e61
la fitness
austin
tx
77573
45
1109
121317 6:11
121317 5:57
9059f
soulcycle
dallas
tx
75206
50
1831
121417 5:57
121417 5:32
8c566
la fitness
spring
tx
77379
40
1206
121517 5:32
…
…
…
…
…
…
…
…
…
121517 5:41
9059f
soulcycle
dallas
tx
75206
55
9938
121617 5:41
121617 13:19
871fb
crossfit
dickinso n
tx
77539
15
1110
121717 13:19
121317 5:57
9059f
soulcycle
dallas
tx
75206
50
9921
121617 6:57
In the above table, there is a duplicate row due to an error in processing. An example of duplicate rows is highlighted in red. The goal is to have only one row per unique combination of localdatetime, user and storename. Imagine this is a table with billions of rows and multiple such duplicates.
1. Discuss some ways you might try to identify the duplicate rows using SQL.
2. How would you create a new table without duplicate rows using only SQL?
Data Challenge Question 3
Given the following details about tables in a database.
tablea: persons name and their selected greeting choice
id
greeting
betty
hello
elle
bonjour
…
…
frank
hiya
zoey
howdy
tableb: persons name and the name of their dogs if they have at least one dog
id
dog
betty
rio
elle
bowwow
…
…
zoey
raffy
tablec: persons name and their veterinarian bill
id
vetbill
betty
27.20
betty
150.99
elle
85.20
elle
47.50
…
…
zoey
79.50
zoey
15.75
zoey
35.00
You execute the following queries:
Do you see any potential issue with the following query? If yes, what?
with p as
select a.id, b.dog
from tablea a
left join tableb b
on a.id b.id
select p.id, p.dog, sumc.vetbill as sumofvetbill from p
left join tablec c
on c.id p.id
group by p.id, p.dog;
How is the above query different than the one below? Do you see any potential issue with the following query? If yes, what?
with p as
select a.id, b.dog
from tablea a
left join tableb b
on a.id b.id
select p.id, p.dog, sumc.vetbill as sumofvetbill from tablec c
left join p
on c.id p.id
group by p.id, p.dog;
Data Bonus Challenge
The most trusted news publications such as Wall Street Journal, Forbes, and Business Insider utilize inMarket to understand how consumer behavior is evolving and what it means for global brands an example is on the following page. In this challenge, we provide a small sample of our anonymized data. Using the two data tables and data sources available online i.e. newspaper, stock market news, etc., discover business trends in the second half of 2014.
storevisits.csv file is a table containing information on the behavior of consumers surveyed at select chains.
Field Name
Data Type
Description
timeweek
date
Time of visit
deviceid
bytesstring
Device identifier
category
string
Category of chain
chain
string
Name of chain
visits
integer
Number of times deviceid was at chain
storepurchases.csv file is a table containing information about consumer purchases.
Field Name
Data Type
Description
timeweek
date
Time of visit
deviceid
bytesstring
Device identifier
itemcategory
string
Category of itemtype
itemtype
string
Type of item deviceid purchased
itemcount
integer
Number of itemtype purchased
Instructions
1. Evaluate storevisits.csv and storepurchases.csv using analytical tools of your choice i.e. Tableau,
SQL, Python, R, Excel, etc.
2. Create a 12 page summary with 2 market trends that PR professionals can pitch to news outlets. Consider things like loyalty, market segmentation, and shopping behavior. Feel free to include additional pages explaining your methodologies.
3. Bonus: Share any additional findings you thought were interesting about the data