COMS W4111: Introduction to Databases
Spring 2020, Sections 002, H02, V02
Non-Programming Project Notebook
Introduction¶
• This notebook defines HW1, Part C for both the programming and no-programming tracks.
• Submission instructions:
▪ Both no programming and programming tracks submit a zip file containing all of the files in the HW project directory that you clone (see below).
▪ Much of your work is completing sections of this notebook, but you MUST edit the copy in the project directory.
▪ If you reference or create any files, please create in the project directory.
▪ The name of your zip file MUST be uni_W4111_s20_hw1c, replace uni with you UNI. My submission would be dff9_W4111_s20_hw1c.
▪ The homework is due on 23-February-2020 at 11:59 PM. You MAY use late days.
• We will use the pinned Piazza post to clarify the homework and answer questions.
Common Setup¶
Both tracks should:
• MySQL: Install
▪ MySQL Server Community Edition
▪ MySQL Workbench
▪ In addition to the official links, there are many operating specific tutorials.
• Create a Lucidchart educational account.
• Project template:
▪ If you do not have Git on your computer, please install it.
▪ Homework 1 setup instructions recommended that you create a folder /IntroToDB on your desktop.
▪ Under this directory, create a folder HW1C.
▪ Open terminal shell or Git enabled command prompt.
▪ CD to the directory you created and enter git clone https://github.com/donald-f-ferguson/GoTHW
▪ This will create the project template for you.

HW1C Project Template
• Run the following cells. Your answers WILL BE different than the sample output.
In [1]:
import sys
import os
import platform
import psutil
print(“=”*40, “Path Information”, “=”*40)
print(“Path = “, os.getcwd())
print(“=”*40, “System Information”, “=”*40)
uname = platform.uname()
print(f”System: {uname.system}”)
print(f”Node Name: {uname.node}”)
print(f”Release: {uname.release}”)
print(f”Version: {uname.version}”)
print(f”Machine: {uname.machine}”)
print(f”Processor: {uname.processor}”)
print(“=”*40, “Network Information”, “=”*40)
import socket
hostname = socket.gethostname()
IPAddr = socket.gethostbyname(hostname)
print(“Your Computer Name is:” + hostname)
print(“Your Computer IP Address is:” + IPAddr)
import uuid
print (“Your MAC address is:”, hex(uuid.getnode()))
======================================== Path Information ========================================
Path = /Users/boyuan/Desktop/IntroToDB/HW1C/GoTHW
======================================== System Information ========================================
System: Darwin
Node Name: Bos-MacBook-Pro.lan1
Release: 19.0.0
Version: Darwin Kernel Version 19.0.0: Thu Oct 17 16:17:15 PDT 2019; root:xnu-6153.41.3~29/RELEASE_X86_64
Machine: x86_64
Processor: i386
======================================== Network Information ========================================
Your Computer Name is:Bos-MacBook-Pro.lan1
Your Computer IP Address is:192.168.1.32
Your MAC address is: 0xacde48001122
No Programming¶
Overview¶
The no-programming track has four tasks:
1. Import CSV data files and create initial tables.
2. Create a new schema by modifying and improving the data schema and content.
3. Defining a set of views that support data visualizations.
4. Running Python cells that produce visualization using the provided code.
Task 1: Initial Data Import¶
Task¶
• The folder in the project Data/csv contains several CSV files:
▪ ‘character_relationships.csv
▪ characters.csv
▪ characters_actors.csv
▪ episodes.csv
▪ locations.csv
▪ scenes.csv
▪ scenes_characters.csv
• Use MySQL Workbench:
▪ Create a new schema W4111GoTSolution and set as the default schema.
▪ Use the MySQL Workbench Table Data Import Wizard and import each of the files. Choose the option to create a new table.
Tests¶
In [2]:
%load_ext sql
%sql mysql+pymysql://root:XHEsql2182199.@localhost/W4111GoTSolution
Out[2]:
‘Connected: root@W4111GoTSolution’
1. Find all the characters with last name Stark.
In [3]:
%sql select * from characters where characterName like(“% Stark”)
* mysql+pymysql://root:***@localhost/W4111GoTSolution
14 rows affected.
Out[3]:
id
characterName
characterLink
actorName
actorLink
character_id
royal
characterImageThumb
characterImageFull
nickname
kingsguard
15
Arya Stark
/character/ch0158604/
Maisie Williams
/name/nm3586035/
CH_15
https://images-na.ssl-images-amazon.com/images/M/MV5BMTk5MTYwNDc0OF5BMl5BanBnXkFtZTcwOTg2NDg1Nw@@._V1._SX100_SY140_.jpg
https://images-na.ssl-images-amazon.com/images/M/MV5BMTk5MTYwNDc0OF5BMl5BanBnXkFtZTcwOTg2NDg1Nw@@._V1_SY1000_CR0,0,665,1000_AL_.jpg
20
Benjen Stark
/character/ch0153996/
Joseph Mawle
/name/nm1152798/
CH_20
https://images-na.ssl-images-amazon.com/images/M/MV5BMjFiY2M5MjgtZmRiMS00ZTAyLTk2Y2UtMjdjZTRkZmVlYTMyXkEyXkFqcGdeQXVyMjk3NTUyOTc@._V1._SX100_SY140_.jpg
https://images-na.ssl-images-amazon.com/images/M/MV5BMjFiY2M5MjgtZmRiMS00ZTAyLTk2Y2UtMjdjZTRkZmVlYTMyXkEyXkFqcGdeQXVyMjk3NTUyOTc@._V1_.jpg
28
Brandon Stark
CH_28
29
Bran Stark
/character/ch0234897/
Isaac Hempstead Wright
/name/nm3652842/
CH_29
True
https://images-na.ssl-images-amazon.com/images/M/MV5BMTA1NTg0NTI3MTBeQTJeQWpwZ15BbWU3MDEyNjg4OTQ@._V1._SX100_SY140_.jpg
https://images-na.ssl-images-amazon.com/images/M/MV5BMTA1NTg0NTI3MTBeQTJeQWpwZ15BbWU3MDEyNjg4OTQ@._V1_SX1500_CR0,0,1500,999_AL_.jpg
36
Catelyn Stark
/character/ch0145135/
Michelle Fairley
/name/nm0265610/
CH_36
https://images-na.ssl-images-amazon.com/images/M/MV5BMTY2ODc4MTkxMV5BMl5BanBnXkFtZTcwNjM3MTg4OQ@@._V1._SX100_SY140_.jpg
https://images-na.ssl-images-amazon.com/images/M/MV5BMTY2ODc4MTkxMV5BMl5BanBnXkFtZTcwNjM3MTg4OQ@@._V1_SY1000_CR0,0,915,1000_AL_.jpg
59
Eddard Stark
/character/ch0154681/
Sean Bean
/name/nm0000293/
CH_59
https://images-na.ssl-images-amazon.com/images/M/MV5BODcwMjg0MDQ5MF5BMl5BanBnXkFtZTcwMTk2NDk4OQ@@._V1._SX100_SY140_.jpg
https://images-na.ssl-images-amazon.com/images/M/MV5BODcwMjg0MDQ5MF5BMl5BanBnXkFtZTcwMTk2NDk4OQ@@._V1_SY1000_CR0,0,827,1000_AL_.jpg
Ned
183
Lyanna Stark
/character/ch0543804/
Aisling Franciosi
/name/nm4957233/
CH_183
https://images-na.ssl-images-amazon.com/images/M/MV5BMTE5NTczY2QtNmQxYS00NTAwLWE2ZTAtYzc1MTkyZGU5MzRkL2ltYWdlL2ltYWdlXkEyXkFqcGdeQXVyMjk3NTUyOTc@._V1._SX100_SY140_.jpg
https://images-na.ssl-images-amazon.com/images/M/MV5BMTE5NTczY2QtNmQxYS00NTAwLWE2ZTAtYzc1MTkyZGU5MzRkL2ltYWdlL2ltYWdlXkEyXkFqcGdeQXVyMjk3NTUyOTc@._V1_.jpg
284
Rickard Stark
CH_284
285
Rickon Stark
/character/ch0233141/
Art Parkinson
/name/nm3280686/
CH_285
https://images-na.ssl-images-amazon.com/images/M/MV5BMWZiOGNjMDAtOTRlNi00MDJmLWEyMTMtOGEwZTM5ODJlNDAyXkEyXkFqcGdeQXVyMjk3NTUyOTc@._V1._SX100_SY140_.jpg
https://images-na.ssl-images-amazon.com/images/M/MV5BMWZiOGNjMDAtOTRlNi00MDJmLWEyMTMtOGEwZTM5ODJlNDAyXkEyXkFqcGdeQXVyMjk3NTUyOTc@._V1_.jpg
286
Robb Stark
/character/ch0158596/
Richard Madden
/name/nm0534635/
CH_286
True
https://images-na.ssl-images-amazon.com/images/M/MV5BMjI2NDE1NzczNF5BMl5BanBnXkFtZTcwNjcwODg4OQ@@._V1._SX100_SY140_.jpg
https://images-na.ssl-images-amazon.com/images/M/MV5BMjI2NDE1NzczNF5BMl5BanBnXkFtZTcwNjcwODg4OQ@@._V1_SY1000_CR0,0,845,1000_AL_.jpg
300
Sansa Stark
/character/ch0158137/
Sophie Turner
/name/nm3849842/
CH_300
True
https://images-na.ssl-images-amazon.com/images/M/MV5BNjAwMjE2NDExNF5BMl5BanBnXkFtZTcwODAwODg4OQ@@._V1._SX100_SY140_.jpg
https://images-na.ssl-images-amazon.com/images/M/MV5BNjAwMjE2NDExNF5BMl5BanBnXkFtZTcwODAwODg4OQ@@._V1_SY1000_CR0,0,806,1000_AL_.jpg
381
Young Benjen Stark
/character/ch0153996/
Matteo Elezi
/name/nm5502295/
CH_381
383
Young Lyanna Stark
/character/ch0543804/
Cordelia Hill
/name/nm8108764/
CH_383
386
Young Ned Stark
/character/ch0154681/
Sebastian Croft
/name/nm7509185/
CH_386
2. Find all of the information about scenes in season 1, episode 1.
In [4]:
%sql select * from scenes where episodeNum=1 and seasonNum=1
* mysql+pymysql://root:***@localhost/W4111GoTSolution
36 rows affected.
Out[4]:
id
seasonNum
episodeNum
sceneNo
sceneStart
sceneEnd
location
subLocation
greensight
altLocation
warg
flashback
0
1
1
1
0:00:40
0:01:45
The Wall
Castle Black
1
1
1
2
0:01:45
0:03:24
North of the Wall
The Haunted Forest
2
1
1
3
0:03:24
0:03:31
North of the Wall
The Haunted Forest
3
1
1
4
0:03:31
0:03:38
North of the Wall
The Haunted Forest
4
1
1
5
0:03:38
0:03:44
North of the Wall
The Haunted Forest
5
1
1
6
0:03:44
0:05:36
North of the Wall
The Haunted Forest
6
1
1
7
0:05:36
0:05:41
North of the Wall
The Haunted Forest
7
1
1
8
0:05:41
0:05:48
North of the Wall
The Haunted Forest
8
1
1
9
0:05:48
0:05:58
North of the Wall
The Haunted Forest
9
1
1
10
0:05:58
0:06:21
North of the Wall
The Haunted Forest
10
1
1
11
0:06:21
0:06:39
North of the Wall
The Haunted Forest
11
1
1
12
0:06:39
0:06:49
North of the Wall
The Haunted Forest
12
1
1
13
0:06:49
0:07:45
North of the Wall
The Haunted Forest
13
1
1
14
0:09:27
0:12:38
The North
Winterfell
14
1
1
15
0:12:38
0:15:41
The North
Outside Winterfell
15
1
1
16
0:15:41
0:18:44
The North
Outside Winterfell
16
1
1
17
0:18:44
0:20:45
The Crownlands
King’s Landing
17
1
1
18
0:20:45
0:22:43
The North
Winterfell
18
1
1
19
0:22:43
0:23:09
The North
Winterfell
19
1
1
20
0:23:09
0:23:39
The North
Winterfell
20
1
1
21
0:23:39
0:29:16
The North
Winterfell
21
1
1
22
0:29:16
0:30:47
The North
Winterfell
22
1
1
23
0:30:47
0:32:57
The North
Winterfell
23
1
1
24
0:32:57
0:33:45
The North
Winterfell
24
1
1
25
0:33:45
0:36:17
Pentos
25
1
1
26
0:36:17
0:38:23
Pentos
26
1
1
27
0:38:23
0:40:05
Pentos
27
1
1
28
0:40:05
0:40:56
The North
Winterfell
28
1
1
29
0:40:56
0:41:19
The North
Winterfell
29
1
1
30
0:41:19
0:44:14
The North
Winterfell
30
1
1
31
0:44:14
0:47:28
The North
Winterfell
31
1
1
32
0:47:28
0:50:29
The North
Winterfell
32
1
1
33
0:50:29
0:56:05
Pentos
33
1
1
34
0:56:05
0:57:48
Pentos
34
1
1
35
0:57:48
0:59:06
The North
Winterfell
35
1
1
36
0:59:06
1:00:57
The North
Winterfell
3. Find all of the characters and actors who played them.
In [5]:
%%sql
SELECT
character_id,
(select characterName from characters where characters.character_id=
characters_actors.character_id) as characterName,
actorName, season
FROM W4111GoTSolution.characters_actors;
* mysql+pymysql://root:***@localhost/W4111GoTSolution
408 rows affected.
Out[5]:
character_id
characterName
actorName
season
CH_0
Addam Marbrand
B.J. Hogg
CH_2
Aeron Greyjoy
Michael Feast
CH_3
Aerys II Targaryen
David Rintoul
CH_4
Akho
Chuku Modu
CH_5
Alliser Thorne
Owen Teale
CH_6
Alton Lannister
Karl Davies
CH_7
Alys Karstark
Megan Parkinson
CH_8
Amory Lorch
Fintan McKeown
CH_9
Anguy
Philip McGinley
CH_10
Archmaester Marwyn
Jim Broadbent
CH_11
Areo Hotah
Deobia Oparei
CH_12
Armeca
Sahara Knite
CH_13
Arthur
Nathanael Saleh
CH_14
Arthur Dayne
Luke Roberts
CH_15
Arya Stark
Maisie Williams
CH_16
Baby Sam
William Wilson
7.0
CH_16
Baby Sam
James Wilson
7.0
CH_17
Balon Greyjoy
Patrick Malahide
CH_18
Baratheon Guard
Phil Barnhill
CH_19
Barristan Selmy
Ian McElhinney
CH_20
Benjen Stark
Joseph Mawle
CH_21
Beric Dondarrion
Richard Dormer
3.0
CH_21
Beric Dondarrion
Richard Dormer
6.0
CH_21
Beric Dondarrion
Richard Dormer
7.0
CH_21
Beric Dondarrion
Richard Dormer
8.0
CH_21
Beric Dondarrion
David Michael Scott
1.0
CH_22
Bianca
Eline Powell
CH_23
Biter
Gerard Jordan
CH_24
Black Lorren
Forbes KB
CH_25
Black Walder Rivers
Tim Plester
CH_26
Bobono
Leigh Gill
CH_27
Bowen Marsh
Michael Condron
CH_29
Bran Stark
Isaac Hempstead Wright
CH_30
Brea
Gemita Samarra
CH_31
Brienne of Tarth
Gwendoline Christie
CH_32
Bronn
Jerome Flynn
CH_33
Brother Ray
Ian McShane
CH_34
Brynden Tully
Clive Russell
CH_35
Captain of the Archers
Michael Grennell
CH_36
Catelyn Stark
Michelle Fairley
CH_37
Catspaw Assassin
Lalor Roddy
CH_38
Cersei Lannister
Lena Headey
CH_39
Child of the Forest
Alice Hewkin
CH_40
Citadel Maester
Frank Hvam
CH_41
Colen of Greenpools
Darren Killeen
CH_42
Craster
Robert Pugh
CH_43
Daario Naharis
Ed Skrein
3.0
CH_43
Daario Naharis
Michiel Huisman
4.0
CH_43
Daario Naharis
Michiel Huisman
5.0
CH_43
Daario Naharis
Michiel Huisman
6.0
CH_44
Daenerys Targaryen
Emilia Clarke
CH_45
Dagmer Cleftjaw
Ralph Ineson
CH_46
Daisy
Maisie Dee
CH_47
Davos Seaworth
Liam Cunningham
CH_48
Dickon Tarly
Freddie Stroma
6.0
CH_48
Dickon Tarly
Tom Hopper
7.0
CH_49
Dongo the Giant
Ian Whyte
CH_50
Donnel Waynwood
Alisdair Simpson
CH_51
Dontos Hollard
Tony Way
CH_52
Doran Martell
Alexander Siddig
CH_53
Doreah
Roxanne McKee
CH_54
Dothraki Bloodrider #1
Diogo Sales
CH_55
Dothraki Bloodrider #2
Junade Khan
CH_56
Drennan
David Coakley
CH_58
Drowned Priest
Jonathan Ryan
CH_59
Eddard Stark
Sean Bean
CH_60
Eddison Tollett
Ben Crompton
CH_61
Edmure Tully
Tobias Menzies
CH_62
Ellaria Sand
Indira Varma
CH_64
Eon Hunter
Barrington Cullen
CH_65
Euron Greyjoy
Pilou Asbæk
CH_66
Faceless Man
Cedric Henderson
CH_67
Farlen
Peter Ballance
CH_68
Gared
Dermot Keaney
CH_69
Gatins
Ricky Champ
CH_70
Gendry
Joe Dempsie
CH_71
Gerold Hightower
Eddie Eyre
CH_73
Gilly
Hannah Murray
CH_74
Goldcloak
Alan Paris
CH_75
Goldcloak
Sam Callis
CH_76
Goldcloak #1
Nikovich Sammut
CH_77
Goldcloak #2
Seamus Kelly
CH_78
Grand Maester Pycelle
Julian Glover
CH_79
Great Master #1
Emilio Doorgasingh
4.0
CH_79
Great Master #1
Gianpiero Cognoli
5.0
CH_80
Great Master #2
Derek Horsham
4.0
CH_80
Great Master #2
Bobby Asghar
5.0
CH_81
Great Master #3
Peter Silverleaf
CH_82
Great Master #4
Davor Jozinovic
CH_83
Great Master #5
Ivan Peric
CH_84
Great Master #6
Hadi Kermani
CH_85
Great Master #7
Tonci Banov
CH_86
Greatjon Umber
Clive Mantle
CH_87
Gregor Clegane
Ian Whyte
2.0
CH_87
Gregor Clegane
Conan Stevens
1.0
CH_87
Gregor Clegane
Hafþór Júlíus Björnsson
4.0
CH_87
Gregor Clegane
Hafþór Júlíus Björnsson
5.0
CH_87
Gregor Clegane
Hafþór Júlíus Björnsson
6.0
CH_87
Gregor Clegane
Hafþór Júlíus Björnsson
7.0
CH_87
Gregor Clegane
Hafþór Júlíus Björnsson
8.0
CH_88
Greizhen mo Ullhor
Cliff Barry
CH_89
Grenn
Mark Stanley
CH_91
Grey Worm
Jacob Anderson
CH_92
Hallyne
Roy Dotrice
CH_93
Handmaid
Sabrina Bartlett
CH_94
Handmaid
Sara Dylan
CH_95
Harald Karstark
Paul Rattray
CH_96
Harry Strickland
Marc Rissmann
CH_97
High Priestess
Souad Faress
CH_98
High Septon
David Verrey
CH_99
High Septon
Paul Bentley
CH_100
High Sparrow
Jonathan Pryce
CH_101
Hizdahr zo Loraq
Joel Fry
CH_102
Hodor
Kristian Nairn
CH_103
Hoster Tully
Christopher Newman
CH_104
Hot Pie
Ben Hawkey
CH_105
Howland Reed
Leo Woodruff
CH_106
Hugh of the Vale
Jefferson Hall
CH_107
Illyrio Mopatis
Roger Allam
CH_108
Ilyn Payne
Wilko Johnson
CH_109
Irri
Amrita Acharia
CH_110
Izembaro
Richard E. Grant
CH_111
Jacks
Paul Caddell
CH_113
Jaime Lannister
Nikolaj Coster-Waldau
CH_114
Janos Slynt
Dominic Carter
CH_115
Jaqen H’ghar
Tom Wlaschiha
CH_116
Jaqen’s Disguise
Patrick O’Kane
CH_117
Jaremy Rykker
Jeffrey O’Brien
CH_118
Jeor Mormont
James Cosmo
CH_119
Jhiqui
Sarita Piotrowski
CH_120
Joffrey Baratheon
Jack Gleeson
CH_121
Jojen Reed
Thomas Brodie-Sangster
CH_122
Jon Arryn
John Standing
CH_123
Jon Snow
Kit Harington
CH_124
Jonos Bracken
Gerry O’Brien
CH_125
Jorah Mormont
Iain Glen
CH_126
Jory Cassel
Jamie Sives
CH_127
Joyeuse Erenford
Kelly Long
CH_128
Karl Tanner
Burn Gorman
CH_129
Karsi
Birgitte Hjort Sørensen
CH_130
Kayla
Pixie Le Knot
CH_131
Kegs
Tim Landers
CH_132
Kevan Lannister
Ian Gelder
CH_133
Khal Drogo
Jason Momoa
CH_134
Khal Moro
Joseph Naufahu
CH_135
Khal Rhalko
Andrei Claude
CH_136
King Joffrey Baratheon Dwarf
Raymond Griffiths
CH_137
King Renly Baratheon Dwarf
Dean Whatton
CH_138
King Robb Stark Dwarf
George Appleby
CH_139
King Stannis Baratheon Dwarf
Maxwell Laird
CH_140
King’s Landing Rioter #1
Reg Wayment
CH_141
King’s Landing Rioter #2
Marko Juraga
CH_142
King’s Landing Rioter #3
Rea Separovic
CH_143
Kinvara
Ania Bukstein
CH_144
Kitty Frey
Lucy Hayes
CH_145
Knight of House Frey
Patrick Ryan
CH_146
Knight of House Lynderly
Paddy Rocks
CH_147
Kovarro
Steven Cole
CH_148
Kraznys mo Nakloz
Dan Hildebrand
CH_149
Kurleket
Kevin Keenan
CH_151
Lady Crane
Essie Davis
CH_152
Lancel Lannister
Eugene Simon
CH_153
Lannister Captain
Aiden Condron
CH_154
Lannister Captain
Richard Laing
CH_155
Lannister Guardsman
Conor Delaney
CH_156
Lannister Messenger
Simon Stewart
CH_157
Lannister Scout
Chris Gallagher
CH_158
Lead Dornish Guard
Christian Vit
CH_159
Leaf
Kae Alexander
6.0
CH_159
Leaf
Octavia Selena Alexandru
4.0
CH_160
Lem Lemoncloak
Jóhannes Haukur Jóhannesson
CH_161
Leo Lefford
Vinnie McCabe
CH_162
Lhara
Sarine Sofair
CH_163
Little Bird
Samuel Paul Small
CH_164
Little Bird
Tristan Mercieca
CH_165
Little Bird #3
Michael Nevin
CH_166
Little Bird #4
Jesse Magee
CH_167
Little Bird #5
Iona Clarke
CH_168
Little Bird #6
Lucy Gallagher
CH_169
Little Bird #7
Fionn Kernan
CH_170
Loboda
Zachary Baharov
CH_171
Locke
Noah Taylor
CH_172
Lollys Stokeworth
Elizabeth Cadwallader
CH_173
Lommy Greenhands
Eros Vlahos
CH_174
Loras Tyrell
Finn Jones
CH_175
Lord Galbart Glover
Mark Coney
CH_176
Lord Portan
Gordon Fulton
CH_177
Lord Varys
Conleth Hill
CH_178
Lordsport Dockhand
Jer O’Leary
CH_179
Lothar Frey
Daniel Tuite
6.0
CH_179
Lothar Frey
Tom Brooke
3.0
CH_180
Loyal Night’s Watchman #1
Steve Cash
CH_181
Loyal Night’s Watchman #2
Robert Fawsitt
CH_182
Lyanna Mormont
Bella Ramsey
CH_183
Lyanna Stark
Aisling Franciosi
CH_184
Lysa Arryn
Kate Dickie
CH_185
Mace Tyrell
Roger Ashton-Griffiths
CH_186
Maester Aemon
Peter Vaughan
CH_187
Maester Cressen
Oliver Ford Davies
CH_188
Maester Luwin
Donald Sumpter
CH_189
Maester Wolkan
Richard Rycroft
CH_190
Mag the Mighty
Neil Fingleton
CH_191
Maggy
Jodhi May
CH_192
Mago
Ivailo Dimitrov
CH_193
Male Prostitute
Dennis Stokes
CH_194
Malko
Adewale Akinnuoye-Agbaje
CH_195
Mance Rayder
Ciarán Hinds
CH_196
Mandon Moore
James Doran
CH_197
Manservant
Conor Watters
CH_198
Marei
Josephine Gillan
CH_199
Margaery Tyrell
Natalie Dormer
CH_200
Margaery’s Handmaiden
Aisling Jarrett-Gavin
CH_201
Marillion
Emun Elliott
CH_202
Martyn Lannister
Dean-Charles Chapman
CH_203
Masha Heddle
Susie Kelly
CH_204
Matthos Seaworth
Kerr Logan
CH_205
Meera Reed
Ellie Kendrick
CH_206
Meereen Slave Master
Laurence Doherty
CH_207
Melessa Tarly
Samantha Spiro
CH_208
Melisandre
Carice van Houten
CH_209
Merchant Captain
Gary Pillai
CH_210
Mero
Mark Killeen
CH_211
Meryn Trant
Ian Beattie
CH_212
Mhaegen
Antonia Christophers
CH_213
Mikken
Boyd Rankin
CH_214
Mirelle
Elisa Lasowski
CH_215
Mirri Maz Duur
Mia Soteriou
CH_216
Missandei
Nathalie Emmanuel
CH_217
Mord
Ciaran Bermingham
CH_218
Mossador
Reece Noi
CH_219
Musician #1
Nanna Bryndís Hilmarsdóttir
CH_220
Musician #1
Jon Thor Birgisson
CH_221
Musician #2
Ragnar Þórhallsson
CH_222
Musician #2
Orri P. Dyrason
CH_223
Musician #3
Georg Holm
CH_224
Musician #3
Arnar Rosenkranz Hilmarsson
CH_225
Musician #4
Brynjar Leifsson
CH_226
Musician #5
Kristján Kristjánsson
CH_227
Mycah
Rhodri Hosking
CH_228
Myranda
Charlotte Hope
CH_229
Myrcella Baratheon
Aimee Richardson
1.0
CH_229
Myrcella Baratheon
Aimee Richardson
2.0
CH_229
Myrcella Baratheon
Nell Tiger Free
5.0
CH_229
Myrcella Baratheon
Nell Tiger Free
6.0
CH_230
Ned Umber
Harry Grasby
CH_231
Night’s Watch Deserter
Barry John O’Connor
CH_232
Night’s Watch Messenger
Frank O’Sullivan
CH_233
Night’s Watch Officer
Dennis McKeever
CH_234
Night’s Watch Officer
Michael Stuart
CH_235
Night’s Watchman
Marcus Lamb
CH_236
Night’s Watchman
Michael Hooley
CH_237
Night’s Watchman
Patrick J Molloy
CH_238
Night’s Watchman #1
Michael Hayes
CH_239
Night’s Watchman #2
Ruairí Heading
CH_240
Night’s Watchman #2
Tony Flynn
CH_241
Northman Rioter
Jed Murray
CH_243
Nymeria Sand
Jessica Henwick
CH_244
Obara Sand
Keisha Castle-Hughes
CH_245
Oberyn Martell
Pedro Pascal
CH_246
Old Nan
Margaret John
CH_247
Old Woman Prisoner
Gina Moxley
CH_248
Olenna Tyrell
Diana Rigg
CH_249
Olly
Brenock O’Connor
CH_250
Olyvar
Will Tudor
CH_251
Orell
Mackenzie Crook
CH_252
Ornela
Hannah John-Kamen
CH_253
Osha
Natalia Tena
CH_254
Othell Yarwyck
Brian Fortune
CH_256
Pentoshi Servant
Rania Zouari
CH_257
Petyr Baelish
Aidan Gillen
CH_258
Podrick Payne
Daniel Portman
CH_259
Polliver
Andy Kellegher
CH_260
Prendahl na Ghezn
Ramon Tikaram
CH_261
Pyat Pree
Ian Hanmore
CH_262
Pypar
Josef Altin
CH_263
Qhono
Staz Nair
CH_264
Qhorin Halfhand
Simon Armstrong
CH_265
Qotho
Dar Salim
CH_266
Quaithe
Laura Pradelska
CH_267
Qyburn
Anton Lesser
CH_268
Rakharo
Elyes Gabel
CH_269
Ramsay Snow
Iwan Rheon
CH_270
Randyll Tarly
James Faulkner
CH_271
Rast
Luke Barnes
CH_272
Rattleshirt
Edward Dogliani
2.0
CH_272
Rattleshirt
Edward Dogliani
3.0
CH_272
Rattleshirt
Ross O’Hennessy
5.0
CH_273
Razdal mo Eraz
George Georgiou
CH_274
Red Keep Stableboy
Hugo Culverhouse
CH_275
Red Priestess
Melanie Liburd
CH_276
Red Priestess
Rila Fukushima
CH_277
Renly Baratheon
Gethin Anthony
CH_283
Rickard Karstark
John Stahl
2.0
CH_283
Rickard Karstark
John Stahl
3.0
CH_283
Rickard Karstark
Steve Blount
1.0
CH_285
Rickon Stark
Art Parkinson
CH_286
Robb Stark
Richard Madden
CH_287
Robert Baratheon
Mark Addy
CH_288
Robett Glover
Tim McInnerny
CH_289
Robin Arryn
Lino Facioli
CH_290
Rodrik Cassel
Ron Donachie
CH_291
Roose Bolton
Michael McElhatton
CH_292
Rorge
Andy Beckwith
CH_293
Ros
Esmé Bianco
CH_294
Roslin Frey
Alexandra Dowling
CH_295
Royal Steward
Robert Sterne
CH_296
Ryger Rivers
Bryan McCaugherty
CH_297
Salladhor Saan
Lucian Msamati
CH_298
Samwell Tarly
John Bradley
CH_299
Sandor Clegane
Rory McCann
CH_300
Sansa Stark
Sophie Turner
CH_301
Selyse Baratheon
Tara Fitzgerald
3.0
CH_301
Selyse Baratheon
Tara Fitzgerald
4.0
CH_301
Selyse Baratheon
Tara Fitzgerald
5.0
CH_301
Selyse Baratheon
Sarah MacKeever
2.0
CH_302
Septa Moelle
Maggie Hayes
CH_303
Septa Mordane
Susan Brown
CH_304
Septa Scolera
Mary Jordan
CH_305
Septa Unella
Hannah Waddingham
CH_306
Septon
Kenneth Hadley
CH_307
Septon
Terry Byrne
CH_308
Ser Endrew Tarth
Dez McMahon
CH_309
Shae
Sibel Kekilli
CH_310
Shagga
Mark Lewis Jones
CH_312
Shireen Baratheon
Kerry Ingram
CH_313
Silk King
Slavko Juraga
CH_314
Smalljon Umber
Dean S. Jagger
CH_315
Spice King
Nicholas Blane
CH_316
Stannis Baratheon
Stephen Dillane
CH_317
Stark Bannerman
Michael Stuart
CH_318
Stark Guard
Faolan Morgan
CH_319
Stark Guard
Rick Burn
CH_320
Stark Guard
David Sheehan
CH_321
Stark Guard
Paul Monahan
CH_322
Steelshanks Walton
Jamie Michie
CH_323
Stevron Frey
Colin Carnegie
CH_324
Stiv
Stephen Don
CH_325
Styr
Yuri Kolokolnikov
CH_327
Syrio Forel
Miltos Yerolemou
CH_328
Talisa Maegyr
Oona Chaplin
CH_329
Talla Tarly
Rebecca Benson
CH_330
Ternesio Terys
Gary Oliver
CH_331
The Bear
Bart the Bear
CH_332
The Night King
Richard Brake
4.0
CH_332
The Night King
Richard Brake
5.0
CH_332
The Night King
Vladimir ‘Furdo’ Furdik
6.0
CH_332
The Night King
Vladimir ‘Furdo’ Furdik
7.0
CH_332
The Night King
Vladimir ‘Furdo’ Furdik
8.0
CH_333
The Tickler
Anthony Morris
CH_334
The Waif
Faye Marsay
CH_335
Thenn Warg
Joseph Gatt
CH_336
Theon Greyjoy
Alfie Allen
CH_337
Thin Man
Oengus MacNamara
CH_338
Thoros of Myr
Paul Kaye
CH_339
Three-Eyed Raven
Struan Rodger
4.0
CH_339
Three-Eyed Raven
Max von Sydow
6.0
CH_340
Timett
Tobias Winter
CH_341
Tobho Mott
Andrew Wilde
CH_342
Todder
Will O’Connell
CH_343
Tomard
Phil Dixon
CH_344
Tommen Baratheon
Callum Wharry
1.0
CH_344
Tommen Baratheon
Callum Wharry
2.0
CH_344
Tommen Baratheon
Dean-Charles Chapman
4.0
CH_344
Tommen Baratheon
Dean-Charles Chapman
5.0
CH_344
Tommen Baratheon
Dean-Charles Chapman
6.0
CH_345
Tormund Giantsbane
Kristofer Hivju
CH_346
Trystane Martell
Toby Sebastian
CH_347
Tycho Nestoris
Mark Gatiss
CH_348
Tyene Sand
Rosabell Laurenti Sellers
CH_349
Tyrion Lannister
Peter Dinklage
CH_350
Tywin Lannister
Charles Dance
CH_351
Vala
Meena Rayann
CH_352
Vardis Egen
Brendan McCormack
CH_353
Varly
Graham Charles
CH_355
Viserys Targaryen
Harry Lloyd
CH_356
Walda Bolton
Elizabeth Webster
CH_357
Walder Frey
David Bradley
CH_358
Wallen
Barry O’Connor
CH_359
Warlock
Lottie Steer
CH_360
Waymar Royce
Rob Ostlere
CH_361
Wedding Band
Sigur Rós
CH_362
White Walker
Ian Whyte
CH_363
White Walker
Ross Mullan
CH_364
White Walker
Tim Loane
CH_365
White Walker #2
Spencer Wilding
CH_366
Wight Wildling Girl
Claire Wright
CH_367
Wildling Rioter
Michael Patrick
CH_368
Will
Bronson Webb
CH_369
Willem Lannister
Timothy Gibbons
CH_370
Willis Wode
Ryan McKenna
CH_371
Wine Merchant
Simon Lowe
CH_372
Winterfell Shepherd
Donagh Deeney
CH_373
Wun Wun
Ian Whyte
CH_374
Wyllis
Sam Coleman
CH_375
Xaro Xhoan Daxos
Nonso Anozie
CH_376
Yara Greyjoy
Gemma Whelan
CH_377
Yezzan zo Qaggaz
Enzo Cilenti
CH_378
Ygritte
Rose Leslie
CH_379
Yohn Royce
Rupert Vansittart
CH_380
Yoren
Francis Magee
CH_381
Young Benjen Stark
Matteo Elezi
CH_382
Young Cersei Lannister
Nell Williams
CH_383
Young Lyanna Stark
Cordelia Hill
CH_384
Young Nan
Annette Tierney
CH_385
Young Ned
Robert Aramayo
CH_386
Young Ned Stark
Sebastian Croft
CH_387
Young Rodrik Cassel
Fergus Leathem
CH_388
Zanrush
Gerald Lepkowski
• Hmm. The season seems to be empty for some of the actors and some characters seem to be missing. Why?
• Let’s look at the data to see which characters_actors the season is NULL.
In [33]:
%sql select * from characters_actors where season is NULL
* mysql+pymysql://dbuser:***@localhost/W4111GoTSolution
0 rows affected.
Out[33]:
id
character_id
actorName
actorLink
season
• That is not what I expected.
• This is an indication of why some cleanup is necessary.
Data Cleanup¶
Task¶
• CSV files are text files, and all data is character strings.
• Spreadsheet applications, like Excel, and the data import tools in databases can infer column types in some cases. For example, consider the table definition created for importing scenes_characters.
CREATE TABLE `scenes_characters` (
`id` int(11) DEFAULT NULL,
`seasonNum` int(11) DEFAULT NULL,
`episodeNum` int(11) DEFAULT NULL,
`sceneNo` int(11) DEFAULT NULL,
`characterName` text
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
• The import process inferred that some columns in the CSV file are text representations of integers, and performed conversion.
• The process is unreliable, and does not always work.
• For this task, you must modify column definitions and data valaues to represent the correct the meaning of the data. We will cover keys in the next task.
• Execute your DDL and DML SQL in cells below. You can add Markdown cells to explain what you are doing.
• My approach is to create a new schema with corrected table definitions and cleaned copies of the data. This preserves the original data for data lineage and avoids possible information loss.
• Call your new schema W4111GoTSolutionClean.
• To help you, I provide an example for one table $-$ groups
• NOTE: You can use MySQL Workbench or other tools to help you write the DDL and DML statements, but you must execute the statements in the cells below.
• Create the schema.
In [6]:
%sql CREATE SCHEMA `W4111GoTSolutionClean` ;
* mysql+pymysql://root:***@localhost/W4111GoTSolution
(pymysql.err.ProgrammingError) (1007, “Can’t create database ‘W4111GoTSolutionClean’; database exists”)
[SQL: CREATE SCHEMA `W4111GoTSolutionClean` ;]
(Background on this error at: http://sqlalche.me/e/f405)
• What does groups look like?
In [7]:
%sql describe W4111GotSolution.groups
* mysql+pymysql://root:***@localhost/W4111GoTSolution
3 rows affected.
Out[7]:
Field
Type
Null
Key
Default
Extra
id
int
YES
None
groupName
text
YES
None
character
text
YES
None
In [8]:
%sql select min(length(groupName)) as min_group_name_len, max(length(groupName)) as max_group_name \
from W4111GoTSolution.groups;
* mysql+pymysql://root:***@localhost/W4111GoTSolution
1 rows affected.
Out[8]:
min_group_name_len
max_group_name
4
13
In [21]:
%sql select min(length(`character`)) as min_character_name_len, max(length(`character`)) as max_character_name_len \
from W4111GoTSolution.groups;
* mysql+pymysql://root:***@localhost/W4111GoTSolution
1 rows affected.
Out[21]:
min_character_name_len
max_character_name_len
0
21
• Create a table with clean data.
In [10]:
%%sql
create table W4111GotSolutionClean.groups (
id int,
groupName varchar(32),
`characterName` varchar(32))
* mysql+pymysql://root:***@localhost/W4111GoTSolution
0 rows affected.
Out[10]:
[]
In [11]:
%%sql
insert into W4111GotSolutionClean.groups
select id, groupName, `character` from W4111GotSolution.groups;
* mysql+pymysql://root:***@localhost/W4111GoTSolution
168 rows affected.
Out[11]:
[]
In [12]:
%sql select count(*) from W4111GotSolution.groups
* mysql+pymysql://root:***@localhost/W4111GoTSolution
1 rows affected.
Out[12]:
count(*)
168
• We successfully copied all of the rows and set reasonable data types.
• But,
In [13]:
%sql select count(*) from W4111GoTSolutionClean.groups where characterName=”
* mysql+pymysql://root:***@localhost/W4111GoTSolution
1 rows affected.
Out[13]:
count(*)
14
• ” is a common way to represent “unknown” or “not applicable” in text data.Codd’s Rule no. 3 defines the systmatic treatment of unknown or not applicable data.
In [14]:
%sql update W4111GoTSolutionClean.groups set characterName=NULL where characterName=”
* mysql+pymysql://root:***@localhost/W4111GoTSolution
14 rows affected.
Out[14]:
[]
Answers¶
characters
In [45]:
%%sql
create table W4111GotSolutionClean.characters(
id int,
characterName varchar(32),
characterLink varchar(64),
actorName varchar(32),
actorLink varchar(32),
character_id varchar(32),
characterImageThumb varchar(200),
characterImageFull varchar(300),
nickname varchar(32),
kingsguard varchar(32));
insert into W4111GotSolutionClean.characters
select id, characterName, characterLink, actorName, actorLink, character_id, characterImageThumb, characterImageFull, nickname, kingsguard
from W4111GotSolution.characters;
update W4111GoTSolutionClean.characters set characterName=NULL where characterName=”;
update W4111GoTSolutionClean.characters set characterLink=NULL where characterLink=”;
update W4111GoTSolutionClean.characters set actorName=NULL where actorName=”;
update W4111GoTSolutionClean.characters set actorLink=NULL where actorLink=”;
update W4111GoTSolutionClean.characters set character_id=NULL where character_id=”;
update W4111GoTSolutionClean.characters set characterImageThumb=NULL where characterImageThumb=”;
update W4111GoTSolutionClean.characters set characterImageFull=NULL where characterImageFull=”;
update W4111GoTSolutionClean.characters set nickname=NULL where nickname=”;
update W4111GoTSolutionClean.characters set kingsguard=NULL where kingsguard=”;
* mysql+pymysql://root:***@localhost/W4111GoTSolution
0 rows affected.
389 rows affected.
1 rows affected.
0 rows affected.
21 rows affected.
35 rows affected.
36 rows affected.
0 rows affected.
134 rows affected.
134 rows affected.
380 rows affected.
384 rows affected.
Out[45]:
[]
character_relationships
• There is some serialized JSON in the value column for actor.
• Also, actor is redundant and we will need to remove later. It is here because we have to do some additional processing code to better align character relationships with scene information. We know character-actor mapping but until we factor in scenes, we are not sure which actor if there was more than one.
In [55]:
%%sql
create table W4111GotSolutionClean.character_relationships(
id int,
character_id varchar(16),
characterName varchar(32),
label varchar(32),
value varchar(128))
insert into W4111GotSolutionClean.character_relationships
select id, character_id, characterName, label, value
from W4111GotSolution.character_relationships;
update W4111GoTSolutionClean.character_relationships set character_id=NULL where character_id=”;
update W4111GoTSolutionClean.character_relationships set characterName=NULL where characterName=”;
update W4111GoTSolutionClean.character_relationships set label=NULL where label=”;
update W4111GoTSolutionClean.character_relationships set value=NULL where value=”;
* mysql+pymysql://root:***@localhost/W4111GoTSolution
890 rows affected.
1 rows affected.
0 rows affected.
0 rows affected.
0 rows affected.
0 rows affected.
Out[55]:
[]
characters_actors
In [64]:
%%sql
create table W4111GoTSolutionClean.characters_actors (
id int,
character_id varchar(16),
actorName varchar(32),
actorLink varchar(32),
season varchar(16));
* mysql+pymysql://root:***@localhost/W4111GoTSolution
0 rows affected.
Out[64]:
[]
In [65]:
%%sql
insert into W4111GoTSolutionClean.characters_actors
select id, character_id, actorName, actorLink, season
from W4111GoTSolution.characters_actors;
update W4111GoTSolutionClean.characters_actors set character_id=NULL where character_id=”;
update W4111GoTSolutionClean.characters_actors set actorName=NULL where actorName=”;
update W4111GoTSolutionClean.characters_actors set actorLink=NULL where actorLink=”;
update W4111GoTSolutionClean.characters_actors set season=NULL where season=”;
* mysql+pymysql://root:***@localhost/W4111GoTSolution
408 rows affected.
0 rows affected.
0 rows affected.
1 rows affected.
354 rows affected.
Out[65]:
[]
episodes
In [66]:
%sql describe W4111GoTSolution.episodes;
* mysql+pymysql://root:***@localhost/W4111GoTSolution
7 rows affected.
Out[66]:
Field
Type
Null
Key
Default
Extra
id
int
YES
None
seasonNum
int
YES
None
episodeNum
int
YES
None
episodeTitle
text
YES
None
episodeLink
text
YES
None
episodeAirDate
text
YES
None
episodeDescription
text
YES
None
In [70]:
%sql select min(length(episodeDescription)) as min_group_name_len, max(length(episodeDescription)) as max_group_name \
from W4111GoTSolution.episodes;
* mysql+pymysql://root:***@localhost/W4111GoTSolution
1 rows affected.
Out[70]:
min_group_name_len
max_group_name
33
238
In [71]:
%%sql
create table W4111GoTSolutionClean.episodes(
id int,
seasonNum int,
episodeNum int,
episodeTitle varchar(64),
episodeLink varchar(32),
episodeAirDate varchar(32),
episodeDescription varchar(256)
);
* mysql+pymysql://root:***@localhost/W4111GoTSolution
0 rows affected.
Out[71]:
[]
In [72]:
%%sql
insert into W4111GoTSolutionClean.episodes
select id, seasonNum, episodeNum, episodeTitle, episodeLink, episodeAirDate, episodeDescription
from W4111GoTSolution.episodes
* mysql+pymysql://root:***@localhost/W4111GoTSolution
73 rows affected.
Out[72]:
[]
In [73]:
%%sql
update W4111GoTSolutionClean.episodes set episodeTitle=NULL where episodeTitle=”;
update W4111GoTSolutionClean.episodes set episodeLink=NULL where episodeLink=”;
update W4111GoTSolutionClean.episodes set episodeAirDate=NULL where episodeAirDate=”;
update W4111GoTSolutionClean.episodes set episodeDescription=NULL where episodeDescription=”;
* mysql+pymysql://root:***@localhost/W4111GoTSolution
0 rows affected.
0 rows affected.
0 rows affected.
0 rows affected.
Out[73]:
[]
locations
In [74]:
%sql describe W4111GoTSolution.locations
* mysql+pymysql://root:***@localhost/W4111GoTSolution
3 rows affected.
Out[74]:
Field
Type
Null
Key
Default
Extra
id
int
YES
None
location
text
YES
None
subLocation
text
YES
None
In [77]:
%sql select min(length(subLocation)) as min_loc, max(length(subLocation)) as max_loc \
from W4111GoTSolution.locations;
* mysql+pymysql://root:***@localhost/W4111GoTSolution
1 rows affected.
Out[77]:
min_loc
max_loc
0
37
In [78]:
%%sql
create table W4111GoTSolutionClean.locations(
id int,
location varchar(32),
subLocation varchar(64)
);
* mysql+pymysql://root:***@localhost/W4111GoTSolution
0 rows affected.
Out[78]:
[]
In [79]:
%%sql
insert into W4111GoTSolutionClean.locations
select id, location, subLocation
from W4111GoTSolution.locations;
* mysql+pymysql://root:***@localhost/W4111GoTSolution
129 rows affected.
Out[79]:
[]
In [80]:
%%sql
update W4111GoTSolutionClean.locations set location=NULL where location=”;
update W4111GoTSolutionClean.locations set subLocation=NULL where subLocation=”;
* mysql+pymysql://root:***@localhost/W4111GoTSolution
0 rows affected.
26 rows affected.
Out[80]:
[]
scenes
In [81]:
%sql describe W4111GoTSolution.scenes
* mysql+pymysql://root:***@localhost/W4111GoTSolution
12 rows affected.
Out[81]:
Field
Type
Null
Key
Default
Extra
id
int
YES
None
seasonNum
int
YES
None
episodeNum
int
YES
None
sceneNo
int
YES
None
sceneStart
text
YES
None
sceneEnd
text
YES
None
location
text
YES
None
subLocation
text
YES
None
greensight
text
YES
None
altLocation
text
YES
None
warg
text
YES
None
flashback
text
YES
None
In [87]:
%sql select min(length(warg)) as min_data, max(length(warg)) as max_data \
from W4111GoTSolution.scenes
* mysql+pymysql://root:***@localhost/W4111GoTSolution
1 rows affected.
Out[87]:
min_data
max_data
0
4
In [88]:
%%sql
create table W4111GoTSolutionClean.scenes(
id int,
seasonNum int,
episodeNum int,
sceneNo int,
SceneStart varchar(16),
SceneEnd varchar(16),
location varchar(32),
subLocation varchar(64),
greensight varchar(8),
altLocation varchar(32),
warg varchar(8)
);
* mysql+pymysql://root:***@localhost/W4111GoTSolution
0 rows affected.
Out[88]:
[]
In [89]:
%%sql
insert into W4111GoTSolutionClean.scenes
select id, seasonNum, episodeNum, sceneNo, SceneStart, SceneEnd, location, subLocation, greensight, altLocation, warg
from W4111GoTSolution.scenes;
* mysql+pymysql://root:***@localhost/W4111GoTSolution
4165 rows affected.
Out[89]:
[]
In [91]:
%%sql
update W4111GoTSolutionClean.scenes set SceneStart=NULL where SceneStart=”;
update W4111GoTSolutionClean.scenes set SceneEnd=NULL where SceneEnd=”;
update W4111GoTSolutionClean.scenes set location=NULL where location=”;
update W4111GoTSolutionClean.scenes set subLocation=NULL where subLocation=”;
update W4111GoTSolutionClean.scenes set greensight=NULL where greensight=”;
update W4111GoTSolutionClean.scenes set altLocation=NULL where altLocation=”;
update W4111GoTSolutionClean.scenes set warg=NULL where warg=”;
* mysql+pymysql://root:***@localhost/W4111GoTSolution
0 rows affected.
0 rows affected.
0 rows affected.
495 rows affected.
4149 rows affected.
4130 rows affected.
4158 rows affected.
Out[91]:
[]
scenes_characters
In [92]:
%sql describe W4111GoTSolution.scenes_characters
* mysql+pymysql://root:***@localhost/W4111GoTSolution
5 rows affected.
Out[92]:
Field
Type
Null
Key
Default
Extra
id
int
YES
None
seasonNum
int
YES
None
episodeNum
int
YES
None
sceneNo
int
YES
None
characterName
text
YES
None
In [96]:
%%sql
create table W4111GoTSolutionClean.scenes_characters(
id int,
seasonNum int,
episodeNum int,
sceneNo int,
characterName varchar(64)
);
* mysql+pymysql://root:***@localhost/W4111GoTSolution
0 rows affected.
Out[96]:
[]
In [97]:
%%sql
insert into W4111GoTSolutionClean.scenes_characters
select id, seasonNum, episodeNum, sceneNo, characterName
from W4111GoTSolution.scenes_characters;
* mysql+pymysql://root:***@localhost/W4111GoTSolution
12114 rows affected.
Out[97]:
[]
In [98]:
%%sql
update W4111GoTSolutionClean.scenes_characters set characterName=NULL where characterName=”;
* mysql+pymysql://root:***@localhost/W4111GoTSolution
0 rows affected.
Out[98]:
[]
In [ ]:
Keys, Constraints and Integrity¶
Task¶
• Correct analysis and management of the data requires defining keys and integrity constraints.
• Use DML statements to modify the schema to create the constraints you think best.
• The following diagram is my definition, but this is an example and may not be correct. It is simply to help you get started.

GoT ER Model
Answer¶
characters, characters_actors
In [ ]:
characters_relationships
In [ ]:
groups
In [ ]:
locations
In [ ]:
scenes
In [ ]:
scenes_characters
In [ ]:
View¶
Create the following views.
Character and Locations¶
• Characters appear in scenes at locations.
• Create a view of the form:
▪ location
▪ character
▪ count of scenes with the character and the location.
▪ Only include characters that appeared in atleast 50 scenes.
Answer
In [ ]:
Actors and Episodes¶
• Create the following view:
▪ seasonNum
▪ episodeNum
▪ no_of_scenes, which is the count of scenes in which the character appeared
▪ character_id
▪ character_name
▪ actor_name
▪ Ordered by seasonNum and characterNum ascending and no_of_scenes descending
Answer
In [ ]:
Character Interactions¶
• Characters interacted if they appeared in the same scene.
• Create the following view:
▪ characterName_1
▪ character_id_1
▪ characterName_2
▪ character_id_2
▪ scene_count, number of scenes containing both characters.
▪ Order by scene_count descending.
• Each pair should appear only once, that is the table should not contain two rows of the form:
▪ character_1, character_2, scene_count
▪ character_2, character_1, scene_count
Answer
In [ ]:
Interesting View 1¶
• Create a view of the information that you think would be interesting to fans.
• Explain what you view is producing.
Answer
• Explanation:
In [ ]:
Interesting View 2¶
• Create a view of the information that you think would be interesting to fans.
• Explain what you view is producing.
Answer
• Explanation:
Programming¶
Overview¶
• Ensuring that programming track students build core skills is essential. Programming track assignments will contain a subset of the broader, more complete tasks assigned in the No programming assignments. So, there is a common core section to the programming track homework.
• Additionally, the programming track will have tasks that reinforce skills necessary for programming and databases.
Common Core¶
Tasks¶
• Schema and DML: For the following tables, perform the data import, data cleanup and schema improvement takes for the following tables:
• characters
• scenes
• scenes_characters
• Views
1. Implement the view in 3.5.3, “Character Interactions.”
2. Define, explain and implement a view of you choosing.
Answers¶
Data Import and Cleanup
• Characters
In [ ]:
# %%sql
# create table W4111GotSolutionClean.characters(
# id int,
# characterName varchar(32),
# characterLink varchar(64),
# actorName varchar(32),
# actorLink varchar(32),
# character_id varchar(32),
# characterImageThumb varchar(200),
# characterImageFull varchar(300),
# nickname varchar(32),
# kingsguard varchar(32));
In [109]:
# %sql load data infile “/Users/boyuan/Desktop/IntroToDB/HW1C/GoTHW/Data/csv/characters.csv ” into table W4111GoTSolution.characters fields terminated by ‘,’;
Using MySQLWorkbench to import data, original data stored in W4111GoTSoultion, cleaned data stored in W4111GoTSolutionClean
In [114]:
%sql CREATE SCHEMA `W4111GoTSolutionClean` ;
* mysql+pymysql://root:***@localhost/W4111GoTSolution
1 rows affected.
Out[114]:
[]
In [115]:
%%sql
create table W4111GotSolutionClean.characters(
id int,
characterName varchar(32),
characterLink varchar(64),
actorName varchar(32),
actorLink varchar(32),
character_id varchar(32),
characterImageThumb varchar(200),
characterImageFull varchar(300),
nickname varchar(32),
kingsguard varchar(32));
insert into W4111GotSolutionClean.characters
select id, characterName, characterLink, actorName, actorLink, character_id, characterImageThumb, characterImageFull, nickname, kingsguard
from W4111GotSolution.characters;
update W4111GoTSolutionClean.characters set characterName=NULL where characterName=”;
update W4111GoTSolutionClean.characters set characterLink=NULL where characterLink=”;
update W4111GoTSolutionClean.characters set actorName=NULL where actorName=”;
update W4111GoTSolutionClean.characters set actorLink=NULL where actorLink=”;
update W4111GoTSolutionClean.characters set character_id=NULL where character_id=”;
update W4111GoTSolutionClean.characters set characterImageThumb=NULL where characterImageThumb=”;
update W4111GoTSolutionClean.characters set characterImageFull=NULL where characterImageFull=”;
update W4111GoTSolutionClean.characters set nickname=NULL where nickname=”;
update W4111GoTSolutionClean.characters set kingsguard=NULL where kingsguard=”;
* mysql+pymysql://root:***@localhost/W4111GoTSolution
0 rows affected.
389 rows affected.
0 rows affected.
21 rows affected.
35 rows affected.
36 rows affected.
0 rows affected.
134 rows affected.
134 rows affected.
380 rows affected.
384 rows affected.
Out[115]:
[]
• Scenes
Using MySQLWorkbench to import data, original data stored in W4111GoTSoultion, cleaned data stored in W4111GoTSolutionClean
In [116]:
%%sql
create table W4111GoTSolutionClean.scenes(
id int,
seasonNum int,
episodeNum int,
sceneNo int,
SceneStart varchar(16),
SceneEnd varchar(16),
location varchar(32),
subLocation varchar(64),
greensight varchar(8),
altLocation varchar(32),
warg varchar(8)
);
insert into W4111GoTSolutionClean.scenes
select id, seasonNum, episodeNum, sceneNo, SceneStart, SceneEnd, location, subLocation, greensight, altLocation, warg
from W4111GoTSolution.scenes;
update W4111GoTSolutionClean.scenes set SceneStart=NULL where SceneStart=”;
update W4111GoTSolutionClean.scenes set SceneEnd=NULL where SceneEnd=”;
update W4111GoTSolutionClean.scenes set location=NULL where location=”;
update W4111GoTSolutionClean.scenes set subLocation=NULL where subLocation=”;
update W4111GoTSolutionClean.scenes set greensight=NULL where greensight=”;
update W4111GoTSolutionClean.scenes set altLocation=NULL where altLocation=”;
update W4111GoTSolutionClean.scenes set warg=NULL where warg=”;
* mysql+pymysql://root:***@localhost/W4111GoTSolution
0 rows affected.
4165 rows affected.
0 rows affected.
0 rows affected.
0 rows affected.
495 rows affected.
4149 rows affected.
4130 rows affected.
4158 rows affected.
Out[116]:
[]
• Scenes and Characters
Using MySQLWorkbench to import data, original data stored in W4111GoTSoultion, cleaned data stored in W4111GoTSolutionClean
In [117]:
%%sql
create table W4111GoTSolutionClean.scenes_characters(
id int,
seasonNum int,
episodeNum int,
sceneNo int,
characterName varchar(64)
);
insert into W4111GoTSolutionClean.scenes_characters
select id, seasonNum, episodeNum, sceneNo, characterName
from W4111GoTSolution.scenes_characters;
update W4111GoTSolutionClean.scenes_characters set characterName=NULL where characterName=”;
* mysql+pymysql://root:***@localhost/W4111GoTSolution
0 rows affected.
12114 rows affected.
0 rows affected.
Out[117]:
[]
Keys and Constraints
• Characters
In [120]:
%%sql
alter table W4111GoTSolutionClean.characters
add primary key (id)
* mysql+pymysql://root:***@localhost/W4111GoTSolution
—————————————————————————
InternalError Traceback (most recent call last)
~/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/engine/base.py in _execute_context(self, dialect, constructor, statement, parameters, *args)
1248 self.dialect.do_execute(
-> 1249 cursor, statement, parameters, context
1250 )
~/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/engine/default.py in do_execute(self, cursor, statement, parameters, context)
579 def do_execute(self, cursor, statement, parameters, context=None):
–> 580 cursor.execute(statement, parameters)
581
~/.local/lib/python3.7/site-packages/pymysql/cursors.py in execute(self, query, args)
169
–> 170 result = self._query(query)
171 self._executed = query
~/.local/lib/python3.7/site-packages/pymysql/cursors.py in _query(self, q)
327 self._clear_result()
–> 328 conn.query(q)
329 self._do_get_result()
~/.local/lib/python3.7/site-packages/pymysql/connections.py in query(self, sql, unbuffered)
516 self._execute_command(COMMAND.COM_QUERY, sql)
–> 517 self._affected_rows = self._read_query_result(unbuffered=unbuffered)
518 return self._affected_rows
~/.local/lib/python3.7/site-packages/pymysql/connections.py in _read_query_result(self, unbuffered)
731 result = MySQLResult(self)
–> 732 result.read()
733 self._result = result
~/.local/lib/python3.7/site-packages/pymysql/connections.py in read(self)
1074 try:
-> 1075 first_packet = self.connection._read_packet()
1076
~/.local/lib/python3.7/site-packages/pymysql/connections.py in _read_packet(self, packet_type)
683 packet = packet_type(buff, self.encoding)
–> 684 packet.check_error()
685 return packet
~/.local/lib/python3.7/site-packages/pymysql/protocol.py in check_error(self)
219 if DEBUG: print(“errno =”, errno)
–> 220 err.raise_mysql_exception(self._data)
221
~/.local/lib/python3.7/site-packages/pymysql/err.py in raise_mysql_exception(data)
108 errorclass = error_map.get(errno, InternalError)
–> 109 raise errorclass(errno, errval)
InternalError: (1068, ‘Multiple primary key defined’)
The above exception was the direct cause of the following exception:
InternalError Traceback (most recent call last)
—-> 1 get_ipython().run_cell_magic(‘sql’, ”, ‘alter table W4111GoTSolutionClean.characters\nadd primary key (id)\n’)
~/opt/anaconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py in run_cell_magic(self, magic_name, line, cell)
2357 with self.builtin_trap:
2358 args = (magic_arg_s, cell)
-> 2359 result = fn(*args, **kwargs)
2360 return result
2361
in execute(self, line, cell, local_ns)
~/opt/anaconda3/lib/python3.7/site-packages/IPython/core/magic.py in
185 # but it’s overkill for just that one bit of state.
186 def magic_deco(arg):
–> 187 call = lambda f, *a, **k: f(*a, **k)
188
189 if callable(arg):
in execute(self, line, cell, local_ns)
~/opt/anaconda3/lib/python3.7/site-packages/IPython/core/magic.py in
185 # but it’s overkill for just that one bit of state.
186 def magic_deco(arg):
–> 187 call = lambda f, *a, **k: f(*a, **k)
188
189 if callable(arg):
~/.local/lib/python3.7/site-packages/sql/magic.py in execute(self, line, cell, local_ns)
93
94 try:
—> 95 result = sql.run.run(conn, parsed[‘sql’], self, user_ns)
96
97 if result is not None and not isinstance(result, str) and self.column_local_vars:
~/.local/lib/python3.7/site-packages/sql/run.py in run(conn, sql, config, user_namespace)
338 else:
339 txt = sqlalchemy.sql.text(statement)
–> 340 result = conn.session.execute(txt, user_namespace)
341 _commit(conn=conn, config=config)
342 if result and config.feedback:
~/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/engine/base.py in execute(self, object_, *multiparams, **params)
986 raise exc.ObjectNotExecutableError(object_)
987 else:
–> 988 return meth(self, multiparams, params)
989
990 def _execute_function(self, func, multiparams, params):
~/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/sql/elements.py in _execute_on_connection(self, connection, multiparams, params)
285 def _execute_on_connection(self, connection, multiparams, params):
286 if self.supports_execution:
–> 287 return connection._execute_clauseelement(self, multiparams, params)
288 else:
289 raise exc.ObjectNotExecutableError(self)
~/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/engine/base.py in _execute_clauseelement(self, elem, multiparams, params)
1105 distilled_params,
1106 compiled_sql,
-> 1107 distilled_params,
1108 )
1109 if self._has_events or self.engine._has_events:
~/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/engine/base.py in _execute_context(self, dialect, constructor, statement, parameters, *args)
1251 except BaseException as e:
1252 self._handle_dbapi_exception(
-> 1253 e, statement, parameters, cursor, context
1254 )
1255
~/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/engine/base.py in _handle_dbapi_exception(self, e, statement, parameters, cursor, context)
1471 util.raise_from_cause(newraise, exc_info)
1472 elif should_wrap:
-> 1473 util.raise_from_cause(sqlalchemy_exception, exc_info)
1474 else:
1475 util.reraise(*exc_info)
~/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/util/compat.py in raise_from_cause(exception, exc_info)
396 exc_type, exc_value, exc_tb = exc_info
397 cause = exc_value if exc_value is not exception else None
–> 398 reraise(type(exception), exception, tb=exc_tb, cause=cause)
399
400
~/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/util/compat.py in reraise(tp, value, tb, cause)
150 value.__cause__ = cause
151 if value.__traceback__ is not tb:
–> 152 raise value.with_traceback(tb)
153 raise value
154
~/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/engine/base.py in _execute_context(self, dialect, constructor, statement, parameters, *args)
1247 if not evt_handled:
1248 self.dialect.do_execute(
-> 1249 cursor, statement, parameters, context
1250 )
1251 except BaseException as e:
~/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/engine/default.py in do_execute(self, cursor, statement, parameters, context)
578
579 def do_execute(self, cursor, statement, parameters, context=None):
–> 580 cursor.execute(statement, parameters)
581
582 def do_execute_no_params(self, cursor, statement, context=None):
~/.local/lib/python3.7/site-packages/pymysql/cursors.py in execute(self, query, args)
168 query = self.mogrify(query, args)
169
–> 170 result = self._query(query)
171 self._executed = query
172 return result
~/.local/lib/python3.7/site-packages/pymysql/cursors.py in _query(self, q)
326 self._last_executed = q
327 self._clear_result()
–> 328 conn.query(q)
329 self._do_get_result()
330 return self.rowcount
~/.local/lib/python3.7/site-packages/pymysql/connections.py in query(self, sql, unbuffered)
515 sql = sql.encode(self.encoding, ‘surrogateescape’)
516 self._execute_command(COMMAND.COM_QUERY, sql)
–> 517 self._affected_rows = self._read_query_result(unbuffered=unbuffered)
518 return self._affected_rows
519
~/.local/lib/python3.7/site-packages/pymysql/connections.py in _read_query_result(self, unbuffered)
730 else:
731 result = MySQLResult(self)
–> 732 result.read()
733 self._result = result
734 if result.server_status is not None:
~/.local/lib/python3.7/site-packages/pymysql/connections.py in read(self)
1073 def read(self):
1074 try:
-> 1075 first_packet = self.connection._read_packet()
1076
1077 if first_packet.is_ok_packet():
~/.local/lib/python3.7/site-packages/pymysql/connections.py in _read_packet(self, packet_type)
682
683 packet = packet_type(buff, self.encoding)
–> 684 packet.check_error()
685 return packet
686
~/.local/lib/python3.7/site-packages/pymysql/protocol.py in check_error(self)
218 errno = self.read_uint16()
219 if DEBUG: print(“errno =”, errno)
–> 220 err.raise_mysql_exception(self._data)
221
222 def dump(self):
~/.local/lib/python3.7/site-packages/pymysql/err.py in raise_mysql_exception(data)
107 errval = data[3:].decode(‘utf-8’, ‘replace’)
108 errorclass = error_map.get(errno, InternalError)
–> 109 raise errorclass(errno, errval)
InternalError: (pymysql.err.InternalError) (1068, ‘Multiple primary key defined’)
[SQL: alter table W4111GoTSolutionClean.characters
add primary key (id)]
(Background on this error at: http://sqlalche.me/e/2j85)
• Scenes
In [122]:
%%sql
alter table W4111GoTSolutionClean.scenes
add primary key (sceneNo, seasonNum, episodeNum)
* mysql+pymysql://root:***@localhost/W4111GoTSolution
0 rows affected.
Out[122]:
[]
• Scenes and Characters
In [129]:
%%sql
alter table W4111GoTSolutionClean.scenes_characters
add primary key (id);
alter table W4111GoTSolutionClean.scenes_characters
add foreign key (sceneNo, seasonNum, episodeNum) references W4111GoTSolutionClean.scenes(sceneNo, seasonNum, episodeNum);
* mysql+pymysql://root:***@localhost/W4111GoTSolution
12114 rows affected.
Out[129]:
[]
Views
• Character Interaction Views
In [ ]:
• Your View and Explanation
Programming¶
Data Processing and Cleansing¶
Task¶
• The scene information in episodes.json complements and extend some of the information about character interactions and relationships. An example is:
{
“sceneStart”:”0:50:24″,
“sceneEnd”:”0:55:54″,
“location”:”The Crownlands”,
“subLocation”:”King’s Landing”,
“characters”:[
{
“name”:”Arya Stark”,
“weapon”:[
{“action”:”has”, “name”:”Needle”}
]
},
{
“name”:”Eddard Stark”,
“title”:”Hand”,
“alive”:false,
“mannerOfDeath”:”Decapitation”,
“killedBy”:[
“Ilyn Payne”
]
},
{“name”:”Yoren”},
{“name”:”Sansa Stark”},
{“name”:”Petyr Baelish”},
{“name”:”Cersei Lannister”},
{“name”:”Joffrey Baratheon”, “title”:”King”},
{“name”:”Grand Maester Pycelle”},
{“name”:”Lord Varys”},
{“name”:”Sandor Clegane”},
{
“name”:”Ilyn Payne”,
“weapon”:[
{“action”:”has”, “name”:”Ice”}
]
}
]
}
]
• A little data processing determines that the top-level elements of a character in a scene are:
Top level keys in characters in scenes are:
sex
name
weapon
born
married
killedBy
mannerOfDeath
title
alive
• Your data processing and cleanse tasks is to create a table of the form:
▪ seasonNum
▪ episodeNum
▪ sceneNo
▪ character_id_1
▪ action
▪ character_id_2
• You can get access to information that maps characterName to character_id using the following code.
In [134]:
import pymysql
cloud_conn = pymysql.connect(
host=”columbia8013.cqgsme1nmjms.us-east-1.rds.amazonaws.com”,
user=”gotuser”,
password=”gotusergotuser”,
db=”W4111GoTSolutionClean”,
cursorclass=pymysql.cursors.DictCursor)
In [135]:
def get_character_info(c_name):
sql = “select * from characters where characterName=%s”
cur = cloud_conn.cursor()
res = cur.execute(sql, (c_name))
res = cur.fetchall()
return res
In [139]:
get_character_info(“Arya Stark”)
Out[139]:
[{‘id’: 15,
‘characterName’: ‘Arya Stark’,
‘characterLink’: ‘/character/ch0158604/’,
‘actorName’: ‘Maisie Williams’,
‘actorLink’: ‘/name/nm3586035/’,
‘character_id’: ‘CH_15’,
‘royal’: 0,
‘characterImageThumb’: ‘https://images-na.ssl-images-amazon.com/images/M/MV5BMTk5MTYwNDc0OF5BMl5BanBnXkFtZTcwOTg2NDg1Nw@@._V1._SX100_SY140_.jpg’,
‘characterImageFull’: ‘https://images-na.ssl-images-amazon.com/images/M/MV5BMTk5MTYwNDc0OF5BMl5BanBnXkFtZTcwOTg2NDg1Nw@@._V1_SY1000_CR0,0,665,1000_AL_.jpg’,
‘nickname’: None,
‘kingsguard’: 0}]
Answer¶
Table Definition
In [146]:
%%sql
drop table W4111GoTSolutionClean.task;
create table W4111GoTSolutionClean.task(
seasonNum int default null,
episodeNum int default null,
sceneNo int default null,
character_id_1 int default null,
action varchar(64) default null,
character_id_2 int default null
);
* mysql+pymysql://root:***@localhost/W4111GoTSolution
0 rows affected.
0 rows affected.
0 rows affected.
Out[146]:
[]
In [ ]:
Sample Query
In [ ]:
Python Code
$\alpha^2$
Data Access Abstraction¶
• A common API and abstraction for manipulating different database types and schema simplifies the development of web applications and other applications. There are many abstractions.

Data Access Objects
• The second programming tasks is to implement our own abstraction. This is RDBDataTable, which implements an abstraction we will use in subsequent homework assignments, and which we will use to enable our web application.
• You can find the implementation template in the src directory of the project template.
• For HW 1, Part C, you can copy your implementation code into this notebook and test again the cloud databases.
• I strongly suggest that you develop and test in PyCharm, and then copy your code into the notebook. The cells below just show you where to place your code. You should copy your code over from PyCharm.
Implementation¶
• BaseDataTable defines the common abstraction we will use. You do not need to do anything to this class.
In [147]:
# Import package to enable defining abstract classes in Python.
# Do not worry about understanding abstract base classes. This is just a class that defines
# some methods that subclasses must implement.
from abc import ABC, abstractmethod
class DataTableException(Exception):
“””
A simple class that maps underlying implementation exceptions to generic exceptions.
“””
invalid_method = 1001
# General
def __init__(self, code, message):
self.code = code
self.message = message
def __str__(self):
result = (
type(self), {“code”: self.code, “message”: self.message}
)
result = str(result)
return result
class BaseDataTable(ABC):
“””
The implementation classes (XXXDataTable) for CSV database, relational, etc. will extend the
base class and implement the abstract methods.
“””
def __init__(self, entity_type_name, connect_info, key_columns=None, context=None):
“””
:param entity_type_name: Name of the logcal entity type. This maps to various abstractions in
underlying stores, e.g. file names, RDB tables, Neo4j Labels, …
:param connect_info: Dictionary of parameters necessary to connect to the data. See examples in subclasses.
:param key_columns: List, in order, of the columns (fields) that comprise the primary key.
A primary key is a set of columns whose values are unique and uniquely identify a row. For Appearances,
the columns are [‘playerID’, ‘teamID’, ‘yearID’]
:param contex: Holds context and environment information.
“””
self._table_name = entity_type_name
self._connect_info = connect_info
self._key_columns = key_columns
self._context = context
@abstractmethod
def find_by_primary_key(self, key_fields, field_list=None, context=None):
“””
:param key_fields: The values for the key_columns, in order, to use to find a record. For example,
for Appearances this could be [‘willite01’, ‘BOS’, ‘1960’]
:param field_list: A subset of the fields of the record to return. The CSV file or RDB table may have many
additional columns, but the caller only requests this subset.
:return: None, or a dictionary containing the columns/values for the row.
“””
pass
@abstractmethod
def find_by_template(self, template, field_list=None, limit=None, offset=None, order_by=None, context=None):
“””
:param template: A dictionary of the form { “field1” : value1, “field2”: value2, …}. The function will return
a derived table containing the rows that match the template.
:param field_list: A list of requested fields of the form, [‘fielda’, ‘fieldb’, …]
:param limit: Do not worry about this for now.
:param offset: Do not worry about this for now.
:param order_by: Do not worry about this for now.
:return: A derived table containing the computed rows.
“””
pass
@abstractmethod
def insert(self, new_entity, context=None):
“””
:param new_record: A dictionary representing a row to add to the set of records. Raises an exception if this
creates a duplicate primary key.
:return: None
“””
pass
@abstractmethod
def delete_by_template(self, template, context=None):
“””
Deletes all records that match the template.
:param template: A template.
:return: A count of the rows deleted.
“””
pass
@abstractmethod
def delete_by_key(self, key_fields, Context=None):
“””
Delete record with corresponding key.
:param key_fields: List containing the values for the key columns
:return: A count of the rows deleted.
“””
pass
@abstractmethod
def update_by_template(self, template, new_values, context=None):
“””
:param template: A template that defines which matching rows to update.
:param new_values: A dictionary containing fields and the values to set for the corresponding fields
in the records. This returns an error if the update would create a duplicate primary key. NO ROWS are
update on this error.
:return: The number of rows updates.
“””
pass
@abstractmethod
def update_by_key(self, key_fields, new_values, context=None):
“””
:param key_fields: List of values for primary key fields
:param new_values: A dictionary containing fields and the values to set for the corresponding fields
in the records. This returns an error if the update would create a duplicate primary key. NO ROWS are
update on this error.
:return: The number of rows updates.
“””
pass
@abstractmethod
def query(self, query_statement, args, context=None):
“””
Passed through/executes a raw query in the native implementation language of the backend.
:param query_statement: Query statement as a string.
:param args: Args to insert into query if it is a template
:param context:
:return: A JSON object containing the result of the operation.
“””
pass
@abstractmethod
def load(self, rows=None):
“””
Loads data into the data table.
:param rows:
:return: Number of rows loaded.
“””
@abstractmethod
def save(self, context):
“””
Writes any cached data to a backing store.
:param context:
:return:
“””
RDBDataTable
• You only need to implement the following methods:
▪ __init__
▪ __str__
▪ find_by_template You only need to implement the parameters
◦ template
◦ field_list
▪ find_by_primary_key You only need to implement the parameters
◦ key_fields
◦ field_list
• The logical behavior of the find_by_template API is the following:
▪ If you create an RDBDataTable instance for a SQL table GoTSolutionClean.scenes_characters,
▪ The call
characters_tbl.find_by_template(
▪ template={
▪ ‘seasonNum’: 1,
▪ ‘episodeNum’: 2
▪ },
▪ field_list = [‘seasonNum’, ‘episodeNum’, ‘sceneNo’, ‘character_id’]
▪ )
▪ Is equivalent to select seasonNum, episodeNum, sceneNo, character_id
▪ from GoTSolutionClean.scenes_characters
▪ where
▪ seasonNum=1 and episodeNum=1
▪ And returns the same dictionary.
• The logical behavior of the find_by_primary_key API is the following:
▪ If you create an RDBDataTable instance for a SQL table GoTSolutionClean.episodes,
▪ The primary key columns are seasonNum, episodeNum.
▪ The invocation would be find_by_primary_key([1,3],field_list=[‘seasonNum’,’episodeNum’,’title’])
▪ This would get converted into select seasonNum, episodeNum, title
▪ from W4111GoTSolutionClean.episodes
▪ where
▪ seasonNum=1 and episodeNum=1
▪ Your class implementation MUST run a database query to determine the primary key columns and their order to form the where clause.
▪ The result is a single dictionary containing the query result.
• We will provide tests.
In [148]:
from src.data_tables.BaseDataTable import BaseDataTable
import pandas as pd
import pymysql
import logging
logger = logging.getLogger()
class RDBDataTable(BaseDataTable):
“””
RDBDataTable is relation DB implementation of the BaseDataTable.
“””
# Default connection information in case the code does not pass an object
# specific connection on object creation.
_default_connect_info = pymysql.connect(
host=”columbia8013.cqgsme1nmjms.us-east-1.rds.amazonaws.com”,
user=”gotuser”,
password=”gotusergotuser”,
db=”W4111GoTSolutionClean”,
cursorclass=pymysql.cursors.DictCursor)
_default_cnx = None
@classmethod
def _get_default_cnx(cls):
if RDBDataTable._default_cnx is None:
# Create a connection to use inside this object. In general, this is not the right approach.
# There would be a connection pool shared across many classes and applications.
cls._default_cnx = pymysql.connect(
host=”columbia8013.cqgsme1nmjms.us-east-1.rds.amazonaws.com”,
user=”gotuser”,
password=”gotusergotuser”,
db=”W4111GoTSolutionClean”,
cursorclass=pymysql.cursors.DictCursor)
return cls._default_cnx
def __init__(self, table_name, key_columns=None, connect_info=None):
“””
:param table_name: The name of the RDB table.
:param connect_info: Dictionary of parameters necessary to connect to the data.
:param key_columns: List, in order, of the columns (fields) that comprise the primary key.
This is for compatibility with other types of data table. Any value other than None is an error.
“””
# If there is not explicit connect information, use the defaults.
####### Your Code Goes Here #########
####### Your Code Goes Here #########
def __str__(self):
“””
:return: String representation of the table’s metadata.
“””
####### Your Code Goes Here #########
####### Your Code Goes Here #########
@classmethod
def _run_q(cls, q, args=None, fields=None, fetch=True, cnx=None, cursor=None, commit=True):
“””
:param q: An SQL query string that may have %s slots for argument insertion. The string
may also have {} after select for columns to choose.
:param args: A tuple of values to insert in the %s slots.
:param fetch: If true, return the result.
:param cnx: A database connection. May be None
:param cncursor: Do not worry about this for now.
:param commit: Do not worry about this for now. This is more wizard stuff.
:return: A result set or None.
“””
r = None
cursor_created = False
if cnx is None:
cnx = cls._get_default_cnx()
try:
# Use the connection in the object if no connection provided.
# Convert the list of columns into the form “col1, col2, …” for following SELECT.
if fields:
q = q.format(“,”.join(fields))
if cursor is None:
cursor = cnx.cursor() # Just ignore this for now.
cursor_created = True
# If debugging is turned on, will print the query sent to the database.
#self.debug_message(“Query = “, cursor.mogrify(q, args))
cursor.execute(q, args) # Execute the query.
# Technically, INSERT, UPDATE and DELETE do not return results.
# Sometimes the connector libraries return the number of created/deleted rows.
if fetch:
r = cursor.fetchall() # Return all elements of the result.
else:
r = None
if commit: # Do not worry about this for now.
cnx.commit()
if cursor_created:
cursor.close()
except Exception as e:
print(“Exception e = “, e)
if commit:
cnx.rollback()
if cursor_created:
cursor.close()
return r
def _run_insert(self, table_name, column_list, values_list, cnx=None, commit=True):
“””
:param table_name: Name of the table to insert data. Probably should just get from the object data.
:param column_list: List of columns for insert.
:param values_list: List of column values.
:param cnx: Ignore this for now.
:param commit: Ignore this for now.
:return:
“””
try:
q = “insert into ” + table_name + ” ”
# If the column list is not None, form the (col1, col2, …) part of the statement.
if column_list is not None:
q += “(” + “,”.join(column_list) + “) ”
# We will use query parameters. For a term of the form values(%s, %s, …) with one slot for
# each value to insert.
values = [“%s”] * len(values_list)
# Form the values(%s, %s, …) part of the statement.
values = ” ( ” + “,”.join(values) + “) ”
values = “values” + values
# Put all together.
q += values
self._run_q(q, args=values_list, fields=None, fetch=False, cnx=cnx, commit=commit)
except Exception as e:
print(“Got exception = “, e)
raise e
@classmethod
def get_folders(cls):
pass
def find_by_primary_key(self, key_fields, field_list=None, context=None):
“””
:param key_fields: The values for the key_columns, in order, to use to find a record.
:param field_list: A subset of the fields of the record to return.
:return: None, or a dictionary containing the request fields for the record identified
by the key.
“””
####### Your Code Goes Here #########
####### Your Code Goes Here #########
def _template_to_where_clause(self, t):
“””
Convert a query template into a WHERE clause.
:param t: Query template.
:return: (WHERE clause, arg values for %s in clause)
“””
w_clause=None
args = None
####### Your Code Goes Here #########
####### Your Code Goes Here #########
return w_clause, args
def find_by_template(self, template, field_list=None, limit=None, offset=None, order_by=None, commit=True):
“””
:param template: A dictionary of the form { “field1” : value1, “field2”: value2, …}
:param field_list: A list of request fields of the form, [‘fielda’, ‘fieldb’, …]
:param limit: Do not worry about this for now.
:param offset: Do not worry about this for now.
:param order_by: Do not worry about this for now.
:return: A list containing dictionaries. A dictionary is in the list representing each record
that matches the template. The dictionary only contains the requested fields.
“””
####### Your Code Goes Here #########
####### Your Code Goes Here #########
def insert(self, new_record):
“””
:param new_record: A dictionary representing a row to add to the set of records.
:return: None
“””
pass
def delete_by_template(self, template):
“””
Deletes all records that match the template.
:param template: A template.
:return: A count of the rows deleted.
“””
pass
def delete_by_key(self, key_fields):
“””
Delete record with corresponding key.
:param key_fields: List containing the values for the key columns
:return: A count of the rows deleted.
“””
pass
def update_by_template(self, template, new_values):
“””
:param template: A template that defines which matching rows to update.
:param new_values: A dictionary containing fields and the values to set for the corresponding fields
in the records. This returns an error if the update would create a duplicate primary key. NO ROWS are
update on this error.
:return: The number of rows updates.
“””
pass
def update_by_key(self, key_fields, new_values):
“””
:param key_fields: List of values for primary key fields
:param new_values: A dictionary containing fields and the values to set for the corresponding fields
in the records. This returns an error if the update would create a duplicate primary key. NO ROWS are
update on this error.
:return: The number of rows updates.
“””
pass
def load(self):
pass
def save(self):
pass
def query(self, query_statement, args, context=None):
pass
Tests
• We will provide some tests.
In [ ]: