程序代写代做代考 Excel database ER SQL cache python COMS W4111: Introduction to Databases

COMS W4111: Introduction to Databases
Spring 2020, Sections 002, H02, V02

Non-Programming Project Notebook

Introduction¶
• This notebook defines HW1, Part C for both the programming and no-programming tracks.
• Submission instructions:
▪ Both no programming and programming tracks submit a zip file containing all of the files in the HW project directory that you clone (see below).
▪ Much of your work is completing sections of this notebook, but you MUST edit the copy in the project directory.
▪ If you reference or create any files, please create in the project directory.
▪ The name of your zip file MUST be uni_W4111_s20_hw1c, replace uni with you UNI. My submission would be dff9_W4111_s20_hw1c.
▪ The homework is due on 23-February-2020 at 11:59 PM. You MAY use late days.
• We will use the pinned Piazza post to clarify the homework and answer questions.

Common Setup¶
Both tracks should:
• MySQL: Install
▪ MySQL Server Community Edition
▪ MySQL Workbench
▪ In addition to the official links, there are many operating specific tutorials.
• Create a Lucidchart educational account.
• Project template:
▪ If you do not have Git on your computer, please install it.
▪ Homework 1 setup instructions recommended that you create a folder /IntroToDB on your desktop.
▪ Under this directory, create a folder HW1C.
▪ Open terminal shell or Git enabled command prompt.
▪ CD to the directory you created and enter git clone https://github.com/donald-f-ferguson/GoTHW
▪ This will create the project template for you.

HW1C Project Template

• Run the following cells. Your answers WILL BE different than the sample output.
In [1]:
import sys
import os
import platform
import psutil

print(“=”*40, “Path Information”, “=”*40)
print(“Path = “, os.getcwd())

print(“=”*40, “System Information”, “=”*40)
uname = platform.uname()
print(f”System: {uname.system}”)
print(f”Node Name: {uname.node}”)
print(f”Release: {uname.release}”)
print(f”Version: {uname.version}”)
print(f”Machine: {uname.machine}”)
print(f”Processor: {uname.processor}”)

print(“=”*40, “Network Information”, “=”*40)

import socket
hostname = socket.gethostname()
IPAddr = socket.gethostbyname(hostname)
print(“Your Computer Name is:” + hostname)
print(“Your Computer IP Address is:” + IPAddr)

import uuid
print (“Your MAC address is:”, hex(uuid.getnode()))

======================================== Path Information ========================================
Path = /Users/boyuan/Desktop/IntroToDB/HW1C/GoTHW
======================================== System Information ========================================
System: Darwin
Node Name: Bos-MacBook-Pro.lan1
Release: 19.0.0
Version: Darwin Kernel Version 19.0.0: Thu Oct 17 16:17:15 PDT 2019; root:xnu-6153.41.3~29/RELEASE_X86_64
Machine: x86_64
Processor: i386
======================================== Network Information ========================================
Your Computer Name is:Bos-MacBook-Pro.lan1
Your Computer IP Address is:192.168.1.32
Your MAC address is: 0xacde48001122

No Programming¶
Overview¶
The no-programming track has four tasks:
1. Import CSV data files and create initial tables.


2. Create a new schema by modifying and improving the data schema and content.


3. Defining a set of views that support data visualizations.


4. Running Python cells that produce visualization using the provided code.

Task 1: Initial Data Import¶
Task¶
• The folder in the project Data/csv contains several CSV files:
▪ ‘character_relationships.csv
▪ characters.csv
▪ characters_actors.csv
▪ episodes.csv
▪ locations.csv
▪ scenes.csv
▪ scenes_characters.csv
• Use MySQL Workbench:
▪ Create a new schema W4111GoTSolution and set as the default schema.
▪ Use the MySQL Workbench Table Data Import Wizard and import each of the files. Choose the option to create a new table.

Tests¶
In [2]:
%load_ext sql
%sql mysql+pymysql://root:XHEsql2182199.@localhost/W4111GoTSolution
Out[2]:
‘Connected: root@W4111GoTSolution’

1. Find all the characters with last name Stark.
In [3]:
%sql select * from characters where characterName like(“% Stark”)

* mysql+pymysql://root:***@localhost/W4111GoTSolution
14 rows affected.
Out[3]:
id
characterName
characterLink
actorName
actorLink
character_id
royal
characterImageThumb
characterImageFull
nickname
kingsguard
15
Arya Stark
/character/ch0158604/
Maisie Williams
/name/nm3586035/
CH_15

https://images-na.ssl-images-amazon.com/images/M/MV5BMTk5MTYwNDc0OF5BMl5BanBnXkFtZTcwOTg2NDg1Nw@@._V1._SX100_SY140_.jpg
https://images-na.ssl-images-amazon.com/images/M/MV5BMTk5MTYwNDc0OF5BMl5BanBnXkFtZTcwOTg2NDg1Nw@@._V1_SY1000_CR0,0,665,1000_AL_.jpg

20
Benjen Stark
/character/ch0153996/
Joseph Mawle
/name/nm1152798/
CH_20

https://images-na.ssl-images-amazon.com/images/M/MV5BMjFiY2M5MjgtZmRiMS00ZTAyLTk2Y2UtMjdjZTRkZmVlYTMyXkEyXkFqcGdeQXVyMjk3NTUyOTc@._V1._SX100_SY140_.jpg
https://images-na.ssl-images-amazon.com/images/M/MV5BMjFiY2M5MjgtZmRiMS00ZTAyLTk2Y2UtMjdjZTRkZmVlYTMyXkEyXkFqcGdeQXVyMjk3NTUyOTc@._V1_.jpg

28
Brandon Stark

CH_28

29
Bran Stark
/character/ch0234897/
Isaac Hempstead Wright
/name/nm3652842/
CH_29
True
https://images-na.ssl-images-amazon.com/images/M/MV5BMTA1NTg0NTI3MTBeQTJeQWpwZ15BbWU3MDEyNjg4OTQ@._V1._SX100_SY140_.jpg
https://images-na.ssl-images-amazon.com/images/M/MV5BMTA1NTg0NTI3MTBeQTJeQWpwZ15BbWU3MDEyNjg4OTQ@._V1_SX1500_CR0,0,1500,999_AL_.jpg

36
Catelyn Stark
/character/ch0145135/
Michelle Fairley
/name/nm0265610/
CH_36

https://images-na.ssl-images-amazon.com/images/M/MV5BMTY2ODc4MTkxMV5BMl5BanBnXkFtZTcwNjM3MTg4OQ@@._V1._SX100_SY140_.jpg
https://images-na.ssl-images-amazon.com/images/M/MV5BMTY2ODc4MTkxMV5BMl5BanBnXkFtZTcwNjM3MTg4OQ@@._V1_SY1000_CR0,0,915,1000_AL_.jpg

59
Eddard Stark
/character/ch0154681/
Sean Bean
/name/nm0000293/
CH_59

https://images-na.ssl-images-amazon.com/images/M/MV5BODcwMjg0MDQ5MF5BMl5BanBnXkFtZTcwMTk2NDk4OQ@@._V1._SX100_SY140_.jpg
https://images-na.ssl-images-amazon.com/images/M/MV5BODcwMjg0MDQ5MF5BMl5BanBnXkFtZTcwMTk2NDk4OQ@@._V1_SY1000_CR0,0,827,1000_AL_.jpg
Ned

183
Lyanna Stark
/character/ch0543804/
Aisling Franciosi
/name/nm4957233/
CH_183

https://images-na.ssl-images-amazon.com/images/M/MV5BMTE5NTczY2QtNmQxYS00NTAwLWE2ZTAtYzc1MTkyZGU5MzRkL2ltYWdlL2ltYWdlXkEyXkFqcGdeQXVyMjk3NTUyOTc@._V1._SX100_SY140_.jpg
https://images-na.ssl-images-amazon.com/images/M/MV5BMTE5NTczY2QtNmQxYS00NTAwLWE2ZTAtYzc1MTkyZGU5MzRkL2ltYWdlL2ltYWdlXkEyXkFqcGdeQXVyMjk3NTUyOTc@._V1_.jpg

284
Rickard Stark

CH_284

285
Rickon Stark
/character/ch0233141/
Art Parkinson
/name/nm3280686/
CH_285

https://images-na.ssl-images-amazon.com/images/M/MV5BMWZiOGNjMDAtOTRlNi00MDJmLWEyMTMtOGEwZTM5ODJlNDAyXkEyXkFqcGdeQXVyMjk3NTUyOTc@._V1._SX100_SY140_.jpg
https://images-na.ssl-images-amazon.com/images/M/MV5BMWZiOGNjMDAtOTRlNi00MDJmLWEyMTMtOGEwZTM5ODJlNDAyXkEyXkFqcGdeQXVyMjk3NTUyOTc@._V1_.jpg

286
Robb Stark
/character/ch0158596/
Richard Madden
/name/nm0534635/
CH_286
True
https://images-na.ssl-images-amazon.com/images/M/MV5BMjI2NDE1NzczNF5BMl5BanBnXkFtZTcwNjcwODg4OQ@@._V1._SX100_SY140_.jpg
https://images-na.ssl-images-amazon.com/images/M/MV5BMjI2NDE1NzczNF5BMl5BanBnXkFtZTcwNjcwODg4OQ@@._V1_SY1000_CR0,0,845,1000_AL_.jpg

300
Sansa Stark
/character/ch0158137/
Sophie Turner
/name/nm3849842/
CH_300
True
https://images-na.ssl-images-amazon.com/images/M/MV5BNjAwMjE2NDExNF5BMl5BanBnXkFtZTcwODAwODg4OQ@@._V1._SX100_SY140_.jpg
https://images-na.ssl-images-amazon.com/images/M/MV5BNjAwMjE2NDExNF5BMl5BanBnXkFtZTcwODAwODg4OQ@@._V1_SY1000_CR0,0,806,1000_AL_.jpg

381
Young Benjen Stark
/character/ch0153996/
Matteo Elezi
/name/nm5502295/
CH_381

383
Young Lyanna Stark
/character/ch0543804/
Cordelia Hill
/name/nm8108764/
CH_383

386
Young Ned Stark
/character/ch0154681/
Sebastian Croft
/name/nm7509185/
CH_386

2. Find all of the information about scenes in season 1, episode 1.
In [4]:
%sql select * from scenes where episodeNum=1 and seasonNum=1

* mysql+pymysql://root:***@localhost/W4111GoTSolution
36 rows affected.
Out[4]:
id
seasonNum
episodeNum
sceneNo
sceneStart
sceneEnd
location
subLocation
greensight
altLocation
warg
flashback
0
1
1
1
0:00:40
0:01:45
The Wall
Castle Black

1
1
1
2
0:01:45
0:03:24
North of the Wall
The Haunted Forest

2
1
1
3
0:03:24
0:03:31
North of the Wall
The Haunted Forest

3
1
1
4
0:03:31
0:03:38
North of the Wall
The Haunted Forest

4
1
1
5
0:03:38
0:03:44
North of the Wall
The Haunted Forest

5
1
1
6
0:03:44
0:05:36
North of the Wall
The Haunted Forest

6
1
1
7
0:05:36
0:05:41
North of the Wall
The Haunted Forest

7
1
1
8
0:05:41
0:05:48
North of the Wall
The Haunted Forest

8
1
1
9
0:05:48
0:05:58
North of the Wall
The Haunted Forest

9
1
1
10
0:05:58
0:06:21
North of the Wall
The Haunted Forest

10
1
1
11
0:06:21
0:06:39
North of the Wall
The Haunted Forest

11
1
1
12
0:06:39
0:06:49
North of the Wall
The Haunted Forest

12
1
1
13
0:06:49
0:07:45
North of the Wall
The Haunted Forest

13
1
1
14
0:09:27
0:12:38
The North
Winterfell

14
1
1
15
0:12:38
0:15:41
The North
Outside Winterfell

15
1
1
16
0:15:41
0:18:44
The North
Outside Winterfell

16
1
1
17
0:18:44
0:20:45
The Crownlands
King’s Landing

17
1
1
18
0:20:45
0:22:43
The North
Winterfell

18
1
1
19
0:22:43
0:23:09
The North
Winterfell

19
1
1
20
0:23:09
0:23:39
The North
Winterfell

20
1
1
21
0:23:39
0:29:16
The North
Winterfell

21
1
1
22
0:29:16
0:30:47
The North
Winterfell

22
1
1
23
0:30:47
0:32:57
The North
Winterfell

23
1
1
24
0:32:57
0:33:45
The North
Winterfell

24
1
1
25
0:33:45
0:36:17
Pentos

25
1
1
26
0:36:17
0:38:23
Pentos

26
1
1
27
0:38:23
0:40:05
Pentos

27
1
1
28
0:40:05
0:40:56
The North
Winterfell

28
1
1
29
0:40:56
0:41:19
The North
Winterfell

29
1
1
30
0:41:19
0:44:14
The North
Winterfell

30
1
1
31
0:44:14
0:47:28
The North
Winterfell

31
1
1
32
0:47:28
0:50:29
The North
Winterfell

32
1
1
33
0:50:29
0:56:05
Pentos

33
1
1
34
0:56:05
0:57:48
Pentos

34
1
1
35
0:57:48
0:59:06
The North
Winterfell

35
1
1
36
0:59:06
1:00:57
The North
Winterfell

3. Find all of the characters and actors who played them.
In [5]:
%%sql
SELECT
character_id,
(select characterName from characters where characters.character_id=
characters_actors.character_id) as characterName,
actorName, season
FROM W4111GoTSolution.characters_actors;

* mysql+pymysql://root:***@localhost/W4111GoTSolution
408 rows affected.
Out[5]:
character_id
characterName
actorName
season
CH_0
Addam Marbrand
B.J. Hogg

CH_2
Aeron Greyjoy
Michael Feast

CH_3
Aerys II Targaryen
David Rintoul

CH_4
Akho
Chuku Modu

CH_5
Alliser Thorne
Owen Teale

CH_6
Alton Lannister
Karl Davies

CH_7
Alys Karstark
Megan Parkinson

CH_8
Amory Lorch
Fintan McKeown

CH_9
Anguy
Philip McGinley

CH_10
Archmaester Marwyn
Jim Broadbent

CH_11
Areo Hotah
Deobia Oparei

CH_12
Armeca
Sahara Knite

CH_13
Arthur
Nathanael Saleh

CH_14
Arthur Dayne
Luke Roberts

CH_15
Arya Stark
Maisie Williams

CH_16
Baby Sam
William Wilson
7.0
CH_16
Baby Sam
James Wilson
7.0
CH_17
Balon Greyjoy
Patrick Malahide

CH_18
Baratheon Guard
Phil Barnhill

CH_19
Barristan Selmy
Ian McElhinney

CH_20
Benjen Stark
Joseph Mawle

CH_21
Beric Dondarrion
Richard Dormer
3.0
CH_21
Beric Dondarrion
Richard Dormer
6.0
CH_21
Beric Dondarrion
Richard Dormer
7.0
CH_21
Beric Dondarrion
Richard Dormer
8.0
CH_21
Beric Dondarrion
David Michael Scott
1.0
CH_22
Bianca
Eline Powell

CH_23
Biter
Gerard Jordan

CH_24
Black Lorren
Forbes KB

CH_25
Black Walder Rivers
Tim Plester

CH_26
Bobono
Leigh Gill

CH_27
Bowen Marsh
Michael Condron

CH_29
Bran Stark
Isaac Hempstead Wright

CH_30
Brea
Gemita Samarra

CH_31
Brienne of Tarth
Gwendoline Christie

CH_32
Bronn
Jerome Flynn

CH_33
Brother Ray
Ian McShane

CH_34
Brynden Tully
Clive Russell

CH_35
Captain of the Archers
Michael Grennell

CH_36
Catelyn Stark
Michelle Fairley

CH_37
Catspaw Assassin
Lalor Roddy

CH_38
Cersei Lannister
Lena Headey

CH_39
Child of the Forest
Alice Hewkin

CH_40
Citadel Maester
Frank Hvam

CH_41
Colen of Greenpools
Darren Killeen

CH_42
Craster
Robert Pugh

CH_43
Daario Naharis
Ed Skrein
3.0
CH_43
Daario Naharis
Michiel Huisman
4.0
CH_43
Daario Naharis
Michiel Huisman
5.0
CH_43
Daario Naharis
Michiel Huisman
6.0
CH_44
Daenerys Targaryen
Emilia Clarke

CH_45
Dagmer Cleftjaw
Ralph Ineson

CH_46
Daisy
Maisie Dee

CH_47
Davos Seaworth
Liam Cunningham

CH_48
Dickon Tarly
Freddie Stroma
6.0
CH_48
Dickon Tarly
Tom Hopper
7.0
CH_49
Dongo the Giant
Ian Whyte

CH_50
Donnel Waynwood
Alisdair Simpson

CH_51
Dontos Hollard
Tony Way

CH_52
Doran Martell
Alexander Siddig

CH_53
Doreah
Roxanne McKee

CH_54
Dothraki Bloodrider #1
Diogo Sales

CH_55
Dothraki Bloodrider #2
Junade Khan

CH_56
Drennan
David Coakley

CH_58
Drowned Priest
Jonathan Ryan

CH_59
Eddard Stark
Sean Bean

CH_60
Eddison Tollett
Ben Crompton

CH_61
Edmure Tully
Tobias Menzies

CH_62
Ellaria Sand
Indira Varma

CH_64
Eon Hunter
Barrington Cullen

CH_65
Euron Greyjoy
Pilou Asbæk

CH_66
Faceless Man
Cedric Henderson

CH_67
Farlen
Peter Ballance

CH_68
Gared
Dermot Keaney

CH_69
Gatins
Ricky Champ

CH_70
Gendry
Joe Dempsie

CH_71
Gerold Hightower
Eddie Eyre

CH_73
Gilly
Hannah Murray

CH_74
Goldcloak
Alan Paris

CH_75
Goldcloak
Sam Callis

CH_76
Goldcloak #1
Nikovich Sammut

CH_77
Goldcloak #2
Seamus Kelly

CH_78
Grand Maester Pycelle
Julian Glover

CH_79
Great Master #1
Emilio Doorgasingh
4.0
CH_79
Great Master #1
Gianpiero Cognoli
5.0
CH_80
Great Master #2
Derek Horsham
4.0
CH_80
Great Master #2
Bobby Asghar
5.0
CH_81
Great Master #3
Peter Silverleaf

CH_82
Great Master #4
Davor Jozinovic

CH_83
Great Master #5
Ivan Peric

CH_84
Great Master #6
Hadi Kermani

CH_85
Great Master #7
Tonci Banov

CH_86
Greatjon Umber
Clive Mantle

CH_87
Gregor Clegane
Ian Whyte
2.0
CH_87
Gregor Clegane
Conan Stevens
1.0
CH_87
Gregor Clegane
Hafþór Júlíus Björnsson
4.0
CH_87
Gregor Clegane
Hafþór Júlíus Björnsson
5.0
CH_87
Gregor Clegane
Hafþór Júlíus Björnsson
6.0
CH_87
Gregor Clegane
Hafþór Júlíus Björnsson
7.0
CH_87
Gregor Clegane
Hafþór Júlíus Björnsson
8.0
CH_88
Greizhen mo Ullhor
Cliff Barry

CH_89
Grenn
Mark Stanley

CH_91
Grey Worm
Jacob Anderson

CH_92
Hallyne
Roy Dotrice

CH_93
Handmaid
Sabrina Bartlett

CH_94
Handmaid
Sara Dylan

CH_95
Harald Karstark
Paul Rattray

CH_96
Harry Strickland
Marc Rissmann

CH_97
High Priestess
Souad Faress

CH_98
High Septon
David Verrey

CH_99
High Septon
Paul Bentley

CH_100
High Sparrow
Jonathan Pryce

CH_101
Hizdahr zo Loraq
Joel Fry

CH_102
Hodor
Kristian Nairn

CH_103
Hoster Tully
Christopher Newman

CH_104
Hot Pie
Ben Hawkey

CH_105
Howland Reed
Leo Woodruff

CH_106
Hugh of the Vale
Jefferson Hall

CH_107
Illyrio Mopatis
Roger Allam

CH_108
Ilyn Payne
Wilko Johnson

CH_109
Irri
Amrita Acharia

CH_110
Izembaro
Richard E. Grant

CH_111
Jacks
Paul Caddell

CH_113
Jaime Lannister
Nikolaj Coster-Waldau

CH_114
Janos Slynt
Dominic Carter

CH_115
Jaqen H’ghar
Tom Wlaschiha

CH_116
Jaqen’s Disguise
Patrick O’Kane

CH_117
Jaremy Rykker
Jeffrey O’Brien

CH_118
Jeor Mormont
James Cosmo

CH_119
Jhiqui
Sarita Piotrowski

CH_120
Joffrey Baratheon
Jack Gleeson

CH_121
Jojen Reed
Thomas Brodie-Sangster

CH_122
Jon Arryn
John Standing

CH_123
Jon Snow
Kit Harington

CH_124
Jonos Bracken
Gerry O’Brien

CH_125
Jorah Mormont
Iain Glen

CH_126
Jory Cassel
Jamie Sives

CH_127
Joyeuse Erenford
Kelly Long

CH_128
Karl Tanner
Burn Gorman

CH_129
Karsi
Birgitte Hjort Sørensen

CH_130
Kayla
Pixie Le Knot

CH_131
Kegs
Tim Landers

CH_132
Kevan Lannister
Ian Gelder

CH_133
Khal Drogo
Jason Momoa

CH_134
Khal Moro
Joseph Naufahu

CH_135
Khal Rhalko
Andrei Claude

CH_136
King Joffrey Baratheon Dwarf
Raymond Griffiths

CH_137
King Renly Baratheon Dwarf
Dean Whatton

CH_138
King Robb Stark Dwarf
George Appleby

CH_139
King Stannis Baratheon Dwarf
Maxwell Laird

CH_140
King’s Landing Rioter #1
Reg Wayment

CH_141
King’s Landing Rioter #2
Marko Juraga

CH_142
King’s Landing Rioter #3
Rea Separovic

CH_143
Kinvara
Ania Bukstein

CH_144
Kitty Frey
Lucy Hayes

CH_145
Knight of House Frey
Patrick Ryan

CH_146
Knight of House Lynderly
Paddy Rocks

CH_147
Kovarro
Steven Cole

CH_148
Kraznys mo Nakloz
Dan Hildebrand

CH_149
Kurleket
Kevin Keenan

CH_151
Lady Crane
Essie Davis

CH_152
Lancel Lannister
Eugene Simon

CH_153
Lannister Captain
Aiden Condron

CH_154
Lannister Captain
Richard Laing

CH_155
Lannister Guardsman
Conor Delaney

CH_156
Lannister Messenger
Simon Stewart

CH_157
Lannister Scout
Chris Gallagher

CH_158
Lead Dornish Guard
Christian Vit

CH_159
Leaf
Kae Alexander
6.0
CH_159
Leaf
Octavia Selena Alexandru
4.0
CH_160
Lem Lemoncloak
Jóhannes Haukur Jóhannesson

CH_161
Leo Lefford
Vinnie McCabe

CH_162
Lhara
Sarine Sofair

CH_163
Little Bird
Samuel Paul Small

CH_164
Little Bird
Tristan Mercieca

CH_165
Little Bird #3
Michael Nevin

CH_166
Little Bird #4
Jesse Magee

CH_167
Little Bird #5
Iona Clarke

CH_168
Little Bird #6
Lucy Gallagher

CH_169
Little Bird #7
Fionn Kernan

CH_170
Loboda
Zachary Baharov

CH_171
Locke
Noah Taylor

CH_172
Lollys Stokeworth
Elizabeth Cadwallader

CH_173
Lommy Greenhands
Eros Vlahos

CH_174
Loras Tyrell
Finn Jones

CH_175
Lord Galbart Glover
Mark Coney

CH_176
Lord Portan
Gordon Fulton

CH_177
Lord Varys
Conleth Hill

CH_178
Lordsport Dockhand
Jer O’Leary

CH_179
Lothar Frey
Daniel Tuite
6.0
CH_179
Lothar Frey
Tom Brooke
3.0
CH_180
Loyal Night’s Watchman #1
Steve Cash

CH_181
Loyal Night’s Watchman #2
Robert Fawsitt

CH_182
Lyanna Mormont
Bella Ramsey

CH_183
Lyanna Stark
Aisling Franciosi

CH_184
Lysa Arryn
Kate Dickie

CH_185
Mace Tyrell
Roger Ashton-Griffiths

CH_186
Maester Aemon
Peter Vaughan

CH_187
Maester Cressen
Oliver Ford Davies

CH_188
Maester Luwin
Donald Sumpter

CH_189
Maester Wolkan
Richard Rycroft

CH_190
Mag the Mighty
Neil Fingleton

CH_191
Maggy
Jodhi May

CH_192
Mago
Ivailo Dimitrov

CH_193
Male Prostitute
Dennis Stokes

CH_194
Malko
Adewale Akinnuoye-Agbaje

CH_195
Mance Rayder
Ciarán Hinds

CH_196
Mandon Moore
James Doran

CH_197
Manservant
Conor Watters

CH_198
Marei
Josephine Gillan

CH_199
Margaery Tyrell
Natalie Dormer

CH_200
Margaery’s Handmaiden
Aisling Jarrett-Gavin

CH_201
Marillion
Emun Elliott

CH_202
Martyn Lannister
Dean-Charles Chapman

CH_203
Masha Heddle
Susie Kelly

CH_204
Matthos Seaworth
Kerr Logan

CH_205
Meera Reed
Ellie Kendrick

CH_206
Meereen Slave Master
Laurence Doherty

CH_207
Melessa Tarly
Samantha Spiro

CH_208
Melisandre
Carice van Houten

CH_209
Merchant Captain
Gary Pillai

CH_210
Mero
Mark Killeen

CH_211
Meryn Trant
Ian Beattie

CH_212
Mhaegen
Antonia Christophers

CH_213
Mikken
Boyd Rankin

CH_214
Mirelle
Elisa Lasowski

CH_215
Mirri Maz Duur
Mia Soteriou

CH_216
Missandei
Nathalie Emmanuel

CH_217
Mord
Ciaran Bermingham

CH_218
Mossador
Reece Noi

CH_219
Musician #1
Nanna Bryndís Hilmarsdóttir

CH_220
Musician #1
Jon Thor Birgisson

CH_221
Musician #2
Ragnar Þórhallsson

CH_222
Musician #2
Orri P. Dyrason

CH_223
Musician #3
Georg Holm

CH_224
Musician #3
Arnar Rosenkranz Hilmarsson

CH_225
Musician #4
Brynjar Leifsson

CH_226
Musician #5
Kristján Kristjánsson

CH_227
Mycah
Rhodri Hosking

CH_228
Myranda
Charlotte Hope

CH_229
Myrcella Baratheon
Aimee Richardson
1.0
CH_229
Myrcella Baratheon
Aimee Richardson
2.0
CH_229
Myrcella Baratheon
Nell Tiger Free
5.0
CH_229
Myrcella Baratheon
Nell Tiger Free
6.0
CH_230
Ned Umber
Harry Grasby

CH_231
Night’s Watch Deserter
Barry John O’Connor

CH_232
Night’s Watch Messenger
Frank O’Sullivan

CH_233
Night’s Watch Officer
Dennis McKeever

CH_234
Night’s Watch Officer
Michael Stuart

CH_235
Night’s Watchman
Marcus Lamb

CH_236
Night’s Watchman
Michael Hooley

CH_237
Night’s Watchman
Patrick J Molloy

CH_238
Night’s Watchman #1
Michael Hayes

CH_239
Night’s Watchman #2
Ruairí Heading

CH_240
Night’s Watchman #2
Tony Flynn

CH_241
Northman Rioter
Jed Murray

CH_243
Nymeria Sand
Jessica Henwick

CH_244
Obara Sand
Keisha Castle-Hughes

CH_245
Oberyn Martell
Pedro Pascal

CH_246
Old Nan
Margaret John

CH_247
Old Woman Prisoner
Gina Moxley

CH_248
Olenna Tyrell
Diana Rigg

CH_249
Olly
Brenock O’Connor

CH_250
Olyvar
Will Tudor

CH_251
Orell
Mackenzie Crook

CH_252
Ornela
Hannah John-Kamen

CH_253
Osha
Natalia Tena

CH_254
Othell Yarwyck
Brian Fortune

CH_256
Pentoshi Servant
Rania Zouari

CH_257
Petyr Baelish
Aidan Gillen

CH_258
Podrick Payne
Daniel Portman

CH_259
Polliver
Andy Kellegher

CH_260
Prendahl na Ghezn
Ramon Tikaram

CH_261
Pyat Pree
Ian Hanmore

CH_262
Pypar
Josef Altin

CH_263
Qhono
Staz Nair

CH_264
Qhorin Halfhand
Simon Armstrong

CH_265
Qotho
Dar Salim

CH_266
Quaithe
Laura Pradelska

CH_267
Qyburn
Anton Lesser

CH_268
Rakharo
Elyes Gabel

CH_269
Ramsay Snow
Iwan Rheon

CH_270
Randyll Tarly
James Faulkner

CH_271
Rast
Luke Barnes

CH_272
Rattleshirt
Edward Dogliani
2.0
CH_272
Rattleshirt
Edward Dogliani
3.0
CH_272
Rattleshirt
Ross O’Hennessy
5.0
CH_273
Razdal mo Eraz
George Georgiou

CH_274
Red Keep Stableboy
Hugo Culverhouse

CH_275
Red Priestess
Melanie Liburd

CH_276
Red Priestess
Rila Fukushima

CH_277
Renly Baratheon
Gethin Anthony

CH_283
Rickard Karstark
John Stahl
2.0
CH_283
Rickard Karstark
John Stahl
3.0
CH_283
Rickard Karstark
Steve Blount
1.0
CH_285
Rickon Stark
Art Parkinson

CH_286
Robb Stark
Richard Madden

CH_287
Robert Baratheon
Mark Addy

CH_288
Robett Glover
Tim McInnerny

CH_289
Robin Arryn
Lino Facioli

CH_290
Rodrik Cassel
Ron Donachie

CH_291
Roose Bolton
Michael McElhatton

CH_292
Rorge
Andy Beckwith

CH_293
Ros
Esmé Bianco

CH_294
Roslin Frey
Alexandra Dowling

CH_295
Royal Steward
Robert Sterne

CH_296
Ryger Rivers
Bryan McCaugherty

CH_297
Salladhor Saan
Lucian Msamati

CH_298
Samwell Tarly
John Bradley

CH_299
Sandor Clegane
Rory McCann

CH_300
Sansa Stark
Sophie Turner

CH_301
Selyse Baratheon
Tara Fitzgerald
3.0
CH_301
Selyse Baratheon
Tara Fitzgerald
4.0
CH_301
Selyse Baratheon
Tara Fitzgerald
5.0
CH_301
Selyse Baratheon
Sarah MacKeever
2.0
CH_302
Septa Moelle
Maggie Hayes

CH_303
Septa Mordane
Susan Brown

CH_304
Septa Scolera
Mary Jordan

CH_305
Septa Unella
Hannah Waddingham

CH_306
Septon
Kenneth Hadley

CH_307
Septon
Terry Byrne

CH_308
Ser Endrew Tarth
Dez McMahon

CH_309
Shae
Sibel Kekilli

CH_310
Shagga
Mark Lewis Jones

CH_312
Shireen Baratheon
Kerry Ingram

CH_313
Silk King
Slavko Juraga

CH_314
Smalljon Umber
Dean S. Jagger

CH_315
Spice King
Nicholas Blane

CH_316
Stannis Baratheon
Stephen Dillane

CH_317
Stark Bannerman
Michael Stuart

CH_318
Stark Guard
Faolan Morgan

CH_319
Stark Guard
Rick Burn

CH_320
Stark Guard
David Sheehan

CH_321
Stark Guard
Paul Monahan

CH_322
Steelshanks Walton
Jamie Michie

CH_323
Stevron Frey
Colin Carnegie

CH_324
Stiv
Stephen Don

CH_325
Styr
Yuri Kolokolnikov

CH_327
Syrio Forel
Miltos Yerolemou

CH_328
Talisa Maegyr
Oona Chaplin

CH_329
Talla Tarly
Rebecca Benson

CH_330
Ternesio Terys
Gary Oliver

CH_331
The Bear
Bart the Bear

CH_332
The Night King
Richard Brake
4.0
CH_332
The Night King
Richard Brake
5.0
CH_332
The Night King
Vladimir ‘Furdo’ Furdik
6.0
CH_332
The Night King
Vladimir ‘Furdo’ Furdik
7.0
CH_332
The Night King
Vladimir ‘Furdo’ Furdik
8.0
CH_333
The Tickler
Anthony Morris

CH_334
The Waif
Faye Marsay

CH_335
Thenn Warg
Joseph Gatt

CH_336
Theon Greyjoy
Alfie Allen

CH_337
Thin Man
Oengus MacNamara

CH_338
Thoros of Myr
Paul Kaye

CH_339
Three-Eyed Raven
Struan Rodger
4.0
CH_339
Three-Eyed Raven
Max von Sydow
6.0
CH_340
Timett
Tobias Winter

CH_341
Tobho Mott
Andrew Wilde

CH_342
Todder
Will O’Connell

CH_343
Tomard
Phil Dixon

CH_344
Tommen Baratheon
Callum Wharry
1.0
CH_344
Tommen Baratheon
Callum Wharry
2.0
CH_344
Tommen Baratheon
Dean-Charles Chapman
4.0
CH_344
Tommen Baratheon
Dean-Charles Chapman
5.0
CH_344
Tommen Baratheon
Dean-Charles Chapman
6.0
CH_345
Tormund Giantsbane
Kristofer Hivju

CH_346
Trystane Martell
Toby Sebastian

CH_347
Tycho Nestoris
Mark Gatiss

CH_348
Tyene Sand
Rosabell Laurenti Sellers

CH_349
Tyrion Lannister
Peter Dinklage

CH_350
Tywin Lannister
Charles Dance

CH_351
Vala
Meena Rayann

CH_352
Vardis Egen
Brendan McCormack

CH_353
Varly
Graham Charles

CH_355
Viserys Targaryen
Harry Lloyd

CH_356
Walda Bolton
Elizabeth Webster

CH_357
Walder Frey
David Bradley

CH_358
Wallen
Barry O’Connor

CH_359
Warlock
Lottie Steer

CH_360
Waymar Royce
Rob Ostlere

CH_361
Wedding Band
Sigur Rós

CH_362
White Walker
Ian Whyte

CH_363
White Walker
Ross Mullan

CH_364
White Walker
Tim Loane

CH_365
White Walker #2
Spencer Wilding

CH_366
Wight Wildling Girl
Claire Wright

CH_367
Wildling Rioter
Michael Patrick

CH_368
Will
Bronson Webb

CH_369
Willem Lannister
Timothy Gibbons

CH_370
Willis Wode
Ryan McKenna

CH_371
Wine Merchant
Simon Lowe

CH_372
Winterfell Shepherd
Donagh Deeney

CH_373
Wun Wun
Ian Whyte

CH_374
Wyllis
Sam Coleman

CH_375
Xaro Xhoan Daxos
Nonso Anozie

CH_376
Yara Greyjoy
Gemma Whelan

CH_377
Yezzan zo Qaggaz
Enzo Cilenti

CH_378
Ygritte
Rose Leslie

CH_379
Yohn Royce
Rupert Vansittart

CH_380
Yoren
Francis Magee

CH_381
Young Benjen Stark
Matteo Elezi

CH_382
Young Cersei Lannister
Nell Williams

CH_383
Young Lyanna Stark
Cordelia Hill

CH_384
Young Nan
Annette Tierney

CH_385
Young Ned
Robert Aramayo

CH_386
Young Ned Stark
Sebastian Croft

CH_387
Young Rodrik Cassel
Fergus Leathem

CH_388
Zanrush
Gerald Lepkowski

• Hmm. The season seems to be empty for some of the actors and some characters seem to be missing. Why?
• Let’s look at the data to see which characters_actors the season is NULL.
In [33]:
%sql select * from characters_actors where season is NULL

* mysql+pymysql://dbuser:***@localhost/W4111GoTSolution
0 rows affected.
Out[33]:
id
character_id
actorName
actorLink
season

• That is not what I expected.
• This is an indication of why some cleanup is necessary.

Data Cleanup¶
Task¶
• CSV files are text files, and all data is character strings.
• Spreadsheet applications, like Excel, and the data import tools in databases can infer column types in some cases. For example, consider the table definition created for importing scenes_characters.
CREATE TABLE `scenes_characters` (
`id` int(11) DEFAULT NULL,
`seasonNum` int(11) DEFAULT NULL,
`episodeNum` int(11) DEFAULT NULL,
`sceneNo` int(11) DEFAULT NULL,
`characterName` text
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
• The import process inferred that some columns in the CSV file are text representations of integers, and performed conversion.
• The process is unreliable, and does not always work.
• For this task, you must modify column definitions and data valaues to represent the correct the meaning of the data. We will cover keys in the next task.
• Execute your DDL and DML SQL in cells below. You can add Markdown cells to explain what you are doing.
• My approach is to create a new schema with corrected table definitions and cleaned copies of the data. This preserves the original data for data lineage and avoids possible information loss.
• Call your new schema W4111GoTSolutionClean.
• To help you, I provide an example for one table $-$ groups
• NOTE: You can use MySQL Workbench or other tools to help you write the DDL and DML statements, but you must execute the statements in the cells below.

• Create the schema.
In [6]:
%sql CREATE SCHEMA `W4111GoTSolutionClean` ;

* mysql+pymysql://root:***@localhost/W4111GoTSolution
(pymysql.err.ProgrammingError) (1007, “Can’t create database ‘W4111GoTSolutionClean’; database exists”)
[SQL: CREATE SCHEMA `W4111GoTSolutionClean` ;]
(Background on this error at: http://sqlalche.me/e/f405)

• What does groups look like?
In [7]:
%sql describe W4111GotSolution.groups

* mysql+pymysql://root:***@localhost/W4111GoTSolution
3 rows affected.
Out[7]:
Field
Type
Null
Key
Default
Extra
id
int
YES

None

groupName
text
YES

None

character
text
YES

None

In [8]:
%sql select min(length(groupName)) as min_group_name_len, max(length(groupName)) as max_group_name \
from W4111GoTSolution.groups;

* mysql+pymysql://root:***@localhost/W4111GoTSolution
1 rows affected.
Out[8]:
min_group_name_len
max_group_name
4
13
In [21]:
%sql select min(length(`character`)) as min_character_name_len, max(length(`character`)) as max_character_name_len \
from W4111GoTSolution.groups;

* mysql+pymysql://root:***@localhost/W4111GoTSolution
1 rows affected.
Out[21]:
min_character_name_len
max_character_name_len
0
21

• Create a table with clean data.
In [10]:
%%sql
create table W4111GotSolutionClean.groups (
id int,
groupName varchar(32),
`characterName` varchar(32))

* mysql+pymysql://root:***@localhost/W4111GoTSolution
0 rows affected.
Out[10]:
[]
In [11]:
%%sql
insert into W4111GotSolutionClean.groups
select id, groupName, `character` from W4111GotSolution.groups;

* mysql+pymysql://root:***@localhost/W4111GoTSolution
168 rows affected.
Out[11]:
[]
In [12]:
%sql select count(*) from W4111GotSolution.groups

* mysql+pymysql://root:***@localhost/W4111GoTSolution
1 rows affected.
Out[12]:
count(*)
168

• We successfully copied all of the rows and set reasonable data types.
• But,
In [13]:
%sql select count(*) from W4111GoTSolutionClean.groups where characterName=”

* mysql+pymysql://root:***@localhost/W4111GoTSolution
1 rows affected.
Out[13]:
count(*)
14

• ” is a common way to represent “unknown” or “not applicable” in text data.Codd’s Rule no. 3 defines the systmatic treatment of unknown or not applicable data.
In [14]:
%sql update W4111GoTSolutionClean.groups set characterName=NULL where characterName=”

* mysql+pymysql://root:***@localhost/W4111GoTSolution
14 rows affected.
Out[14]:
[]

Answers¶

characters
In [45]:
%%sql
create table W4111GotSolutionClean.characters(
id int,
characterName varchar(32),
characterLink varchar(64),
actorName varchar(32),
actorLink varchar(32),
character_id varchar(32),
characterImageThumb varchar(200),
characterImageFull varchar(300),
nickname varchar(32),
kingsguard varchar(32));

insert into W4111GotSolutionClean.characters
select id, characterName, characterLink, actorName, actorLink, character_id, characterImageThumb, characterImageFull, nickname, kingsguard
from W4111GotSolution.characters;

update W4111GoTSolutionClean.characters set characterName=NULL where characterName=”;
update W4111GoTSolutionClean.characters set characterLink=NULL where characterLink=”;
update W4111GoTSolutionClean.characters set actorName=NULL where actorName=”;
update W4111GoTSolutionClean.characters set actorLink=NULL where actorLink=”;
update W4111GoTSolutionClean.characters set character_id=NULL where character_id=”;
update W4111GoTSolutionClean.characters set characterImageThumb=NULL where characterImageThumb=”;
update W4111GoTSolutionClean.characters set characterImageFull=NULL where characterImageFull=”;
update W4111GoTSolutionClean.characters set nickname=NULL where nickname=”;
update W4111GoTSolutionClean.characters set kingsguard=NULL where kingsguard=”;

* mysql+pymysql://root:***@localhost/W4111GoTSolution
0 rows affected.
389 rows affected.
1 rows affected.
0 rows affected.
21 rows affected.
35 rows affected.
36 rows affected.
0 rows affected.
134 rows affected.
134 rows affected.
380 rows affected.
384 rows affected.
Out[45]:
[]

character_relationships
• There is some serialized JSON in the value column for actor.
• Also, actor is redundant and we will need to remove later. It is here because we have to do some additional processing code to better align character relationships with scene information. We know character-actor mapping but until we factor in scenes, we are not sure which actor if there was more than one.
In [55]:
%%sql
create table W4111GotSolutionClean.character_relationships(
id int,
character_id varchar(16),
characterName varchar(32),
label varchar(32),
value varchar(128))

insert into W4111GotSolutionClean.character_relationships
select id, character_id, characterName, label, value
from W4111GotSolution.character_relationships;

update W4111GoTSolutionClean.character_relationships set character_id=NULL where character_id=”;
update W4111GoTSolutionClean.character_relationships set characterName=NULL where characterName=”;
update W4111GoTSolutionClean.character_relationships set label=NULL where label=”;
update W4111GoTSolutionClean.character_relationships set value=NULL where value=”;

* mysql+pymysql://root:***@localhost/W4111GoTSolution
890 rows affected.
1 rows affected.
0 rows affected.
0 rows affected.
0 rows affected.
0 rows affected.
Out[55]:
[]

characters_actors
In [64]:
%%sql
create table W4111GoTSolutionClean.characters_actors (
id int,
character_id varchar(16),
actorName varchar(32),
actorLink varchar(32),
season varchar(16));

* mysql+pymysql://root:***@localhost/W4111GoTSolution
0 rows affected.
Out[64]:
[]
In [65]:
%%sql
insert into W4111GoTSolutionClean.characters_actors
select id, character_id, actorName, actorLink, season
from W4111GoTSolution.characters_actors;

update W4111GoTSolutionClean.characters_actors set character_id=NULL where character_id=”;
update W4111GoTSolutionClean.characters_actors set actorName=NULL where actorName=”;
update W4111GoTSolutionClean.characters_actors set actorLink=NULL where actorLink=”;
update W4111GoTSolutionClean.characters_actors set season=NULL where season=”;

* mysql+pymysql://root:***@localhost/W4111GoTSolution
408 rows affected.
0 rows affected.
0 rows affected.
1 rows affected.
354 rows affected.
Out[65]:
[]

episodes
In [66]:
%sql describe W4111GoTSolution.episodes;

* mysql+pymysql://root:***@localhost/W4111GoTSolution
7 rows affected.
Out[66]:
Field
Type
Null
Key
Default
Extra
id
int
YES

None

seasonNum
int
YES

None

episodeNum
int
YES

None

episodeTitle
text
YES

None

episodeLink
text
YES

None

episodeAirDate
text
YES

None

episodeDescription
text
YES

None

In [70]:
%sql select min(length(episodeDescription)) as min_group_name_len, max(length(episodeDescription)) as max_group_name \
from W4111GoTSolution.episodes;

* mysql+pymysql://root:***@localhost/W4111GoTSolution
1 rows affected.
Out[70]:
min_group_name_len
max_group_name
33
238
In [71]:
%%sql
create table W4111GoTSolutionClean.episodes(
id int,
seasonNum int,
episodeNum int,
episodeTitle varchar(64),
episodeLink varchar(32),
episodeAirDate varchar(32),
episodeDescription varchar(256)
);

* mysql+pymysql://root:***@localhost/W4111GoTSolution
0 rows affected.
Out[71]:
[]
In [72]:
%%sql
insert into W4111GoTSolutionClean.episodes
select id, seasonNum, episodeNum, episodeTitle, episodeLink, episodeAirDate, episodeDescription
from W4111GoTSolution.episodes

* mysql+pymysql://root:***@localhost/W4111GoTSolution
73 rows affected.
Out[72]:
[]
In [73]:
%%sql
update W4111GoTSolutionClean.episodes set episodeTitle=NULL where episodeTitle=”;
update W4111GoTSolutionClean.episodes set episodeLink=NULL where episodeLink=”;
update W4111GoTSolutionClean.episodes set episodeAirDate=NULL where episodeAirDate=”;
update W4111GoTSolutionClean.episodes set episodeDescription=NULL where episodeDescription=”;

* mysql+pymysql://root:***@localhost/W4111GoTSolution
0 rows affected.
0 rows affected.
0 rows affected.
0 rows affected.
Out[73]:
[]

locations
In [74]:
%sql describe W4111GoTSolution.locations

* mysql+pymysql://root:***@localhost/W4111GoTSolution
3 rows affected.
Out[74]:
Field
Type
Null
Key
Default
Extra
id
int
YES

None

location
text
YES

None

subLocation
text
YES

None

In [77]:
%sql select min(length(subLocation)) as min_loc, max(length(subLocation)) as max_loc \
from W4111GoTSolution.locations;

* mysql+pymysql://root:***@localhost/W4111GoTSolution
1 rows affected.
Out[77]:
min_loc
max_loc
0
37
In [78]:
%%sql
create table W4111GoTSolutionClean.locations(
id int,
location varchar(32),
subLocation varchar(64)
);

* mysql+pymysql://root:***@localhost/W4111GoTSolution
0 rows affected.
Out[78]:
[]
In [79]:
%%sql
insert into W4111GoTSolutionClean.locations
select id, location, subLocation
from W4111GoTSolution.locations;

* mysql+pymysql://root:***@localhost/W4111GoTSolution
129 rows affected.
Out[79]:
[]
In [80]:
%%sql
update W4111GoTSolutionClean.locations set location=NULL where location=”;
update W4111GoTSolutionClean.locations set subLocation=NULL where subLocation=”;

* mysql+pymysql://root:***@localhost/W4111GoTSolution
0 rows affected.
26 rows affected.
Out[80]:
[]

scenes
In [81]:
%sql describe W4111GoTSolution.scenes

* mysql+pymysql://root:***@localhost/W4111GoTSolution
12 rows affected.
Out[81]:
Field
Type
Null
Key
Default
Extra
id
int
YES

None

seasonNum
int
YES

None

episodeNum
int
YES

None

sceneNo
int
YES

None

sceneStart
text
YES

None

sceneEnd
text
YES

None

location
text
YES

None

subLocation
text
YES

None

greensight
text
YES

None

altLocation
text
YES

None

warg
text
YES

None

flashback
text
YES

None

In [87]:
%sql select min(length(warg)) as min_data, max(length(warg)) as max_data \
from W4111GoTSolution.scenes

* mysql+pymysql://root:***@localhost/W4111GoTSolution
1 rows affected.
Out[87]:
min_data
max_data
0
4
In [88]:
%%sql
create table W4111GoTSolutionClean.scenes(
id int,
seasonNum int,
episodeNum int,
sceneNo int,
SceneStart varchar(16),
SceneEnd varchar(16),
location varchar(32),
subLocation varchar(64),
greensight varchar(8),
altLocation varchar(32),
warg varchar(8)
);

* mysql+pymysql://root:***@localhost/W4111GoTSolution
0 rows affected.
Out[88]:
[]
In [89]:
%%sql
insert into W4111GoTSolutionClean.scenes
select id, seasonNum, episodeNum, sceneNo, SceneStart, SceneEnd, location, subLocation, greensight, altLocation, warg
from W4111GoTSolution.scenes;

* mysql+pymysql://root:***@localhost/W4111GoTSolution
4165 rows affected.
Out[89]:
[]
In [91]:
%%sql
update W4111GoTSolutionClean.scenes set SceneStart=NULL where SceneStart=”;
update W4111GoTSolutionClean.scenes set SceneEnd=NULL where SceneEnd=”;
update W4111GoTSolutionClean.scenes set location=NULL where location=”;
update W4111GoTSolutionClean.scenes set subLocation=NULL where subLocation=”;
update W4111GoTSolutionClean.scenes set greensight=NULL where greensight=”;
update W4111GoTSolutionClean.scenes set altLocation=NULL where altLocation=”;
update W4111GoTSolutionClean.scenes set warg=NULL where warg=”;

* mysql+pymysql://root:***@localhost/W4111GoTSolution
0 rows affected.
0 rows affected.
0 rows affected.
495 rows affected.
4149 rows affected.
4130 rows affected.
4158 rows affected.
Out[91]:
[]

scenes_characters
In [92]:
%sql describe W4111GoTSolution.scenes_characters

* mysql+pymysql://root:***@localhost/W4111GoTSolution
5 rows affected.
Out[92]:
Field
Type
Null
Key
Default
Extra
id
int
YES

None

seasonNum
int
YES

None

episodeNum
int
YES

None

sceneNo
int
YES

None

characterName
text
YES

None

In [96]:
%%sql
create table W4111GoTSolutionClean.scenes_characters(
id int,
seasonNum int,
episodeNum int,
sceneNo int,
characterName varchar(64)
);

* mysql+pymysql://root:***@localhost/W4111GoTSolution
0 rows affected.
Out[96]:
[]
In [97]:
%%sql
insert into W4111GoTSolutionClean.scenes_characters
select id, seasonNum, episodeNum, sceneNo, characterName
from W4111GoTSolution.scenes_characters;

* mysql+pymysql://root:***@localhost/W4111GoTSolution
12114 rows affected.
Out[97]:
[]
In [98]:
%%sql
update W4111GoTSolutionClean.scenes_characters set characterName=NULL where characterName=”;

* mysql+pymysql://root:***@localhost/W4111GoTSolution
0 rows affected.
Out[98]:
[]
In [ ]:

Keys, Constraints and Integrity¶

Task¶
• Correct analysis and management of the data requires defining keys and integrity constraints.
• Use DML statements to modify the schema to create the constraints you think best.
• The following diagram is my definition, but this is an example and may not be correct. It is simply to help you get started.

GoT ER Model

Answer¶
characters, characters_actors
In [ ]:

characters_relationships
In [ ]:

groups
In [ ]:

locations
In [ ]:

scenes
In [ ]:

scenes_characters
In [ ]:

View¶
Create the following views.

Character and Locations¶
• Characters appear in scenes at locations.
• Create a view of the form:
▪ location
▪ character
▪ count of scenes with the character and the location.
▪ Only include characters that appeared in atleast 50 scenes.
Answer
In [ ]:

Actors and Episodes¶
• Create the following view:
▪ seasonNum
▪ episodeNum
▪ no_of_scenes, which is the count of scenes in which the character appeared
▪ character_id
▪ character_name
▪ actor_name
▪ Ordered by seasonNum and characterNum ascending and no_of_scenes descending
Answer
In [ ]:

Character Interactions¶
• Characters interacted if they appeared in the same scene.
• Create the following view:
▪ characterName_1
▪ character_id_1
▪ characterName_2
▪ character_id_2
▪ scene_count, number of scenes containing both characters.
▪ Order by scene_count descending.
• Each pair should appear only once, that is the table should not contain two rows of the form:
▪ character_1, character_2, scene_count
▪ character_2, character_1, scene_count
Answer
In [ ]:

Interesting View 1¶
• Create a view of the information that you think would be interesting to fans.
• Explain what you view is producing.
Answer
• Explanation:
In [ ]:

Interesting View 2¶
• Create a view of the information that you think would be interesting to fans.
• Explain what you view is producing.
Answer
• Explanation:

Programming¶
Overview¶
• Ensuring that programming track students build core skills is essential. Programming track assignments will contain a subset of the broader, more complete tasks assigned in the No programming assignments. So, there is a common core section to the programming track homework.
• Additionally, the programming track will have tasks that reinforce skills necessary for programming and databases.
Common Core¶
Tasks¶
• Schema and DML: For the following tables, perform the data import, data cleanup and schema improvement takes for the following tables:
• characters
• scenes
• scenes_characters
• Views
1. Implement the view in 3.5.3, “Character Interactions.”
2. Define, explain and implement a view of you choosing.
Answers¶
Data Import and Cleanup

• Characters
In [ ]:
# %%sql
# create table W4111GotSolutionClean.characters(
# id int,
# characterName varchar(32),
# characterLink varchar(64),
# actorName varchar(32),
# actorLink varchar(32),
# character_id varchar(32),
# characterImageThumb varchar(200),
# characterImageFull varchar(300),
# nickname varchar(32),
# kingsguard varchar(32));
In [109]:
# %sql load data infile “/Users/boyuan/Desktop/IntroToDB/HW1C/GoTHW/Data/csv/characters.csv ” into table W4111GoTSolution.characters fields terminated by ‘,’;

Using MySQLWorkbench to import data, original data stored in W4111GoTSoultion, cleaned data stored in W4111GoTSolutionClean
In [114]:
%sql CREATE SCHEMA `W4111GoTSolutionClean` ;

* mysql+pymysql://root:***@localhost/W4111GoTSolution
1 rows affected.
Out[114]:
[]
In [115]:
%%sql
create table W4111GotSolutionClean.characters(
id int,
characterName varchar(32),
characterLink varchar(64),
actorName varchar(32),
actorLink varchar(32),
character_id varchar(32),
characterImageThumb varchar(200),
characterImageFull varchar(300),
nickname varchar(32),
kingsguard varchar(32));

insert into W4111GotSolutionClean.characters
select id, characterName, characterLink, actorName, actorLink, character_id, characterImageThumb, characterImageFull, nickname, kingsguard
from W4111GotSolution.characters;

update W4111GoTSolutionClean.characters set characterName=NULL where characterName=”;
update W4111GoTSolutionClean.characters set characterLink=NULL where characterLink=”;
update W4111GoTSolutionClean.characters set actorName=NULL where actorName=”;
update W4111GoTSolutionClean.characters set actorLink=NULL where actorLink=”;
update W4111GoTSolutionClean.characters set character_id=NULL where character_id=”;
update W4111GoTSolutionClean.characters set characterImageThumb=NULL where characterImageThumb=”;
update W4111GoTSolutionClean.characters set characterImageFull=NULL where characterImageFull=”;
update W4111GoTSolutionClean.characters set nickname=NULL where nickname=”;
update W4111GoTSolutionClean.characters set kingsguard=NULL where kingsguard=”;

* mysql+pymysql://root:***@localhost/W4111GoTSolution
0 rows affected.
389 rows affected.
0 rows affected.
21 rows affected.
35 rows affected.
36 rows affected.
0 rows affected.
134 rows affected.
134 rows affected.
380 rows affected.
384 rows affected.
Out[115]:
[]

• Scenes

Using MySQLWorkbench to import data, original data stored in W4111GoTSoultion, cleaned data stored in W4111GoTSolutionClean
In [116]:
%%sql
create table W4111GoTSolutionClean.scenes(
id int,
seasonNum int,
episodeNum int,
sceneNo int,
SceneStart varchar(16),
SceneEnd varchar(16),
location varchar(32),
subLocation varchar(64),
greensight varchar(8),
altLocation varchar(32),
warg varchar(8)
);

insert into W4111GoTSolutionClean.scenes
select id, seasonNum, episodeNum, sceneNo, SceneStart, SceneEnd, location, subLocation, greensight, altLocation, warg
from W4111GoTSolution.scenes;

update W4111GoTSolutionClean.scenes set SceneStart=NULL where SceneStart=”;
update W4111GoTSolutionClean.scenes set SceneEnd=NULL where SceneEnd=”;
update W4111GoTSolutionClean.scenes set location=NULL where location=”;
update W4111GoTSolutionClean.scenes set subLocation=NULL where subLocation=”;
update W4111GoTSolutionClean.scenes set greensight=NULL where greensight=”;
update W4111GoTSolutionClean.scenes set altLocation=NULL where altLocation=”;
update W4111GoTSolutionClean.scenes set warg=NULL where warg=”;

* mysql+pymysql://root:***@localhost/W4111GoTSolution
0 rows affected.
4165 rows affected.
0 rows affected.
0 rows affected.
0 rows affected.
495 rows affected.
4149 rows affected.
4130 rows affected.
4158 rows affected.
Out[116]:
[]

• Scenes and Characters

Using MySQLWorkbench to import data, original data stored in W4111GoTSoultion, cleaned data stored in W4111GoTSolutionClean
In [117]:
%%sql
create table W4111GoTSolutionClean.scenes_characters(
id int,
seasonNum int,
episodeNum int,
sceneNo int,
characterName varchar(64)
);

insert into W4111GoTSolutionClean.scenes_characters
select id, seasonNum, episodeNum, sceneNo, characterName
from W4111GoTSolution.scenes_characters;

update W4111GoTSolutionClean.scenes_characters set characterName=NULL where characterName=”;

* mysql+pymysql://root:***@localhost/W4111GoTSolution
0 rows affected.
12114 rows affected.
0 rows affected.
Out[117]:
[]

Keys and Constraints

• Characters
In [120]:
%%sql
alter table W4111GoTSolutionClean.characters
add primary key (id)

* mysql+pymysql://root:***@localhost/W4111GoTSolution

—————————————————————————
InternalError Traceback (most recent call last)
~/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/engine/base.py in _execute_context(self, dialect, constructor, statement, parameters, *args)
1248 self.dialect.do_execute(
-> 1249 cursor, statement, parameters, context
1250 )

~/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/engine/default.py in do_execute(self, cursor, statement, parameters, context)
579 def do_execute(self, cursor, statement, parameters, context=None):
–> 580 cursor.execute(statement, parameters)
581

~/.local/lib/python3.7/site-packages/pymysql/cursors.py in execute(self, query, args)
169
–> 170 result = self._query(query)
171 self._executed = query

~/.local/lib/python3.7/site-packages/pymysql/cursors.py in _query(self, q)
327 self._clear_result()
–> 328 conn.query(q)
329 self._do_get_result()

~/.local/lib/python3.7/site-packages/pymysql/connections.py in query(self, sql, unbuffered)
516 self._execute_command(COMMAND.COM_QUERY, sql)
–> 517 self._affected_rows = self._read_query_result(unbuffered=unbuffered)
518 return self._affected_rows

~/.local/lib/python3.7/site-packages/pymysql/connections.py in _read_query_result(self, unbuffered)
731 result = MySQLResult(self)
–> 732 result.read()
733 self._result = result

~/.local/lib/python3.7/site-packages/pymysql/connections.py in read(self)
1074 try:
-> 1075 first_packet = self.connection._read_packet()
1076

~/.local/lib/python3.7/site-packages/pymysql/connections.py in _read_packet(self, packet_type)
683 packet = packet_type(buff, self.encoding)
–> 684 packet.check_error()
685 return packet

~/.local/lib/python3.7/site-packages/pymysql/protocol.py in check_error(self)
219 if DEBUG: print(“errno =”, errno)
–> 220 err.raise_mysql_exception(self._data)
221

~/.local/lib/python3.7/site-packages/pymysql/err.py in raise_mysql_exception(data)
108 errorclass = error_map.get(errno, InternalError)
–> 109 raise errorclass(errno, errval)

InternalError: (1068, ‘Multiple primary key defined’)

The above exception was the direct cause of the following exception:

InternalError Traceback (most recent call last)
in
—-> 1 get_ipython().run_cell_magic(‘sql’, ”, ‘alter table W4111GoTSolutionClean.characters\nadd primary key (id)\n’)

~/opt/anaconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py in run_cell_magic(self, magic_name, line, cell)
2357 with self.builtin_trap:
2358 args = (magic_arg_s, cell)
-> 2359 result = fn(*args, **kwargs)
2360 return result
2361

in execute(self, line, cell, local_ns)

~/opt/anaconda3/lib/python3.7/site-packages/IPython/core/magic.py in (f, *a, **k)
185 # but it’s overkill for just that one bit of state.
186 def magic_deco(arg):
–> 187 call = lambda f, *a, **k: f(*a, **k)
188
189 if callable(arg):

in execute(self, line, cell, local_ns)

~/opt/anaconda3/lib/python3.7/site-packages/IPython/core/magic.py in (f, *a, **k)
185 # but it’s overkill for just that one bit of state.
186 def magic_deco(arg):
–> 187 call = lambda f, *a, **k: f(*a, **k)
188
189 if callable(arg):

~/.local/lib/python3.7/site-packages/sql/magic.py in execute(self, line, cell, local_ns)
93
94 try:
—> 95 result = sql.run.run(conn, parsed[‘sql’], self, user_ns)
96
97 if result is not None and not isinstance(result, str) and self.column_local_vars:

~/.local/lib/python3.7/site-packages/sql/run.py in run(conn, sql, config, user_namespace)
338 else:
339 txt = sqlalchemy.sql.text(statement)
–> 340 result = conn.session.execute(txt, user_namespace)
341 _commit(conn=conn, config=config)
342 if result and config.feedback:

~/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/engine/base.py in execute(self, object_, *multiparams, **params)
986 raise exc.ObjectNotExecutableError(object_)
987 else:
–> 988 return meth(self, multiparams, params)
989
990 def _execute_function(self, func, multiparams, params):

~/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/sql/elements.py in _execute_on_connection(self, connection, multiparams, params)
285 def _execute_on_connection(self, connection, multiparams, params):
286 if self.supports_execution:
–> 287 return connection._execute_clauseelement(self, multiparams, params)
288 else:
289 raise exc.ObjectNotExecutableError(self)

~/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/engine/base.py in _execute_clauseelement(self, elem, multiparams, params)
1105 distilled_params,
1106 compiled_sql,
-> 1107 distilled_params,
1108 )
1109 if self._has_events or self.engine._has_events:

~/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/engine/base.py in _execute_context(self, dialect, constructor, statement, parameters, *args)
1251 except BaseException as e:
1252 self._handle_dbapi_exception(
-> 1253 e, statement, parameters, cursor, context
1254 )
1255

~/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/engine/base.py in _handle_dbapi_exception(self, e, statement, parameters, cursor, context)
1471 util.raise_from_cause(newraise, exc_info)
1472 elif should_wrap:
-> 1473 util.raise_from_cause(sqlalchemy_exception, exc_info)
1474 else:
1475 util.reraise(*exc_info)

~/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/util/compat.py in raise_from_cause(exception, exc_info)
396 exc_type, exc_value, exc_tb = exc_info
397 cause = exc_value if exc_value is not exception else None
–> 398 reraise(type(exception), exception, tb=exc_tb, cause=cause)
399
400

~/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/util/compat.py in reraise(tp, value, tb, cause)
150 value.__cause__ = cause
151 if value.__traceback__ is not tb:
–> 152 raise value.with_traceback(tb)
153 raise value
154

~/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/engine/base.py in _execute_context(self, dialect, constructor, statement, parameters, *args)
1247 if not evt_handled:
1248 self.dialect.do_execute(
-> 1249 cursor, statement, parameters, context
1250 )
1251 except BaseException as e:

~/opt/anaconda3/lib/python3.7/site-packages/sqlalchemy/engine/default.py in do_execute(self, cursor, statement, parameters, context)
578
579 def do_execute(self, cursor, statement, parameters, context=None):
–> 580 cursor.execute(statement, parameters)
581
582 def do_execute_no_params(self, cursor, statement, context=None):

~/.local/lib/python3.7/site-packages/pymysql/cursors.py in execute(self, query, args)
168 query = self.mogrify(query, args)
169
–> 170 result = self._query(query)
171 self._executed = query
172 return result

~/.local/lib/python3.7/site-packages/pymysql/cursors.py in _query(self, q)
326 self._last_executed = q
327 self._clear_result()
–> 328 conn.query(q)
329 self._do_get_result()
330 return self.rowcount

~/.local/lib/python3.7/site-packages/pymysql/connections.py in query(self, sql, unbuffered)
515 sql = sql.encode(self.encoding, ‘surrogateescape’)
516 self._execute_command(COMMAND.COM_QUERY, sql)
–> 517 self._affected_rows = self._read_query_result(unbuffered=unbuffered)
518 return self._affected_rows
519

~/.local/lib/python3.7/site-packages/pymysql/connections.py in _read_query_result(self, unbuffered)
730 else:
731 result = MySQLResult(self)
–> 732 result.read()
733 self._result = result
734 if result.server_status is not None:

~/.local/lib/python3.7/site-packages/pymysql/connections.py in read(self)
1073 def read(self):
1074 try:
-> 1075 first_packet = self.connection._read_packet()
1076
1077 if first_packet.is_ok_packet():

~/.local/lib/python3.7/site-packages/pymysql/connections.py in _read_packet(self, packet_type)
682
683 packet = packet_type(buff, self.encoding)
–> 684 packet.check_error()
685 return packet
686

~/.local/lib/python3.7/site-packages/pymysql/protocol.py in check_error(self)
218 errno = self.read_uint16()
219 if DEBUG: print(“errno =”, errno)
–> 220 err.raise_mysql_exception(self._data)
221
222 def dump(self):

~/.local/lib/python3.7/site-packages/pymysql/err.py in raise_mysql_exception(data)
107 errval = data[3:].decode(‘utf-8’, ‘replace’)
108 errorclass = error_map.get(errno, InternalError)
–> 109 raise errorclass(errno, errval)

InternalError: (pymysql.err.InternalError) (1068, ‘Multiple primary key defined’)
[SQL: alter table W4111GoTSolutionClean.characters
add primary key (id)]
(Background on this error at: http://sqlalche.me/e/2j85)

• Scenes
In [122]:
%%sql
alter table W4111GoTSolutionClean.scenes
add primary key (sceneNo, seasonNum, episodeNum)

* mysql+pymysql://root:***@localhost/W4111GoTSolution
0 rows affected.
Out[122]:
[]

• Scenes and Characters
In [129]:
%%sql
alter table W4111GoTSolutionClean.scenes_characters
add primary key (id);
alter table W4111GoTSolutionClean.scenes_characters
add foreign key (sceneNo, seasonNum, episodeNum) references W4111GoTSolutionClean.scenes(sceneNo, seasonNum, episodeNum);

* mysql+pymysql://root:***@localhost/W4111GoTSolution
12114 rows affected.
Out[129]:
[]

Views

• Character Interaction Views
In [ ]:

• Your View and Explanation

Programming¶

Data Processing and Cleansing¶
Task¶
• The scene information in episodes.json complements and extend some of the information about character interactions and relationships. An example is:
{
“sceneStart”:”0:50:24″,
“sceneEnd”:”0:55:54″,
“location”:”The Crownlands”,
“subLocation”:”King’s Landing”,
“characters”:[
{
“name”:”Arya Stark”,
“weapon”:[
{“action”:”has”, “name”:”Needle”}
]
},
{
“name”:”Eddard Stark”,
“title”:”Hand”,
“alive”:false,
“mannerOfDeath”:”Decapitation”,
“killedBy”:[
“Ilyn Payne”
]
},
{“name”:”Yoren”},
{“name”:”Sansa Stark”},
{“name”:”Petyr Baelish”},
{“name”:”Cersei Lannister”},
{“name”:”Joffrey Baratheon”, “title”:”King”},
{“name”:”Grand Maester Pycelle”},
{“name”:”Lord Varys”},
{“name”:”Sandor Clegane”},
{
“name”:”Ilyn Payne”,
“weapon”:[
{“action”:”has”, “name”:”Ice”}
]
}
]
}
]
• A little data processing determines that the top-level elements of a character in a scene are:
Top level keys in characters in scenes are:
sex
name
weapon
born
married
killedBy
mannerOfDeath
title
alive

• Your data processing and cleanse tasks is to create a table of the form:
▪ seasonNum
▪ episodeNum
▪ sceneNo
▪ character_id_1
▪ action
▪ character_id_2
• You can get access to information that maps characterName to character_id using the following code.
In [134]:
import pymysql
cloud_conn = pymysql.connect(
host=”columbia8013.cqgsme1nmjms.us-east-1.rds.amazonaws.com”,
user=”gotuser”,
password=”gotusergotuser”,
db=”W4111GoTSolutionClean”,
cursorclass=pymysql.cursors.DictCursor)
In [135]:
def get_character_info(c_name):

sql = “select * from characters where characterName=%s”
cur = cloud_conn.cursor()
res = cur.execute(sql, (c_name))
res = cur.fetchall()
return res
In [139]:
get_character_info(“Arya Stark”)
Out[139]:
[{‘id’: 15,
‘characterName’: ‘Arya Stark’,
‘characterLink’: ‘/character/ch0158604/’,
‘actorName’: ‘Maisie Williams’,
‘actorLink’: ‘/name/nm3586035/’,
‘character_id’: ‘CH_15’,
‘royal’: 0,
‘characterImageThumb’: ‘https://images-na.ssl-images-amazon.com/images/M/MV5BMTk5MTYwNDc0OF5BMl5BanBnXkFtZTcwOTg2NDg1Nw@@._V1._SX100_SY140_.jpg’,
‘characterImageFull’: ‘https://images-na.ssl-images-amazon.com/images/M/MV5BMTk5MTYwNDc0OF5BMl5BanBnXkFtZTcwOTg2NDg1Nw@@._V1_SY1000_CR0,0,665,1000_AL_.jpg’,
‘nickname’: None,
‘kingsguard’: 0}]

Answer¶

Table Definition
In [146]:
%%sql
drop table W4111GoTSolutionClean.task;
create table W4111GoTSolutionClean.task(
seasonNum int default null,
episodeNum int default null,
sceneNo int default null,
character_id_1 int default null,
action varchar(64) default null,
character_id_2 int default null
);

* mysql+pymysql://root:***@localhost/W4111GoTSolution
0 rows affected.
0 rows affected.
0 rows affected.
Out[146]:
[]
In [ ]:

Sample Query
In [ ]:

Python Code

$\alpha^2$

Data Access Abstraction¶
• A common API and abstraction for manipulating different database types and schema simplifies the development of web applications and other applications. There are many abstractions.

Data Access Objects
• The second programming tasks is to implement our own abstraction. This is RDBDataTable, which implements an abstraction we will use in subsequent homework assignments, and which we will use to enable our web application.
• You can find the implementation template in the src directory of the project template.
• For HW 1, Part C, you can copy your implementation code into this notebook and test again the cloud databases.
• I strongly suggest that you develop and test in PyCharm, and then copy your code into the notebook. The cells below just show you where to place your code. You should copy your code over from PyCharm.

Implementation¶
• BaseDataTable defines the common abstraction we will use. You do not need to do anything to this class.
In [147]:
# Import package to enable defining abstract classes in Python.
# Do not worry about understanding abstract base classes. This is just a class that defines
# some methods that subclasses must implement.
from abc import ABC, abstractmethod

class DataTableException(Exception):
“””
A simple class that maps underlying implementation exceptions to generic exceptions.
“””

invalid_method = 1001

# General
def __init__(self, code, message):
self.code = code
self.message = message

def __str__(self):
result = (
type(self), {“code”: self.code, “message”: self.message}
)
result = str(result)
return result

class BaseDataTable(ABC):
“””
The implementation classes (XXXDataTable) for CSV database, relational, etc. will extend the
base class and implement the abstract methods.
“””

def __init__(self, entity_type_name, connect_info, key_columns=None, context=None):
“””

:param entity_type_name: Name of the logcal entity type. This maps to various abstractions in
underlying stores, e.g. file names, RDB tables, Neo4j Labels, …
:param connect_info: Dictionary of parameters necessary to connect to the data. See examples in subclasses.
:param key_columns: List, in order, of the columns (fields) that comprise the primary key.
A primary key is a set of columns whose values are unique and uniquely identify a row. For Appearances,
the columns are [‘playerID’, ‘teamID’, ‘yearID’]
:param contex: Holds context and environment information.
“””
self._table_name = entity_type_name
self._connect_info = connect_info
self._key_columns = key_columns
self._context = context

@abstractmethod
def find_by_primary_key(self, key_fields, field_list=None, context=None):
“””

:param key_fields: The values for the key_columns, in order, to use to find a record. For example,
for Appearances this could be [‘willite01’, ‘BOS’, ‘1960’]
:param field_list: A subset of the fields of the record to return. The CSV file or RDB table may have many
additional columns, but the caller only requests this subset.
:return: None, or a dictionary containing the columns/values for the row.
“””
pass

@abstractmethod
def find_by_template(self, template, field_list=None, limit=None, offset=None, order_by=None, context=None):
“””

:param template: A dictionary of the form { “field1” : value1, “field2”: value2, …}. The function will return
a derived table containing the rows that match the template.
:param field_list: A list of requested fields of the form, [‘fielda’, ‘fieldb’, …]
:param limit: Do not worry about this for now.
:param offset: Do not worry about this for now.
:param order_by: Do not worry about this for now.
:return: A derived table containing the computed rows.
“””
pass

@abstractmethod
def insert(self, new_entity, context=None):
“””

:param new_record: A dictionary representing a row to add to the set of records. Raises an exception if this
creates a duplicate primary key.
:return: None
“””
pass

@abstractmethod
def delete_by_template(self, template, context=None):
“””

Deletes all records that match the template.

:param template: A template.
:return: A count of the rows deleted.
“””
pass

@abstractmethod
def delete_by_key(self, key_fields, Context=None):
“””

Delete record with corresponding key.

:param key_fields: List containing the values for the key columns
:return: A count of the rows deleted.
“””
pass

@abstractmethod
def update_by_template(self, template, new_values, context=None):
“””

:param template: A template that defines which matching rows to update.
:param new_values: A dictionary containing fields and the values to set for the corresponding fields
in the records. This returns an error if the update would create a duplicate primary key. NO ROWS are
update on this error.
:return: The number of rows updates.
“””
pass

@abstractmethod
def update_by_key(self, key_fields, new_values, context=None):
“””

:param key_fields: List of values for primary key fields
:param new_values: A dictionary containing fields and the values to set for the corresponding fields
in the records. This returns an error if the update would create a duplicate primary key. NO ROWS are
update on this error.
:return: The number of rows updates.
“””
pass

@abstractmethod
def query(self, query_statement, args, context=None):
“””
Passed through/executes a raw query in the native implementation language of the backend.
:param query_statement: Query statement as a string.
:param args: Args to insert into query if it is a template
:param context:
:return: A JSON object containing the result of the operation.
“””
pass

@abstractmethod
def load(self, rows=None):
“””
Loads data into the data table.
:param rows:
:return: Number of rows loaded.
“””

@abstractmethod
def save(self, context):
“””
Writes any cached data to a backing store.
:param context:
:return:
“””

RDBDataTable
• You only need to implement the following methods:
▪ __init__
▪ __str__
▪ find_by_template You only need to implement the parameters
◦ template
◦ field_list
▪ find_by_primary_key You only need to implement the parameters
◦ key_fields
◦ field_list
• The logical behavior of the find_by_template API is the following:
▪ If you create an RDBDataTable instance for a SQL table GoTSolutionClean.scenes_characters,
▪ The call
characters_tbl.find_by_template(
▪ template={
▪ ‘seasonNum’: 1,
▪ ‘episodeNum’: 2
▪ },
▪ field_list = [‘seasonNum’, ‘episodeNum’, ‘sceneNo’, ‘character_id’]
▪ )

▪ Is equivalent to select seasonNum, episodeNum, sceneNo, character_id
▪ from GoTSolutionClean.scenes_characters
▪ where
▪ seasonNum=1 and episodeNum=1

▪ And returns the same dictionary.

• The logical behavior of the find_by_primary_key API is the following:
▪ If you create an RDBDataTable instance for a SQL table GoTSolutionClean.episodes,
▪ The primary key columns are seasonNum, episodeNum.
▪ The invocation would be find_by_primary_key([1,3],field_list=[‘seasonNum’,’episodeNum’,’title’])
▪ This would get converted into select seasonNum, episodeNum, title
▪ from W4111GoTSolutionClean.episodes
▪ where
▪ seasonNum=1 and episodeNum=1

▪ Your class implementation MUST run a database query to determine the primary key columns and their order to form the where clause.
▪ The result is a single dictionary containing the query result.
• We will provide tests.
In [148]:
from src.data_tables.BaseDataTable import BaseDataTable

import pandas as pd
import pymysql

import logging

logger = logging.getLogger()

class RDBDataTable(BaseDataTable):
“””
RDBDataTable is relation DB implementation of the BaseDataTable.
“””

# Default connection information in case the code does not pass an object
# specific connection on object creation.
_default_connect_info = pymysql.connect(
host=”columbia8013.cqgsme1nmjms.us-east-1.rds.amazonaws.com”,
user=”gotuser”,
password=”gotusergotuser”,
db=”W4111GoTSolutionClean”,
cursorclass=pymysql.cursors.DictCursor)

_default_cnx = None

@classmethod
def _get_default_cnx(cls):

if RDBDataTable._default_cnx is None:
# Create a connection to use inside this object. In general, this is not the right approach.
# There would be a connection pool shared across many classes and applications.
cls._default_cnx = pymysql.connect(
host=”columbia8013.cqgsme1nmjms.us-east-1.rds.amazonaws.com”,
user=”gotuser”,
password=”gotusergotuser”,
db=”W4111GoTSolutionClean”,
cursorclass=pymysql.cursors.DictCursor)

return cls._default_cnx

def __init__(self, table_name, key_columns=None, connect_info=None):
“””

:param table_name: The name of the RDB table.
:param connect_info: Dictionary of parameters necessary to connect to the data.
:param key_columns: List, in order, of the columns (fields) that comprise the primary key.
This is for compatibility with other types of data table. Any value other than None is an error.
“””

# If there is not explicit connect information, use the defaults.

####### Your Code Goes Here #########

####### Your Code Goes Here #########

def __str__(self):
“””

:return: String representation of the table’s metadata.
“””
####### Your Code Goes Here #########

####### Your Code Goes Here #########

@classmethod
def _run_q(cls, q, args=None, fields=None, fetch=True, cnx=None, cursor=None, commit=True):
“””

:param q: An SQL query string that may have %s slots for argument insertion. The string
may also have {} after select for columns to choose.
:param args: A tuple of values to insert in the %s slots.
:param fetch: If true, return the result.
:param cnx: A database connection. May be None
:param cncursor: Do not worry about this for now.
:param commit: Do not worry about this for now. This is more wizard stuff.
:return: A result set or None.
“””

r = None

cursor_created = False

if cnx is None:
cnx = cls._get_default_cnx()

try:
# Use the connection in the object if no connection provided.

# Convert the list of columns into the form “col1, col2, …” for following SELECT.
if fields:
q = q.format(“,”.join(fields))

if cursor is None:
cursor = cnx.cursor() # Just ignore this for now.
cursor_created = True

# If debugging is turned on, will print the query sent to the database.
#self.debug_message(“Query = “, cursor.mogrify(q, args))

cursor.execute(q, args) # Execute the query.

# Technically, INSERT, UPDATE and DELETE do not return results.
# Sometimes the connector libraries return the number of created/deleted rows.
if fetch:
r = cursor.fetchall() # Return all elements of the result.
else:
r = None

if commit: # Do not worry about this for now.
cnx.commit()

if cursor_created:
cursor.close()

except Exception as e:
print(“Exception e = “, e)
if commit:
cnx.rollback()
if cursor_created:
cursor.close()

return r

def _run_insert(self, table_name, column_list, values_list, cnx=None, commit=True):
“””

:param table_name: Name of the table to insert data. Probably should just get from the object data.
:param column_list: List of columns for insert.
:param values_list: List of column values.
:param cnx: Ignore this for now.
:param commit: Ignore this for now.
:return:
“””
try:
q = “insert into ” + table_name + ” ”

# If the column list is not None, form the (col1, col2, …) part of the statement.
if column_list is not None:
q += “(” + “,”.join(column_list) + “) ”

# We will use query parameters. For a term of the form values(%s, %s, …) with one slot for
# each value to insert.
values = [“%s”] * len(values_list)

# Form the values(%s, %s, …) part of the statement.
values = ” ( ” + “,”.join(values) + “) ”
values = “values” + values

# Put all together.
q += values

self._run_q(q, args=values_list, fields=None, fetch=False, cnx=cnx, commit=commit)

except Exception as e:
print(“Got exception = “, e)
raise e

@classmethod
def get_folders(cls):
pass

def find_by_primary_key(self, key_fields, field_list=None, context=None):
“””

:param key_fields: The values for the key_columns, in order, to use to find a record.
:param field_list: A subset of the fields of the record to return.
:return: None, or a dictionary containing the request fields for the record identified
by the key.
“””

####### Your Code Goes Here #########

####### Your Code Goes Here #########

def _template_to_where_clause(self, t):
“””
Convert a query template into a WHERE clause.
:param t: Query template.
:return: (WHERE clause, arg values for %s in clause)
“””
w_clause=None
args = None

####### Your Code Goes Here #########

####### Your Code Goes Here #########

return w_clause, args

def find_by_template(self, template, field_list=None, limit=None, offset=None, order_by=None, commit=True):
“””

:param template: A dictionary of the form { “field1” : value1, “field2”: value2, …}
:param field_list: A list of request fields of the form, [‘fielda’, ‘fieldb’, …]
:param limit: Do not worry about this for now.
:param offset: Do not worry about this for now.
:param order_by: Do not worry about this for now.
:return: A list containing dictionaries. A dictionary is in the list representing each record
that matches the template. The dictionary only contains the requested fields.
“””
####### Your Code Goes Here #########

####### Your Code Goes Here #########

def insert(self, new_record):
“””

:param new_record: A dictionary representing a row to add to the set of records.
:return: None
“””
pass

def delete_by_template(self, template):
“””

Deletes all records that match the template.

:param template: A template.
:return: A count of the rows deleted.
“””
pass

def delete_by_key(self, key_fields):
“””

Delete record with corresponding key.

:param key_fields: List containing the values for the key columns
:return: A count of the rows deleted.
“””
pass

def update_by_template(self, template, new_values):
“””

:param template: A template that defines which matching rows to update.
:param new_values: A dictionary containing fields and the values to set for the corresponding fields
in the records. This returns an error if the update would create a duplicate primary key. NO ROWS are
update on this error.
:return: The number of rows updates.
“””
pass

def update_by_key(self, key_fields, new_values):
“””

:param key_fields: List of values for primary key fields
:param new_values: A dictionary containing fields and the values to set for the corresponding fields
in the records. This returns an error if the update would create a duplicate primary key. NO ROWS are
update on this error.
:return: The number of rows updates.
“””
pass

def load(self):
pass

def save(self):
pass

def query(self, query_statement, args, context=None):
pass

Tests
• We will provide some tests.
In [ ]: