Database Administration
INFO20003
Database Systems
Lecture 17
Database Administration
David Eccles
Today’s Lecture…
• Functions that are part of the DBA role
– Capacity planning
• Estimating disk space and transaction load
– Backup and recovery
• Types of failures, responses to these, types of backups
-2-INFO20003 Database Systems
Capacity Planning
Capacity Planning
• “Capacity Planning is the process of predicting when
future load levels will saturate the system and
determining the most cost-effective way of delaying
system saturation as much as possible.”
– Menasce and Virgilio (2002) ‘Capacity Planning for Web
Services’. Prentice Hall.
• When implementing a database, need to consider:
– disk space requirements
– transaction throughput
– (at go-live and throughout the life of the system)
• Plan for 7y Can be 20y
4INFO20003 Database Systems
Estimating Database Usage
5INFO20003 Database Systems
Estimating Database Usage
6INFO20003 Database Systems
Estimating Database Usage
7INFO20003 Database Systems
Capacity Planning in the dev life
cycle
8INFO20003 Database Systems
Estimating disk space requirements
• Which estimation methodology to use?
– many vendors sell capacity planning solutions
– most have the same ideas at their core
– here we present the core concepts
• treat Database size as the sum of all Table sizes
– where table size = number of rows * average row width
9INFO20003 Database Systems
Calculating row widths
• need to know storage size of different data types
• https://dev.mysql.com/doc/refman/5.1/en/storage-
requirements.html
10INFO20003 Database Systems
https://dev.mysql.com/doc/refman/5.1/en/storage-requirements.html
Calculating row widths
• https://dev.mysql.com/doc/refman/5.1/en/storage-
requirements.html
• (these sizes are for MySQL and are slightly different
for other vendors)
11INFO20003 Database Systems
https://dev.mysql.com/doc/refman/5.1/en/storage-requirements.html
Calculating row widths
• https://dev.mysql.com/doc/refman/5.1/en/storage-
requirements.html
12INFO20003 Database Systems
https://dev.mysql.com/doc/refman/5.1/en/storage-requirements.html
Estimate growth of tables
• How will tables grow over time?
• Gather estimates during system analysis, e.g.
– “The company sells 1000 products. There are 2,000,000
customers who place, on average, 5 orders each per
month. An average order is for 8 different products.”
13INFO20003 Database Systems
Estimate growth of tables
• Using this simplified database as an example,
assume there are:
– 100 forums
– 1 million users
and assume that:
– users post average
– 30 times per month
we calculate:
– Post table grows by 1M rows / day
– which is 12 inserts per second
14INFO20003 Database Systems
Calculate disk space per table
• use a spreadsheet to simplify calculations and
enable what-ifs
15INFO20003 Database Systems
Projected total storage
requirements
16INFO20003 Database Systems
Estimating transaction load
• consider each business transaction
• how often will transaction each be run?
• for each transaction, what SQL statements are being
run?
• for example, consider this fictitious banking
application:
17INFO20003 Database Systems
Now you try….
-18-INFO20003 Database Systems
Go Live Year 1 Year 2
Users 0 300000 900000
Tweets p/user 0 3600 4800
Total Tweets 0 108,000,000 432,000,000
Make an assumption
About email varchar(200)
Backup and Recovery
What is a Backup
-22-
• A backup is a copy of your data
– however there are several types of backup
• If data becomes corrupted or deleted or
held to ransom it can be restored from
the backup copy
• A backup and recovery strategy is
needed
– To plan how data is backed up
– To plan how it will be recovered
INFO20003 Database Systems
Protect data from …
• human error
– e.g. accidental drop or delete
– example:
http://www.theaustralian.com.au/aus
tralian-it/human-error-triggered-nab-
software-corruption/story-e6frgakx-
1225962953523
-23-
• hardware or software malfunction
– bug in application
– hard drive (failure or corruption)
– CPU
– memory
INFO20003 Database Systems
http://www.theaustralian.com.au/australian-it/human-error-triggered-nab-software-corruption/story-e6frgakx-1225962953523
Must also protect against
• malicious activity
– security compromise
• server, database, application
-24-
• natural or man made disasters
– consider the scale of the damage
• government regulation
– historical archiving rules
– Metadata collection (AUS)
– HIPPA, EU data retention regulations
– Privacy Rules
INFO20003 Database Systems
Categories of Failure
-25-
Failures can be divided into the following categories:
• Statement failure
– Syntactically incorrect
• User Process failure
– The process doing the work fails (errors, dies)
• Network failure
– Network failure between the user and the database
• User error
– User accidentally drops the rows, table, database
• Memory failure
– Memory fails, becomes corrupt
• Media Failure
– Disk failure, corruption, deletion
INFO20003 Database Systems
Types of Backups
-26-
• Physical vs Logical
• Online vs Offline
• Full vs Incremental
• Onsite v Offsite
INFO20003 Database Systems
Physical vs Logical backup
-27-
• Physical backup
– raw copies of files and directories
– suitable for large databases that need fast recovery
– database is preferably offline (“cold” backup) when backup occurs
• MySQL Enterprise automatically handles file locking,
so database is not wholly off line
– backup = exact copies of the database directories and files
– backup should include logs
– backup is only portable to machines with a similar configuration
– to restore
• shut down DBMS
• copy backup over current structure on disk
• restart DBMS
INFO20003 Database Systems
Physical vs Logical backup
-28-
• Logical backup
– backup completed through SQL queries
– slower than physical
• SQL Selects rather than OS copy
– output is larger than physical
– doesn’t include log or config files
– machine independent
– server is available during the backup
– in MySQL can use the backup using
• Mysqldump
• SELECT … INTO OUTFILE
– to restore
• Use mysqlimport, or LOAD DATA INFILE within the mysql client
INFO20003 Database Systems
Online vs Offline backup
-29-
• Online (or HOT) backup
– backups occur when the database is “live”
– clients don’t realise a backup is in progress
– need to have appropriate locking to ensure integrity of data
• Offline (or COLD) backup
– backups occur when the database is stopped
– to maximize availability to users,
take backup from replication server not live server
– simpler to perform
– cold backup is preferable, but not available in all situations
e.g. applications without downtime
INFO20003 Database Systems
Full vs Incremental backup
-30-
• Full
– a full backup is where the complete database is backed up
• may be Physical or Logical, Online or Offline
– it includes everything you need to get the database
operational in the event of a failure
• Incremental
– only the changes since last backup are backed up
– for most databases this means only backup log files
– to restore:
• stop the database, copy backed up log files to disk
• start the database and tell it to redo the log files
INFO20003 Database Systems
Create a Backup Policy
-31-
• Backup strategy is usually a combination of full and
incremental backups
– for example:
• weekly full backup
• weekday incremental backup
• Conduct backups when database load is low
• If using replication, use the mirror database for
backups to negate any performance concerns with
the primary database
• TEST your backup before you NEED your backup!
Fr
= Delta (only changes
since last backup)
Su Mo Tu We Th Su
INFO20003 Database Systems
Offsite Backup
• enables disaster recovery
(because backup is not physically near the disaster site)
• example solutions:
• backup tapes transported to underground vault
• remote mirror database maintained via replication
• backup to Cloud (see figure below)
-32-INFO20003 Database Systems
What is examinable?
-33-
• The roles of a DBA
– Capacity planning
• Calculating Capacity & Transaction workload
– Back up and Recovery
• Backup Types
• Types of Failures
INFO20003 Database Systems
Next lectures
-34-
• Week 10:
• Lecture19: Transactions
• Lecture 20: Data Warehousing
INFO20003 Database Systems