CS代写 Information Management

Information Management

Università degli Studi di Databases (slides partially provided by Prof. Samarati)

Copyright By PowCoder代写 加微信 powcoder

Paradigms for data distribution
• Client-server architectures
• Separation between the database server and the client
• Distributed databases
• Several database servers used by the same application
• Parallel databases
• Several storage devices and processors that operate in parallel to improve
performance
• Replicated databases
• Same data physically stored at different servers
• Data warehouses
• servers specialized for the management of data dedicated to decision support

Kinds of architectures
Two kinds of systems
• OLTP (On-Line Transaction Processing)
• for the optimized management and reliable transactions on database server
• specialized on the management of hundreds or even thousands of transactions per second
• OLAP (On-Line Analytical Processing)
• for data analysis
• operate on data warehouse servers, specialized on the management of data for decision support systems
Server typically include both OLAP and OLTP functionalities

Properties of distributed systems
Portability
• possibility of transporting programs from an environment to another • established at compile time
• facilitated by language standards (e.g., SQL-2, SQL-3)
Interoperability
• ability of interacting among heterogeneous systems
• established at compile time
• facilitated by standard protocols for data access (e.g., Database Connectivity (ODBC) e X-Open Distributed Transaction Processing (DTP))

Client-server architectures (1)
Interacting software processes are divided between client (request services) and server (provide services)
• Requires a precise definition of service interface, listing precisely the services offered by the server
• The client has an active role (makes the request)
• The server has a reactive role (replies)
• Usually, a client process requests a few services in a sequence to one or more server processes
• Usually, a server process replies to multiple requests by different client processes

Client-server architectures (2)
• The machine acting as client should
• be adequate for interacting with the final user
• support productivity tools (email, word processing, Internet access, and workflow management)
• The machine acting as server should
• have considerable memory capacity (to support buffer management) • have considerable disk capacity (to store the whole database)

Client-server architectures (3)
Widely used for databases:
• Client and server functions are well defined
• Provides a good separation of the management activities • client suitable for interaction with the user
• server suitable for data management
• SQL offers an ideal programming paradigm for the identification of service interfaces

Client-server architectures (4)
SQL offers an ideal programming paradigm for the identification of services interfaces
• SQL queries formulated by the client and sent to the server
• query results are computed by the server and sent to the client
• SQL standardization, portability, and interoperability enable the development of client applications that involve different servers

Client-server architectures (5)
Often the server is multi-threaded:
• Behaves as a single process that dynamically operates for different
transactions
• Each execution unit of the server process for a given transaction is a thread

Client-server architectures (6)
• Servers are processes that are permanently active and check: • an input queue for client’s requests
• an output queue for query results
• Often, a dispatcher process distributes requests among servers and returns replies to clients
• When dispatchers can dynamically determine the number of active server processes depending on the number of requests, we say that we have a class of servers

Client-server architectures (7)
Output queue
Input queue
Database Server
Server Process

Two-tiered architectures (1)
• There are two machines: a client and a server
• There are different solutions for deciding to which machine each layer
is assigned
• Simple approach:
• the client has the user-interface level
• the server has the processing and data levels
• This is not the only solution

Two-tiered architectures (2)

Fat vs thin client
• Fat Clients
+ More efficient for users
+ Provide higher system scalability
– More complex and hard to manage
– Client software is more error prone and depends on the client machine and its
operating system • Thin Clients
+ Easier to manage
Trend to move computation to the client

Multi-tiered architectures
• A single server is replaced by multiple servers running on different machines
• A server may act as a client
• Different programs in the processing level reside on different servers

Distributed databases
System of databases where at least one client interacts with more servers for the execution of an application

Distributed databases: Pros
• Respond to applications requirements
• organizations have a distributed structure
• data distribution permits the management of data where they are generated and used
• Flexibility and modularity
• can be configured with continuous additions and modifications of
components
• Reliability
• can react to failures reducing performance instead than stopping operability

Distributed databases: classification (1)
• Kind of involved DBMSs
• Homogeneous DDB: all the servers use the same DBMS • Heterogeneous DDB: the servers use different DBMSs
• Kind of network
• Local Area Network (LAN) • Wide Area Network (WAN)

Distributed databases: classification (2)
Kind of DBMS
Kind of network
Homogeneous
Management and financial applications
Booking systems and financial applications
Heterogeneous
Management and inter-functional applications
Integrated booking systems and inter- banking systems

Local independency and cooperation
The distributed database can be considered, from an abstract point of view, as a single database
• It should be designed in such a way to have applications that are executed on a single server, minimizing
• the need of interaction
• the need of data exchange

Data fragmentation (1)
Adopts algebraic operations on a relation R to divide it into fragments R1, …, Rn
• Horizontal fragmentation
• each Ri has a subset of the tuples in R
• each Ri can be interpreted as the result of a selection over R
• Vertical fragmentation
• each Ri has, in its schema, a subset of the attributes in R
• each Ri can be interpreted as the result of a projection over R

Data fragmentation (2)
Correctness property
• completeness: each data in R must be represented in one of its fragments Ri
• reconstructability: R should be fully reconstructable starting from its fragments
• Horizontal fragments are disjoint
• do not have common tuples
• Vertical fragments include the primary key of R • guarantees reconstructablility

Horizontal fragmentation: example (1)
EMPLOYEE(Empnum,Name,Deptnum,Salary,Taxes)
Fragments:
• EMPLOYEE1 = σEmpnum≤3 EMPLOYEE • EMPLOYEE2 = σEmpnum>3 EMPLOYEE
To reconstruct the relation:
• EMPLOYEE = EMPLOYEE1 È EMPLOYEE2

Horizontal fragmentation: example (2)

Production Administration Production Marketing
3.7M 3.5M 5.3M 3.5M
1.2M 1.1M 2.1M 1.1M
EMPLOYEE1 (σEmpnum≤3 EMPLOYEE)
Administration Production
3.7M 3.5M 5.3M
1.2M 1.1M 2.1M
EMPLOYEE2 (σEmpnum>3 EMPLOYEE)

Vertical fragmentation: example (1)
EMPLOYEE(Empnum,Name,Deptnum,Salary,Taxes) Fragments:
• EMPLOYEE1 = ÕEmpnum,Name(EMPLOYEE)
• EMPLOYEE2 = ÕEmpnum,Deptnum,Salary,Taxes(EMPLOYEE)
To reconstruct the relation:
• EMPLOYEE = EMPLOYEE1 EMPLOYEE2

Horizontal fragmentation: example (2)

Production Administration Production Marketing
3.7M 3.5M 5.3M 3.5M
1.2M 1.1M 2.1M 1.1M
EMPLOYEE1 = ÕEmpnum,Name(EMPLOYEE) EMPLOYEE2 = ÕEmpnum,Deptnum,Salary,Taxes(EMPLOYEE)
EMPLOYEE1 EMPLOYEE2

Production Administration Production Marketing
3.7M 3.5M 5.3M 3.5M
1.2M 1.1M 2.1M 1.1M

Allocation schema
Describe the mapping of relations or fragments to servers where they are stored
• Each fragment corresponds, at the physical level, to a file and is allocated to a specific server
• fragments are stored
• the original relation is a view over fragments (virtual)
• Allocation can be
• non-redundant: each fragment or relation is allocated to one server only
• redundant: at least one fragment or relation is allocated to multiple servers

Transparency levels (1)
Distinguishing between fragmentation and allocation permits to write applications operating at different levels
• From the most abstract and independent from data fragmentation • To the most concrete and dependent on their physical allocation

Transparency levels (2)
• fragmentation: the programmer
• does not need to know the fragmentation • does not need to know the allocation
• allocation: the programmer
• needs to know the structure of fragments • does not need to know the allocation
• language: the programmer
• needs to know the structure of fragments • needs to know the allocation
• no transparency
• each DBMS accepts its own SQL dialect: the system is heterogeneous and the DBMSs do not support a common interoperability standard

Transparency levels: example
SUPPLIER(Snum,Name,City)
• horizontal fragments:
• SUPPLIER1 = σcity=’Milan’ (SUPPLIER) • SUPPLIER2 = σcity=’Rome’ (SUPPLIER)
• allocation of horizontal fragments (with replication): •
write a procedure that, given a supplier number, returns its name…

Fragmentation transparency: example
• The programmer
• does not need to know the fragmentation • does not need to know the allocation
procedure Query1(:snum,:name); select Name into :name from Supplier
where Snum = :snum;
end procedure;

Allocation transparency: example
• The programmer
• needs to know the structure of fragments
• does not need to know the allocation
• in case of redundancy does not need to indicate which copy to use for access (replication transparency)
procedure Query2(:snum,:name); select Name into :name from Supplier1
where Snum = :snum;
if :empty then
select Name into :name
from Supplier2
where Snum = :snum;
end procedure;

Language transparency: example
• the programmer
• needs to know the structure of fragments
• needs to know the allocation
• in case of replication, needs to indicate which copy to use for access
procedure Query3(:snum,:name); select Name into :name
from where Snum = :snum;
if :empty then
select Name into :name
from where Snum = :snum;
end procedure;

Query optimization
The application can be optimized through:
• parallelism
• submit requests in parallel in contrast to in sequence • reduce the overall response time
• knowledge of the logical properties of fragments • query the fragment where data reside
• increase efficiency but reduce flexibility

Query optimization: example
procedure Query4(:snum,:name,:city); case :city of
select Name into :name from Supplier1
where Snum = :snum;
select Name into :name from Supplier2
where Snum = :snum;
end procedure;

Fragmentation: exercise
SHOP (ID, Name, Kind, Tel, City, IDWarehouse) WAREHOUSE (ID, Street, City, Tel)
• WAREHOUSE is split in 4 fragments based on the city where warehouses are located (Milan, Florence, Rome, Venice), allocated at the three corresponding shipment centres in the same cities
• SHOP is split in 3 fragments based on the kind of shop (clothes, shoes, bath) allocated at three coordination centres, one for each kind of shop, in Cagliari, Siena, and Bari, respectively

Fragmentation: exercise (continuation)
• Write a query that retrieves the name, kind, and telephone number of the shop with ID=123 according to fragmentation, allocation, and language transparency.

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com