Scale-out Architectures Fundamentals:
Datapartitioning
Auto redistribution, balancing
Parallel access to partitions in different servers
Copyright By PowCoder代写 加微信 powcoder
Simple data model
N ma eici ke-value like
Vale mabe ced eg clm
File system vs Table-like systems (HbaseCaada
Scale-out Architectures
Failures are norm, not exception
Hadoop / MR framework and related systems
Does most of the dirty work for us
Failures, monitoring, data transfers/organizations, etc.
With all of that done for us, we then just describe what/how we want done:
Splits (= blocks; distributed; loaded into mappers, etc..)
Map and reduce functions
Output to HDFS
Disadvantages of Hadoop / MR ?
Input-output: distributed reads/writes HDFS (map output, shuffle,
What if we have interactive or iterative algorithms
eed maiai daa ad el i mai mem f bee ceig
Intermediate results in iterative tasks
Repeated different queries over same datasets (perhaps intermediate results).
SPOF Architectures ?
Single Point of Failures..
Master-workers
Global knowledge
What kind of knowledge is stored and how
What does master do (orchestrate all)
Monitorupsanddowns,resourceutilization,etc. Loadbalancing/assignment
Even with master-slave designs: Seealleelfcealiai
E.g., MR2/YARN removes much of the central functioning and distributes it out, BigTable/HBase uses Chubby/Zookeeper for much of it, etc.
Dont confuse the existence of master with global knowledge
Can be maintained in a decentralised way
SPOF or NOT ? Decentralized
Eg Cassandra
Data placement: Consistent hashing Load Balancing
» Vs range queries
» Storage vs query load balancing Locating data items
Routing: decentralized Finger tables
Giig
Failure handling decentralised chores
A system for which access types ? Batchand/orRandom?
MR/Spark + HDFS
Batch+Random
MR/Spark on NoSQL
Ad-hoc query random data access
Why can NoSQLs (like Cassandra/Hbase/BigTable) do random accesses well
Why GFS/HDFS cannot?
Fault Tolerance
Recoverability (WAL) logs
Checkpoints
H Sadb
RecoverabilityvsEfficiency(why?)
ReplicationHighAvailability
Replication factors
Replication control protocol
Efficiency vs consistency (why?)
High Write Throughput
Avoid disk writes as much as possible Membufs
Butwhataboutpersistence WALs
Add: Group commits / append-only access
Or,forgetaboutpersistenceakeachace
What about reads ? SSTables,
Row vs column storage organisations on disk
Compactions
Can these help with locating relevant servers, or only once a server is ideified
Think of Cassandra vs HBase !
Ideig
Which node(s) store the data I want ?
Which disk blocks at these nodes, in particular ?
For storage models like Hbase, BigTable, Cassandra, etc,
Which SSTables have blocks which have my data ?
Which blocks within SSTables ?
a 3-tier indexing structure in Hbase/BigTable
How does Cassandra do it?
Advanced (query-specific) indices:
Can I expedite specific queries by getting to the data that is part of the query answer fast ?
Example:
Given the web archive of AEs
Queries: find all pages/articles whose creation was within a given time interval
What index can you build for this using Hbase?
Consistency
Without replication
Atomic row ops
Evenformulti-columnaccesses
NO atomicity for multi-row ops
Atomicity refers to both concurrency control and fault tolerance
With replication
Replication protocol (quorums, primary-secondary schemes,
Think of
performance implications: good vs bad. Fault tolerance / high availability
Consistency levels (strong/weak) offered.
read-one-write-all
THINK OF HOW EACH OF THESE HORIZONTAL ISSUES IS ADDRESSED BY EACH OF THE SYSTEMS WE DISCUSSED IN CLASS !
Moving Up The Data Management Food Chain
One (Datacentre) is None Geo-Distributed/Multi-Datacentre Operation
These ae he daacee ae lkig f Computing/analytics at the Edge
But is this the way to go?
But is this the way to go?
How large is your dataset?
What is the largest amount of RAM you can have in a
igle U ackm cmmdi ee
>= 64TB! (IBM Power9 server)
How many CPUs can you have in a single server?
>= 1536 vcores! (IBM Power9 server)
Think CPUs, caches, memory buses, etc.; is this still a
centralised system?
Islands of CPUs, sharing caches/buses, etc.
More like a distributed system within a centralised system
But is this the way to go?
Storage B/W
50+ MB/s (HDD)
500+ MB/s (SSD)
Network B/W
CPU (freq)
Source: https://drive.google.com/open?id=0BwMOJChROzavdVYyRXpLY1J4a0U
Source: https://blog.westerndigital.com/cpu-bandwidth-the-worrisome-2020-trend/
Source: http://wikibon.org/w/images/b/be/ProjectionCapacityDiskNANDManagementSummary.png
Big Data Systems Summary
A slew of “easy to use” open-source systems Hadoop/Mapreduce
YARN/MapReduce2/Tez
Spark/SparkSQL
Hive/Pig/Impala …
Over data stored in
HDFS (plain files, Parquet, ORC, …)
NoSQL data stores (HBase, HyperTable, Cassandra, Kudu, MapR, Dynamo, RedShift, …)
Which of these is the best for you?
Hard to tell right away
Depends on characteristics of
Vs,schema,cardinalities,constraints,…
Selectivity,dataaccesspatterns,executionplan,…
Software stack
OSbuffercache,networkstack,scheduler,filesystem,
Hadoop/spark/… configuration Hardware
CPU,RAM,disk,NIC,switches/routers,… Infrastructure/ops
Multitenancy,isolation,virtualisation,… …
Food for thought: Sorting
Suppose you want to totally sort a large input
HadooporSparkthebestoption?
SparkusuallyseveraltimesfasterthanHadoop… …butwait!
128 physical
128 physical
Virtualised 1 1 Gbps
Elapsed Time
Hadoop MR record
50400 physical
6592 virtualised
Dedicated 10 10 Gbps
Elapsed Time
(1) https://databricks.com/blog/2014/10/10/spark-petabyte-sort.html
(2) J. Shi et al. “Clash of the Titans: MapReduce vs Spark for Large Scale Data Analytics”. PVLDB, 8(13), 2015.
Thee ad back agai
Big daa l e efl
Fault tolerant + scalable processing of tasks requiring (semi-)complete scans
Fal lea calable age f fee
But come with design decisions that can hurt performance Fairly rigid map-reduce style computation
Large units of storage to improve throughput
Optimised for scan-everything tasks
Highly selective tasks suffer
Solution:
Distributed index building + storage Coordinator-based query processing
Part I: Needle in a (big data) haystack
Needle in a haystack
t1 t2 t3 t4
Relevant Data
Case study 1: Top-k Join Queries
Top-k Join Queries
SELECT select-list FROM R1, R2 Rn WHERE join-expression(R1, R2 Rn) ORDER BY f(R1, R2 Rn)
Scoring of individual tuples of Ri based on either an explicit score attribute (e.g., movie viewer rating,
PageRank score, …) or
a score function on a subset of the tuple attribute values
Score of join result tuples computed using a monotonic aggregate function f() on individual tuple scores
N. Ntarmos, I. Patlakas, and P. Triantafillou. “Rank join queries in NoSQL databases. PVLDB, 7(7):493–504, 2014.
Inverted Score List Top-k Join
SELECT * FROM T1, T2 WHERE T1.A=T2.A ORDER BY T1.Score+T2.Score STOP AFTER 1
Match1: A = 17, Score = 183 Match2: A = 12, Score = 180
Max future join score = max(95+85, 90+93) = 183
No unprocessed tuple can be in the result set and have a higher score than Match1
N hee e
With textbook Fagin/Ilyas-style approach, tuples may still be read from disk and transferred over the network, even though they may never contribute to the join result
Goal: only the tuples needed for the final result set should be read and transferred
The BFHM structure
Histogram on score values
103…1 021…1
010…1 100…2
Equi-widthhistogramonscorevalues
Eachbucketstores:
The min and max score for its existing tuples
A Bloom filter (*) of tuple join attribute values
(*) Actual data structure is a Golomb-compressed hybrid of a 1-hash Plain BF, plus a hashtable of non-e Cig BF ce
Bloom filters on join values
BFHM example
BFHM for T1
0.8-0.9 0.7-0.8
BFHM for T2
0.6-0.7 0.5-0.6
Result estimation using BFHMs Idea:
Cme a bi-ie dc of BFHM buckets in pairs
If the product is >0 then there are possible result(s)
Produce a list of estimated results sorted on their attainable scores
0.9- 0.8- 1 0.9
0.7- 0.6- 0.5- 0.8 0.7 0.6
0.9- 0.6- 1 0.7
0.5- 0.4- 0.3- 0.6 0.5 0.4
# of results
Result estimation using BFHMs (cont.)
Top-5 query
Top-5 estimated result tuples
Tuples that could be in top-k if our estimations were off (i.e. those with max score larger than the min score of the kh result in the top-k list)
# of results
Performance Results
LineItem-Part
LineItem-Orders
Case study 2: k-Nearest-Neighbours
k-Nearest-Neighbours
SELECT * FROM R ORDER BY distance(q) LIMIT k
Databaseofmulti-dimensionaldatapoints(plus properties)
User provides (multi-dimensional) query point q
ScoringofindividualtuplesofRbasedondistance
A. Cahsai,N.Ntarmos,C.Anagnostopoulos,andP.Triantafillou,“Scalingk-NearestNeighborsQueries(Therightway), in Proceedings of the 37th IEEE International Conference on Distributed Computing Systems (ICDCS), 2017.
A. Cahsai, C. S. Anagnostopoulos, N. Ntarmos, and P. Triantafillou, “Revisiting Exact kNN Query Processing with
Probabilistic Data Space Transformations, in Proceedings of the IEEE International Conference on Big Data (BigData)
( Paper Award), 2018.
Typical Workflow
Two-pronged Solution
COWI: Coordinator With In-memory Index
Quad-tree based index
Sample + compute size of leaves/internal nodes to fit in main memory
Distributedindexbuilding+storageonDFS
Data partitioning
Quad-tree building
Lad ad ee i cdia mem
answer queries
CONI: Coordinator with No In-memory Index
Build COWI + store in DKVS
Build balanced Quad Tree over COWI leaf
Ladllaeadeeicdia memory + answer queries
Performance Results
If data points are uniformly distributed, a Quad Tree is an overkill
Optimal: uniform grid
Decide cell size based on items per cell
Partition + store cell items together
Only need to know cell size (along each dimension) + mapping function to cell ID
Alas, they are not
S le make hem be
STOS: Probabilistic Space Transformation
1. Finite Gaussian Mixture Models to break data distribution into set of Gaussians
2. De-correlate per cluster RVs Remove statistical dependencies among RVs
3. Transform independent random RVs to
joint uniform distribution
Skla theorem: any multivariate distribution can be written in terms of a univariate uniform distribution of each RV and a copula that captures the dependence between the RVs
In independence copula, marginal and joint distributions of RVs are standard uniform
4. Use uniform grid to index the latter; store grid dimensions + reverse transformation function
Performance Results
Index Building Time Index Loading Time
In-memory Index Size
Query Response Time
#rows accessed per query
Graphs are everywhere
Computer networks
Source: https://commons.wikimedia.org/wiki/ File:Internet_map_4096.png
Source: https://pixabay.com/get/ 57e7d2404350ae14f6da8c7dda3536781538d6e25b55794f_1280.jpg
Source: https://arxiv.org/pdf/2003.00911v1.pdf
Social networks
Source: https://commons.wikimedia.org/wiki/ File:Social_Network_Analysis_Visualization.png
Source:https://commons.wikimedia.org/wiki/ File:Relationship_graph_of_Panama_Papers.png
Neural networks/AI
Commerce/Recommendations
Graphs are hard
Expressiveness at a cost
Complex queries
Traversals (e.g., BFS, DFS, )
Routing(e.g.,A*search,SSSP,APSP,)
Netork analsis (e.g., centralit, PageRank, )
Graph pattern matching (e.g., subgraph isomorphism, approimate pattern matching, )
Unique access patterns
Large number of data accesses Low spatial locality
Often restrictive application domains/use cases
Case study 3: Subgraph Matching
Example: Subgraph/supergraph queries
subgraph queries
supergraph queries
graph dataset
graph dataset
Bruteforce Approach
g1 g2 g3 g4 g5 g6 g7
Subgraph isomorphism test
NP complete
ok b highl inefficien
q g1 q g2 q g3 q g4
q g5 q g6 q g7
Index-based processing (Method M)
g1 g2 g3 g4 g5
q g1 isomorphism q g2
Candidate Set
Subgraph test
Dataset Index
q g3 q g4 q g5
F. Katsarou, N. Ntarmos, and P. Triantafillou, “Performance and Scalability of Indexed Subgraph Query Processing Methods, Proceedings of the VLDB Endowment (PVLDB), vol. 8, no. 12, pp. 1566–1577, 2015.
F. Katsarou, N. Ntarmos, and P. Triantafillou, “Hybrid Algorithms for Subgraph Pattern Queries in Graph Databases, in Proceedings of the IEEE International Conference on Big Data (BigData), 2017.
F. Katsarou, N. Ntarmos, and P. Triantafillou, “Subgraph Querying with Parallel Use of Query Rewritings and Alternative Algorithms, in Proceedings of the 20th International Conference on Extending Database Technology (EDBT), 2017.
meanhile in he real orld
worldwide friendship network friendship network in Europe
friendship network in UK
hee bgaphsupergraph
proteins of multi-cell organism, e.g., plants, anim friendship network rine among queries!
Le hane
friendship network in Glasgow
protein mixtures
amino acids
proteins of uni-cell bacteria
Enter the GraphCache
Enter the GraphCache
Candidate Set (CS) {g1, g2, g3, g4, g5}
Bruteforce approach
YES TBD NO
g1 g2 g3 g4 g5
g1 g2 g3 g4 g5
g1 g2 g3 g4 g5 g6 g7
J. Wang, N. Ntarmos, and P. Triantafillou, “Indexing Query Graphs to Speed Up Graph Query Processing, in Proc. EDBT, 2016.
J. Wang, N. Ntarmos, and P. Triantafillou, “GraphCache: A Caching System for Graph Queries, in Proc. EDBT, 2017.
J. Wang, N. Ntarmos, and P. Triantafillou, “Ensuring Consistency in Graph Cache for Graph-Pattern Queries, in Proc. GraphQ, 2017. J. Wang, Z. Liu, S. Ma, N. Ntarmos, and P. Triantafillou, “GC: A Graph Caching System for Subgraph/Supergraph Queries, PVLDB, 11(12):2022–2025, 2018.
GraphCache
Query Index
q1 q2 q3 q4 q5 q6 q7
Answer sets
{g1, g2, g3, g4, g7}
Sub case: q is a subgraph of a previous query
Dataset Index
g1 g2 g3 g4 g5
g1 g2 g3 g4 g5
Why remove g1, g2 from CS ?
q1 A(q4q):2{g1, g2}q3 q4 q4
(answer set)
Query Index
Sub case (cont.): q is a subgraph of multiple previous queries
A(q2) A(q4) A(q6) YES
A(q2) A(q4) A(q6) Answer sets q1 q2 q3 q4 q5 q6 q7
Query Index
Sub case (cont.): … a special case
Answer set of q (against dataset) = A(q2)
A(q2) q q2
q isomorphic to q2 ame query found)
q1 q2 q3 q4 q5 q6 q7
Query Index
Super case: q is a supergraph of a previous query
Dataset Index
Query Index
1 q2 q3 {g1, g2, g3, g4, g7}
(answer set)
q7 from CS ?
Why remove g5
Super case (cont.): q is a supergraph of multiple previous queries
CS {CS A(q5) A(q7)} Not in
A(q5) A(q7) A(q5) A(q7)
Answer sets A(q5) A(q7) q1 q2 q3 q4 q5 q6 q7
Query Index
Super case (const): a pecial case
No dataset graph could contain q
No dataset graph contains q7
A(q7) = { } q7
q1 q2 q3 q4 q5 q6 q7
Query Index
Ping i all ogeher
Dataset Index
g1 g2 g3 g4 g5
Super case
g1 g2g1 g3g2 g4
QQuueeryryInInddeexx(S(Suupberccaasese) )
Performance Results
Query processing time speedup
F. Katsarou, N. Ntarmos, and P. Triantafillou, “Performance and Scalability of Indexed Subgraph Query Processing Methods. PVLDB, 8(12):1566-1577, 2015.
J. Wang, N. Ntarmos, and P. Triantafillou, “Towards a Subgraph/Supergraph Cached Query-Graph Index. IEEE BigData 2015.
J. Wang, N. Ntarmos, and P. Triantafillou. “Indexing Query Graphs to Speed Up Graph Query Processing. EDBT 2016.
Performance Results
iGQ speedup = ~2X – ~40X
~ 1% space increase
speedup < 1.1
~ 100% space increase
Index Size
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com