代写代考 EDBT 2016.

Scale-out Architectures 􏰺 Fundamentals:
􏰻 Datapartitioning
􏰺 Auto redistribution, balancing
􏰻 Parallel access to partitions in different servers

Copyright By PowCoder代写 加微信 powcoder

􏰺 Simple data model
􏰻 N􏰭􏰔 ma􏰵􏰇 􏰬e􏰖􏰔􏰬ic􏰔i􏰭􏰵􏰖 􏰅 ke􏰇-value like
􏰻 Val􏰂e ma􏰇be 􏰖􏰔􏰬􏰂c􏰔􏰂􏰬ed 􏰌e􏰍g􏰍􏰄 c􏰭l􏰂m􏰵􏰖 􏰅􏰆
􏰺 File system vs Table-like systems (Hbase􏰊Ca􏰖􏰖a􏰵d􏰬a􏰅􏰆

Scale-out Architectures
􏰺 Failures are norm, not exception
􏰺 Hadoop / MR framework and related systems
􏰻 Does most of the dirty work for us
􏰺 Failures, monitoring, data transfers/organizations, etc.
􏰻 With all of that done for us, we then just describe what/how we want done:
􏰺 Splits (= blocks; distributed; loaded into mappers, etc..)
􏰺 Map and reduce functions
􏰺 Output to HDFS
􏰺 Disadvantages of Hadoop / MR ?
􏰻 Input-output: distributed reads/writes 􏰉 HDFS (map output, shuffle, 􏰅􏰆
􏰻 What if we have interactive or iterative algorithms
􏰵eed 􏰔􏰭 mai􏰵􏰔ai􏰵 da􏰔a a􏰵d 􏰬e􏰖􏰂l􏰔􏰖 i􏰵 mai􏰵 mem􏰭􏰬􏰇 f􏰭􏰬 􏰖􏰂b􏰖e􏰹􏰂e􏰵􏰔 􏰕􏰬􏰭ce􏰖􏰖i􏰵g􏰅
􏰻 Intermediate results in iterative tasks
􏰻 Repeated different queries over same datasets (perhaps intermediate results).

SPOF Architectures ?
􏰺 Single Point of Failures..
􏰺 Master-workers
􏰻 Global knowledge
􏰺 What kind of knowledge is stored and how
􏰻 What does master do (orchestrate all)
􏰺 Monitorupsanddowns,resourceutilization,etc. 􏰺 Loadbalancing/assignment
􏰺 Even with master-slave designs: 􏰺 Se􏰓e􏰬alle􏰓el􏰖􏰭f􏰛ce􏰵􏰔􏰬ali􏰃a􏰔i􏰭􏰵􏰪
􏰻 E.g., MR2/YARN removes much of the central functioning and distributes it out, BigTable/HBase uses Chubby/Zookeeper for much of it, etc.
􏰺 Don􏰁t confuse the existence of master with global knowledge
􏰻 Can be maintained in a decentralised way

SPOF or NOT ? 􏰺 Decentralized
􏰻 Eg Cassandra
􏰺 Data placement: Consistent hashing 􏰻 Load Balancing
» Vs range queries
» Storage vs query load balancing 􏰻 Locating data items
􏰺 Routing: decentralized 􏰻 Finger tables
􏰻 G􏰭􏰖􏰖i􏰕i􏰵g 􏰅
􏰺 Failure handling 􏰉 decentralised chores

A system for which access types ? 􏰺 Batchand/orRandom?
􏰻 MR/Spark + HDFS
􏰺 Batch+Random
􏰺 MR/Spark on NoSQL
􏰺 Ad-hoc query random data access
􏰺 Why can NoSQLs (like Cassandra/Hbase/BigTable) do random accesses well
􏰻 Why GFS/HDFS cannot?

Fault Tolerance
􏰺 Recoverability 􏰻 (WAL) logs
􏰻 Checkpoints
􏰻 􏰛H􏰭􏰔 S􏰔a􏰵db􏰇􏰖􏰪
􏰺 RecoverabilityvsEfficiency(why?)
􏰺 Replication􏰉HighAvailability
􏰻 Replication factors
􏰻 Replication control protocol
􏰺 Efficiency vs consistency (why?)

High Write Throughput
􏰺 Avoid disk writes as much as possible 􏰻 Membufs
􏰻 Butwhataboutpersistence 􏰺 WALs
􏰻 Add: Group commits / append-only access
􏰻 Or,forgetaboutpersistence􏰉􏰔akeacha􏰵ce􏰅
􏰺 What about reads ? 􏰻 SSTables,
􏰺 Row vs column storage organisations on disk
􏰻 Compactions
􏰻 Can these help with locating relevant servers, or only once a server is ide􏰵􏰔ified􏰅
􏰺 Think of Cassandra vs HBase !

􏰛I􏰵de􏰈i􏰵g􏰪
􏰺 Which node(s) store the data I want ?
􏰺 Which disk blocks at these nodes, in particular ?
􏰺 For storage models like Hbase, BigTable, Cassandra, etc,
􏰻 Which SSTables have blocks which have my data ?
􏰻 Which blocks within SSTables ?
a 3-tier indexing structure in Hbase/BigTable 􏰅
􏰺 How does Cassandra do it?
􏰺 Advanced (query-specific) indices:
􏰻 Can I expedite specific queries by getting to the data that is part of the query answer fast ?
􏰻 Example:
􏰺 Given the web archive of AEs
􏰺 Queries: find all pages/articles whose creation was within a given time interval
􏰺 What index can you build for this using Hbase?

Consistency
􏰺 Without replication
􏰻 Atomic row ops
􏰺 Evenformulti-columnaccesses
􏰻 NO atomicity for multi-row ops
􏰻 Atomicity refers to both concurrency control and fault tolerance
􏰺 With replication
􏰻 Replication protocol (quorums, primary-secondary schemes,
􏰺 Think of
􏰻 performance implications: good vs bad. 􏰻 Fault tolerance / high availability
􏰻 Consistency levels (strong/weak) offered.
read-one-write-all􏰄 􏰅􏰆

THINK OF HOW EACH OF THESE 􏰛HORIZONTAL􏰪 ISSUES IS ADDRESSED BY EACH OF THE SYSTEMS WE DISCUSSED IN CLASS !

Moving Up The Data Management Food Chain

One (Datacentre) is None􏰅 Geo-Distributed/Multi-Datacentre Operation

These a􏰬e􏰵􏰁􏰔 􏰔he da􏰔ace􏰵􏰔􏰬e􏰖 􏰇􏰭􏰂 a􏰬e l􏰭􏰭ki􏰵g f􏰭􏰬􏰅 Computing/analytics at the Edge

But is this the way to go?

But is this the way to go?
􏰺 How large is your dataset?
􏰺 What is the largest amount of RAM you can have in a
􏰖i􏰵gle 􏰼U 􏰬ackm􏰭􏰂􏰵􏰔 􏰛c􏰭mm􏰭di􏰔􏰇􏰪 􏰖e􏰬􏰓e􏰬􏰷
􏰻 >= 64TB! (IBM Power9 server)
􏰺 How many CPUs can you have in a single server?
􏰻 >= 1536 vcores! (IBM Power9 server)
􏰺 Think CPUs, caches, memory buses, etc.; is this still a
centralised system?
􏰻 Islands of CPUs, sharing caches/buses, etc.
􏰻 More like a distributed system within a centralised system

But is this the way to go?
Storage B/W
50+ MB/s (HDD)
500+ MB/s (SSD)
Network B/W
CPU (freq)
Source: https://drive.google.com/open?id=0BwMOJChROzavdVYyRXpLY1J4a0U
Source: https://blog.westerndigital.com/cpu-bandwidth-the-worrisome-2020-trend/
Source: http://wikibon.org/w/images/b/be/ProjectionCapacityDiskNANDManagementSummary.png

Big Data Systems Summary
􏰺 A slew of “easy to use” open-source systems 􏰻 Hadoop/Mapreduce
􏰻 YARN/MapReduce2/Tez
􏰻 Spark/SparkSQL
􏰻 Hive/Pig/Impala 􏰻…
􏰺 Over data stored in
􏰻 HDFS (plain files, Parquet, ORC, …)
􏰻 NoSQL data stores (HBase, HyperTable, Cassandra, Kudu, MapR, Dynamo, RedShift, …)

Which of these is the best for you?
􏰺 Hard to tell right away
􏰺 Depends on characteristics of
􏰺 Vs,schema,cardinalities,constraints,…
􏰺 Selectivity,dataaccesspatterns,executionplan,…
􏰻 Software stack
􏰺 OSbuffercache,networkstack,scheduler,filesystem,
Hadoop/spark/… configuration 􏰻 Hardware
􏰺 CPU,RAM,disk,NIC,switches/routers,… 􏰻 Infrastructure/ops
􏰺 Multitenancy,isolation,virtualisation,… 􏰻 …

Food for thought: Sorting
􏰺 Suppose you want to totally sort a large input
􏰻 HadooporSparkthebestoption?
􏰺 SparkusuallyseveraltimesfasterthanHadoop… 􏰺 …butwait!
128 physical
128 physical
Virtualised 1 1 Gbps
Elapsed Time
Hadoop MR record
50400 physical
6592 virtualised
Dedicated 10 10 Gbps
Elapsed Time
(1) https://databricks.com/blog/2014/10/10/spark-petabyte-sort.html
(2) J. Shi et al. “Clash of the Titans: MapReduce vs Spark for Large Scale Data Analytics”. PVLDB, 8(13), 2015.

The􏰬e a􏰵d back agai􏰵􏰅
􏰺 􏰛Big da􏰔a 􏰔􏰭􏰭l􏰖􏰪 􏰓e􏰬􏰇 􏰂􏰖ef􏰂l
􏰻 Fault tolerant + scalable processing of tasks requiring (semi-)complete scans
􏰻 Fa􏰂l􏰔 􏰔􏰭le􏰬a􏰵􏰔 􏰰 􏰖calable 􏰖􏰔􏰭􏰬age 􏰛f􏰭􏰬 f􏰬ee􏰪
􏰺 But come with design decisions that can hurt performance 􏰻 Fairly rigid map-reduce style computation
􏰻 Large units of storage to improve throughput
􏰻 Optimised for scan-everything tasks
Highly selective tasks suffer
􏰺 Solution:
􏰻 Distributed index building + storage 􏰻 Coordinator-based query processing

Part I: Needle in a (big data) haystack

Needle in a haystack
t1 t2 t3 t4
Relevant Data

Case study 1: Top-k Join Queries

Top-k Join Queries
SELECT select-list FROM R1, R2􏰄 􏰅􏰄 Rn WHERE join-expression(R1, R2􏰄 􏰅􏰄 Rn) ORDER BY f(R1, R2􏰄 􏰅􏰄 Rn)
􏰺 Scoring of individual tuples of Ri based on either 􏰻 an explicit score attribute (e.g., movie viewer rating,
PageRank score, …) or
􏰻 a score function on a subset of the tuple attribute values
􏰺 Score of join result tuples computed using a monotonic aggregate function f() on individual tuple scores
N. Ntarmos, I. Patlakas, and P. Triantafillou. “Rank join queries in NoSQL databases.􏰦 PVLDB, 7(7):493–504, 2014.

Inverted Score List Top-k Join
SELECT * FROM T1, T2 WHERE T1.A=T2.A ORDER BY T1.Score+T2.Score STOP AFTER 1
Match1: A = 17, Score = 183 Match2: A = 12, Score = 180
Max future join score = max(95+85, 90+93) = 183
No unprocessed tuple can be in the result set and have a higher score than Match1

N􏰭􏰔 􏰔he􏰬e 􏰇e􏰔􏰅
􏰺 With textbook Fagin/Ilyas-style approach, tuples may still be read from disk and transferred over the network, even though they may never contribute to the join result
􏰺 Goal: only the tuples needed for the final result set should be read and transferred

The BFHM structure
Histogram on score values
103…1 021…1
010…1 100…2
􏰺 Equi-widthhistogramonscorevalues
􏰺 Eachbucketstores:
􏰻 The min and max score for its existing tuples
􏰻 A Bloom filter (*) of tuple join attribute values
(*) Actual data structure is a Golomb-compressed hybrid of a 1-hash Plain BF, plus a hashtable of non-􏰃e􏰬􏰭 C􏰭􏰂􏰵􏰔i􏰵g BF c􏰭􏰂􏰵􏰔e􏰬􏰖􏰅
Bloom filters on join values

BFHM example
BFHM for T1
0.8-0.9 0.7-0.8
BFHM for T2
0.6-0.7 0.5-0.6

Result estimation using BFHMs 􏰺 Idea:
􏰻 C􏰭m􏰕􏰂􏰔e a 􏰛bi􏰔-􏰋i􏰖e 􏰕􏰬􏰭d􏰂c􏰔􏰪 of BFHM buckets in pairs
􏰺 If the product is >0 then there are possible result(s)
􏰻 Produce a list of estimated results sorted on their attainable scores
0.9- 0.8- 1 0.9
0.7- 0.6- 0.5- 0.8 0.7 0.6
0.9- 0.6- 1 0.7
0.5- 0.4- 0.3- 0.6 0.5 0.4

# of results

Result estimation using BFHMs (cont.)
Top-5 query
Top-5 estimated result tuples
Tuples that could be in top-k if our estimations were off (i.e. those with max score larger than the min score of the k􏰁􏰔h result in the top-k list)

# of results

Performance Results
LineItem-Part
LineItem-Orders

Case study 2: k-Nearest-Neighbours

k-Nearest-Neighbours
SELECT * FROM R ORDER BY distance(q) LIMIT k
􏰺 Databaseofmulti-dimensionaldatapoints(plus properties)
􏰺 User provides (multi-dimensional) query point q
􏰺 ScoringofindividualtuplesofRbasedondistance
A. Cahsai,N.Ntarmos,C.Anagnostopoulos,andP.Triantafillou,“Scalingk-NearestNeighborsQueries(Therightway),􏰦 in Proceedings of the 37th IEEE International Conference on Distributed Computing Systems (ICDCS), 2017.
A. Cahsai, C. S. Anagnostopoulos, N. Ntarmos, and P. Triantafillou, “Revisiting Exact kNN Query Processing with
Probabilistic Data Space Transformations,􏰦 in Proceedings of the IEEE International Conference on Big Data (BigData)
( Paper Award), 2018.

Typical Workflow

Two-pronged Solution
􏰺 COWI: Coordinator With In-memory Index
􏰻 Quad-tree based index
􏰻 Sample + compute size of leaves/internal nodes to fit in main memory
􏰻 Distributedindexbuilding+storageonDFS
􏰺 Data partitioning
􏰺 Quad-tree building
􏰻 L􏰭ad 􏰹􏰂ad 􏰔􏰬ee i􏰵 c􏰭􏰭􏰬di􏰵a􏰔􏰭􏰬􏰁􏰖 mem􏰭􏰬􏰇 􏰰
answer queries
􏰺 CONI: Coordinator with No In-memory Index
􏰻 Build COWI + store in DKVS
􏰻 Build balanced Quad Tree over COWI leaf
􏰻 L􏰭ad􏰭􏰵l􏰇la􏰔􏰔e􏰬􏰹􏰂ad􏰔􏰬eei􏰵c􏰭􏰭􏰬di􏰵a􏰔􏰭􏰬􏰁􏰖 memory + answer queries

Performance Results

􏰺 If data points are uniformly distributed, a Quad Tree is an overkill
Optimal: uniform grid
􏰺 Decide cell size based on items per cell
􏰺 Partition + store cell items together
􏰺 Only need to know cell size (along each dimension) + mapping function to cell ID
􏰺 Alas, they are not
􏰻 S􏰭 le􏰔􏰁􏰖 make 􏰔hem be􏰸

STOS: Probabilistic Space Transformation
1. Finite Gaussian Mixture Models to break data distribution into set of Gaussians
2. De-correlate per cluster RVsRemove statistical dependencies among RVs
3. Transform independent random RVs to
joint uniform distribution
􏰻 Skla􏰬􏰁􏰖 theorem: any multivariate distribution can be written in terms of a univariate uniform distribution of each RV and a copula that captures the dependence between the RVs
􏰻 In independence copula, marginal and joint distributions of RVs are standard uniform
4. Use uniform grid to index the latter; store grid dimensions + reverse transformation function

Performance Results
Index Building Time Index Loading Time
In-memory Index Size
Query Response Time
#rows accessed per query

Graphs are everywhere
Computer networks
Source: https://commons.wikimedia.org/wiki/ File:Internet_map_4096.png
Source: https://pixabay.com/get/ 57e7d2404350ae14f6da8c7dda3536781538d6e25b55794f_1280.jpg
Source: https://arxiv.org/pdf/2003.00911v1.pdf
Social networks
Source: https://commons.wikimedia.org/wiki/ File:Social_Network_Analysis_Visualization.png
Source:https://commons.wikimedia.org/wiki/ File:Relationship_graph_of_Panama_Papers.png
Neural networks/AI
Commerce/Recommendations

Graphs are hard
􏰺 Expressiveness at a cost
􏰻 Complex queries
􏰺 Traversals (e.g., BFS, DFS, 􏰘)
􏰺 Routing(e.g.,A*search,SSSP,APSP,􏰘)
􏰺 Net􏰟ork anal􏰫sis (e.g., centralit􏰫, PageRank, 􏰘)
􏰺 Graph pattern matching (e.g., subgraph isomorphism, appro􏰽imate pattern matching, 􏰘)
􏰺 Unique access patterns
􏰻 Large number of data accesses 􏰻 Low spatial locality
􏰺 Often restrictive application domains/use cases

Case study 3: Subgraph Matching

Example: Subgraph/supergraph queries
subgraph queries
supergraph queries
graph dataset
graph dataset

Bruteforce Approach
g1 g2 g3 g4 g5 g6 g7
Subgraph isomorphism test
NP complete
􏰅 􏰋o􏰬k􏰖􏰄 b􏰂􏰔 highl􏰇 inefficien􏰔 􏰅
q g1 q g2 q g3 q g4
q g5 q g6 q g7

Index-based processing (Method M)
g1 g2 g3 g4 g5
q g1 isomorphism q g2
Candidate Set
Subgraph test
Dataset Index
q g3 q g4 q g5
F. Katsarou, N. Ntarmos, and P. Triantafillou, “Performance and Scalability of Indexed Subgraph Query Processing Methods,􏰦 Proceedings of the VLDB Endowment (PVLDB), vol. 8, no. 12, pp. 1566–1577, 2015.
F. Katsarou, N. Ntarmos, and P. Triantafillou, “Hybrid Algorithms for Subgraph Pattern Queries in Graph Databases,􏰦 in Proceedings of the IEEE International Conference on Big Data (BigData), 2017.
F. Katsarou, N. Ntarmos, and P. Triantafillou, “Subgraph Querying with Parallel Use of Query Rewritings and Alternative Algorithms,􏰦 in Proceedings of the 20th International Conference on Extending Database Technology (EDBT), 2017.

􏰅 mean􏰋hile􏰄 in 􏰔he real 􏰋orld 􏰍􏰍􏰍
worldwide friendship network friendship network in Europe
friendship network in UK
he􏰖e 􏰖􏰂bg􏰬aph􏰊supergraph
proteins of multi-cell organism, e.g., plants, anim friendship network rine among queries!
Le􏰔􏰁􏰖 ha􏰬ne􏰖􏰖 􏰔
friendship network in Glasgow
protein mixtures
amino acids
proteins of uni-cell bacteria

Enter the GraphCache

Enter the GraphCache
Candidate Set (CS) {g1, g2, g3, g4, g5}
Bruteforce approach
YES TBD NO
g1 g2 g3 g4 g5
g1 g2 g3 g4 g5
g1 g2 g3 g4 g5 g6 g7
J. Wang, N. Ntarmos, and P. Triantafillou, “Indexing Query Graphs to Speed Up Graph Query Processing,􏰦 in Proc. EDBT, 2016.
J. Wang, N. Ntarmos, and P. Triantafillou, “GraphCache: A Caching System for Graph Queries,􏰦 in Proc. EDBT, 2017.
J. Wang, N. Ntarmos, and P. Triantafillou, “Ensuring Consistency in Graph Cache for Graph-Pattern Queries,􏰦 in Proc. GraphQ, 2017. J. Wang, Z. Liu, S. Ma, N. Ntarmos, and P. Triantafillou, “GC: A Graph Caching System for Subgraph/Supergraph Queries,􏰦 PVLDB, 11(12):2022–2025, 2018.

GraphCache
Query Index
q1 q2 q3 q4 q5 q6 q7
Answer sets
{g1, g2, g3, g4, g7}

Sub case: q is a subgraph of a previous query
Dataset Index
g1 g2 g3 g4 g5
g1 g2 g3 g4 g5
Why remove g1, g2 from CS ?
q1 A(q4q):2{g1, g2}q3 q4 q4
(answer set)
Query Index

Sub case (cont.): q is a subgraph of multiple previous queries
A(q2) A(q4) A(q6) YES
A(q2) A(q4) A(q6) Answer sets q1 q2 q3 q4 q5 q6 q7
Query Index

Sub case (cont.): … a special case
Answer set of q (against dataset) = A(q2)
A(q2) q q2
q isomorphic to q2 􏰌􏰛􏰖ame􏰪 query found)
q1 q2 q3 q4 q5 q6 q7
Query Index

Super case: q is a supergraph of a previous query
Dataset Index
Query Index
1 q2 q3 {g1, g2, g3, g4, g7}
(answer set)
q7 from CS ?
Why remove g5

Super case (cont.): q is a supergraph of multiple previous queries
CS 􏰉 {CS A(q5) A(q7)} Not in
A(q5) A(q7) A(q5) A(q7)
Answer sets A(q5) A(q7) q1 q2 q3 q4 q5 q6 q7
Query Index

Super case (const): 􏰅 a 􏰖pecial case
No dataset graph could contain q
No dataset graph contains q7
A(q7) = { } q7
q1 q2 q3 q4 q5 q6 q7
Query Index

P􏰂􏰔􏰔ing i􏰔 all 􏰔oge􏰔her􏰅
Dataset Index
g1 g2 g3 g4 g5
Super case
g1 g2g1 g3g2 g4
QQuueeryryInInddeexx(S(Suupberccaasese) )

Performance Results
Query processing time speedup
F. Katsarou, N. Ntarmos, and P. Triantafillou, “Performance and Scalability of Indexed Subgraph Query Processing Methods.􏰦 PVLDB, 8(12):1566-1577, 2015.
J. Wang, N. Ntarmos, and P. Triantafillou, “Towards a Subgraph/Supergraph Cached Query-Graph Index.􏰦 IEEE BigData 2015.
J. Wang, N. Ntarmos, and P. Triantafillou. “Indexing Query Graphs to Speed Up Graph Query Processing.􏰦 EDBT 2016.

Performance Results
iGQ speedup = ~2X – ~40X
~ 1% space increase
speedup < 1.1 ~ 100% space increase Index Size 程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com