CS代考 Declarative Languages

Declarative Languages
• Map-Reduce framework hides scheduling and parallelization details
• Limited query expressiveness
– Complex queries difficult to write

• Declarative languages on top of map-reduce
– Pig Latin (Yahoo!)
• Like relational algebra • Open source
– HiveQL (Facebook) • SQL like language
• Open source
– SQL / Tenzing
• proprietary

Pig (Latin)
• Pig Latin
– A higher SQL like language to run complex queries that
require several map-reduce jobs • Pig
– An execution engine
• Translates Pig Latin programs into graphs of map-reduce jobs • Executes them on top of Hadoop
• An Apache open source project

Example ( , Yahoo)
❑ users(name, age), pagelog(url, uname)
❑ Find the top 5 most popular pages for users aged 18-25
SELECT url,count(*)asclicks
FROM users U, pagelog P WHERE U.name = P.uname AND U.age >= 18
AND U.age <= 25 GROUP BY URL ORDER BY clicks desc LIMIT 5 -- FETCH FIRST 5 ROWS ONLY Map reduce pogram: 170 lines of code In Pig Latin files or DFS Users = load ‘users’ as (name,age); Fltrd = filter Users by age >= 18 and age <= 25; Pages = load ‘pages’ as (uname, url); Jnd = join Fltrd by name, Pages by uname; Grpd = group Jnd by url; Smmd = foreach Grpd generate ($0), COUNT($1) as clicks; Srtd = order Smmd by clicks desc; Top5 = limit Srtd 5; store Top5 into ‘top5sites’ Very similar to Relational Algebra Pig Latin vs. RA Users = load ‘users’ as (name,age); = filter Users by age >= 18 and age <= 25; Pages = load ‘pages’ as (uname, url); Jnd = join Fltrd by name, Pages by uname; Grpd = group Jnd by url; Smmd = foreach Grpd generate ($0), COUNT($1) as clicks; Srtd = order Smmd by clicks desc; Top5 = limit Srtd 5; store Top5 into ‘top5sites’ Order by click desc Count as click Grouping on url Join R1.name = P.uname Selection !(18 <= age < 25) Load and Store – Users = load ‘users’ as (name,age); – Load: read information from file into a (temp) relation • Mostly user defined to translate file format into a relational format – Store: write relation into file • Again, usually provided by user Pig Latin Operators • Fltrd=filterUsersbyage>=18andage<=25; – Left side: new intermediate relation – Right side operation on existing relations • Operators – Selection • Res = filter R1 by: • SELECT * FROM R1 WHERE ... • Byage>=18;byurlmatches‘*oracle*’
• Res=joinR1bya1,R2bya2:
• SELECT * FROM R1, R2 WHERE R1.a1 = R2.a2
– Order by
• Res = order R1 by a1 desc
• SELECT * FROM R1 order by a1 desc

• Given relation Rel(A, B, C) with three tuples
(a1, b1, c1) (a1, b2, c2) (a3, b3, c3)
• Grpd = group Rel by A;
• Result relation is Grpd(group, Rel)
– Attribute ‘group’ has the same type as attribute A of Rel
– Attribute ‘Rel’ is a multiset (bag)
– In the given example, Grpd has two tuples, one for each value of A; first attribute of the tuple is the value of A, the second is the set of
all tuples of Rel that have this particular value of A
– dumpGrpd:
• (a1, {(a1,b1,c1), (a1,b2,c2)})
• (a3, {(a3, b3, c3)}

• Assume same as before
– Grpd=groupRelbyA;
– ResultrelationisGrpd(group,Rel):
• (a1,{(a1,b1,c1),(a1,b2,c2)})
• (a3,{(a3,b3,c3)})
• For each (two options)
– Smmd=foreachGrpdgenerate($0),COUNT($1)asc;
– Smmd=foreachGrpdgenerategroup,COUNT(Rel)asc;
• Result relation is Smmd(group, c)
– Attribute ‘group’ has the same type as attribute A of Rel – Attribute c a long
– Dump Smmd:
• (a1,2L) • (a3,1L)

Projection and others
• Projection
– Assume R1(A, B, C)
– Rel = for each R1 generate A, B;
• Co-Group
– Group-by over more than one relation
– Related to joins
• Join results in a flat result
• Co-group results in nested result

Data Model and Flattening
• Supported types:
– Atomic (string, number…)
– Tuple (58,‘lilly’, 10, 10)
– Multiset {(58,‘lilly’, 10, 10), (33,‘debby’, 5, 7)} – Further nesting possible: (1, (2,3))
– Maps (advanced)
• Flattening example
– Assume R = {(1, (2,3))}
– Res = foreach R generate $0, flatten($1) – Res = {(1, 2, 3)}
– Semantics somewhat obscure…
• Sometimes output type not quite clear: try to flatten…

Implementation
• Parser and Query Generator:
– Everything between load and store translates into
one logical plan • Logical plan:
– Graph of Hadoop map-reduce jobs
• All statements between two (co)groupsà
one Map-reduce job

Map Reduce Assignment
• Use Pig Latin on top of Hadoop to process (not so) large data set
• We have set up a hadoop cluster with 4 nodes (virtual)
• You have access to that cluster
• You have to write PigLatin Queries
• You have to observe the execution
• Instructions will be on myCourses

程序代写 CS代考加微信: powcoder QQ: 1823890830 Email: powcoder@163.com

Related Posts