NWEN303 Concurrent Programming
2 A conceptual model for concurrency
Marco Servetto VUW
●
A conceptual model for Concurrency
In a non concurrent model of computation operations are executed one after the other in order. An operation can start only after the former operation ended.
You can visualize this model imagining yourself doing a task alone. We often talk in this way:
Here I call this function, I take the result and I transform it in this way, then I will check if ….
Divide and conquer (as for merge-sort)
●
–
●
–
If the data is a single element then it is sorted, otherwise I take the data and I split it in two roughly equal halves, I recursively sort the first half, then I recursively sort the second half, and finally I merge the two sorted halves into a single sorted result.
●
●
A conceptual model for Concurrency
In a concurrent model of computation some operations may be executed simultaneously.
This model is harder to visualize. A common metaphor is the ‘temporary work agency’: if more then one task can be performed at the same time, we call someone to help.
This model allows to generalize Divide and conquer into Fork-Join:
Fork-Join (applied to merge-sort)
●
●
–
If the data is a single element then it is sorted, otherwise I take the data and I split it in two roughly equal halves, then I delegate to another person to recursively sort the first half; while they work, I recursively sort the second half, then I collect their result, and finally I merge the two sorted halves into a single sorted result.
●
A conceptual model for Concurrency
If the various people involved in the computation have to discuss about the computation and share partial results, everything can get very complicated very fast.
Minimizing those communications and deeply understand the points where communication need to happens is the key to understand concurrent programs.
When possible, no communication is the best solution. Moreover, all mainstream languages offers libraries to make this simple case syntactically convenient.
●
●
Java parallelStream: Easy mode
public static boolean isPrime(int number){/*..*/}
public static void main(String[] arg) { List
List
System.out.println(nums.size()+” “+primes.size()); }
Just write .parallelStream() instead of .stream()!
Works well if the operations are fully functional! (more info later) No checked exceptions
Unchecked exceptions “sort-of” propagated
●
●
●
●
. .
Computation do not progress until all workers have concluded their task. (Synchronization point)
.
Join
Visualizing Fork-Join
. . . .
. . . .
. . . . . . . . .
Fork
….
. . . .
Nested Fork-Join
Individual fork-joins are unaware of synchronization of other forks.
Computation in separate branches can progress even when some branch is waiting for the Join point.
Exercise
● Consider the following sequential code
a=doA()//takes 1 min b=doB()//takes 10 min c=doC()//takes 2 min d=doD()//takes 5 min ab=doAB(a,b)//takes 1 min cd=doCD(c,d)//takes 10 min res=doAll(ab,cd)//takes 1 min
● How much it takes to execute this code sequentially?//30
● How much it takes to fork on a,b,c,d, then to fork on ab,cd and then to
do res?//21
● Is there a better way?//16
Exercise: The Java encoding
public static String sequential() { String a=doA();
String b=doB();
String c=doC();
String d=doD(); String ab=doAB(a,b); String cd=doCD(c,d); return doAll(ab,cd);
}
A:1
B:10
C:2
D:5
AB(a,b):1 CD(c,d):10
res(ab,cd):1
Total: 30 minutes
Exercise: The Java encoding
@SafeVarargs
public static
.map(f->f.get()).collect(Collectors.toList()); }
Simple helper function.
Yes, you can/should use functions to encode higher order concepts instead of just writing the full stream expression all the times.
Limited: all the suppliers return the same type (and do not throw checked exceptions)
Soon we see (Completable)Futures, that will give us more flexibility
● ●
●
●
Exercise: The Java encoding
@SafeVarargs
public static
.map(f->f.get()).collect(Collectors.toList()); }
public static String parallel1() { List
()->doA(),()->doB(),()->doC(),()->doD()); String a=abcd.get(0);
String b=abcd.get(1);
String c=abcd.get(2);
String d=abcd.get(3); List
}
Exercise: The Java encoding
@SafeVarargs
public static
.map(f->f.get()).collect(Collectors.toList()); }
public static String parallel1() { List
()->doA(),()->doB(),()->doC(),()->doD()); String a=abcd.get(0);
String b=abcd.get(1);
String c=abcd.get(2);
String d=abcd.get(3); List
}
Exercise: The Java encoding
@SafeVarargs
public static
.map(f->f.get()).collect(Collectors.toList()); }
public static String parallel1() { List
()->doA(),()->doB(),()->doC(),()->doD()); String a=abcd.get(0);
String b=abcd.get(1);
String c=abcd.get(2);
String d=abcd.get(3); List
}
Exercise: The Java encoding
@SafeVarargs
public static
.map(f->f.get()).collect(Collectors.toList()); }
public static String parallel1() { List
()->doA(),()->doB(),()->doC(),()->doD()); String a=abcd.get(0);
String b=abcd.get(1);
String c=abcd.get(2);
String d=abcd.get(3);
List
}
Exercise: The Java encoding
@SafeVarargs
public static
.map(f->f.get()).collect(Collectors.toList()); }
public static String parallel1() { List
()->doA(),()->doB(),()->doC(),()->doD()); String a=abcd.get(0);
String b=abcd.get(1);
String c=abcd.get(2);
String d=abcd.get(3); List
}
Fork
Join Fork
Join
A:1
B:10
AB(a,b):1
C:2
D:5
Visual representation Start
res(ab,cd):1
CD(c,d):10
Total: 21 minutes
Exercise: The Java encoding
@SafeVarargs
public static
.map(f->f.get()).collect(Collectors.toList()); }
public static String parallel2() { List
()->{ List
},
()->{ List
});
return doAll(res.get(0),res.get(1)); }
Exercise: The Java encoding
@SafeVarargs
public static
.map(f->f.get()).collect(Collectors.toList()); }
public static String parallel2() { List
()->{ List
},
()->{ List
});
return doAll(res.get(0),res.get(1)); }
Exercise: The Java encoding
@SafeVarargs
public static
.map(f->f.get()).collect(Collectors.toList()); }
public static String parallel2() { List
()->{ List
},
()->{ List
});
return doAll(res.get(0),res.get(1)); }
Exercise: The Java encoding
@SafeVarargs
public static
.map(f->f.get()).collect(Collectors.toList()); }
public static String parallel2() { List
()->{ List
},
()->{ List
});
return doAll(res.get(0),res.get(1)); }
Fork Fork
Join
Join
A:1
B:10
AB(a,b):1
Start Fork
Join
C:2
D:5
Visual representation
res(ab,cd):1
CD(c,d):10
Total: 16 minutes Can we do better? same speed less splits?
Exercise: The Java encoding
@SafeVarargs
public static
.map(f->f.get()).collect(Collectors.toList()); }
public static String parallel3() { List
()->{
String a=doA(); String b=doB(); return doAB(a,b); },
()->{ List
});
return doAll(res.get(0),res.get(1)); }
Join
CD(c,d):10
Total: still 16 minutes
However, this version is possible only because we KNOW in advance the time of all the tasks. Assuming such knowledge is often unrealistic, thus the former solution is often preferred.
Fork
Visual representation
Start Fork
C:2
Join
A:1 B:10
AB(a,b):1
D:5
res(ab,cd):1
● ●
Libraries
Libraries make programming concepts and patterns easy to use.
There is no such thing as a “non library”. In all languages even using Threads directly, is just about using a library that may (or may not) directly map on operative system threads. The only important point is the Library public interface.
Learning libraries do not commit you to a specific language: Common concepts are likely to be represented as High-Level libraries in all languages you will program into. Knowing those concepts will help you to find the right library in the specific language, and already knowing one example of such library will help you to adapt to the new environment.
●
●
Let’s debunk this statement!
Libraries
Some think that using many libraries is not the best, since:
they need to invest too much time into learning all those library specific API,
the library may have bugs,
most likely the result is going to be slower than their hand-optimized version relying only on primitive constructs.
–
– –
●
Library specific API: (Most of) The library API captures the problem domain and gives developers a common language to discuss problems and solutions together.
Moreover, different libraries in different languages tend to offer similar API if they are modeling the same High-Level concepts.
Thus, time spent learning a Library API is time well spent if you need to solve the same kind of problems of that library.
–
Libraries
●
– –
●
Library code is tested on a range of different application and thus enjoy a dept and a variety of testing that user code simply can not obtain.
Libraries
The library may have bugs and so is your code, but:
your code is debugged by just you,
and only in the context of your application.
●
The result is going to be slower:
the library need to be more general, to support good performance for a range of applications and machines.
The performance of a library using application will be more consistent over different machines and over the maintenance history of the application.
–
Libraries
specific code version
The day I fully hand optimized my code!
Today 🙁
specific code version
The day I converted my code to use libraries
Today
Performance curve
●
Even if you your code will only be executed on the same machine, your code evolves over time. Your hand optimizations probably leveraged on some details of the current version of the codebase.
The library code have to balance efficiency for a whole range of programs inside the problem domain. Thus is much more likely that the performance of your codebase will stay stable instead of regularly degrade.
●
efficiency efficiency
Recap
● Concurrency: Multiple things happening at once.
● Fork-Join is a simple concurrent model that generalize Divide and Conquer.
● .parallelStream() is a simple solution to use Fork-Join in a Java program. Try to use more primitive approaches only if really needed.
● Fork-Join can be nested/recursive, as for Divide and Conquer; but here the specific division of tasks can influence performance.
● We have seen parallel streams as an example of library. We will see more libraries in the rest of the course.
● Using many libraries helps with writing correct and efficient code that can be easier to maintain since the API of the library most likely encodes common concepts in the problem domain.