程序代做 Research in Distributed Systems Dr Tawfiq Islam

Research in Distributed Systems Dr Tawfiq Islam
Associate Lecturer
School of Computing and Information Systems (CIS) The University of Melbourne, Australia

Research Experience
● Net Neutrality (MS): network protocols, protocol blocking, content shaping
● Cloud and Big Data (PhD): optimization, performance modelling, resource allocation, job scheduling, reinforcement learning
● Software Defined Networks (RA): intent-driven resilient tactical battlefield networks
● Stream Computing (Post Doc): real-time social media data analytics, in-memory caching databases
Islam – Google Scholar

Big Data Job Scheduling on Cloud
• Objectives
– Scheduling Big Data Applications in a cloud-deployed cluster, while reducing the cost of VM usages of the whole cluster, prioritize critical/deadline-constrained applications
– Scheduling Big Data Applications in a hybrid cluster composed of local and Cloud VMs, leverage pricing models to reduce cost, provide deadline guarantee

Big Data Job Scheduling on Cloud
• Limitations of Existing approaches
– Homogeneous VM assumption leads to resource wastage
– Performance-aware, but not Cost-efficient
– No separation between normal and time-critical jobs
– Multiple executors cannot be placed in the same VM
– Does not consider pricing model of different VM instance types, and cost efficiency in a hybrid setup
• Research Contributions:
– Four Job Scheduling Algorithms which prioritize critical jobs and tightly pack jobs in fewer VMs to reduce cost
– Real implementation of a job scheduling framework on top of Apache Mesos Cluster Manager. Can be extended to add new policies.
– RM_Simulator: event-based simulator for simulating scheduling policies for big data applications
– Experiments on Apache Spark Jobs 4

Problem Formulation (Cloud-based Cluster)
Example scheduling scenarios

Proposed Algorithms
• Solution Approach (cloud-based cluster):
– Best-Fit-Heuristic (BFD): Unifies resource dimensions (CPU, Memory), finds a placement of a job which is cost-effective, and reduces unused resources
– Integer Linear Programming (ILP): Tight packing of jobs with cost-minimization objective
• Solution Approach (hybrid cluster):
– First Fit Heuristic (FF): Use local, then Cloud
– Greedy Iterative Optimization (GIO): Relaxes the problem from per-job to per-executor basis, uses the pricing model of VMs and job profile information to find the cheapest placement for each executor

System Implementation

RL-based Job Scheduling
• Limitations of Existing approaches
– Cannot learn cluster or application characteristics for efficient optimization of objective
– Need to be tuned for different scenarios
• Research Contributions:
– RL Model for the job scheduling problem
– Reward formulation (encoding of multiple objectives)
– RL environment implementation for a Cloud-deployed cluster
– DRL agents (DQN and REINFORCE) to learn inherent characteristics
• SolutionApproach:
– Set expected balance between cost-optimized and time-optimized objective
– DRL agents learn to schedule and optimize objectives entirely by continuous interaction with the cluster simulation environment

‒ Agent observation is made from job requirements and cluster resource details
‒ Agent takes an action
‒ Receives a reward and observes another state
‒ Learns through interaction with the environment
‒ Agent has no prior knowledge of job arrival, job type, resource constraints, objectives
‒ Maximizing expected reward = optimizing target objectives
‒ Built and trained on TensorFlow Agents framework.

Performance Evaluation
− Trade-offs between multiple objectives

Multi-level Caching Architecture for Stateful Stream Computation

Intent-based Framework for Vehicular Edge Computing

Questions?
Islam – Google Scholar
For any queries:

程序代写 CS代考加微信: powcoder QQ: 1823890830 Email: powcoder@163.com

Related Posts