程序代写 CSIT314 Software Development Methodologies

CSIT314 Software Development Methodologies
Data-driven Software Development

Data-driven software development

Copyright By PowCoder代写 加微信 powcoder

 Two perspectives:
 Developing data-driven software products
• E.g. Many Artificial Intelligence (AI) applications are data-driven. • Also referred to as AI Engineering
 Leveraging software development data to generate insights and build tool support for business analysts, software developers, project managers, etc.:
• E.g. Help business analysts identify requirements from app reviews
• E.g. Help project managers predict delays and risks
• E.g. Help agile team estimate efforts (story points)
• E.g. Help software developers locate security vulnerabilities and bugs
• E.g. Automatically generate code comments, commit messages, test cases, etc.
• Also referred to as Software Analytics
• Many large organizations (e.g. Microsoft, Google and Facebook) deployed Software
Analytics in their software development process. 2

The traditional software development approach
Source: , Hands-On Machine Learning with Scikit-Learn and Tensorflow: Concepts, Tools, and Techniques to Build Intelligent Systems (1st ed.) 3

The traditional approach – an example
 Example: the weather problem – condition for playing a cricket game. How can we build an software app which can predict if a cricket game is going to play or not?

The data-driven approach
Source: , Hands-On Machine Learning with Scikit-Learn and Tensorflow: Concepts, Tools, and Techniques to Build Intelligent Systems (1st ed.) 5

Data-driven software systems
Supervised learning (Machine Learning)
Source: , Hands-On Machine Learning with Scikit-Learn and Tensorflow: Concepts, Tools, and Techniques to Build Intelligent Systems (1st ed.) 6

Data-driven software development Develop data-driven learning models
How do we (automatically) build a model (or an app) for predicting if a game is played?
 This model is a function which can be inferred (learned) from labelled training data (aka data driven)
• f(outlook, temperature, humidity, windy) returns true or false.
• Features/attributes: input variable, e.g. outlook, temperature, etc. • Target/dependent variable, e.g. play = yes or no
• Training set consists of training examples
 Classification = prediction.

The AI/ML approach – an example (cont.)
 Some basic learning models (learners)  Decision Trees
Source: Data Mining: Practical Machine Learning Tools and Techniques, 3rd Edition by Ian H. Witten, , . Hall 8

The AI/ML approach – an example (cont.)
Some advanced learning models (learners)  Random Forests (RF)
• An ensemble learning method
• A significant improvement of the decision tree approach
• Generating many classification trees, each of which is built with random subset of variables at each node split, and aggregates into the individual results using voting
 Neural Networks
and many other ML models.

Data-driven software development lifecycle
Model requirements:
 Identify which components of the existing (or new) product are feasible to implement with machine learning/ML, data-driven technology.
 Elicit requirements for these data-driven/ML components
 Determine what types of models (e.g. supervised vs.
unsupervised).
Amershi et. al., Software engineering for machine learning: a case study. Proceedings of the 41st International Conference on Software Engineering: Software Engineering in Practice, 2019.

Data-driven software development lifecycle
Data collection:
 Look for available datasets (e.g. internal data, public data, etc.)
or build their own datasets.
 May use a mix of datasets (e.g. pre-training on public datasets, and then (post-)training on their own dataset).
Amershi et. al., Software engineering for machine learning: a case study. Proceedings of the 41st International Conference on Software Engineering: Software Engineering in Practice, 2019.

Data-driven software development lifecycle
Data cleaning:
 Remove inaccurate or noisy data records.  Filling missing data
Amershi et. al., Software engineering for machine learning: a case study. Proceedings of the 41st International Conference on Software Engineering: Software Engineering in Practice, 2019.

Data-driven software development lifecycle
Data labelling:
 Assign ground-truth labels for each data record
 Can be done by software engineers, domain experts or crowd workers (e.g. Mechanical Turk).
Amershi et. al., Software engineering for machine learning: a case study. Proceedings of the 41st International Conference on Software Engineering: Software Engineering in Practice, 2019.

Data-driven software development lifecycle
Feature engineering:
 Extract features from data records – feature extraction
 Select informative features (e.g. remove correlated features) – feature selection
Amershi et. al., Software engineering for machine learning: a case study. Proceedings of the 41st International Conference on Software Engineering: Software Engineering in Practice, 2019.

Data-driven software development lifecycle
Model training:
 Split training vs test data
 Choose a model or a set of models (learning algorithms)  Training the chosen models
 Tuning hyper-parameters
Amershi et. al., Software engineering for machine learning: a case study. Proceedings of the 41st International Conference on Software Engineering: Software Engineering in Practice, 2019.

Data-driven software development lifecycle
Model evaluation:
 Assess the model’s performance on test data (precision, recall,
F-measure, AUC, MAE, etc.)
Amershi et. al., Software engineering for machine learning: a case study. Proceedings of the 41st International Conference on Software Engineering: Software Engineering in Practice, 2019.

Data-driven software development lifecycle
Model deployment:
 Deploy on the targeted devices
Amershi et. al., Software engineering for machine learning: a case study. Proceedings of the 41st International Conference on Software Engineering: Software Engineering in Practice, 2019.

Data-driven software development lifecycle
Model monitoring:
 Continuously monitor for performance and errors
Amershi et. al., Software engineering for machine learning: a case study. Proceedings of the 41st International Conference on Software Engineering: Software Engineering in Practice, 2019.

Data-driven software development lifecycle An Example
Model requirements:
 Data-driven support for software engineers in developing and
managing software projects (Software Analytics)
• E.g.Helpbusinessanalystsidentifyrequirementsfromappreviews
• E.g.Helpprojectmanagerspredictdelaysandrisks
• E.g.Helpagileteamestimateefforts(storypoints)
• E.g.Helpsoftwaredeveloperslocatesecurityvulnerabilitiesandbugs
• E.g.Automaticallygeneratecodecomments,commitmessages,testcases,etc.
• AlsoreferredtoasSoftwareAnalytics
• Manylargeorganizations(e.g.Microsoft,GoogleandFacebook)deployedSoftwareAnalyticsin
their software development process.

Software Engineering
 Requirements
 Implementation
Verification and Validation Maintenance and evolution Software Project Management

AI for Software Engineering (AI4SE)
Commit messages
Test cases
Issue, bug reports, Product backlog user stories
Source code
System events
Usage logs 21
App reviews

AI4SE: delay prediction in software projects
Model requirements:
Project Manager
Which of these ongoing tasks will be at risk of being delayed?

AI Engineering lifecycle – example
Model requirements:
Feature and label extraction
Project’s issue tracking system (e.g. JIRA)
Training tasks
f1, f2, f3, …, fn
f1, f2, f3, …, fn
f1, f2, f3, …, fn
f1, f2, f3, …, fn
f1, f2, f3, …, fn
f1, f2, f3, …, fn
Task features
t1 Known delay outcome
t2 (e.g. major delay, minor
t3 delay, non-delay)
New ongoing task
Predicted delay outcome
Supervised learning
Delay prediction system
f1, f2, f3, …, fn
Classifier

AI Engineering lifecycle
Data collection:
 Analyze 40,830 past tasks (i.e. issues) in 5 large software projects: Moodle, JBoss, Apache,
Duraspace, and Spring.
 All these tasks are recorded in the JIRA issue tracking system

AI Engineering lifecycle
Data cleaning:
 Remove outliers, e.g. incomplete tasks, issues/tasks long
overdue, issues with no due date, etc.

AI Engineering lifec
Data labelling:

AI Engineering lifecycle
Feature engineering:

AI Engineering lifecycle
Feature engineering:
1. Discussion time
2. Waiting time
4. Number of times that an issue is reopened
5. Priority
6. Changing of priority
7. Number of comments
8. Number of fix versions
9. Changing of fix versions
10. Number of affect versions
11. Number of issue links
12. Number of issues that are blocked by this issue
13. Number of issues that block this issue
14. Topics of an issue’s description (NLP/LDA)
15. Changing of description 16. Number of votes
17. Number of watches
18. Reporter reputation
19. Developers’ workload
20. Percentage of delayed issues that a developer involved with
21. Task dependencies (e.g. blocking, assigned to the same person, or affecting the same components)

AI Engineering lifecycle
Feature engineering:
Descriptive -penalized logistic regression model for risk probability, trained on all tasks collected from the five projects

AI Engineering lifecycle
Model training:
 Data (e.g. 40,830 past tasks) split into training set and test set
 Use a number of classifiers: Random Forests, Neural Networks, Decision Tree (C4.5), Naïve Bayes and NBTree.
 Training set is used to train these classifiers.

AI Engineering lifecycle
Model evaluation:
1.0 0.8 0.6 0.4 0.2
Random Forests
aNN C4.5 Precision Recall
Naïve Bayes F-measure
• , Dam, Truyen Tran and , Characterization and prediction of issue- related risks in software projects, Proceedings of 12th International Conference on Mining Software Repositories (MSR 2015), co-located with ICSE 2015, pages 280 – 291, IEEE (ACM SIGSOFT Distinguished Paper Award)
• , Dam, Truyen Tran and , Predicting the delay of issues with due dates in software projects, Empirical Software Engineering journal, Volume 22, Issue 3, pages 1223-1263, Springer.

AI Engineering lifecycle
 Model deployment:
 AI-powered plugin for JIRA issue
tracking system
• Recommendingstorypoints,labels, priority, type and components for each issue
• Visualizationofissuedependency
2019 CSIT321 Project – viTech Team See demo https://youtu.be/iI-3Rj-AWRs
See this article featured this project published on Atlassian developer website: https://blog.developer.atlassian.com/artificial- intelligence-for-issue-analytics-a-machine-learning-powered-jira- cloud-app/

Conceptualization of AI Engineering
33 Source: , Developing AI Systems – New challenges for Software Engineering, ICSOC 2019

Top important jobs in AI
Source: , Software Engineering for AI 34

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com