CHEN40770: Project Description
CHEN40770: Data Science For Biopharmaceutical Manufacturing 04/11/2020
Contents
1 Introduction
1
1.1 Overview ……………………………………….. 1 1.2 Thetestdataset ……………………………………. 2 1.3 ProblemStatment …………………………………… 2 1.4 ProjectRequirements …………………………………. 2 1.5 Submission ………………………………………. 2
2 Marking scheme 2
2.1 Section1:BestPracticecoding…………………………….. 2 2.2 Section2:Dashboarddesign………………………………. 3 2.3 Section3:DashboardFunctionality…………………………… 3 2.4 Section4:Datavisualisation ……………………………… 3 2.5 Section5:Useofstatistics……………………………….. 4 2.6 Section6:Innovation………………………………….. 4
1 Introduction 1.1 Overview
In recent years biopharmaceutical manufacturing organisations have strived to maximise the knowledge gained from their data in order to increase the efficency and reliability of their processes. CHEN Biophar- maceuticals is company that manufactures Penicllian at industrial scale. The company are seeking to utilise the information collected from the bioreactor and associated PAT systems (e.g. Raman probe) in a more effective manner and make those data and statistical analyses including multivariate data analysis (MVDA) availiable to employees in various roles across the company.
You have recently accepted a new role as the company’s first data scientist and have responsibility for the design and implementation of a prototype data analytics dashboard for presentation to upper managment. The Manufacturing Sciences and Techology (MSAT) and IT and departments in the company have identified and collated a suitable test dataset in order to build the dashboard. The data has already been aggregated from different source systems, preprocessed and contextualised and is now ready for further analysis.
1
1.2 The test dataset
These data can be loaded in R Studio Cloud using the following command:
library(chen40770data1)
1.3 Problem Statment
Develop a data dashboard using R and R Shiny to enable the MSAT team (none of whom have knowledge of R) to interactively explore process trends as well as conduct MVDA of the 100 fermentation batches.
1.4 Project Requirements
• All projects must adhere to the following rules:
– Dashboards must be built using R Shiny
– The tidyverse package must used where appropiate i.e. dplyr and ggplot. Using other
approaches may affect coding practice and efficency (and therefore your marks for the project). 1.5 Submission
Projects must be submitted through R Studio Cloud although if you wish you may also host your dashboard on Shiny.apps.io.
Deadline: 14th December 2020 at 5pm.
2 Marking scheme
This continous assessement element contribues 60% of your total mark for the CHEN40770 Data Science For Biopharmaceutical Manufacturing module.
The following 6 sections detail the marking scheme. Each section is worth 10 Marks.
2.1 Section 1: Best Practice coding
Total Marks (10 Marks)
When writing programmes in R and other programming languages it is essential to utilise and efficenct, readable and well documented code to enable the code to be maintained, updated and debugged.
• For this project you must consider the following aspects when designing your R Shiny App.
– That your code follows the Tidverse Style Guide. You can find the tidverse style guide by following this link. (3 Marks)
– Is efficent in that you try to achieve the desired result in the fewest lines of code possible. (3 Marks)
– Is appropiately commented to enable someone unfamiliar with the code to understand how the code works. (4 Marks)
2
2.2 Section 2: Dashboard design
Total Marks (10 Marks)
During the development a Shiny App it is important to consider the layout and organisation of the user interface (UI) to maximise the usability of the platform.
• For this project you must consider the following aspects when designing your R Shiny App:
– Optimising the layout of the application including the overall layout, panels, tabs and menus. (3 Marks)
– The position and organisation of components, plots and tables. (3 Marks)
– Clearly indicating the role (i.e. what each component shows or does) of the App components and their functionality on the dashboard (e.g. sliderInputs, radioButtons, actionButton) (2
Mark)
– Color schemes used in the dashboard. (2 Marks)
2.3 Section 3: Dashboard Functionality
Total Marks (10 Marks)
As you build your R Shiny app it is essential to consider the end-user of your system. For this project you are designing a data analysis dashboard for the MSAT team. To be successful you must put your self in the position of a MSAT team-member (i.e. a scientist or engineer). You should ask yourself for example: what information would MSAT want to see? What analyses would members of the team want to carry out (e.g. comparison of control strategies, correlation of process variables to productivity, root cause analysis etc.).
• For this project you must consider the following aspects when designing your Shiny App:
– Selecting data of analysis or visualisation via a categorical variable. (3 Marks)
– Selecting data of analysis or visualisation via a numerical value. (3 Marks)
– To display selected data in the R Shiny app in one or more tables. (2 Marks)
– To display selected data in R Shiny app in one more plots on the dashboard. (2 Marks)
2.4 Section 4: Data visualisation
Total Marks (10 Marks)
The ability to display data interactively using the R Shiny is key objective of this project.
• Using the ggplot2 package you must:
– Enable the user to display one or more process variable against culture time for selected data or groups using appropiate plot types. (2 Marks)
– Display the Raman data for selected data or groups within the dashboard using appropiate plot types. (2 Marks)
– Visualise the output of MVDA such as principal component analysis using appropiate plot types. (2 Marks)
– Ensure that all plots have an appropiate title as well as axis labels with units. (2 Marks)
– Where suitable display ranges or standard deviation or statistics such as p-values on the plot. (2
Marks)
3
2.5 Section 5: Use of statistics
Total Marks (10 Marks)
The appropiate use of statistics to analyse data is an essential component of any data analytics system. Your R Shiny app must make use of univariate and multivariate analysis methods. Importantly the user should been able to interact with the app to enable methods like PCA to be conducted by a non expert user.
• You must enable the user of your R Shiny App to interactively:
– Select data and apply univariate statistics (e.g. mean, median) and/or compare groups using statistical testing (e.g. t-test). (5 Marks)
– Incorporate interactive control of an MVDA technique (e.g. PCA). (5 Marks).
2.6 Section 6: Innovation
Total Marks (10 Marks)
As a data scientist it is important to be creative in order to develop informative workflows to communicate data and the knowledge gained in meaningful ways. In this project marks will be awarded for innovative aspects of the application. Students are encouraged to explore and use R packages outside of those that have been taught on the course to enhance their R Shiny Dashboard.
There no set requirements or marking scheme for this section. • Examples of innovation can include but are not limited to:
– Visualisations
– Statistical methods and approaches used to analyse the data – UI design and themes
4