FIT5145 – Sample Exam Paper
Instructions
Information
This exam is in three parts:
• Part A: Multiple Choice Questions – 30 questions (30 marks)
• Part B: Short Answer Questions – 25 questions (50 marks)
• Part C: Longer Answer Questions – 4 questions (20 marks)
This is a sample exam. It is not complete, in that it does not have the full complement of questions.
Part A Multiple-choice Questions
Information
This section is worth 30 marks. Each question is worth 1 mark.
Identify the choice that best completes the statement or answers the question. There is only one best answer for each question. Sometimes two answers may appear feasible, but you are to pick the one you believe is the best.
Marking Scheme for Multiple Choice Questions:
• 1 mark for a correct answer
• 0 marks for the wrong answer, or no answer
QUESTION 1.1: Database issues
Distributed databases, in-memory databases and RDBMSs are specially designed to address the following issue:
A. the need for scalable systems
B. the need for cheaper systems
C. the need to handle semi-structured data D. the need for security
QUESTION 1.2: Privacy
What is the technological reason for the continued increase in lack of privacy?
A. the flow of technology makes surveillance easier unless particular measures are set in place. B. the increase in cybercrime and terrorism makes it a necessity.
C. the open internet and the cloud removes privacy.
D. it follows from Koomey’s Law.
QUESTION 1.3: Text data
Text data can also be:
A. structural metadata B. a digital container C. image data
D. markup language
QUESTION 1.4: Useful visualisation
Which of the following is visualisation not useful for:
A. to perform discovery visually
B. to perform significance tests
C. to explore the data during the initial cleaning process. D. to explain results
QUESTION 1.5: Tasks
Which of these tasks might a data scientist typically perform?
A. Pitching project ideas.
B. Collecting and cleaning data.
C. ALL of the three other options. D. Integrating and Interpreting data.
QUESTION 1.6: Python
Which of the following statements about Python is TRUE?
A. The first element of an array in Python has the index 1. B. Python is a scripting language.
C. Python was designed by statisticians.
D. Python is a proprietary programming language.
QUESTION 1.7: Shell commands
Unix shell commands like “less” and “grep”:
A. can be used to manipulate large data files easily
B. are poorly documented
C. are examples of technology that is too old to be useful to a modern data scientist D. are used to fit regression tree models
QUESTION 1.8: Data Science Software and Tools
Which of the following set of software names consist of operating system, programming language, database and visualization tool respectively?
A. Window, R, SQL, Spark
B. Unix, Java, MySQL, matplotlib
C. Mac OS, Hadoop, Oracle, Visual Basic D. All of the options
……
Part B Short Answer Questions
Information
This section is worth 50 marks. Each question is worth 2 marks.
Your answers should be written in clear, simple English and should be complete enough in addressing the question. Extensive prose is not required. Structured bullet points are preferred.
QUESTION 2.1 (2 marks)
List four types of metadata might be associated with an image.
……
QUESTION 2.2 (2 marks)
Would you consider users’ emails to be sensitive information? Why or why not?
Part C Longer Answer Questions
QUESTION 3.1 (5 marks)
Information
This section is worth 20 marks. Each question is worth 5 marks.
Your answers should be written in clear, simple English and should be complete enough in addressing the question. Extensive prose is not required but multiple paragraphs may be required. Structured bullet points are preferred.
Australia has many highways that contain tolls. For instance, Victoria has the CityLink system, Queensland has various Motorways, and NSW has the WestConnex system. A lot of these systems are run by the Transurban company. These are complex systems that collect data and use the data to detect cars, read numberplates, check speeds, and bill drivers. These systems also use websites, mobile phone apps, in-car devices, and devices on the roads to allow cars and drivers to interact with the system, including paying the bills and registering new vehicles. Each system has to be set up according to different state laws, and shares data with different state authorities, like the state police and the road management authorities. Transurban have to modify their system to match the needs and laws of each state and road. However, Transurban does allow drivers to interact with a single account system (called Linkt) and pay for tolls in any of the three states. Presume that Transurban has a different system per state to collect data from the road and vehicles, another system which analyses the data and works out the billing, and a single centralised national system which holds the account records for each vehicle. Drivers interact with the national system via apps on their phone or via webpages.
For four of the stages in a data lifecycle, explain how standardising some aspect of Transurban’s data systems will help it to function efficiently.
……