COMP20008
Elements of Data Processing
Semester 1 2020
Lecture 21: Social and Ethical Implications of Big Data Analytics
Plan for today
• Big data analytics
– Issues
– Stakeholders
– Processes & implications
– 10 simple rules for responsible and ethical big data research
• Acknowledgements
• Many of these slides have been adapted from materials developed by Ida Asadi Someh
Question- what is the story?
Top 3 silicon Top 3 auto (car) valley (2014) makers (1990)
Revenue
$247 billion
$250 billion
Number of employees
137000
1.2 million employee
Market capitalisation
$1.09 trillion
$36 billion
(Zuboff 2015)
What is Big Data Analytics? What’s different now?
• The ability to collect, store, and process increasingly large and complex data sets from a variety of sources, into competitive advantage (LaValle and Lesser 2013)
• Big data management capabilities
– Volume, Variety and Velocity (3Vs) + Veracity (4Vs)
• Algorithms to process big data
– Advanced statistical and computational techniques to process large, unstructured and fast data
Is this a sufficient definition?
Creepy uses of data in the media
• Target exposing a teen girl’s pregnancy
– Father: “My daughter got this in the mail!” he said. “She’s still in high
school, and you’re sending her coupons for baby clothes and cribs?
Are you trying to encourage her to get pregnant?”
https://www.businessinsider.com.au/the-incredible-story-of-how-target-exposed-a-teen-girls-pregnancy-2012-2 https://www.nytimes.com/2012/02/19/magazine/shopping-habits.html?pagewanted=1&_r=1&hp
• Facebook’s 2012 secret mood experiment
– how people react to an emotional contagion process
– filtered users’ news feeds – the flow of comments, videos, pictures and web links posted by other people
– 689,000usersaffected
– https://www.theguardian.com/technology/2014/jun/29/facebook-users-emotions-news-
feeds
• Cambridge Analytica (we have discussed previously ..)
Consequences of Big Data Analytics
• Positive consequences
– Trackingcriminals,higherproductmargins,newbusinessmodels, improved healthcare and …
• Negative consequences:
– Misuseofpersonalinformation,breachingprivacy,profilingof individuals, discrimination and …
• Where is the boundary? What is the balance?
– There is no agreement on what is ethical and what is not!
(Markus and Topi 2015; Newell and Marabelli 2015)
BDA: Not only about Technology
• Technological view (3Vs) does not help to understand unethical use
– Technology is neutral in nature
– Itdoesnotconsidertheunderlyingprocessesthatareenabled
– Itdoesnotconsiderthestakeholdersthatareinvolvedand influenced
– 3Vs do not consider either people or process
– Anewsocialphenomenon,anewmarketeconomy
• We need further (non technical) perspectives on big data analytics
– It is not OK to exclude social from the definition
Stakeholder View on BDA
The exchange of data is characterised by its 4Vs.
Own and benefit from data
Individual
Create big data
Organizations
Society
Guide and regulate
Unequal exchanges between stakeholders
BDA from Social Perspective
• Interactions among stakeholders
• Data is contributed, collected, shared, and processed for the
extracted, exchanged, sold,
purpose of predicting and
modifying human behaviour in the production of economic or
social value.
• BDA involves several processes, discussed next
Process 1: Data Extraction
• Dataextraction,notdatacollection
• GoogleStreetview(“singlegreatestbreachinthe
history of privacy”)
– https://www.theguardian.com/technology/2010/may/15/google-admits-storing-private-data
• Oureverydaynessquantified
• Incursions into legally and socially undefended territory
• Googlehasthelargestunpaidnumberof employees
• “You’re not the customer, you’re the product ..”
Process 2: Data commodification: secondary markets and hidden value chains
• Sellpersonaldatauntilitturnsintowaste
• Big data as a new industry (secondary markets)
(Martin 2015)
Examples
• http://intelligence.towerdata.com/
• http://www.iriworldwide.com/en-us/ • https://www.intelius.com/
Process 3: Decision Making
• Big Data Quality (Veracity)
– Dataaccuracyforaggregateddata
– What’s the quality criteria for a social media post?
– Completenessofourdigitalidentity
– Mosaic effect
– “When is a boat not a boat?”
– Game: http://celebrityguesswho.com/#2
– Meaningdependentonthecontext
– Is how I act on social media a true representation of who I am?
Process 3: Decision Making
• Data Analysis
– Predictions based on the past
– How can I redefine myself?
– In what context is is legitimate to make a prediction about someone?
– Predictionsoftenbasedoncorrelations(notcausations)
– What about outliers? (what if I don’t fit into a predefined category??)
• DataVisualization
– Decisionmakingandpresentationbiases
Example 1
• https://www.theguardian.com/technology/2016/nov/02/admi ral-to-price-car-insurance-based-on-facebook-posts
Process 3: Decision Making
• Data Analysis
– Predictions based on the past
– How can I redefine myself?
– In what context is is legitimate to make a prediction about someone?
– Predictionsoftenbasedoncorrelations(notcausations)
– What about outliers? (what if I don’t fit into a predefined category??)
• DataVisualization
– Decisionmakingandpresentationbiases
Example 2
Example 3
• Do you want to stop global warming? BECOME A PIRATE!
See also: https://www.tylervigen.com/spurious-correlations
Process 4: Control and monitoring
• Pervasivemonitoringnowpossibleusingsensors,
Internet of Things technology
• Everyone is observed, organisations make money of observing others, collect data, sell data, make offers, induce dependence
• Whathappenstosocialtrust?
• Surveillanceisthepreciseoppositeofthetrust- based relationships
• Free market economy versus Surveillance Economy
• WhoIam,whatIam,whatIliketodo,whereIlike to go, who I know, …
• http://theconversation.com/someones-looking-at-you-welcome-to-the-surveillance-economy-
16357
(Zuboff 2015 and Martin 2015)
Process 5: Experiments
• Where did experiments traditionally take place?
– We have been seeing big companies running a large number of experiments on its users investigating different aspects
• Rewards and punishments
(Zuboff 2015)
Societal actors need to provide oversight and regulate
• EU General Data Protection Regulation (enforced since May 2018)
• Aims to regulate and protect data privacy for all EU citizens.
– Penalty 4% of annual global turnover of the organizations.
• The consent
– should be clear, concise, not too long and intelligibly written— should attach the reasons of data collection and analyses.
– individuals have the right to withdraw the consent with the same easiness that they have previously agreed with.
• Accessing individual’s data
– Individuals have the right to ask for a copy of their personal data together with information regarding the processing and purpose of data collection and analyses from a controller
– Individuals have the right of data portability, which means that they can transfer their data from one controller to another.
http://www.eugdpr.org/eugdpr.org.html
10 simple rules for responsible big data research
• From Zook et al (Plos Comp Bio 2017)
– http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005399
Rule 1
• Acknowledge that data are people and can do harm
– All data are people until proven otherwise
• Social media
• HeartratesfromYoutubevideos
• Ocean measurements that change property risk profiles
Rule 2
• Recognize that privacy is more than a binary value
– Privacy is contextual and situational
– Single Instagram photo versus entire history of social media posts – Privacypreferencesdifferacrossindividualsandsocieties
Rule 3
• Guard against the reidentification of your data
– Metadataassociatedwithphotos
– Reverse image search – connect dating and professional profiles – Difficulttorecognizethevulnerablepointsa-priori!
• Battery usage on a phone – can reveal a person’s location
– Unintended consequence of 3rd party access to phone sensors
– Whendatasetsthoughttobeanonymizedarecombinedwithother variables, it may result in unexpected reidentification
Rule 4
• Practice ethical data sharing
– Seekingconsentfromparticipantstosharedata
Rule 5
• Consider the strengths and limitations of your data; big does not automatically mean better
– Document the provenance and evolution of your data. Do not overstate clarity; acknowledge messiness and multiple meanings.
• is a Facebook post or an Instagram photo best interpreted as an approval/disapproval of a phenomenon, a simple observation, or an effort to improve status within a friend network?
Rule 6
• Debate the tough, ethical choices/issues
– importanceofdebatingtheissueswithingroupsofpeers
• Examplesmentionedearlier
– Facebook emotional contagion – Exposing teen girl’s pregnancy
• More recently, Google Duplex
– https://www.youtube.com/watch?v=D5VN56jQMWM
Rule 7
• Develop a code of conduct for your organization, research community, or industry
– Areweabidingbythetermsofserviceorusers’expectations? – Does the general public consider our research “creepy”?
Rule 8
• Design your data and systems for auditability
– Plan for and welcome audits of your big data practices.
– Systemsofauditabilityclarifyhowdifferentdatasets(andthe subsequent analysis) differ from each other, aiding understanding and creating better research.
• “For example, many types of social media and other trace data are unstructured, and answers to even basic questions such as network links depend on the steps taken to collect and collate data.”
Rule 9
• Engage with the broader consequences of data and analysis practices
– Recognize that doing big data research has societal-wide effects
Rule 10
• Know when to break these rules
– Naturaldisaster
– Publichealthemergency – Hostileenemy
–…
– Itmaybeimportanttotemporarilyputasidequestionsofindividual privacy in order to serve a larger public good.
Summary: What can we do?
• We need to empower individuals
– Educate individuals, raise social awareness
– Providedataaccessandcontrol(e.g.Googleactivity)
• http://www.abc.net.au/4corners/stories/2017/04/10/4649443.htm
• Define and develop a culture of acceptable data use
– Organizationsshouldinternalizethecosts
– Genuineconsentfromindividuals
– Betransparentandclearlycommunicateintentofdatacollection and analytics/ai
– Adoptrulesforresponsiblebigdataresearch
– Providedatacontroltoindividuals