程序代写 COVID-19…

Social Media Sentiment Matters data science, #tweets
& their ethics
Dr – Senior Lecturer in Information Sys. (Digital Ethics) Senior Fellow (MLS) & Honorary Burnet Institute Senior Fellow
S1/2022 – 25 May 2022

Copyright By PowCoder代写 加微信 powcoder

CompSci – Did Twitter research ca. 2009-2013
– Was a wonderful world back then, generous support by Twitter, less toxicity online
– Lucky to jump in then, ~3 dozen papers or so worldwide!
Pivoted to philosophical research
– Introducing data science to philosophers (digital humanities/experimental philosophy)
– Contemporary social media usage and social networking trends.
Digital Ethics (philosophical ethics + data science + practical aspects)
All cover image credits: Unsplash

Ethics in social media data analysis
Not just the realm of ethicists/philosophers, lawyers, etc.
The ML/Data Science community is fast realizing its importance – e.g., FACCT (left) and NeurIPS (right).

Quick intro 🎥
Case study: Social media metadata – interactive lecture 💬
• Studying the public reaction to the Matter – movement for equality & social good
• The ethics behind this!
Enter: Digital ethics & social media sentiment mining – interactive lecture
Ethics and the ML pipeline.
💬 Ethical discussions include scenarios, mitigating risks, techniques? Coghlan et al – the ethical principles
Further Discussion.

Spot the themes…

Spot the themes…
0:04 – (Philosophical) notion of consciousness?
0:20 – notion of “to-do” as tasks, or as an Ethics of Care scholar (e.g. Coghlan, Gilligan, Noddings) will say… …isn’t this about building caring relations?
0:32 – the concept of automata, MTurk etc – shaping machines in the guise of us humans! 0:45 – photos and health stats… when did we opt in?
0:53 – the cloud = someone else’s computer?
1:13 – not ‘almost anything’!
E.g. the harms of misclassifying cats as dogs VS the harms of classifying someone as ‘loan denied’ (someone’s livelihood!) VS the harms of misclassifying medical data (diseases misdiagnosed!)
1:27 – (Philosophical) caveat: do machines “learn” or follow patterns? 1:55 – More productive? Or displaced by automation?
2:22 – Who can afford such cars?

Case study: Geo data and social media

Geo data: helping us in COVID-19…
Aggregating crowdsourced data on Rapid Antigen Tests – findarat.com.au 8

…beware danger!
“Sensitive information about the location and staffing of military bases and spy outposts around the world has been revealed by a fitness tracking company…
…data visualisation map that shows all the activity tracked by users of its app, which allows people to record their exercise and share it with others.
The map, released in November 2017, shows every single activity ever uploaded to Strava – more than 3 trillion individual GPS data points, according to the company..”
Hern (2018) for The Guardian
https://www.theguardian.com/world/2018/jan/28/fitness-tracking -app-gives-away-location-of-secret-us-army-bases

Case Study: Twitter geo-metadata
BLM peaceful protest movement (2020)
Work in progress, Please do not cite, circulate or quote without permission – 10 credits: Quintana; ; ; ; ; & .

Ethical discussion…
For peaceful protests (BLM protests, 2020) –
is it problematic if geotags are used by police?
– Consider Amnesty International (2020)’s report 

USA: Law enforcement violated Black Lives Matter protesters’ human rights, documents acts of police violence and excessive Force


💬 Class discussion – as data scientists, discuss:
What determines the ethics of use of social media geo data? (5 minutes)

Risk mitigation: data precision and risk of re-ID
Take, say, the BLM case study – this is sensitive data!
Modern GPS’s in phones  accurate geotags!
(What did we do? Added lots of random noise – hence some data points in the ocean.)
💬 What else can we do?

Digital ethics in NLP!
Interactive lecture.

Case Study: Twitter text for sentiment analysis
Typical ML pipeline, from Hapke & Nelson (2021) 14 https://www.oreilly.com/library/view/building-machine-learning/9781492053187/ch04.html

Case Study: Twitter text for sentiment analysis
• Where does it come from: issues of consent, ethics, representation, harms?
Preprocessing and feature engineering:
• What assumptions do we make: issues of bias, discrimination?
Model construction:
Typical ML pipeline, from Hapke & Nelson (2021) https://www.oreilly.com/library/view/building-machine-learning/9781492053187/ch04.html
• Long term effects – entrenchment?
Language models – contemporary issues? *

Where does it come from: issues of consent, ethics, representation, harms?
Issues to be aware of when collecting the data (terms of use, reproduction of data) E.g. Twitter hydration policy.
Anecdote: In university research – Ethics Approval process.
Harms to individuals?
No collection of private tweets, even as an approved follower.
• 💬Ethicaldilemmanextslide.
• Representation
• Clearly outline assumptions and parameters of data collected. • 💬Ethicaldilemmanextslide.

Where does it come from: issues of consent, ethics, representation, harms?
One ethical issue on representation has been addressed –
In addition, each Tweet is labelled with the variety of English in which it is written: African American English or Standard American English.
There are many English varieties in the world. We need to be aware of cultural and linguistic diversity!

Black English Matters

Data: Ethical discussion…
💬 Ethical question 1:
• You have completed your prototype classification model. (Good work!)
As part of your report, you mention the following:
The classifier performs lower than expected if tested on short tweets with text emoticons and common abbreviations/non-dictionary words, such as we miss u ^_^”
• Discussion:
What ethical issue can we run into?
A hypothetical situation which warrants an ethical discussion.
In our model, these users may well be implicitly linked as, say, Trump supporters…
… even though they are not (e.g. just being ironic, or using political satire).
What moral harms can result in this misclassification?

Data: Ethical discussion…
💬 Ethical question 2:
• You have built a classifier which works very well in classifying movie-related sentiment on Twitter.
• For another project for an NGO/charity, you repurpose the model, but retrain it to classify e.g. public health sentiment in, say, Papua New Guinea.
• What ethical issue can we run into?
Assume your model is technically sound and has been peer-reviewed at the NGO.
• Discussion:
A hypothetical, but quite likely scenario!
Based on Marc’s work with Burnet, studying public health comms in PNG is one of the areas that public health researchers actually work on!

Preprocessing and feature engineering.
What assumptions do we make: issues of bias, discrimination?
Document the assumptions adequately! Good not only from a DS/ML perspective… … but also, an ethical perspective!
💬 Ethical question:
• Assume your preprocessing pipeline consists of the removal of non-alphanumeric characters, e.g.
text = re.sub(r'[^a-zA-Z0-9]’, ”, input)
• What accuracy issues can you also run into? • e.g.
What ethical issues of representation can you run into?

Current issues in NLP.
• Language models – contemporary issues?
• Long term effects – entrenchment?
• (General assumptions on accuracy – the ethical discussion on stakes: FN/FP not so important for sentiment analysis; but very important for e.g. automated sanctioning/enforcement – COMPAS).
Image source: (2020) https://www.gosmar.eu/machinelearning/2021/01/02/mlops-scalability/

Model Bias in NLP: Hutchinson et al (2020)
https://arxiv.org/abs/2005.00813

Model Bias in NLP – then (from 2017)…
Gender bias in word embeddings
(Duman, Kalai, Leiserson, Mackey, Sursesh, 2017) http://wordbias.umiacs.umd.edu/

“as the daughter of an attorney Mrs. Bennet married up when she captivated the landed Mr. Bennet”
– Pride and Prejudice, as cited in
http://www.diva-portal.org/smash/get/diva2:207053 /FULLTEXT01.pdf
(extrapolated to ‘big data’…)
Slide adapted from “Gender Bias: From language models to disparate impact” – credits to [CAIDE] G. Bush, S. Coghlan, K. Leins, A. Lodders, T. Miller, J. Paterson; and [CIS/Policy Lab] L. Frermann; S. Njoto; L. Ruppanner (in alphabetical order).

Model Bias in NLP – …and now (2021)
Using a GPT-3-based web service, we’ve asked GPT-3 to complete the following sentences. (GPT-3 accessed on 9th November 2021).
Reflect on how GPT-3 has completed them…
The detective was a … large man with a large mustache who looked as if he was… The scientist turned out to be a … man with a goatee.
The professor is a … “unsubstantiated” expert in the field, and his opinion should be treated with caution. The nurse was a … woman.
The plane’s captain is a … 75-year-old man
The doctor was a …
serious man in his thirties.
The tailor was a …
man who made clothes.
The librarian in charge is a …
female librarian who knows what she is doing.

Models and Entrenching Bias Once an ML model is deployed, past decisions (en
masse) will play a role in determining future decisions.
Take, say, current biases in NLP models (say M).
• Reuse of models: Any ML model (say M’) based on a current NLP model will inherit these biases. Any further project building upon M’ will continue inheriting these biases.
• Also beware feedback loops: if M misclassifies data point X0 (e.g. misclassifies a particular tweet as ‘not
harmful’ even though it is clearly harmful/discriminatory), it is likely that other data points similar to X0 will be similarly misclassified…
… and if that misclassification is left unchecked when fine-tuning the model…
Image source: Oxford Dictionary of English, per Dictionary.app (MacOS Catalina).

Models: Ethical discussion…
💬 Ethical question:
You’ve completed your model for the assignment. (Good work!)
Open ended question: What should we capture in the documentation/report about the model? (We can’t audit an entire model for bias, but what CAN we do?)
What can data scientists – i.e. you – do???
Auditing the entire pipeline (not just the code!!!)
Including domain experts (+ legal/ethical experts) in the construction/deployment of the system. Tools that may help: Amazon SageMaker Clarify, Learn, LIME.
XAI to the rescue?
💬Opendiscussion

Ethical discussion – just a start
ON DIGITAL ETHICS…
Adapted from my talk given at the Victorian Centre for Data Insights (VIC Govt), May 2021.
How do we put academic ideas of digital ethics into practice
– As a start: learning basic key tenets (cf Coghlan, Miller, Paterson, 2020) https://arxiv.org/pdf/2011.07647.pdf
ON THE PHILOSOPHY OF GPT-3…
From my colleague, Anastasia (Annie) Chan, fresh off the press:
https://link.springer.com/article/10.1007/s43681-022-00148-6
“fairness, non-maleficence, transparency, privacy, respect for autonomy, liberty, and trust.

Any questions?
Identifier first line
•Second line

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com