程序代写 ISBN 978-1-4503-8309-7/21/03. . . $15.00 https://doi.org/10.1145/3442188.34

Formalizing Trust in Artificial Intelligence: Prerequisites, Causes and Goals of Human Trust in AI

Bar Ilan University

School of Computing and Information Systems The University of Melbourne
Trust is a central component of the interaction between people and AI, in that ‘incorrect’ levels of trust may cause misuse, abuse or disuse of the technology. But what, precisely, is the nature of trust in AI? What are the prerequisites and goals of the cognitive mech- anism of trust, and how can we promote them, or assess whether they are being satisfied in a given interaction? This work aims to answer these questions. We discuss a model of trust inspired by, but not identical to, interpersonal trust (i.e., trust between people) as defined by sociologists. This model rests on two key properties: the vulnerability of the user; and the ability to anticipate the im- pact of the AI model’s decisions. We incorporate a formalization of ‘contractual trust’, such that trust between a user and an AI model is trust that some implicit or explicit contract will hold, and a formalization of ‘trustworthiness’ (that detaches from the notion of trustworthiness in sociology), and with it concepts of ‘warranted’ and ‘unwarranted’ trust. We present the possible causes of war- ranted trust as intrinsic reasoning and extrinsic behavior, and dis- cuss how to design trustworthy AI, how to evaluate whether trust has manifested, and whether it is warranted. Finally, we elucidate the connection between trust and XAI using our formalization.
CCS CONCEPTS
• Human-centered computing → HCI theory, concepts and models; • Applied computing → Sociology; Psychology; • So- cial and professional topics → Computing / technology pol- icy; • Computing methodologies → Artificial intelligence; Machine learning.
trust, distrust, trustworthy, warranted trust, contractual trust, arti- ficial intelligence, sociology, formalization
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from
FAccT ’21, March 3–10, 2021, Virtual Event, Canada
© 2021 Association for Computing Machinery. ACM ISBN 978-1-4503-8309-7/21/03. . . $15.00 https://doi.org/10.1145/3442188.3445923
for Artificial Intelligence University of Washington
Yoav Ilan University
for Artificial Intelligence
ACM Reference Format:
, ́, , and . 2021. Formal- izing Trust in Artificial Intelligence: Prerequisites, Causes and Goals of Human Trust in AI. In Conference on Fairness, Accountability, and Trans- parency (FAccT ’21), March 3–10, 2021, Virtual Event, Canada. ACM, , NY, USA, 12 pages. https://doi.org/10.1145/3442188.3445923
1 INTRODUCTION
With the rise of opaque and poorly-understood machine learning models in the field of AI, trust is often cited as a key desirable property of the interaction between any user and AI [8, 11, 76, 81]. The recent rapid growth in explainable AI (XAI) is also, in part, motivated by the need to maintain trust between the human user and AI [32, 36, 47, 55, 62, 77]. By designing AI that users can and will trust to interact with, AI can be safely implemented in society.
However, literature seldom discusses specific models of trust between humans and AI. What, precisely, are the prerequisites for human trust in AI? For what goals does the cognitive mechanism of trust exist? How can we design AI that facilitates these prerequisites and goals? And how can we assess whether the prerequisites exist, and whether the purpose behind the trust has been achieved?
In this work, we are interested in formalizing the ‘trust’ trans- action between the user and AI, and using this formalization to further our understanding of the requirements behind AI that can be integrated in society. We consider ‘artificial intelligence’ to be any automation that is attributed with intent by the user [social at- tribution, 55], i.e., anthropomorphized with a human-like reasoning process. For our purpose, we consider the user to be an individual person, rather than an organization, though aspects of the work are applicable to the latter as well.
There are many vague aspects of trust that are difficult to for- malize with the tools available to us in literature on AI and Human- Computer Interaction (HCI). For this reason, we first discuss how interpersonal trust is defined in sociology, and derive a basic, yet functional, definition of trust between a human and an AI model, based on the prerequisites and goals of the trustor gaining trust in the AI (Section 2). Specifically, the trustor must be vulnerable to the agent’s actions, and the trustor’s goal in developing trust is to anticipate the impact of the AI model’s decisions.
However, the above definition is incomplete: though the goal is anticipating ‘intended’ behavior, what can we say about when and whether this goal is achieved? We develop the definition further
arXiv:2010.07487v3 [cs.AI] 20 Jan 2021

FAccT ’21, March 3–10, 2021, Virtual Event, Canada
, ́, , .
by answering two questions: (1) what is the AI model being trusted with (i.e., what is ‘intended’)?; and (2) what differentiates trust that achieves this goal, and trust that does not? Section 3 answers (1) via a notion of contractual trust, and Section 4 answers (2) via notions of warranted and unwarranted trust. In Section 5 we complete the definition of Human-AI trust with a formal summary of the above.
With these definitions, we are now equipped to discuss the causes of trust in the AI (specifically, warranted trust in a particular con- tract), and how we should pursue the development of AI that will be trusted. In Section 6, we answer the question: what are the mech- anisms by which an AI model gains the trust of a person? Namely, we define and formalize notions of intrinsic trust, which is based on the AI’s observable reasoning process, and extrinsic trust, which is based on the AI’s external behavior.
Both intrinsic and extrinsic trust are deeply related to XAI. As mentioned, the XAI literature frequently notes trust as a principal motivation in the development of explanations and interpretations in AI, but seldom elucidates the precise connection between the methods and the goal. In Section 7, we unravel this ‘goal’ of XAI —to facilitate trust—by using our formulation thus far.
In Section 8 we pivot to the question of evaluating trust, by dis- cussing the evaluation of the vulnerability in the interaction, and of the ability to anticipate. Finally, in Section 9 we expand on other as- pects of interpersonal trust and human-machine trust (automation not attributed with intent), their relation to our notion of Human-AI trust, and possible future extensions of our formalization.
Contributions. We provide a formal perspective of Human-AI trust that is rooted in, but nevertheless not the same as, interpersonal trust as defined by sociologists. We use this formalization to inform notions of the causes behind Human-AI trust, the connection be- tween trust and XAI, and the evaluation of trust. We hope that this work enables a principled approach to developing AI that should, and will, be trusted in practice.
Note on the organization of the work. The following sections provide an informal description of trust in AI via a narrative, in the interest of accessibility (§2,3,4). We provide formal, concise defini- tions of our taxonomy after completing the relevant explanations (§5). Additionally, for coherency we bypass some nuance behind our choice of formalization, made available in §9.
2 A BASIC DEFINITION OF TRUST
To understand human trust in AI (Human-AI trust), a useful place to start is to examine research in philosophy, psychology, and soci- ology of how people trust each other (interpersonal trust). In this section, we present a primitive (and incomplete, as we will show) definition of trust that will serve as a basis for the rest of the work.
Definition (Interpersonal trust). A common basic definition of trust regards it as a directional transaction between two parties: if A believes that B will act in A’s best interest, and accepts vulnerability to B’s actions, then A trusts B [52]. The goal of trust is to “make social life predictable [by anticipating the impact of behavior], and
1 make it easier to collaborate between people” [56].
This definition of trust is considered overly simplistic by many in sociology. In Section 9 we discuss aspects of more elaborate formalizations of interpersonal trust, and whether they are relevant to Human-AI trust.
Noteworthy in this definition, and key to defining Human-AI trust, are the notions of anticipation and vulnerability. In particu- lar, interpersonal trust exists to mitigate uncertainty and risk of collaboration by enabling the trustor’s ability to anticipate the trustee—where ‘anticipating’ refers to a belief that the trustee will act in the trustor’s best interests. We maintain that Human-AI trust exists for the same purpose, as a sub-case of trust in automation, following Hoffman [29]: trust is an attempt to anticipate the impact of behavior under risk. Based on this, we conclude:
Risk is a prerequisite to the existence of Human-AI trust. We refer to risk as a disadvantageous or otherwise undesirable event to the trustor (that is a result of interacting with the trustee), which can possibly—but not certainly—occur [25]. Therefore, “to act in A’s best interest” is to avoid any unfavorable events. Admitting vulnerability means that the trustor perceives both of the following: (1) that the event is undesirable; and (2) that it is possible. Ideally, the existence of trust can only be verified after verifying the existence of risk, i.e., by proving that both conditions hold.
For example, AI-produced credit scoring [9] represents a risk to the loan officer: a wrong decision carries a risk (among others) that the applicant defaults in the future. The loss event must be undesirable to the user (the loan officer), who must understand that the decision (credit score) could theoretically (and not certainly) be incorrect for trust to manifest. Similarly, from the side of the applicants (if they have a choice as to whether to use the AI model), the associated risk is to be denied or to be charged a higher interest rate on a loan that they deserve, and trust manifests if they believe that the AI model will work in their interest (the risk will not occur).
Distrust manifests in attempt to mitigate the risk. The notion of distrust is important, as it is the mechanism by which the user attempts to avoid the unfavorable outcome. We adapt Tallant’s definition of distrust: A distrusts B if A does not accept vulnerability to B’s actions, because A believes that B may not act in A’s best interest [71]. Importantly, distrust is not equivalent to the absence of trust [53], as the former includes some belief, where the latter is lack of belief—or in other words, distrust is trust in the negative scenario. For the remainder of this paper, we focus our analysis on trust, as the link to distrust is straightforward.
The ability to anticipate is a goal, but not necessarily a symp- tom, of Human-AI trust. The ability or inability of the user to anticipate the behavior of an AI model in the presence of uncer- tainty or risk, is not indicative of the existence or absence of trust. We illustrate this in §4. We stress that anticipating intended behav- ior is the user’s goal in developing trust, but not necessarily the AI developer’s goal.
3 CONTRACTUAL TRUST
The above notion of anticipating ability is incomplete. If the goal of trust is to enable the trustor’s ability to anticipate, what does the human trustor anticipate in the AI’s behavior? And what is the role of the ‘anticipated behavior’ in the definition of Human-AI trust?
3.1 Trust in Model Correctness
XAI research commonly refers to the trust that the model is correct [e.g., 23, 47, 64]. What does this mean, exactly?

Formalizing Trust in Artificial Intelligence: Prerequisites, Causes and Goals of Human Trust in AI FAccT ’21, March 3–10, 2021, Virtual Event, Canada
To illustrate this question, consider some binary classification task, and suppose we have a baseline that is completely random by design, and a trained model that achieves the performance of
the random baseline (i.e., 50% accuracy in this case).
trained model performs poorly, a simple conclusion to draw is that we cannot trust this model to be correct. But is this true?
Suppose now that the trained model with random baseline perfor-
mance does not behave randomly. Instead, it is biased in a specific
manner, and this bias can be revealed with an interpretation or
explanation of the model behavior. This explanation reveals to the
user that on some types of samples, the model—which maintains
random baseline performance—is more likely to be correct than for
model that is more likely to be correct for certain sub-populations.
The performance of the second model did not change, yet we can say that now, with the added explanation, a trustor may have more trust that the model is correct (on specific instances). What has changed? The addition of the explanation enabled the model to be more predictable, such that the user can now better anticipate whether the model’s decision is correct or not for given inputs (e.g. by looking at whether the individual is part of a certain sub- population), compared to the model without any explanation. Note that this is merely refers to one ‘instance’ of anticipation; it refers to anticipating a particular attribute of the AI’s decision (correctness), whereas in the previous definition (§2), it refers to general behavior.
We arrive at a more nuanced and accurate view of what “trust in model correctness” refers to: it is in fact not trust in the general per- formance ability of the model, but that the patterns that distinguish the model’s correct and incorrect cases are available to the user.
3.2 The General Case: Trust in a Contract
The above example of model correctness is merely an instance of
what Hawley [27] and Tallant [71] refer to as trust with commitment
and that regardless of what the contract is in a particular interaction, to discuss Human-AI trust, the contract must be explicit.
Generally, the contract may refer to any functionality that is deemed useful, even if it is not concrete performance at the end- task that the model was trained for. Therefore, model correctness is only one instance of contractual trust. For example, a model trained to classify medical samples into classes can reveal strong correlations between attributes for one of those classes, giving leads to research on causation between them, even if the model was not useful for the original classification task [43, 47].
Contracts and contexts. The idea of context is important in trust: people can trust something in one context but not another [29]. For
Assume that the performance evaluation is representative of real usage for now, although this is an important factor that we will discuss in Section 6.2.
E.g., calibrated probabilities [46], where the classification probabilities of a model are calibrated with some measure of its uncertainty, can produce this effect.
To our knowledge Hawley [27] is the first to formalize trust as “trust with commitment [= contract].” Tallant [71] expands on their work with terminology of contractual trust.
Although they do not refer to contractual trust in their work, Hoffman [29] provide support to formalize trust in automation (beyond AI) as multi-dimensional (which we interpret as multi-contractual), rather than a binary variable or sliding scale.
example, a model trained to classify medical samples into classes can perform strongly for samples that are similar to those in its training set, but poorly on those where some features were infrequent, even though the ‘contract’ appears the same. Therefore, contractual trust can be stated as being conditioned on context. For readability in the rest of this paper, we omit context from the discussion, but implicitly, we consider the contract to be conditioned on, and thus include, the context of the interaction.
What are useful contracts? The European Commission has out- lined detailed guidelines on what should be required from AI models
As an illustrative example, consider a credit-scoring AI
or contractual trust. Contractual trust is when a trustor has a belief 4
that the trustee will stick to a specific contract.
In this work, we contend that all Human-AI trust is contractual,
for them to be trustworthy (see Table 1, col. 1–2). requirements can be used to specify a useful contract.
Another area of research that is relevant for defining contracts is the work that proposes standardized documentations to communi- cate the performance characteristics of trained AI models. The ex- amples of such documentations are: data statements [6], datasheets for datasets [18], model cards [57], reproducibility checklists [59], fairness checklists [50], and factsheets [2].
We illustrate the connection between these documentations and the European requirements in Table 1. For example, if transparency is the stated contract then all of the mentioned documentations could be used to specify information that AI developers need to provide such that they can evaluate and increase users’ trust in transparency of an AI system.
Explanation and analysis types depend on the contract. We argue that “broad trust” is built on many contracts, each involving many factors and requiring different evaluation methods. For ex- ample, the models’ efficiency in terms of the number of individual neurons responsible for a prediction is relevant for sustainability, but likely not for, e.g., ensuring universal design.
We have previously illustrated that the addition of explanation of the model’s behavior can increase users’ trust based on one contract (§3.1). Just as different evaluation methods are needed for different types of contractual trust, so are different types of explanations. In Table 1, we outline different established types of explanatory methods and analyses that could be suitable for increasing different types of contractual trust derived from the European requirements.
Conclusions. The formalization of contracts allows us to clarify the goal of anticipation in Human-AI trust: contracts specify the behavior to be anticipated, and to trust the AI is to believe that a set of contracts will be upheld.
Specific contracts have been outlined and explored in the past when discussing the integration of AI models in society. We advo- cate for adoption of the taxonomy of contracts in Human-AI trust, for three reasons: (1) it has, though recent, precedence in sociol- ogy; (2) it opens a general view of trust as a multi-dimensional transaction, for which all relevant dimensions should be explored before integration in society; and importantly, (3) the term implies an obligation by the AI developer to carry out a prior or expected agreement, even in the case of a social contract.
The guidelines are available at https://ec.europa.eu/digital-single-market/en/news/ ethics- guidelines- trustworthy- ai.
Each of these

FAccT ’21, March 3–10, 2021, Virtual Event, Canada , ́,

程序代写 CS代考加微信: powcoder QQ: 1823890830 Email: powcoder@163.com

Related Posts