COGNITIVESCIENCE 11,65-99(1987)
Why a Diagmm is (Sometimes) Worth
Ten Thousand Words
JILLH. LARKIN HERBERTA. SIMON Carnegie-
Copyright By PowCoder代写 加微信 powcoder
We distinguishdiagrammaticfrom sententialpaper-and-pencilrepresentationsof information by developing alternative models of information-processingsystems that are informationallyequivalentandthat can becharacterizedas sententialor diagrammatic. Sententiolrepresentationsare sequential, like the propositionsin a text. Diagrammatic representations are indexed by location in a plane. Dia- grammatic representationsalso typically display information that is only implicit in sentential representationsand that therefore has to be computed, sometimer at great cost, to make it explicit for use. We then contrast the computationaleffi- ciency of these representationsfor solving several.illustrativeproblems in mathe- matics and physics.
When two representationsare informationallyequivalent,their computational efficiency depends on the information-processingoperators that act on them. Two sets of operators may differ in their capabilities for recognizingpatterns, in the inferencesthey can carry out directly, and in their control strategies (in par- ticular, the control of search). Diagrammatic and sentential representationssup part operators that differ in all of these respects. Operators working on one representation may recognize features readily or make inferences directly that are difficult to realize in the other representation. Most important, however, are differences in the efficiency of search for information and in the explicitnessof information. In the representationswe call diagrammatic, information is orga- nized by location, and often much of the information needed to make an infer- ence is present and explicit at a single location. In addition, cues to the next logical step in the problem may be present at an adiacent location. Therefore problem solving can proceedthrough a smooth traversal of the diagram, and may require very little search or computation of elements that had been implicit.
According to Bartfett¡¯s Quotations, ¡°a picture is worth 10,OOO words¡± is a Chinese proverb. On inquiry, we find that the Chinese seem not to have heard of it, but the proverb is certainly widely known and widely believed in our culture. In particular, problem solvers in domains like physics and en- gineering make extensive use of diagrams, a form of pictures, in problem
The first author was supported in part by grant MDR-8470166 from the NSP Directorate for Science and Engineering Education and by the Guggenheim Foundation.
Correspondenceand requests for reprints should be sent to . Larkin, Department of Psychology. Carnegie-MellonUniversity, Pittsburgh, PA 15213.
66 LARKIN AND SIMON
solving, and many distinguished scientists and mathematicians (e.g., Ein- stein, Hadamard) have denied that they ¡°think in words.¡¯¡¯ To understand why it is advantageous to use diagrams-and when it is-we must find some way to contrast diagrammatic and non-diagrammatic representations in an information-processing system.
When they are solving problems, human beings use both internal repre- sentations, stored in their brains, and external representations, recorded on a paper, on a blackboard, or on some other medium. Some investigators (e.g., Pylyshyn, 1973) have argued that all internal representations are propositional, while others (e.g., Anderson, 1978) have argued that there is no operational way in which an internal propositional representation can be distinguished from a diagrammatic one. Although our discussion may be relevant to this current controversy about the distinguishability of different internal representations, our analysis explicitly concerns external represen- tations.
We consider external problem representations of two kinds, both of which use a set of symbolic expressions to define the problem.
1. In a sentential representation, the expressions form a sequence corre- sponding, on a one-to-one basis, to the sentences in a natural-language description of the problem. Each expression is a direct translation into a simple formal language of the corresponding natural language sen- tence.
2. In a diagrammatic representation, the expressions correspond, on a one-to-one basis, to the components of a diagram describing the prob- lem. Each expression contains the information that is stored at one par- ticular locus in the diagram, including information about relations with the adjacent loci.
The fundamental difference between our diagrammatic and sentential representations is that the diagrammatic representation preserves explicitly the information about the topological and geometric relations among the components of the problem, while the sentential representation does not. A sentential representation may, of course, preserve other kinds of relations, for example, temporal or logical sequence. An outline may reflect hierarchi- cal relations.
We consider problems presented in these two representations and ask about the relative difficulty of solution. We start with the assumption that the problem is solved using the given representation (sentential or diagram- atic). In fact, of course, one way to solve a problem in a poor representation is to translate it into a better one. One may be able to use the information in a verbal description to draw or image a diagram or use a diagram to infer verbal statements. But in order to understand what makes a good represen- tation, we ask what is required for solution without such translation.
DIAGRAMS AND WORDS 67
1. FORMALIZING THE QUESTION
To compare diagrams with sentences, we need to define what we mean by representation and by a ¡°better¡± representation.
1.1 Wbat Does a ¡°Better¡± Representation Mean?
1.I . I Informational and Computational Efficiency. At the core of our analysis lie the wholly distinct concepts of informational and computational equivalence of representations (Simon, 1978). Two representations are in- formationally equivalent if all of the information in the one is also inferable from the other, and vice versa. Each could be constructed from the infor- mation in the other. Two representations are computationally equivalent if they are informationally equivalent and, in addition, any inference that can be drawn easily and quickly from the information given explicitly in the one can also be drawn easily and quickly from the information given explicitly in the other, and vice versa.
¡°Easily¡± and ¡°quickly¡± are not precise terms. The ease and rapidity of inference depends upon what operators are available for. modifying and augmenting data structures, and upon the speed of these operators. When we compare two representations for computational equivalence, we need t o compare both data and operators. The respective value of sentences and diagrams depends on how these are organized into data structures and on the nature of the processes that operate upon them.
1.1.2 Representations.A representation consists of both data structures and programs operating on them to make new inferences. The data struc- tures we consider are node-link structures that include schemas employing attribute-value pairs. (Such structures have been called variously list struc- tures, colored directed graphs, scripts, and frames. The differences, when there are any, are inconsequential for our purposes.) We can think of these structures as being represented in a list-processing language like LISP. Pro- grams are represented as production systems. Each instruction has the form: C-A, conditionsCwithassociatedactionsA.Theconditionsaretestson some parts of the data structures; whenever such tests are satisfied by the appropriate data structures, the actions of the production are executed. Actions modify data structures, that is, they make and record inferences. Although the data structures we shall postulate are stored externally, on paper, the productions that operate on them are in the problem solver¡¯s memory.
Since data structures for a problem are complex, we must also provide for an attention management system that determines what portion of the data structure is currently attended to and can trigger the productions of the program. The nature of attention management depends crucially on the
68 LARKIN AND SIMON
linkages provided in the data structure since this is the only information available for guiding shifts in attention.
Later we describe systems for solving physics and geometry problems. In these systems the productions contain knowledge of the laws and principles of physics or geometry, while the data structures contain knowledge about the particular problem being solved. This separation corresponds to the usual division of labor between the knowledge a problem solver holds in memory and the knowledge he or she commits to paper. In both cases we will also need to consider how attention is managed in keeping with the data structure.
In general the computational efficiency of a representation depends on all three of these factors (data structure, program, attention management) and on how well they work together. Whether a diagram (or any other rep- resentation) is worth 10,OOO words depends on what productions are avail- able for searching the data structure, for recognizing relevant information and for drawing inferences from that information (Simon, 1978). This point has been made again recently by Anderson (1984) in arguing that the dis- tinction between representations is not rooted in the notations used to write them, but in the operations used on them.
1.2 Diagrams and Sentences
We are concerned with contrasting the operation of an inference program, human or computer, when using two different data structures with the same informational content. To assure informational equivalence, we start with a problem stated in natural language, translate it first into a sequence of more formal sentences, and then translate it into a diagram.
2.2. 2 Data Structures. Producing a formal sentential representation from a verbal problem statement is relatively straightforward, using simple ana- logs of propositional coding like that of (Kintsch & , 1978). But how can we produce data structures that capture important features of dia- grammatic problem representations? Consider a situation described by a sequence of ordinary English sentences (e.g., a verbal problem statement). Associate with each sentence a location (perhaps x and y coordinates in a two-dimensional reference frame). Now sentences are indexed by location-
a program using this data structure can choose to ¡°look¡± at a particular location and thereby access all information present there (i.e., all informa- tion elements indexed by that location). In short we make the following definitions:
A data structure in which elements appear in a single sequence is what we will call a sentential representation.
A data structure in which information is indexed by two-dimensional location is what we call a diagrammatic representation.
DIAGRAMS AND WORDS 69
Of course, when a sentence or diagram is analyzed internally it may acquire different linkages, (e.g., a person may form a ¡°mental picture¡± upon read- ing a sequence of sentences), but here we are concerned with the external representations.
2.2.2 Programs. The program operating on the data structure employs the following kinds of processes: (1) Search operates on the node-link data structures, seeking to locate sets of elements that satisfy the conditions of one or more productions. This process requires attention management. (2) Recognition matches the condition elements of a production to data ele- ments located through search. Recognition depends on a match between the elements in the data structure and the conditions of the productions in the program. (3) Inference executes the associated action to add new (inferred) elements to the data structure.
Howdosententialanddiagrammaticrepresentations, respectively,affect the three components of information processing mentioned above: search, recognition, and inference?
Search. Consider first the sentential data structure consisting of a simple list of items. Unless an index is manufactured and added explicitly to this list, finding elements matching the conditions of any inference rule requires searching linearly down the data structure. Furthermore, the several ele- ments needed to match conditions for any given rule may be widely sepa- rated in the list. Search times in such a system depend sensitively upon the size of the data structure.
Search in a diagram can be quite different. In this representation an item has a location. If the conditions of an inference rule are only satisfied by structures at or near a single location, then the tests for satisfaction can all be performed on the limited set of structures that belong to the current loca- tion, and no search is required through the remaining data. Often part of the search process involves identifying multiple attributes of the same items, for example, that a rabbit is both white and furry. Therefore one computa- tional cost of search is the ease with which such attributes can be collected. Of course, some search may be required to find the right location. (Asan example of such a system, see the model of chess perception constructed by Simon and Barenfeld, 1969).
The two systems just described are not, in general, computationally equi- valent. As we have described them, we would expect the second to exhibit efficiencies in search that would be absent from the first. Differences in search strategies associated with different representations are one major source of computational inequivalence.
Recognition. The effects of different representations on search are at least equaled by their effects on recognition. Human abilities to recognize
70 LARKIN AND SIMON
information are highly sensitiveto the exact form (representation) in which the information is presented to the senses (or to memory). For example, consider a set of points presented either in a table of x and y coordinates or as geometric points on a graph. Visual entities such as smooth curves, maxima and discontinuities are readily recognized in the latter representa- tion, but not in the former.
Ease of recognition may be strongly affected by what information is ex- plicit in a representation, and what is only implicit. For example, a geometry problem may state that a pair of parallel lines is cut by a transversal. Eight angles, four exterior and four interior, are thereby created but not men- tioned explicitly. Moreover, without drawing a diagram it is not easy to identify which pairs of angles are alternate interior angles-information that may be needed to match the conditions of an inference rule. All of these entities are readily identified from a diagram by simple processes, once the three lines are drawn. The process of drawing the diagram makes these new inferences which are then displayed explicitly in the diagram itself. We will see later how this explicitness facilitates geometry proofs when a diagrammatic representation is used. Of course, the same information can also be inferred from the sentential representation, but these latter inference processes may require substantial computation, and the cost of this compu- tation must be included in any assessment of the relative efficiency of the two representations.
Although our focus here is the contrast between diagrammatic and sen- tential representations, the human recognition process can often be specific to particular representations within these broad categories. For example, although the Roman, Cyrillic, and Greek alphabets are nearly isomorphic, a person who can read fluently in one of these alphabets cannot generally read the same information readily when it is transcribed into one of the others. The oral Serbian and Croatian languages are essentiallyidentical, no farther apart than the English and American dialects of the English lan- guage. Serbian is written in the Cyrillic alphabet, while Croatian is written in the Roman alphabet. As a consequence, Serbs and Croats can read each other¡¯s newspapers only with the greatest difficulty.
The difficulty does not disappear for a person who knows both alphabets well, but each only the context of particular languages. For example, some- one who reads Russian fluently when it is written in the Cyrillic alphabet and English in the Roman alphabet will have great difficulty in reading Russian transcribed to the Roman alphabet or English to the Cyrillic. (For similar effects of chess notation, see Chase and Simon [1973]).
It follows that we will be unable to recognize knowledge that is relevant to a situation and retrieve it from long-term memory if the situation is not presented in a representation matching the form of existing productions. This specificity of access is presumably remediable by training, but only at the cost of acquiring whole new sets of productions with condition sides
DIAGRAMSAND WORDS 71
adapted to the specific representation that is employed. While the specificity favors no particular form of representation, it does severely limit the imme- diate substitutabilityof one representation for another.
Because a representation is useful only if one has the productions that can use it, we can readily understand the common complaint of physics pro- fessors that students ¡°refuse to draw diagrams¡± or ¡°don¡¯t appreciate their value.¡± Ifthestudentslackproductionsformakingphysicsinferencesfrom diagrams, they may not only fail to ¡°appreciate¡± the value of diagrams, but will find them largely useless.
Inference.In view of the dramaticeffects that alternativerepresentations may produce on search and recognition processes, it may seem surprising that the differential effects on inference appear to be less strong. Inference is largely independent of representation if the information content of the two sets of inference rules is equivalent-i.e., the two sets are isomorphs as they are in our examples. But it is certainly possible to make inference rules that are more or less powerful, independently of representation.
Examples of this phenomena are suggestedby the everydayuse of the verb ¡°see¡± when no explicit visual processes are present. What is this metaphori- cal ¡°seeing¡± and how might it connect to information-processingdifferences between sentential and diagrammatic representations? We speculate that this metaphor refers to inferences that are qualitatively like perceptually ¡°seeing¡± in that they come about through productions with great computa-
tional efficiency. This efficiency might arise from low search and recogni- tion costs, or from very powerful inference rules or from both.
Consider, for example, a physical chessboard which we would represent as a set of squares, each with an (x,y) location and connectionsto adjacent squares. With each square is associated the name of any piece on it. Any personcan¡°see¡± onwhatsquaresthepieceslieandlocateadjacentornearby squares. These inferences come from primitive production rules that every- one has. But a chess expert may ¡°see¡± things in the board not evident to the non-expert observer. For example, an important feature of a chess position is an open file: a sequenceof squaresthat are vacant, running from the play- er¡¯s side of the board toward the opponent¡¯s side. In what sense is this ¡°see- ing¡± if everyone cannot see it? This recognition could be accomplishedby a production that, upon noticing an open square in the first row, would trace this square to the ¡°North¡± until a piece was encountered, then store this feature in memory, indexed to its location on the board. For the chess player who has such a production in his or her repertory, an open file is ¡°seen,¡± meaning that it is easily recognized. But for a person without such a set of productions only the individual unoccupied
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com