Cognitive Psychology

by Eamon Fulcher

Chapter 9: Computer models of cognition and connectionism

BOOK CONTENTS

       

 

Chapter summary

There is a long history of the use of the computer metaphor in cognitive psychology. Indeed, developments in computer science were said to inspire, in part, the cognitive revolution. In this chapter we explore the computer metaphor by comparing aspects of the operation of the modern computer and human mental processes. The most recent development in this area is the connectionist approach and we outline the principles involved and evaluate the approach. The debate about whether or not computers could in principle be considered to think is discussed and we examine the implication of this debate for cognitive psychology.

Section 1

The computer metaphor

The use of the computer metaphor is pervasive in cognitive psychology. Here you will be studying exactly how the computer metaphor is applied and the sorts of models that have been developed that simulate some aspect of cognitive psychology.

Computers carry out their tasks according to a specially written program. A program is a language that the programmer uses to get the computer to carry out particular tasks. Most of the tasks involve either numerical operations or operations on words (non-numeric text is known as a string by programmers).

An algorithm is a set of routines in the program that describes the series of steps that need to be taken in order to achieve some task. People may use algorithms every day of their lives. For example, consider the algorithm for making a cup of coffee:

1. Fill the kettle with water

2. Turn on the kettle

3. Put some coffee in a cup

4. When the kettle has boiled, pour some water into the cup

5. Pour some milk into the cup

We also use algorithms when we try to teach someone a skill. However, because computers cannot think by themselves the instructions need to be very precise indeed. So, while most people would be able to follow the algorithm described above, some steps can be broken down further still. For example, what do we mean by "fill" in step 1? Do we really fill the kettle or is this a figure of speech? Also, in steps 3, 4 and 5, how much is "some" exactly? It follows that the instructions we give a computer must be completely unambiguous. In other words, computers are very stupid and will interpret every command literally.

And the point is?

The aim of this discussion is to illustrate the similarities that might exist between algorithms and the way people think. If human thought can be described as a collection of algorithms then three things might be possible:

•  First, we may have identified a specific way of describing mental processes. Once a psychologist is familiar with what an algorithm is, he or she can understand the details of theories that are described as algorithms.

•  Second, it may be possible to simulate human thought on the computer.

•  Third, it may be possible to design machines that can do things that humans do.

 

The first two describe the information processing approach to theory development. The third is the subject of the study of artificial intelligence or AI.

Artificial intelligence (AI) is the study of how to make machines do the sorts of things that humans do. Many different types of AI program have been developed, from playing chess to filtering out background noise (as in the human ability of selective attention).

Many computer models of psychological processes have been developed and commonly these are written as computer programs. There are a number of advantages of using this approach (as opposed to expressing a theory as a handful of general statements), but there are a number of problems with the approach too.

Strengths

The greatest strengths of the computer metaphor are:

•  Precision

Theories that describe mental processes or behaviour can be expressed in a number of ways. They can be expressed as a series of statements, as a diagram (such as a box-and-arrow diagram), or as an algorithm with an accompanying computer program. The latter method requires greater precision in determining how the theory can be put together (since one has to develop a computer simulation of it), and it also generates more precise predictions. Very often when transposing a theory from a series of statements into a computer program one finds that some features either do not work as expected or they are incompatible with other features. In this way, computer simulation is a very good way of testing the internal consistency of a theory.

•  Explicitness of assumptions

Tied in with precision of the inner workings of the model is the need to make one's assumptions explicit. This means that when developing a computer simulation, one has to make decisions about basic assumptions in cognitive psychology. These assumptions become apparent in the description of the model.

•  Novel experimental predictions

One can quite easily generate a theory as a series of statements and overlook interesting predictions of the theory. Computer models have the advantage that when they are running their behaviour can be quite unexpected and hence lead to testable predictions.

These positive features make computer models of cognition much easier to evaluate and test than other types of theories. For example, one criticism of box-and-arrow theories is that it might not be clear what an arrow represents. It might represent passing information from one module to another, or it might represent a particular process. That process may or may not be compatible with other processes.

Weaknesses

Several weaknesses of the computer metaphor are the following:

•  Algorithms are serial

The algorithm may be only one way of describing how mental processes operate. Because of its design, the computer can only do one thing at a time. Although it can carry out calculations at an enormous speed, it only has one processor. Hence algorithms are descriptions of serial processes. If we use an algorithm, or the computer metaphor, to develop a theory of some cognitive process we might restict ourselves to a theory of seriality. For example, a theory of attention might suppose we can only attend to one thing at a time, whereas the evidence argues against this. Furthermore, the brain has something like 10 billion processors or neurons. Each neuron can be understood as carrying out a simple calculation, but their combined operation can lead to a different kind of process, namely, parallel processing. Algorithms follow a sequence of logical steps. In analysing a visual input, for example, (which is encoded as thousands of dots or ‘pixels'), the traditional method is to examine the input dot by dot; the processing of each dot is carried out serially. What might be required is a computer that processed information in parallel (i.e. one that could carry out many computations at any one time). In this example, this would mean that the computer analyses each pixel of an image in parallel, and this would speed up the process significantly. The implication is that the serial nature of the algorithm makes it inappropriate as a metaphor for mental processing.

•  Algorithms are difficult to devise

A second problem with the algorithmic approach is that in order to design a computer to do the sorts of things that people can do, one first needs to know what the algorithm is, and this will be based on the underlying theory. However, algorithms for doing things we find easy, such as selectively attending to information in the environment, understanding speech, perceiving depth and so on, are not well understood. Imagine how easy it is for us to understand speech even when it is spoken with an accent, or at different rates, or in a noisy environment. An algorithm for speech recognition cannot be just written down; it requires extensive research. The quest for algorithms that may describe some aspect of human ability or intelligence is the main goal of AI. So, while the technology of computer modelling is available, progress in this area is slowed by the fact that our theories are too underdeveloped and the complexity of many human abilities has been underestimated.

•  Human knowledge is acquired, not built in

Having developed a theory and designed an algorithm the next task is to program the computer. In programming the computer model, the knowledge the model needs to simulate the cognitive process has to be built-in by the programmer. A criticism here of this approach is that human ability isn't just built-in or hardwired, it is acquired through learning.

•  Human brains look nothing like computers

Another problem with the computer metaphor concerns its similarities (or lack thereof) with the brain. Indeed, there seems to be no clear idea how a program could be represented in the brain. Cognitive psychologists and AI researchers have tended to ignore or avoid any serious attempt to specify how their models might be implemented in the brain. The algorithm or program (the software) is thought of as a metaphor of the mind, and the hardware (the computer's nuts and bolts) as the equivalent of the brain. An implication of this is that memory has been viewed as a filing cabinet or as a library, passively storing information in individual compartments which are searched during recall. As Rumelhart and Norman (1981) record: Of course we have always realised that someday our theories of memory would have to be brought into line with our knowledge of brain function, but we assumed that the hardware of the brain was general enough to support almost any proposal that we found useful to postulate. (p. 2)

Despite such problems, many AI researchers still view the algorithm as a valid metaphor of mind, yet others are severely critical. In considering these kinds of problems we see that the conventional computer is not a good metaphor of the brain or of human information processing in general, the point being that computers are often laboriously slow and inefficient at doing those things that humans find so easy.

To the lay person, a researcher in AI might conjure up a sort of Dr Frankenstein image, one who is attempting to create a machine so intelligent that it threatens human existence. Popular cinema often ascribes to this picture: it depicts a computer or a group of androids as machines with human-like intelligence, emotions and intentions (for example, the movie I Robot , which is loosely based on the Asimov novel of the same name). In reality, however, current achievements in AI are significantly inferior to their fictional counterpart. The question here is why? Is it because the algorithms underlying human skills and intelligence are too complex and that we simply do not know at present what they are, or is it because the computer analogy is inappropriate and we can never build serial computers to behave in human-like ways?ction 2

Foundations of connectionism

The connectionist metaphor has, to a large extent, taken the place of the computer metaphor. Here you will learn about why this is so and the principles of connectionist models. One idea is that the brain is a massively parallel, richly interconnected set of processors. Neurons, it is argued, may act as simple processors, with each neuron making a very small contribution to the overall storage of memory. This sharing of information gives the brain a distributed representation: a single memory may involve thousands of neurons (and a single neuron may be involved in many memories). In addition, neural networks in the brain acquire knowledge through learning – they do not embody pre-programmed knowledge. This is extremely powerful because it implies that we may no longer need to search for an appropriate algorithm, but rather get the neural network to ‘discover' one.

This approach has been referred to as parallel distributed processing, as artificial neural networks, and as connectionism (the latter term being most used by psychologists). The properties that emerge from this kind of processing may be more cognitive-like than the algorithms designed for the conventional computer. Indeed, such an approach has shown great promise, and has initiated a trend towards modelling human intelligence based on the theory of neural networks.

Unlike the traditional computer metaphor, connectionist models are said to have emergent properties. These are useful features of the behaviour of the model that have not been built-in explicitly. So, while the behaviour of the traditional computer model is predicted by the algorithms that have been programmed in, the behaviour of the connectionist model is not. The list below describes some of these emergent properties, and it is intriguing that these properties are not explicitly built into the model. they emerge from the combined behaviour of the set of units in the model.

Properties of connectionist models

Connectionist models are mathematical models of the computations and activities thought to be occurring in the brain. They exploit the notion that the brain achieves parallel processing with distributed representational properties. They are also systems that learn about the environment they are exposed to. The kinds of properties that emerge from a connectionist network are generalisation, optimisation and fault tolerance.

Generalisation

•  Heteroassociation

Generalisation is the ability to correctly categorise, identify or retrieve information for stimuli that are novel or incomplete in some way. Classifying objects or concepts into distinct categories may be viewed as a problem of pattern recognition. This includes identifying a pattern as a member of a class or set of patterns (such as handwritten letters, spoken words, etc.). Classification is the grouping of a set of objects on the basis of similarity. Such groupings may be found in much of human information processing, such as in speech recognition and concept formation. Objects of a group share a common label and are assumed to have many other common attributes. Once a network has been trained on the a set of examples, it is able to correctly classify new items.

•  Autoassociation

Conventional computers store information by assigning a random location in memory – the computer simply records the location of the memory for future reference. Neural networks, on the other hand, store new information in ‘locations' according to its relationship with existing information (i.e., it stores information through its meaning). Human memory is also said to have this content address able nature. For example, when cued with a fragment of a particular memory, we can recall its whole. We know that a person's identity (name) can be discovered when only a small amount of information is given. Also we can visually recognise an object even though it may be partially hidden. Human speech is often poorly articulated, frequently occurs among much background noise, is occasionally spoken in unfamiliar accents; the message, too, may be delivered in an ungrammatical form, yet it may still be correctly perceived. This means that information may be imprecise, incomplete or ‘noisy' yet still be understood. Neural networks performing this type of association may be said to act as ‘clean-up' systems, where distorted or ‘dirty' input patterns activate their errorless, ‘clean' prototypes. A major attraction of neural networks is their content-addressability.

Optimisation

All processing is ultimately dependent upon selecting an output (response) for a particular input (stimulus) in a given context. It is desirable for a system to respond with the optimal output. For conventional methods this means searching among a vast number of options, which may take an impractical amount of time. Rather than selecting an output through serial search, neural networks can achieve a parallel search for the optimal output. Some neural network approaches have been shown to be highly likely to settle into an optimum response, although near optimal responses may be common.

Fault tolerance

The human and animal brain is noted for its tolerance to damage. If neurons in a particular region are damaged the whole region does not stop functioning, except in cases of extreme damage. Modern digital computers generally do not have this capacity. Neural networks, on the other hand, have been shown to be resistant to the removal of some of their units (Hinton and Sejnowski, 1986; Hinton and Shallice, 1990), showing a graceful degradation (a gradual decrease of efficiency) as the amount of damage increases. This is something that is known to occur in the brain, e.g., as in the case of Alzheimer's disease.

Principles of a connectionist network

A connectionist network consists of a set of ‘artificial neurons'; these are processing units that do very simple calculations and are said to mimic (albeit in highly simplified form) the activity of real neurons. Neurons in the brain are connected to many other neurons. A neuron will have many inputs from other neurons and will also output to many other neurons. When a neuron receives its input fro mother neurons it makes a decision: to fire or not to fire (thus it is an all-or-none affair). If the neuron fires then it delivers a signal (the same signal) to other neurons with which it is connected. Firing patterns go in one direction – thus a connection may either be an input to the neuron or an output from the neuron. The decision whether to fire or not is dependent upon the particular pattern of signals it receives in its inputs. Some connections are excitatory in that they try to force other neurons to fire and some are inhibitory in that they try to prevent other neurons from firing. In addition, some neurons may have a greater (excitatory or inhibitory) influence over a particular neuron than others. The degree of excitation or inhibition of one neuron upon another is known as a weight. It is the strength of a connection between neurons.

Connection weights

Learning in a connectionist network takes place on the connection weights. These are increased and decreased when the network is exposed to stimuli. How are appropriate weight values chosen? What is required is a simple rule that will help determine these values. We consider first the Hebb rule. This rule is named after the psychologist Donald Hebb who had a number of ideas about how real neurons learn:

When an axon of cell A is near enough to excite a cell B and repeatedly or persistently takes part in firing it, some growth process or metabolic change takes place in one or both cells, such that A's efficiency, as one of the cells firing B, is increased. (Hebb, 1949, p. 62)

Put simply, this states that when two neurons are firing the connection weight between them is increased. In addition, a modification to the Hebb rule is that when one unit fires and the other unit does not fire then the weight between them is decreased. Consider the network in Figure 9.1. This is a simple network that has to learn to discriminate between men and women. The bottom row of units represents inputs, features of people such as things that they do or other features that they have. The top row of units represents the output units. These are the units that make the final decision about whether the features correspond to those of a man or those of a woman. The lines between the two rows represent the weights and the arrows indicate the direction of the signals (from input units to output units). Suppose that the network is presented with the features of one particular woman (e.g., she has long hair, is not a soccer fan, reads Cosmopolitan magazine, does not drink beer, wears trousers, and drives a hatchback). The features present are represented by filled circles, and these units are said to be excitatory. Unfilled circles are features not present in the input and are inhibitory.

 

Figure 9.1 An example of a classification network

 

According to the Hebb rule, weights between the active inputs (excitatory units) and the active output unit (woman) are increased, and weights between inhibitory units (unfilled circles) and the active output unit are decreased. One can view the connection weights as being correlational: when they are both active they are highly correlated. If many more examples were presented to the network, some connection weights would become large and positive (e.g. that between long hair and woman, presumably), some would be close to zero because some women do and some women don't have these features (e.g. that between drives hatchback and woman), and some weights would be large and negative (e.g. that between reads Cosmo and man).

As the network begins to learn, it develops the power to generalise. So, if we presented the details of someone new, the network would make a good guess at whether it was a man or a woman. Furthermore, if one input unit was made inoperable then the network could still produce a decent guess. This is the property of fault tolerance. Other types of networks have been designed, such as the Perceptron, the Adaline, the Multi-Layered Perceptron, and the weightless network (Kan and Aleksander, 1987). Other rules for changing the connection weights have also been devised, such as the Delta rule (Widrow, 1962) and error back propagation (Rumelhart et al., 1986). Some networks, such as the Perceptron, are limited in terms of the kinds of patterns they can discriminate between (Minsky and Papert, 1969). However, many of these problems have been overcome by the application of more sophisticated learning rules, such as error back propagation.

 

The physicist and Nobel Prize winner John Hopfield investigated another type of network (Hopfield, 1982). This work also gave the field a major impetus. The model analysed by Hopfield is a fully interconnected, dynamic system in which each unit receives input from all other units (feedback), except from itself (see Figure 9.2). The weight between each unit can be modified according to the Hebb rule or Delta rule. This type of system has a distributed representation of memory, i.e. each unit is involved in the storage of many memories, and each memory involves many units.

 

Figure 9.2 Hopfield's fully interconnected network

 

 

The pattern of activation of the set of units at any time is the state of the network. On presentation of an input, each node makes the decision whether to excite or inhibit other units. They then pass their signals to each of the other units and the process is repeated. Through this iterative process, the activation of each unit, and hence the state of the system, may continually change, moving over time until it reaches a stable limit point (a process known as relaxation) – this is a point at which the activity of the set of units is unchanging. Each stable point has a region known as the basin of attraction, which a stable system will aim towards, usually finding the ‘nearest'. One can think of basins of attraction as valleys at various distances apart and the state of the system as a ball (Figure 9.3). The ball moves in time towards one of the valleys (the nearest or the deepest) whereupon it settles and moves no more. A system with a number of stable states can be viewed as a content-addressable memory, a memory accessed by content (as in the brain).

 

Figure 9.3 The basins of attraction metaphor.

 

The valleys represent the stable states of the system (memories) and the ball represents the current state of the system. The surrounding ‘terrain' conveys the areas from which each basin will attract the ball. The arrow indicates the direction the ball will follow (i.e. to the nearest or deepest basin) The Hopfield model epitomises the connectionist metaphor of memory. Memories are accessed by their content: partial information, processed in parallel, is used to reinstate a memory whole. This is in stark contrast to the previous view where memories are accessed by physical location (which gives rise to the library metaphor of memory, with individual memories or books being retrieved through the process of serial search). With this new Deepest basin approach some questions concerning human memory can be addressed with fresh insight.

If connectionist models are to be of any value to psychology, they must explain phenomena in the literature on various aspects of human memory, and they must also add to our understanding.

Connectionist models have been applied to a range of cognitive phenomena, including dream sleep (Crick andMitchison, 1983), grammar acquisition (McClelland and Rumelhart, 1986), synthetic grammar acquisition (Cleeremans and McClelland, 1991), neuropsychological modeling (Hinton and Shallice, 1990), classical conditioning (Grossberg and Levine, 1987), and models of emotion (Fulcher, 2002). Here we will only consider models of reading to give a flavour of the approach.

Connectionist models of reading

One of the earliest connectionist models of reading was that of McClelland and Rumelhart (1981). Their network simulated reading by converting visual letter features into word units through a hierarchical process. Letters are presented to the network, which first analyses the features of each letter and representations of individual letters become active when sufficient features are recognised. Representations of words then become active when certain patterns of letter representations are activated. The network determines its output through a ‘winner-takes-all' process, in which the response (word naming) corresponds to the most active word unit, which will be the one with the best fit to letter patterns. The weights in the network are pre-programmed and equate to knowledge of the relationships between letter features, letters and words. The model can account for a broad array of observations on reading performance. However, it cannot account for certain ‘migration errors' that result from the spatial representation of text, and this is because the input is not spatially represented. An example of a migration error (from Ellis and Humphreys, 1999):

PSYCHMENT DEPARTOLOGY

 

This can be read as PSYCHOLOGY DEPARTMENT before the reader realises what the text actually says. This shows that reading errors can be based on the relative positions of words.

To overcome this and a few other problems, McClelland (1986) developed PABLO (Programmable Blackboard Model), which has position-specific input and output units, a knowledge store, and modifiable (programmable) connections between the knowledge store and the position-specific information of words. Another model that takes positional information into account is BLIRNET (Mozer, 1987), which is also a hierarchical scheme.

Position-coded units are activated by visual letter features to combination units that process several recognised features in parallel. Word recognition occurs when the activity of letter cluster units of a particular word reaches a threshold. BLIRNET can account for many reading errors, including migration errors.

Connectionist models of reading have also been designed around theories of the processing routes involved in reading (see Chapter 6, Section 2). For example, NETtalk (Sejnowski and Rosenberg, 1987) models the direct-access route. Inputs are letters and spaces between words, and the output is a phonemic representation of the words. The network uses a ‘moving window' that corresponds to perceptual span and the reading of one letter at a time, and from left to right. The network was trained on 20,000 words and was correct about 75 per cent of the time. Although quoted successful in its performance, there is little evidence that the moving window approach is psychologically plausible. Rather, letters may be read in parallel (see also Chapter 6, Section 2). Furthermore, the training set designed for this simulation was not broken down into different word types, such as high and low frequency words. Seidenberg and McClelland (1989) used a network where the letters of words were presented in parallel.

The network has orthographic units for visual word recognition, which then activate their phonological representations. The model was tested on high and low frequency words, and is able to account for much of the evidence on reading, such as regularity effects in word naming times (words that have an irregular correspondence between their pronunciation and spelling take longer to name). There is some data that is difficult for the model, such as human data on reading nonwords (the model performs much worse than humans). Plaut et al. (1996) developed the above model further. Their approach is a modular one, in which separate networks, or modules, carry out different functions. According to Plaut et al. (1996) both humans and an adequate model of reading ‘gradually learn to be sensitive to the statistical structure among orthographic, phonological, and semantic representations and in which these representations simultaneously constrain each other in interpreting a given input' (p. 429). In other words, learning to read involves identifying regularities in the way words are spelled, in the way they are pronounced and in the way they are used within a sentence to convey meaning. Furthermore, the different kinds of learned regularities each inform the reader about the identity of the word. Some learned regularities will instantly rule out certain phonological outputs thereby speeding up the ‘search' for the correct output. Networks that are able to utilise information from more than one source may need to be modular in structure.

Simulating acquired dyslexia by ‘lesioning' a connectionist network

Recall that one advantage of the connectionist approach is that a connectionist network is tolerant to a degree of damage. In other words, unlike the computer metaphor where damage to one component might bring the machine to a halt, a connectionist network will still be able to produce an output event though a number of its units are damaged. More interestingly, the output it produces, although it may be in error, is likely to be a near miss.

One way of testing connectionist models of reading is to mimic brain damage (for example, acquired dyslexia) and compare the resulting performances with those of individuals with various forms of dyslexia. Mozer (1991) attempted to model neglect dyslexia by lesioning a modified version of BLIRNET, known as MORSEL. This model has two modules, one for word recognition and one for attention. The word recognition module is sensitive to spatial information (word and letter order), while the attention module controls attention to individual spatial locations. This attention serves to avoid mixing up letters and making errors (miscombinations of letters). When the attention module is lesioned by making the left side of the module less likely to become activated by its inputs, a certain pattern of errors is produced that mimics those made in neglect dyslexia. For example, it made errors of the form hand being read as sand. This error occurred more often with non-words, e.g. tand being read as sand, and this, and other errors made by the model, are consistent with those made in neglect dyslexia.

The model developed by Plaut et al. (1996), discussed earlier, has been tested on neuropsychological data from surface dyslexia. Lesions were simulated by removing a portion of units and connections, and by adding noise to some of the weights (changing the values of some weights by small amounts). Lesions affected performance on exception words, which accords with the data on surface dyslexia. However, the increase in regularisation errors observed in surface dyslexia were not mirrored by the model's behaviour. Plaut et al. argue that this type of error is likely to occur during semantic processing (which this model does not simulate in an adequate form).

Finally, Hinton and Shallice (1990) have modelled deep dyslexia. Their model consists of a layer of grapheme units (position-specific letter units) that feed a layer of sememe units (attractor units that contain semantic information), which in turn interact with a set of ‘clean-up' units (these units reduce the errors made in the semantic units and enable generalisation of meaning). The sememe and clean-up units act together to form an attractor network. Recall that an attractor network represents information as basins of attraction or ‘attractor states' in a network. Attractor states represent the stimuli the network has been trained to recognise. This means that when an incomplete pattern (e.g. a word with one letter missing) is presented to the network, the state of the network will move towards the nearest attractor state. So, for example, when presented with the word B*OK, the network will output BOOK. However, Hinton and Shallice (1990) employ the attractor analogy to semantic information. This means that attractors such as CAT and DOG are nearby attractors but CAT and MUG are distant attractors.

After training on a set of words the network was lesioned. Several types of error emerged:

•  visual errors: CAT – COT (caused by lesions within the grapheme units);

•  semantic errors: CAT – DOG (caused by lesions between sememe and clean up units);

•  mixed errors: CAT – RAT (caused by both types of lesions);

•  unrelated errors: CAT – MUG (caused by both types of lesions).

 

Visual, semantic and mixed errors were more common than unrelated errors, and indeed these errors match those found in deep dyslexia.

Evaluation: the problem domain of connectionism

Learning in a connectionist network appears to be of an implicit nature, that is to say learning is through ‘mere exposure' to material rather than through direct efforts to learn a set of rules. Connectionist networks are useful at (1) modelling the process by which memory wholes are automatically activated by their parts, and (2) inferring the rules of category membership from a set of exemplars the system has been exposed to. These are behaviours that appear to be rule governed, and although they may be described by rules, they may not be produced on the basis of explicitly stored rules.

The common feature of these forms of learning is that they involve the association of a stimulus (a verb, an exemplar, a representation of time, a particular date, the visual form of a word, the vocalised form of a word, etc.) with a response (past tense form of a verb, a category label, a choice of two alternatives, the name of a particular day, or a word label, and so on). Such associations may involve skills in which the way a response is computed is difficult or impossible to verbalise. Thus we may not be aware of the process by which we transform an irregular verb into its past tense form, but we certainly are aware of utterances that contain erroneous transformations (e.g. ‘I goed shopping'). If neural networks merely involve the learning of stimulus–response associations, are they anything more than sophisticated behaviourist models? Lachter and Bever (1988), critics of connectionism, argue that they are models of the Hullian (behaviourist) tradition that implement learning through many S–R associations. Lachter and Bever mockingly rename the collective units of a connectionist network as ‘massively parallel rodents' (MPRs), where each processing unit could be replaced by rat! Each rat receives a pinch on its tail (an input) and either freezes or lunges at the tail of the rat in the next layer (outputs 0 or 1).

Performance feedback of the network ofMPRs is a conditioning schedule of the Skinnerian variety, where the reinforcement (error signal) is specific to each rat. If Lachter and Bever are correct then neural networks might be, at best, limited to describing forms of animal learning rather than human linguistic mechanisms (and indeed, there are several connectionist models of classical conditioning).

Behind Lachter and Bever's critique of connectionist networks might be a concern that connectionism signifies a return to behaviourism. However, such a view is without foundation. According to behaviourist philosophy, internal representations were inferred and not directly observable; they were therefore unacceptable as phenomena of study.

Only the response (R) to the stimulus (S) was observable and hence worthy of study. Connectionist networks are intrinsically concerned with internal representations and cannot, therefore, be compatible with behaviourism (see also Pinker and Prince, 1988). Connectionism is clearly a lively area of research and is having ever more impact in psychology, and theoretical psychology is the major benefactor. Connectionist networks help us to better model aspects of human behaviour. These models provide detailed accounts of specific behaviours and may be used to test other hypotheses. In addition, they inspire experiments that might not otherwise have been thought of and the results of which could enhance ideas in applied areas. In short, ‘The neural network of the future can be to the psychologist what a row of test tubes is to the chemist: a test-bed for new conjectures' (Aleksander and Burnett, 1987). Another advantage of developing connectionist models is that important links are made between aspects of behaviour, cognition and neurophysiology. The bridging of these areas is, perhaps, something long overdue. Until recently, cognitive psychologists had been ignoring the work in neuropsychology. Clearly, there is much to gain from this kind of enterprise.ection 3

Thinking machines

If we take the computer metaphor to its extreme, or consider the computer not as a metaphor but as closely corresponding to human mental processes, then the question that arises is whether machines can think. The answer to this question may have major implications for the validity of the approach. Here you will study the views of the most outspoken supporters and critics of the view that machines can think or will be able to think at some point in the future.

Many cognitive researchers in the 1960s were of the view that such machines would be developed in the near future. Despite several major advances in the processing power of the modern computer, the desktop PC sitting in front of me is no closer to being able to think than the radio sitting next to it. Yet could a combination of even better computing power and some sophisticated programming yield a thinking machine? For example, what about Pearl , the chess program that beat the world champion Kasparov in 1997 – can it be said to think?

The Turing test

If a researcher in AI claimed to have created a thinking machine how could we put that claim to the test? Alan Turing, a British mathematician, devised a test for answering this question (Turing, 1950). The test, which later became known as the Turing test, is designed to discover whether a human observer can tell whether the person or thing it is interacting with is a human or a computer.

The test is set up as a game, which Turing called the imitation game. There are three players, a person, a machine and an interrogator. The person and the machine are hidden from the view of the interrogator, and all communication between players is typewritten. The aim of the interrogator is to identify which player is the machine and which player is the person. The aim of the machine is to convince the interrogator that it is the person and that the person is a machine, and it is the aim of the person to convince the interrogator that the machine is a machine and he or she is the person.

The point of the exercise was to show that in order to make the claim that a machine can think, it must be able to understand and must be able to generate germane responses that could fool a person into thinking it was human. But if a computer were able to perform in such a way that an expert could not discriminate between its performance and that of a human, to what extent does this mean that the machine can think?

The ‘strong AI' view is that if a machine passed the Turing test it ‘would not merely be a model of the mind; it would literally be a mind, in the same sense that a human mind is a mind' (Searle, 1990, p. 26). The ‘weak AI' view, which Searle (1980, 1990) adheres to, is that such a machine could not be considered as having a mind. He argues that it is more realistic to ‘think of computer models as being useful in studying the mind in the same way that they are useful in studying the weather, economics, or molecular biology' (Searle, 1990, p. 26). Searle takes us through the ‘Chinese room thought experiment' to explain how he arrives at the position that conventional computing machines cannot, and will never be able to, think.

The Chinese room

Searle (1980) asks us to consider a language that we do not understand. In his case (and in mine) it is Chinese. To Searle (and to me), Chinese writing looks like ‘meaningless squiggles'. He then asks us to imagine that we are placed in a room containing hundreds of Chinese symbols and a rule book in English for matching Chinese symbols with other Chinese symbols. People outside the room do understand Chinese and these people submit Chinese symbols into the room and wait for us to hand them back another Chinese symbol. Our task is to use the rule book to identify which symbol to return based on the symbol we have been given.

This scenario is Searle's analogy of the modern computer. The people outside the room are the programmers, the person inside the room is the computer, the rule book is the computer program, the Chinese symbols posted into the room are queries, and the symbols returned are the answers. Suppose that the symbols posted to the room represent the question ‘What is your favourite colour?' and the symbols handed back represent the statement ‘My favourite colour is blue but I also like green a lot.' To the person outside of the room it appears as though the person inside the room has understood the question, yet to the person inside the room the symbols are meaningless. Remember in our earlier discussion about algorithms we noted that the computer follows a strict set of rules when doing something. It matters not whether the task is to search a database for a particular author or to carry out a t-test in a statistics package.

The computer does not process any meaning: it just follows a set of instructions. Just as the person in the Chinese room does not understand Chinese, so the computer does not understand the information it is processing. Searle argues, then, that computers are essentially syntactic (rule-based systems) but human minds deal with meaning. Since syntax alone cannot produce meaning (see Chapter 6 on language) then computer programs cannot create minds. Searle (1990) concludes with an amusing but salient point: one can imagine a computer simulation of the oxidation of hydrocarbons in a car engine or the action of digestive processes in a stomach when it is digesting pizza. And the simulation is no more the real thing in the case of the brain than it is in the case of the car or the stomach. Barring miracles, you could not run your car by doing a computer simulation of the oxidation of gasoline, and you could not digest pizza by running the program that stimulates such digestion. It seems obvious that a simulation of cognition will similarly not produce the effects of the neurobiology of cognition. (p. 29)

 

Simulating angst: I am suffering the angst of post industrial society under late capitalism

 

For Searle, one could write a computer program to print out sentences such as ‘I am suffering the angst of postindustrial society under late capitalism', thereby simulating the feeling of angst, but simulation is not the same thing as duplication. AI programs have been successful at solving complex mathematical problems, solving tactical problems, playing chess, proving theorems and engaging in some forms of dialogue. Yet what computers lack is the vast store of background knowledge that people bring to problems as well as their commonsense ability to use their knowledge as the situation demands (Dreyfus, 1972). Another problem with computers of the 1970s and 1980s was that computer systems designed to process visual information took an enormous amount of time to recognise simple objects. A feat that takes biological visual systems just seconds. Yet the processors of the computer are about a million times faster than those of the brain.

 

If this topic is covered in your course then try to get hold of a copy of Searle's original article – it is a very accessible read. Churchland and Churchland (1990) argue that machines may be able to think, and that computers based on connectionist networks are more likely to achieve this than the conventional (rule-based) digital computer.

They argue that the brain has the ability to think because of the following:

•  The nervous system processes in parallel rather than in serial. For example, the retina sends information to the brain in the form of about a million distinct signals. The largest number of parallel signals the computer processes at any one time is a meagre 16 or 32, depending upon whether it is a 16- or 32-bit machine.

•  Neural processes are dynamic. Neurons that project to another area of the brain receive signals back from the same area. These ‘recurrent' signals allow the system to modulate its own sensory processing.

 

Connectionist networks share more of these features with the brain than does the modern computer. Furthermore, connectionist systems are not rule governed, they do not process symbols. The Churchlands argue, therefore, that Searle's Chinese room does not apply to computers based on connectionist principles.

The Chinese gym

Searle's reply to the Churchlands' argument was to suppose the Chinese room was replaced by a Chinese gym in which there are lots of people all arranged to work on the symbols in parallel (thereby mimicking a connectionist network). The people in the gym could still send back messages that have meaning to the people outside the gym, yet to those inside the symbols are still meaningless. The Churchlands reply that the same is true of the brain – no single neuron in a Chinese speaker's brain understands Chinese, although their whole brain does. To the Churchlands the brain is a computer but in a radically different style, and it is still not known how the brain deals with meaning. They conclude that there is no principled reason why science could not construct an artificial system capable of thought by using what is known about the nervous system.

Typical Exam Questions

1. Evaluate the traditional computer metaphor of cognition.

2. What is a connectionist network?

3. Can a machine think?

Section 5

Further reading

The original collection of papers that inspired many psychologists to become interested in connectionism is quite an accessible read:

McClelland, J. and Rumelhart, D. E. (1986) Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Volume 2. Psychological and Biological Models. Cambridge , MA : MIT Press.

See also:

Churchland, P. S. and Churchland, P.N. (1990) ‘Could a machine think?' Recent arguments and new prospects', Scientific American, 262, 32–7. Ellis, R. and Humphreys, G. W. (1999) Connectionist Psychology: A Text with Reading .

Hove: Psychology Press.

Searle, J. R. (1990) ‘Is the brain's mind a computer program?', Scientific American, 262(1), 26–37.

Next >>


Write up your lab report with this unique application. www.labwriteup.com
 
 
 

This book was first published in 2003 by Crucial, a division of Learning Matters Ltd [ISBN 1 903337 13 5] © 2003 Eamon Fulcher; © 2009 GEFT Consultance Services (geft.co.uk).

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without prior permission in writing from Geft Consultancy Services, who may be contacted via www.geft.co.uk.