High-Level Vision (Biological)
1. Give an example of a superordinate, a basic, and a subordinate category in the domain of (a) food, (b) furniture.
(a) superordinate = fruit, basic =apple, subordinate = Golden Delicious (b) superordinate = furniture, basic = chair, subordinate = arm chair
2. What is the difference between the “viewer-centred” and “object-centred” approach to object recognition?
In the viewer-centred approach, the 3D object is modelled as a set of 2D images, showing different views of the object.
In the object-centred approach, a single 3D model is used to describe the object.
3. What is a geon? What is a structural description?
Geons are simple three dimensional shapes such as spheres, cubes, cylinders, cones or wedges.
A structural description is a representation of an object in terms of its component geons and their relative
locations and sizes.
4. An object recognition system encodes objects using 2-element feature vectors. Four objects from two classes are encoded as follows:
Object Class 1 A
2 A
3 B
4 B
Feature Vector (7,7) (7,4) (3,4) (1,4)
A new object, of unknown class, has a feature vector (3,7). Determine the classification of the new object using a (1) nearest mean classifier, (2) nearest neighbour classifier, (3) k-nearest neighbour classifier, with k=3. Use the Euclidean distance as the similarity measure.
(1) nearest mean classifier
Prototype of class A = 7+7 , 7+4 = (7, 5.5)
Prototype of class B = 3+1 , 4+4 = (2, 4) 22
Distance of new object from prototypes:
From prototype of class A: (7 − 3)2 + (5.5 − 7)2 = 4.27
From prototype of class B: (2 − 3)2 + (4 − 7)2 = 3.16 Hence, new object is class B.
(2) nearest neighbour classifier
Distance of new object from exemplars:
22
Object Class 1 A
2 A
3 B
4 B
Feature Vector (7,7) (7,4) (3,4) (1,4)
Distance to (3,7)
(7 − 3)2 + (7 − 7)2 = 4 (7 − 3)2 + (4 − 7)2 = 5 (3 − 3)2 + (4 − 7)2 = 3 (1 − 3)2 + (4 − 7)2 = 3.6
The closest exemplar is object 3.
Since object 3 is of class B, the new object is also class B.
(2) 3-nearest neighbour classifier
The three closest exemplars are 3, 4 and 1.
Object 3 is class B.
Object 4 is class B.
Object 1 is class A.
The majority are class B, so the new object is classified as B.
5. How do the following properties of cortical neurons change when moving along the ventral pathway from more pe- ripheral areas to higher areas: (1) receptive field size, (2) sensitivity to stimulus location, (3) complexity of preferred stimulus.
44
(1) increases
(2) decreases
(3) increases
6. Describe the inputs to (a) a simple cell, and (b) a complex cell and explain how these inputs give rise to the observed response properties.
Simple cell: input is from a number of centre-surround cells which have RFs on a common line. These centre- surround neurons are activated by a bar/edge at the correct orientation, resulting in the simple cell responding to a oriented bar/edge at a specific orientation.
Complex cell: input is from a number of simple cells with the same orientation preference within a small spatial region. A bar/edge at the correct orientation and location to activate one of these simple cells will result in the complex cell responding, and hence, the complex cell responds to to oriented edges with some tolerance to exact location.
7. Describe the mathematical operation that is performed by neurons in the HMAX model in the (1) simple cells, (2) complex cells.
(1) simple cells output the SUM of their inputs.
(2) complex cells output the MAXIMUM value of their inputs.
8. Write down Bayes’ theorem, and explain the interpretation of each term in relation to a computer vision system.
p(H|E) = p(E|H)p(H) p(E)
p(H|E) is the (posterior) probability that hypothesis H is true, given the image evidence E. This is what the vision system needs to evaluate (generally, we want to find the most likely hypothesis that explains the image data.)
p(E|H) is the likelihood that if hypothesis H were true, the image would contain particular evidence E. (Cal- culating this quantity is based upon our scientific knowledge of the image formation process, e.g. that a certain set of surface properties and illumination conditions would result in a certain image being formed as a result.)
p(H) is the prior assumptions about the likelihood of the hypothesis in the first place. If H is extremely im- probable, then stronger evidence is required to support it.
p(E) is the probability that the evidence E would be found in images anyway, regardless of whether or not hypothesis H is true. Thus if p(E) is large (e.g. images contain some bright regions no matter what), then this reduces our confidence in inferring any particular hypothesis H as a result of observing E.
9. A production line produces two objects (objA and objB) which are sorted into separate bins using a computer vision system controlling a robot arm. The two objects have distinct shapes from most viewpoints. However, if objA happens to lie at a particular orientation (oriA), and objB lies at oriB, then the images of the two objects are indistinguishable.
It is known that the production line produces three times as many of objA than objB. It is also known that the probability of objA lying at oriA is 0.1, while the probability of objB lying at oriB is 0.2.
Using Bayes’ theorem determine into which bin the robot should sort an object which could be either objA at oriA or objB at oriB in order to minimise the number of errors.
p(objA)=0.75
p(objB)=0.25
p(I|objA)=0.1
p(I|objB)=0.2
p(objA|I)= p(I|objA)p(objA) =k(0.1×0.75)=0.075k p(I)
p(objB|I)= p(I|objB)p(objB) =k(0.2×0.25)=0.05k p(I)
Hence, indistinguishable images are most likely to contain object A.
Note k = 1 which is the same for both possibilities, so its value is not important to answer this particular
p(I)
question. However, if we needed to calculate the absolute posteriors then we would need to calculate p(I).
p(I)=p(I|objA)p(objA)+p(I|objB)p(objB)=(0.1×0.75)+(0.2×0.25)=0.125 45
So
p(objA|I)= 0.075 =0.6
p(objB|I)= 0.05 =0.4 0.125
0.125
46