CS代考 This Week

This Week
• Physics of image formation (light, reflectance, optics)
• Geometry of image formation (camera models, projective geometry)
• Digital images (digitisation, representation)
• The Eye
Part a
Physics of Image Formation (Part 1)
7CCSMCVI / 6CCS3COV: Computer Vision
Image Formation

Overview of image formation
Light from a radiation source is reflected by surfaces in the world.
Reflected light passes through some optics and is recorded by a sensor.
P
LIGHT SOURCE
Illuminance (E)
SENSOR OPTICS P’
Luminance (L)
Ingredients of image formation
The image that is formed is affected by two sets of parameters:
Radiometric parameters
Determine the intensity/colour of a given location in the image
• illumination (type, number, location, intensity, colour-spectrum) • surface reflectance properties (material, orientation)
• sensor properties (sensitivity to different electromagnetic frequencies)
Geometric parameters
Determine where on the image a scene point appears • camera position and orientation in space
• camera optics (e.g. focal length)
• projection geometry (mapping from 3D to 2D)
SURFACE
Images are formed when a SENSOR registers RADIATION that has interacted with PHYSICAL OBJECTS
Contents
Physics of image formation
• light
• reflectance • optics

Colour perception
• The radiation that drives the human construct of colour, is fundamentally colour less.
• The sensation of colour is determined by the human visual system, based on the product of light and reflectance.
• However, it is a convenient short-hand to refer to electormagnetic radiation as having colour, e.g. to say “red-light” or “blue-light”.
Illuminance
Light is produced in different amounts at different wavelengths by each light source.
Sun (am) Sun (pm) Fluorescent light
Light and Colour
400nm 500nm 600nm 700nm
• At the earth’s surface the intensity of the electromagnetic radiation emanating from the sun has a peak within the 400-700nm range.
• The human eye has evolved a specific sensitivity to this part of the electromagnetic spectrum.
• Hence, visible light is that part of the electromagnetic spectrum with a wavelength (λ) between 400 and 700nm.
• Cameras and Computer Vision systems also concentrate on this part of the spectrum (but not exclusively, e.g. infra-red cameras for detecting body heat, X-rays for imaging inside the body).
wavelength (m)

Colour mixing
Mixing light: additive
e.g. a green light plus a blue light plus a red light gives light containing a broad spectrum of light, i.e. white.
The illumination from different light sources adds.
Mixing pigments: subtractive
e.g. a green pigment plus a blue pigment plus a red pigment gives a pigment that absorbs light over a broad spectrum, leaving black.
The reflection from different surfaces subtracts.
Measuring surface properties
The biological vision system perceives the colour of surfaces in the world, not the colour of the light entering the eyes.
e.g.:
Looks green-blue Looks yellow-orange
Luminance
Light is differentially reflected at each wavelength, which gives objects their natural colours.
albedo = fraction of light reflected at a particular wavelength
An object looks green because is absorbes red and blue light leaving more green in the reflected light.

Measuring surface properties
The biological vision system perceives the colour of surfaces in the world, not the colour of the light entering the eyes.
This is an ill-posed problem: we record L but need to recover R and don’t know E.
L(x,y,λ) = fn (E(x,y,λ), R(x,y,λ))
Intensity at a particular location (x,y) and wavelength (λ)
Luminance (L) amount of light striking the sensor, depends on Illuminance (E) amount of light striking the surface, as well as Reflectance (R) which depends on material properties
Measuring surface properties
EL R
E’ L’ RR
If colour/intensity of light source changes, colour/intensity of reflected light also changes (i.e. L varies with E)
If colour/reflectance of surface changes, colour/intensity of reflected light also changes (i.e. L varies with R)
E L’
Measuring surface properties
The biological vision system perceives the colour of surfaces in the world, not the colour of the light entering the eyes.
e.g.:
Light entering eyes is green-yellow

Colour constancy: biological
However, the human visual system does seem able to recover surface colour, since despite large changes in illumination (and consequently the intensity spectrum that enters our eyes), we usually experience the colour of an object as being constant.
artificial light hazy daylight clear blue sky
We are not normally aware of this variation because colour constancy mechanisms discount the effects of illumination, and infer the colour of the objects.
Part b
Physics of Image Formation (Part 2)
Colour constancy: artificial
To recover the surface colour of a particular location, R(x,y,λ), we need to know the colour of the illumination at that point, E(x,y,λ).
Many ways of approximating E have been suggested:
– Average reflectance across scene is known (often fails) – Fixing brightest image patch to be white
– Gamut (collection of all colours) falls within known range – Known reference colour (colour chart, skin colour…)
– Specular reflections have the colour of the illumination
None of these are work particularly well.

Focusing Light
LIGHT SOURCE
Light spreads out from a point.
Without some kind of optics each location on the sensor will register light coming from many different points in the world.
No image will be formed.
Pinhole camera
P1
P2
SURFACE
SENSOR
“Focus” means that all rays coming from a scene point converge into a single image point.
“Exposure” is the time needed to allow enough light through to form an image (the smaller the aperture, the longer the exposure time). The longer the exposure the more blurred an image is likely to be.
Restricts the flow of light (using a small hole) so that only one ray from each point in the world reaches the sensor.
Note: image inverted
Contents
Physics of image formation
• light
• reflectance • optics

Lensed camera
Cost: image focused for only a restricted range of object positions
Thin lenses
A Lens works by refracting light.
optical axis
F’ = focal point, f = focal length, O=optical centre
• Rays passing through O are not refracted
• Rays parallel to the optical axis are refracted to pass through F’.
• Rays passing through point F are refracted to be parallel to the optic axis.
f depends on the curvature of the lens and the material it is made from.
•With a large pinhole, light rays from the same point project to different locations on the image. The image is blurred.
By focusing rays from the same point onto a single image location, a lens can keep the image sharp while gathering more light.
Pinhole camera
Restricts the flow of light (using a small hole) so that only one ray from each point in the world reaches the sensor.
Note: image inverted
Small pinhole: sharp focus but dim image (long exposure time) Large pinhole: brighter image (shorter exposure) but blurred
To produce an image that is both bright and in focus requires a lens

Depth of focus
|z’| 2F’ F’
2F2F’ F’
2F’ F’
|z|
F 2F
F 2F
F 2F
1=11
f ∣z∣∣z’∣
For a lens with a fixed focal length, the depth of the image plane required to bring an object into focus varies inversely with the depth of the object.
At the extremes: if the object is at infinity the image is formed at F’, if the object is at F (or closer) no image can be formed.
For a short focal length camera, almost all objects will be located more than 2 focal lengths from the lens (top figure)
Placing the receptor plane at a fixed depth in the range z’ ∈ [F’, 2F’] will provide an acceptable image for objects in a wide range of depths greater than 2F.
Focal range
2F’ F’
object appears to be in focus at both positions
focal range
F 2F
Focal range is defined by the range of object locations such that blurring due to the difference between the receptor plane and the focal plane is less than the resolution of the receptor device.
Decreasing the aperture size increases the focal range, but it decreases the amount of light available to the receptor.
As the aperture decreases to a pinhole we recover the infinite focal range of the pinhole camera!
Thin lenses
optical axis
An object located at distance z from the lens has an image formed at depth at z’ from the lens according to the “thin lens equation”:
1=11
f ∣z∣∣z’∣
Hence, depth of the image depends on the depth of the object as well as the focal length of the lens.

Contents
Geometry of image formation
• perspective camera model
• projective geometry
• intrinsic and extrinsic parameters
Geometric camera models
Given the coordinates of a point in the scene, what are the co- ordinates of this point in the image?
i.e. given P = (x,y,z) how do we calculate P’=(x’,y’,z’)?
To answer, we need a mathematical model of the geometric projection implemented by the camera, often called simply a camera model.
Part c
Geometry of Image Formation (Part 1)

Perspective camera model
P: a scene point with coordinates (x,y,z), P’: its image with coordinates (x’, y’,z’) O: origin (pin hole / centre of lens)
Image plane Π’ is located at a distance f’ from the pinhole along the vector k optical axis: line perpendicular to image plane and passing through O
C’: image centre (intersection of optical axis and image plane)
Equation of projection (2D) O
From similar triangles:
Hence:
x’/x = f’/z x’ = (f’/z) x
Perspective (or pinhole) camera model
A lens follows the pinhole model for objects that are in focus.
The pinhole camera is therefore an acceptable mathematical approximation (i.e. a model) of image formation in a real camera.
For a pinhole camera everything is in focus regardless of image plane depth Pinhole model arbitrarily assumes that image plane is at distance f ‘

Equation of projection (3D)
Wecanre-writethese equations in matrix form, it is aconventiontouse homogeneous coordinates:
Virtual image
O
A pinhole camera creates inverted images.
x’ f’ 1 0 0 0 xy y’= 0 1 0 0
   
z’ z 0 0 1 0 1z projection
operator
It is traditional to draw the image plane in front of the pinhole at the same distance from it as the actual image plane.
Resulting “virtual” image is identical to the real image except it is the right way up.
This does not change the mathematics of the perspective camera model.
Equation of projection (3D)
Similarly, for y coordinates:
Since P’ lies on image plane:
x’= f ‘ x z
y’= f ‘ y z
z’ = f’
All coordinates are relative to camera reference frame [mm] – i.e. axes rigidly attached to camera with origin at pinhole

Distant objects appear smaller
The apparent size of an object depends on its distance
In world:
A and C have equal length, B twice this length
In image:
B’ and C’ have equal length, A’ half this length
Vanishing points
virtual image plane
camera centre
virtual image plane
camera centre
vanishing point
line on ground plane
vanishing point
line on ground plane line on ground plane
Vanishing point = projection Vanishing point = projection
of a point at infinity
of a point at infinity
Parallel lines have the same
Parallel lines have the same
vanishing point
vanishing point
Projective geometry
Euclidean geometry describes objects “as they are”.
It describes transformations within a 3D world (i.e. translations and
rotations)
These mappings do not change the shape of an object (i.e. lengths, angles, and parallelism are preserved).
Projective geometry describes objects “as they appear”.
It describes the transformation from the 3D world to a 2D image (i.e.
scaling and shear in addition to translations and rotations)
This mapping does distort the shape of an object (i.e. lengths, angles, and parallelism are not preserved).
Some properties of (i.e. distortions caused by) projective geometry are described on the following slides…

Vanishing points provide cues to size
Our visual system is good at detecting vanishing points and using this to extract information about depth and object size. Can be used in computer vision too.
Vanishing points provide cues to size
Our visual system is good at detecting vanishing points and using this to extract information about depth and object size. Can be used in computer vision too.
Example of projective distortion
Reality is distorted by projection: In world:
• rail tracks parallel
• ties equal in length
• ties perpendicular to tracks • ties evenly spaced
In image:
• rail tracks converge at a vanishing point
• ties get shorter with distance
• ties not at right angles to track • ties get closer together at longer distances.

Contents
Geometry of image formation
• perspective camera model
• projective geometry
• intrinsic and extrinsic parameters
Mapping from world to image coordinates
Where a point in the 3D world appears on an image depends not just on the camera model, but also on:
• Intrinsic parameters
• Depend on properties of camera
– focal length, sensor dimensions/resolution
• Extrinsic parameters
• Depend on the camera location
– translation and rotation of camera relative to world
Part d
Geometry of Image Formation (Part 2)

Relating pixels to camera coordinates
u
v
We can re-write these equations in matrix form:
u 1  0 o x 1 0 0 0 xy
v=0o0100 z y z
1 00100101

This is only an approximation, due to each individual camera having small manufacturing errors (i.e. the image plane may not be perfectly perpendicular to the optical axis, or may be rotated slightly about the optical axis (‘skew’))
Relating pixels to world coordinates
So far it has been assumed that the location of the scene point P=(x,y,z)T is given in camera coordinates.
However, more generally scene points may be known relative to some external reference frame.
An external reference frame is especially useful when: • the camera is moving
• multiple cameras are used (e.g. for stereopsis)
intrinsic projection
camera operator parameters
Z
y’
x’ C’=(ox,oy)T
image centre/ principal point
Y
t O
y
To convert coordinates from one reference frame to another requires a translation and a rotation.
z
X
P
x
v
To convert from camera reference frame [mm] to image reference frame [pixel], we need to take account of:
• Origin of image (in corner, not centre)
• Pixel size (sx, sy [mm/pixel]) u=−x’+ox =−f ‘ x+ox=αx+ox
Relating pixels to camera coordinates
u
y’
x’ C’=(ox,oy)T
image centre/ principal point
sx sxz z
v=−y’+oy =−f ‘ y+oy =βy+oy sy syz z
Where:
• u, v are integer image coordinates [pixel]
• α, β magnification factors in x’ and y’ directions (α=-f’/sx , β=-f’/sy)
ox,oy, α, β are the 4 intrinsic camera parameters α/β = aspect ratio of camera

Relating pixels to world coordinates
We can re-write these equations in matrix form:
Xx
Y=R ty
Y
O
Z
y
z
x
Where:
R is a 3×3 rotation matrix
t is a 3×1 translation vector 0 is a 3×1 vector of zeros
Z01z  T 
11
t
P
X
Relating pixels to world coordinates
To go from the coordinates of a point in the world reference frame to the camera reference frame we require the inverse mapping:
Y
t O
z
xX y=RT −RTtY
z01Z  T 
11
Z
y
P X
x
Relating pixels to world coordinates
Coordinates of point in frame OXYZ
Coordinates of origin of frame Oxyz (in frame OXYZ)
P= t + R p
x
Rotation matrix describing orientation of frame Oxyz with respect to frame OXYZ
Coordinates of point in frame Oxyz
World reference frame OXYZ Camera reference frame Oxyz
R and t are the rotation matrix and translation vector going from OXYZ to Oxyz
Y
t O
y
z
P
X
Z

Mapping between 2D and 3D
Given a point in 3D space we can model where that point will appear in a 2D image.
» A well-posed, forward problem
However, given a point in a 2D image we cannot determine the corresponding point in 3D space (depth information has been lost).
» An ill-posed, inverse problem
To recover depth, need extra information – e.g.
• anotherimage(theneedforstereovision),or
• prior knowledge about the structure of the scene.
Part e
Digital Images
Relating pixels to world coordinates The complete transformation is thus:
u 1  0 o x 1 0 0 0 R T − R T t XY v=0o0100
 y T  1 z 0 0 1 0 0 1 0 0 1 Z1
intrinsic
camera parameters 2D → 2D
Maps points on the image plane into pixel image coordinates
projection operator
3D → 2D
Projects points in the camera reference frame onto the image plane
extrinsic camera parameters
3D → 3D
Transforms the world coordinates to the camera reference frame

Digital image formation
LIGHT SOURCE
Illuminance (E)
P
OPTICS
Luminance (L)
P’ SENSOR (CCD)
0
10
10
15
50
70
80
0
0
100
120
125
130
130
0
35
100
150
150
80
50
0
15
70
100
10
20
20
0
15
70
0
0
0
15
5
15
50
120
110
130
110
5
10
20
50
50
20
250
SURFACE
DIGITAL IMAGE
Digital image representation (greyscale)
A digital image is a 2D array (matrix) of numbers:
x=
5859 6061 626364
7
10
10
15
50
70
80
1
0
67
123
25
30
130
2
35
100
150
150
80
50
8
15
70
100
10
20
20
12
15
76
5
17
0
15
5
15
50
120
110
130
110
5
10
20
50
50
20
250
y =41 42 43 44 45
The scene, which is a 46
continuous function, is sampled at discrete points (called picture elements or pixels for short).
Value of each pixel = measure of light intensity at that point.
47
Typically:
0 = black
255 = white integers
Or:
0 = black 1 = white floats
Contents
Digital images
• digitisation
• representation

Pixelisation and Quantization
Pixelisation: intensity values averaged at each sampling point (i.e. within each grid location in the sensor array).
Quantization: intensity values represented using a finite number of discrete values (e.g. rounded to the nearest integer value).
Effects of pixelization
128 x 128
32 x 32
64 x 64
16 x 16
8 x 8
4x 4
Image axes and notation A digital image is a matrix of
pixel values, I.
A pixel value is I(x,y) or I(p)
• where p = (x,y)T is point in the image
• Origin is top left corner
• (1,1) is usual convention, but (0,0) may be used when more convenient (e.g , when using a programming language that indexes from 0)
x=
5859 6061 626364
y =41 42 43 44 45 46 47
Note: in mathematics (and Matlab) I(r,c) would refer to the rth row and the cth column of I, so I is the transpose of a matrix in standard format.
7
10
10
15
50
70
80
1
0
67
123
25
30
130
2
35
100
150
150
80
50
8
15
70
100
10
20
20
12
15
76
5
17
0
15
5
15
50
120
110
130
110
5
10
20
50
50
20
250

Image formats
Binary or Monochrome:
• 1 binary value per pixel, 1-bit ⇒ 2 discrete levels. • I(p) = 0 or 1.
Greyscale or Intensity:
• 1 real value per pixel, typically 8-bit ⇒ 256 discrete levels.
• I(p) ∈ [0,1]
Colour:
• 3 real values per pixel, i.e. 3 colour channels, e.g. RGB • IR(p) ∈ [0,1], IG(p) ∈ [0,1], IB(p) ∈ [0,1]
• Each colour channel is a greyscale image representing the intensity of light at a particular wavelength (R=645.2nm, G=526.3nm, B=444.4nm)
• 24-bit ‘True Color’ ⇒ 8-bits for each colour.
Colour image representation (RGB)
IR IG IB
In MATLAB represented as a 3D matrix, so I(x,y,c) is the intensity in the c-channel at location x,y.
Note: many other formats are used to represent colour images.
Effects of quantization
256 gray levels (8 bits/pixel)
2 gray levels (1 bit/pixel) BINARYIMAGE
16 gray levels 8 gray levels 4 gray levels (4 bits/pixel) (3 bits/pixel) (2 bits/pixel)
256 128 64 32 16 8 4 2
8bits 7bits 6bits 5bits 4bits 3bits 2bits 1bit

Part f
Camera Sensors
Contents
Camera Sensors
• Charge Coupled Devices (CCDs)
• Demosaicing and Interpolation
Switching between formats
RGB to Greyscale
Take weighted average over three colour channels:
Igrey=rIRgIGbIB rgb
where r, g, b are three weighting factors (if r=g=b=1, this is simply the mean).
Greyscale to binary
Apply a threshold t:
Ibin(p)={1 ifIgrey(p)>t} 0 otherwise

Charge Coupled Device (CCD)
● A semiconductor carrying a two- dimensional matrix of photo-sensors (photo-diodes)
● Each photo-sensor is a small (usually square) electrically isolated capacitive region that can accumulate charge
● Photons incident on a photo-sensor are converted into electrons (via the photoelectric effect)
● The charge accumulated by each element is proportional to the incident light intensity and exposure time
● The pattern of charge across the CCD corresponds to the pattern of incident light intensity (i.e. it is an image)
Colour from CCD devices
To improve efficiency, microlenses are fabricated on chip to focus incoming light onto each sensor
The photo-sensors in the CCD array are not selective to the wavelength of incoming light.
Colour sensitivity is achieved by adding a colour filter that allows through light from a small band of frequencies associated with a specific colour.
Digital image formation
LIGHT SOURCE
Illuminance (E)
P
OPTICS
Luminance (L)
P’ SENSOR (CCD)
0
10
10
15
50
70
80
0
0
100
120
125
130
130
0
35
100
150
150
80
50
0
15
70
100
10
20
20
0
15
70
0
0
0
15
5
15
50
120
110
130
110
5
10
20
50
50
20
250
SURFACE
DIGITAL IMAGE

CCD colour masks
Demosaicing
The Bayer mask (GRGB) is the most common colour filter used in digital cameras.
There are twice as many green filters as red and blue filters: improves sensitivity to green to which humans are most sensitive.
The bayer mask samples colours at specific locations
The result are three colour channels with missing values: individual colour channels are subsampled
Demosaicing is the process of filling in the missing values, so that we have R, G and B values at every pixel
Colour from CCD devices
3 CCD solution
A prism splits light into 3 beams of different colour. 3 separate CCD devices record each colour
• Expensive, Bulky • High image quality
1 CCD solution
An array of coloured filters is placed over the pixels of single CCD to make different pixels selective to different colours
• Cheap, Compact
• Coarser sampling: lower image quality

Demosaicing and Interpolation
Demosaicing is a method of interpolatation
The same interpolation methods can be used when scaling an image
Nearest Neighbour Interpolation
Copies an adjacent pixel value from the same colour channel. • V. Fast.
• V. Inaccurate.
Demosaicing
Raw output from after demoscaising mask image sensor
Demosaicing is a process that computes the colour (RGB values) at every pixel based on the local red, green and blue values in the subsampled images.

Smooth Hue Transition Interpolation
Interpolation of green pixels: same as in bilinear interpolation.
Interpolation of red/blue pixels: bilinear interpolation of the ratio (“hue”) between red/blue and green.
B12=G12×(B6/G6)+(B8/G8)+(B16/G16)+(B18/G18) 4
Edge-Directed Interpolation
The region around a pixel is analysed to determine if a preferred interpolation direction exists
Interpolation performed along axis where change in value is lowest (i.e. not across edges)
e.g. calculating G8
calculate horizontal and vertical gradients:
ΔH=|G7−G9| ΔV=|G3−G13| perform interpolation:
ifΔH<ΔV, G8=(G7+G9)/2 elseifΔH>ΔV, G8=(G3+G13)/2
else, G8=(G3+G7+G9+G13)/4
Interpolation of red/blue pixels: same as in smooth hue transition interpolation
Bilinear Interpolation
Takes average value of nearest two or four pixels from the same colour channel.
• Fast.
• Accurate in smooth regions, inaccurate at edges
R2
R14
R9
R4
R16 R18
R24
R30
R9=R2+R4+R14+R16 R24=R18+R30 42

Contents
The Eye
• photoreceptors
• sampling
• ganglion cells
• image processing
The Eye
• Cornea performs the initial bulk of the refraction (at fixed focus)
• Lens performs further refraction and can be stretched to change its shape and hence change focal length
• Iris allows the eye to regulate the amount of light that can enter in order to both protect from over stimulation, and improve focus (as with the pinhole camera).
• Optic nerve: 1 million nerve fibers transmit information sensed by eye to the brain.
Part g
The Eye

Photoreceptor types
Human eyes have 2 classes of photoreceptor: • Rods:
• high sensitivity (can operate in dim light) • Cones:
• low sensitivity (require bright light)
• 3 sub-types that are sensitive to different wavelengths
Hence cones provide colour information:
• blue: short-wavelength cones • peak sensitivity ~420nm
• green: medium-wavelength cones • peak sensitivity ~540nm
• red: long-wavelength cones
• peak sensitivity ~570nm
Distribution of photoreceptors
cone density near fovea
Photoreceptor types are not evenly distributed across the retina.
• blind spot: no photoreceptors (location where optic nerve leaves eye) • fovea: no rods, high density of cones #(blue) << #(red) <= #(green) • periphery: high concentration of rods, few cones more rods overall The Retina The retina contains light sensitive photoreceptors (approx 130 million). These photoreceptors are farthest from the light. But intervening cells are transparent Photoreceptors transduce light to electrical signals (voltage changes). Transduction = the transformation of one form of energy to another. Retinal processing Ganglion cells produce the output of the retina (the axons of all ganglion cells combine to form the optic nerve). Each eye has approx. 1 million ganglion cells and 100million photoreceptors. One ganglion cell collects input from multiple photoreceptors: • Large convergence in periphery • Smaller convergence in fovea Another reason for the lower acuity in periphery compared to fovea Ganglion cell responses Ganglion cells have centre-surround receptive fields. Receptive Field (RF) = area of visual space (i.e. area of retina) from which a neuron receives input. Or more generally, the properties of a visual stimulus that produces a response from a neuron. Two types of ganglion cell: On-centre, off-surround: • active if central stimulus is brighter than background Off-centre, on-surround: • active if central stimulus is darker than background Behaviour is generated by ganglion cell being excited (or inhibited) by photoreceptors in the centre of its receptive field and being inhibited (or excited) by photoreceptors surrounding its receptive field. Foveal vs peripheral vision The fovea is small region of high resolution containing mostly cones Fovea: • high resolution (acuity) – due to high density of photoreceptors • colour – due to photoreceptors being cones • low sensitivity – due to response characteristics of cones Periphery: • low resolution (acuity) – due to low density of photoreceptors • monochrome – due to photoreceptors being rods • high sensitivity – due to response characteristics of rods Far more of the brain is devoted to processing information from the fovea than from the periphery. Centre-surround RF: edge enhancement A and E: on plane surface input to centre and surround cancels (output at resting – spontaneous - state, greater than zero) B and D: at contrast discontinuity input to centre and surround unequal (output increased or decreased) Centre-surround RF: edge enhancement input image retinal response centre-surround cells respond weakly where input is uniform, and strongly where input changes. Hence, strong response near edges. Centre-surround RF function -+ +- Functional consequences: → accentuates edges → invariance to ambient light → efficient coding Centre-surround RF: efficient coding Cells with RFs that fall on plane surfaces are only weakly active Cells with RFs that fall on areas where contrast is changing are strongly active Natural images have strong spatial correlation (i.e. little intensity change over short distances) 1 0.8 0.6 0.4 0.2 -40 -20 0 20 40 distance By only signalling where intensity changes, centre-surround RFs: • minimises neural activity (efficient in terms of energy consumption) • minimises bandwidth (efficient in terms of information coding) often referred to as ”redundancy reduction” or “decorrelation” or “predictive coding” Centre-surround RFs: colour opponent cells Ganglion cells combine inputs from both rods and cones in a centre-surround configuration. Input from cones produces colour opponent cells. e.g. a red on-centre, green off- surround (or red ON/green OFF) - + responds most strongly to red light falling within its R -surround RF: invariance to lighting Ambient lighting conditions, i.e. the Illuminance (E), are generally irrelevant for most perceptual tasks. e.g. an object should look the same in bright light and dull light. Centre-surround RFs measure the change in intensity (contrast) between adjacent locations. This contrast remains constant independent of lighting conditions. bright light response dull light centre-surround correlation Modelling centre-surround RFs The standard way of modelling ganglion cell receptive fields is by using a Difference of Gaussians (DoG) operator. The DoG operator, is also used in computer vision (see later lecture). = — = on-centre / off-surround off-centre / on-surround — Summary Image formation is described by the pinhole (perspective) camera model A point in 3D world projects to a point in 2D image dependent on extrinsic camera parameters, projection operator, intrinsic camera parameters A lens is required to collect sufficient light to make an image that is both in focus and bright Light reflected from an object is transduced into a electronic signal by a sensor (CCD array, retinal rods and cones) The image is sampled at discrete locations (pixels), sampling is uniform in a camera, non-uniform in the retina (periphery vs fovea) Following image formation the image needs to be analysed: in biological vision system 1st stage of analysis is performed by centre-surround cells in artificial vision systems .... (next week) Centre-surround RFs: colour opponent cells A number of different colour combinations occur in the human retina giving rise to an number of different types of colour-opponent cell. “yellow” is produced by averaging the outputs of red and green cones. - - Red-green opponent cells Blue-yellow opponent cells + cones