CS代考 This Week

This Week
• Low-Level Vision – from a biological perspective
• Biological Visual System overview
• Primary visual cortex (V1)
– physiology
» input selectivities / classical receptive fields
– models of orientation selectivity » Gabor filters
– extra-classical receptive field properties » contextual influences / change detection
• Mid-level Vision
• groupingandsegmentation
• top-down and bottom-up influences (Gestalt Laws)
• extra-classical receptive field properties
– neural implementation of border ownership and grouping
Part a
The Biological Visual System
7CCSMCVI / 6CCS3COV: Computer Vision
Low- and Mid-Level Vision (Biological)

Human visual system (beyond the retina)
Retina
Optic nerve
Lateral Geniculate Nucleus (LGN) part of the thalamus
Pathways from Retina to Cortex
Cerebral Cortex
Primary visual cortex (V1)
• The right visual field (RVF) projects to left sides of each retina
• ganglion cells from the left side of the left eye projects to the left LGN
• ganglion cells from the left side of the right eye cross over at the optic chiasm and go to the left LGN
• Hence, the RVF projects to the left LGN
• The left LGN projects to the left V1 (striate cortex)
• NOTE: It is not the case that the right eye goes to the left V1, it is the RVF (as seen by both eyes).
Contents
The Biological Visual System Overview
• from eye to cortex
• cortical visual system:
– areas and pathways

What does the Cerebral Cortex do?
All higher cognitive functions • Perception
• Knowledge
• Language
• Memory
• Reasoning
• Decision Making
Basic facts:
• A folded sheet approx. 1.7mm thick, with an area of approx. 0.25m2
• Contains 1010 neurons (= approx. 105 neurons/mm3) and 4×1013 synapses
• Is about 2% of human body mass but accounts for 20% of human energy consumption
• Approx. 50% of cerebral cortex is devoted to vision
• Is divided into areas which specialise in different functions. Approx. 30 areas are devoted to different aspects of visual information processing.
The Cortical Visual System: areas
Lots of distinct cortical areas. Very complex interconnectivity.
V1
What does LGN do?
Lateral geniculate nucleus transmits information from retina to cortex.
• Traditionally viewed as merely a relay station.
• Current evidence suggests it does more than just relay information.
• However, what computation occurs in the LGN is not currently known.
LGN cells have centre-surround RFs, like retinal ganglion cells.

Part b
Primary Visual Cortex
Contents
Primary visual cortex (V1)
• physiology
– input selectivities / classical receptive fields
• organisation
– hypercolumns
The Cortical Visual System: pathways
“What” and “Where” pathways Hierarchically organised:
• simple, local, RFs at V1
• complex, large, RFs in higher areas
Where (or How):
• V1 to parietal cortex
• spatial / motion information
What
• V1 to inferotemporal cortex
• identity / category information

V1 RFs: range of selectivities
V1 cells have receptive fields selective for: • colour
• orientation
• direction of motion • spatial frequency • eye of origin
• binocular disparity • position
Some of these properties are similar to those of cells in the LGN and retinal ganglion cells (e.g., colour, eye of origin, position).
Other properties are seen for the first time in V1 (e.g., orientation selectivity and binocular disparity, direction of motion).
V1 RFs: centre-surround / colour
Centre-surround / colour opponent RFs are similar to those in the LGN/retina and perform the same function: detecting changes in intensity or regions of uniform colour.
G- R+
Colour opponent centre-surround
Broad band centre- surround
R+G-
Colour opponent centre only
G+R- R+G-
Double-opponent
Double-opponent (DO) cells are not seen prior to V1.
They detect locations where colour changes (e.g. where centre is red and background green).
have larger receptive fields than simple opponent cells
Cortical area V1
LGN neurons mainly project to the primary visual cortex
Cerebral Cortex
Primary visual cortex (V1)
Primary visual cortex is also known as:
• V1
• striate cortex • area 17
Retina
Optic nerve
V1 therefore performs the initial (low-level) processing on the incoming information.
LGN
Processing is carried out by cells with a larger range of receptive field properties than are found in the retina and LGN.

V1 RFs: orientation (simple cells)
This particular example is also direction selective as it produces a strong response only when the stimulus is at a particular angle and moving in a particular direction through a particular location.
V1 RFs: orientation (simple cells)
Optimum response to an appropriately oriented stimulus, with correct contrast, placed at a specific position within the receptive field:
Different cells can be selective for different stimuli:
In general, simple cells act as edge and bar detectors.
V1 RFs: orientation selectivity
In contrast to retinal and LGN cells, many V1 cells display a marked preferences to a particular orientation.
Orientation selectivity is sharply tuned with responses falling to zero when a line is tilted ±15-30° from the “preferred orientation”.
stimulus
response

V1 RFs: orientation (hyper-complex cells)
Optimum response depends not only upon stimulus orientation but also on length.
Maximum response occurs when the length of the bar matches the width of the receptive field (i.e. these cells are selective for length as well as orientation)
This is called “End- stopping”.
V1 RFs: orientation selectivity (summary)
V1 RFs: orientation (complex cells)
Optimum response to an appropriately oriented stimulus, of any contrast, placed anywhere within the receptive field.
Complex cells thus have a greater positional invariance compared to simple cells.
Usually they respond most strongly to moving bars/edges and can be direction selective:
Complex cells act as edge and bar detectors with some tolerance to location.
stimulus stimulus
response
response

V1 RFs: eye of origin
Monocular cells
Retinal and LGN cells receive input from one eye only.
Cells in V1 that receive the input from LGN are also monocular.
Binocular cells
Other cells in V1 are binocular: they receive input (via monocular
cells) from the two eyes.
Binocular cells have two receptive fields (Left eye & Right eye) and these are matched in type (e.g. same direction of motion preference, same orientation preference, both simple or both complex).
Hence, binocular cells respond maximally when corresponding regions in each eye are stimulated by stimuli of similar appearance.
V1 RFs: disparity
For this neuron, maximum response occurs when a bar is presented simultaneously to both eyes, with the correct orientation, direction of motion and in the correct position on each retina.
V1 RFs: spatial frequency tuning
20 15 10
5
0.125 0.25
0.5 1 2 4 8
cycles/degree
Firing Rate (spikes/second)

V1 RFs: conjunctions
V1 cells have receptive fields selective for: • colour
• orientation
• direction of motion • spatial frequency • eye of origin
• binocular disparity • position
Some cells are responsive to more than one of these properties e.g. orientation and direction of motion,
orientation and colour
All are selective for at least one of these properties plus position (as the stimulus needs to be at a particular location on the retina).
V1 organisation: Hypercolumns
All neurons with RFs at the same location on the retina are located in the same region of V1.
Approx. 1mm2 of V1 is required to represent the whole range of RF types for one particular position.
Such a 1mm2 area of V1 is called a “hypercolumn”
Hence, each hypercolumn contains the requisite neural machinery to simultaneously analyse multiple attributes of an image (colour, orientation, direction of motion, spatial frequency, eye of origin, binocular disparity) falling on a localised region of the retina.
V1 RFs: disparity
Binocular cells are tuned to disparity: the difference in the preferred location in each eye
This enables them to encode the depth of the stimulus
disparity tuned neurons
far cell
near cell
left eye
right eye
2 0 -2
“crossed” disparity
“uncrossed” disparity
plane of fixation

Part c

Contents
Gabor filters
• models of V1 orientation selectivity – edge-detection
– image components
– applications in computer vision
V1 organisation: retinotopic maps
Adjacent hypercolumns analyse information from adjacent areas of retina.
Hence, the spatial position of the ganglion cells within the retina is preserved by the spatial organisation of the neurons within V1.
This spatial layout is called retinotopic organization because the topological organization of the receptive fields in V1 parallels the organization of the retina.
The map is distorted primarily due to cortical magnification of central vs. peripheral areas.
There is more cortical area (more hypercolomns) devoted to the fovea than peripheral vision.
This mirrors the fact that the vast majority of retinal ganglion cells are devoted to the fovea.

Gabor: model of orientation selective R : model of orientation selective RFs A 2-D Gabor function is a Gaussian multiplied by a sinusoid.
x’22 y’2
Gx,y=exp− 22 cos2x’ f
where: x ‘ = xcos  ysin  y’=−xsinycos
Gabor: model of orientation selective RFs Simple cells RFs actually look more like this:
These are very similar to a mathematical function called a Gabor.

Gabor: model example
To simulate the activity of simple cells in response to an image, we can convolve the image using a Gabor function as the mask.
=
σ=2
f = 1/6 ψ=0
θ = 90 γ= 0.25
This provides a crude model of the responses of all simple cells selective for the same orientation, spatial frequency and phase across all cortical hypercolumns.
Gabor: model example
To simulate the activity of complex cells in response to an image, we can pool the outputs generated by convolving the image with Gabor functions of varying phases.
2
* *
σ=2
f = 1/6
ψ = 0, 90 θ = 90 γ= 0.25
The energy model takes the square root of the sum of squared output of a quadrature pair of Gabor filters.
Result is invariant to phase.
*
Σ=
Gabor: model of orientation selective RFs The Gabor function has 5 parameters:
σ = the standard deviation of the Gaussian function
f = the frequency of the sinusoidal function
ψ = the phase of the sinusoidal function
θ = the orientation
γ = the spatial aspect ratio

: wavelet transforms
To detect edges at multiple spatial scales, we can use Gabor functions with different frequencies and standard deviations.
Convolving a signal (e.g. an image) with a family of similar masks sensitive to different frequencies is known as a “Continuous Wavelet Transform”.
The “wavelet” is the mathematical function used to generate the masks (e.g. a Gabor).
Hence, we can think of the simple cells in V1 as performing a continuous wavelet transform.
Gabors: as image components
We can think of an image being made up of a superposition of various image components or elementary features.
e.g. A letter I as a combination of three lines (Gabor image components):
×1
×1
Σ=
×1
image components (A)
activations (y)
image (x)
Gabor: model example
To detect edges at multiple orientations, we can sum the outputs from simulated complex cells at multiple orientations.
2 2
**
**
**
**
2 Σ= 2
σ=2
f = 1/6
ψ = 0, 90
θ = 0, 45, 90, 135 γ= 0.25

Image components
This is a very general concept, e.g.:
x Σ=
image components (A)
Image components Converting arrays to vectors, e.g.
activations (y)
image (x)
Σ=
-1 1 0.5 -1 0 -2 -1 0 0.5 -1 -1 0.5 -1 1 0 -1
x
allows this idea to be expressed mathematically, as:
1 -2
where:
A y≈x -1 1
A = is an m by n matrix of weight values the columns of which 0.5 represent image components -1
-1 in the image 0
y = is a n by 1 vector describing the activation of each component x = is a m by 1 vector representing the image
Gabors: as image components
Combining different components in different proportions can generate numerous different images.
e.g. A letter H as a combination of three lines (Gabor image components):
image components (A)
×1
×1
Σ=
×1
activations (y)
image (x)

Image components
Generally we know X, and want to determine how this set of images
can be represented, i.e. we want to find the factors A and Y of X. AY≈X
Finding the factors A and Y is an ill-posed problem (there are lots of possible solutions).
By placing additional constraints on the factors A and Y different solutions, with different properties, can be found.
There are several standard “Matrix factorisation” methods that can find A and Y under different constraints. e.g.:
• Principal Component Analysis (PCA)
• Independent Component Analysis (ICA) • Non-negative Matrix Factorization (NMF)
Gabors as components of natural images
What are the components of natural images?
If we create X by randomly selecting small image patches
50 400 1 40
450
45
500
50 50 100
50 41
450
4
500
5
50 100
1
150
1
200
2
250
2
300
3
350
3
400
4
450
4
500
5
X Then find the factors using two constraints: 1. That information be preserved (i.e. that
||X-AY|| is minimised)
2. The representation be sparse (i.e. the
number of non-zero values in each column of Y is minimised)
A
50
00
50
00
50
00
50
00
50
00 50 100 15
50
50
100
100
150
150
200
200
250
250
300
300
350
350
00 0
0 0
150
0
200 250
200250 2
50 100
100
150
150
200
200
250
300
300
350
350
50
400
50 300 350 400 450 500
150
00 150 50
00 50
200
100
200
250
150
250
300 350 400
300 350
200 250 300
400
350
450 500
450 500
400 450 500
300 350 400 450 500
50 100 150
200 250 300 350
400 450 500
Then the columns of A are very similar to Gabor functions.
Gabors therefore appear to capture the intrinsic structure of natural images
Image components
If we have lots of images, then this equation becomes:
AY≈X
A = is an m by n matrix of weight values the columns of which
where:
represent image components
Y = is a n by p matrix each column of which contains the activation of each component in the corresponding image
X = is a m by p matrix each column of which contains the pixel values for one image

Image components: Computer Vision Applications
Image Compression
The ability to reconstruct an image using image components allows an image to be efficiently encoded.
e.g. jpeg
Reconstructs each image patch as a linear combination of a set of image components
image components are cosine functions not Gabor functions
A y≈x
activations are quantized rather than sparse
y requires less storage space than x
Image components: Computer Vision Applications
Image Denoising
Reconstruct a noisy image using image components produces a reconstructed image that has less noise.
image components are Gabor functions (or curvelets, wedgelets, or other functions)
activations are found that produce the most accurate reconstruction of the noisy image (i.e. that minimise ||x-Ay||)
A y≈x A y=
Gabors as efficient code
The sparsity constraint used to find the components of natural images, means that only a few components are present in each image.
If components are represented by neurons, then this means that only a few neurons need to be active in order to represent an image.
Hence, Fs result in an efficient code which minimises the number of active neurons needed to transmit a given signal.

Part d
Non-Classical RFs
Contents
Non-Classical Receptive Fields in V1
• neural implementation
• influences on perception
Image components: Computer Vision Applications
Image Inpainting
Reconstruct missing parts of an image using a sparse-subset of image components that represent non-corrupted image patches
image components learnt from non- corrupted parts of image.
activations are found that produce the most accurate reconstruction (that minimise ||x-Ay||)
A y≈x A y=

V1 non-classical RFs (orientation tuned cells)
ncRF caused by lateral (and feedback) connections
excitation from collinear / co- circular neighbours (solid lines)
inhibition from near parallel iso-orientated neighbours (dashed lines)
Thick lines indicate strongest connections (greatest influence)
This pattern of lateral connectivity is sometimes called the “association field”
V1: contextual influences
• context has no effect in isolation
• context can enhance response
• context can reduce response
V1 non-classical RFs
Classical Receptive Field (cRF) = the region of visual space that can elicit a response from a neuron.
However, this response can be modulated by stimuli appearing outside the neurons classical receptive field.
The region of visual space that can modulate the response is called the non-classical receptive field (ncRF).
The result is that neuronal responses are influenced by context, i.e., neuronal activity at one location depends on activity at distant locations.

V1: contextual influences Contour Integration
Which of these images contains a circle?
Lateral excitatory connections enhance responses to aligned elements, making these easier to perceive.
V1: contextual influences
Surround Suppression
Which of the central Gabor patches is easier to see?
V1: contextual influences Collinear Facilitation
Which of the central Gabor patches is easier to see?
Contour Integration
Which of these images contains a straight line?

V1: contextual influences
Texture Segmentation
Where is there a boundary in each figure?
Similar to pop-out, but with two regions. Lateral inhibitory connections suppress responses to similar elements. At the boundary between dissimilar elements neurons receive less inhibition making it easier to perceive.
V1: contextual influences The neural correlates of this effect:
Lateral inhibitory connections suppress responses to similar elements, making boundary between dissimilar elements easier to perceive.
V1: contextual influences Pop-out
Which is the odd element in each figure?
feature search
Lateral inhibitory connections suppress responses to similar elements, making dissimilar elements easier to perceive.
dissimilar distractors
conjunction search

Part e
Introduction to Mid-Level Vision
Contents
Mid-level Vision
• grouping and segmentation
Modelling contextual influences in V1
Contour Detection
Input image filtering filtering + lateral
Edge-detections methods based on convolution fail to detect only meaningful edges (such as object boundaries).
More complex operations, like those due to the ncRF, are required for accurate identification of meaningful edges.

Mid-Level Vision
Image Formation
Low-Level Vision
A boat passing under Westminster Bridge
High-Level Vision
Mid-Level Vision
noise suppression edge enhancement feature extraction efficient coding
object recognition object classification object localisation scene understanding
Image Formation
Low-Level Vision
noise suppression edge enhancement feature extraction efficient coding
object recognition object classification object localisation scene understanding
A boat passing under Westminster Bridge
One group Another group
High-Level Vision
Mid-Level Vision
Mid-Level Vision
Image Formation
Low-Level Vision
noise suppression edge enhancement feature extraction efficient coding

Grouping and Segmentation: Names
• Image Parsing
● Perceptual Organisation
● Binding
• Perceptual Grouping
• Figure-Ground segmentation (restricted case):
 segment one object (the figure) from everything else (the ground / background)
Part f
Gestalt Laws
Mid-Level Vision
The role of Mid-Level vision is to:
• group together those elements* of an image that “belong together”, and to
• segment (differentiate) these elements* from all others. *elements (or tokens) can be whatever we need to group (pixel
intensities, pixel colours, edges, features, etc.)
Grouping and segmentation are “two sides of the same coin”: doing one implies doing the other
Terms are used interchangeably, and there are also a number of other names of the same process…

Grouping and Segmentation: Approaches
Top-down (coming from internal knowledge, prior experience) • elements belong together because they lie on the same object
Bottom-up (coming from the image properties)
• elements belong together because they are similar or locally
coherent
These approaches are NOT mutually exclusive
Both bottom-up and top-down influences are required for general purpose image segmentation
In humans we can observe (with the right choice of image) the influences of individual bottom-up and top-down cues to perceptual grouping.
Top-down Influences: Knowledge
Meaningfulness or Familiarity:
Elements that, when grouped, form a familiar object tend to be grouped together.
Contents
Gestalt Laws
• image segmentation in the brain
• top-down influences
• bottom-up influences (the Gestalt Laws)
– neural implementation via V1 lateral connections

Bottom-up Influences: Gestalt Laws
Many bottom-up factors influence how elements are grouped.
These influences are called the “Gestalt Laws” or the “Gestalt Grouping Principals”
Gestalt Laws are not really laws (that must be obeyed) but heuristics (“rules of thumb” that are often obeyed)
Original Gestalt Laws (proposed in early 20th century) include: • Proximity
• Similarity
• Closure
• Continuity (or Good Continuation) • Common Fate
• Symmetry
More Recent Laws (hence, not true Gestalt Laws) include: • Common Region
• Connectivity
Gestalt Laws: Proximity
Elements that are near to each other tend to be grouped.
Top-down Influences: Expectation

Gestalt Laws: Closure
Elements that, when grouped, result in closed boundaries tend to be seen as belonging together.
These are perceived as a square and triangle (left), and an ellipse and a rectangle (right), not as a combination of strange shapes, e.g.:
Potential closure is sufficient to result in grouping:
Gestalt Laws: Continuity
Elements that, when grouped, result in straight or smoothly curving lines, or smooth surfaces, tend to be seen as belonging together
We even “invent” illusory contours (groups of white elements) to conform to our expectations about continuity.
Gestalt Laws: Similarity
Elements that are similar tend to be grouped together.
Similar shape
Similar colour
Similar orientation
Similar size
Similar luminance

Gestalt Laws: Continuity
Gestalt Laws: Common Fate
Elements that have coherent motion tend to be grouped together.
The presence of an occluding surface (the elements of which can be grouped together and segmented from the rest of the image) enables us to group the blobs into meaningful shapes.
Gestalt Laws: Continuity
Seen as a collection of unrelated blobs.

Gestalt Laws: Common Region
Elements that lie inside the same closed region tend to be grouped together.
c.f.:
Gestalt Laws: Connectivity
Elements that are connected by another element tend to be grouped together.
c.f.:
Gestalt Laws: Symmetry
Elements that form symmetric groups tend to be grouped together.
c.f.:

Gestalt Laws: Summary
Many other bottom-up cues have been shown to effect segmentation!
Not all are clearly distinct, e.g.:
Law of proximity = Law of Similarity for location Law of Common Fate = Law of Similarity for motion
No clear rules about what happens when multiple Gestalt principles are in conflict, e.g.:
Law of proximity vs Law of common region (see earlier) Law of proximity vs Law of connectivity (see earlier) Law of proximity vs Law of symmetry (see earlier)
Law of similarity vs Law of continuity:
Gestalt Laws via V1 Lateral Connections
Lateral inhibitory connections, give rise to texture segmentation and pop-out
= Gestalt Law of similarity
Lateral excitatory connections, give rise to contour integration
= Gestalt Law of continuity
Gestalt Laws: Summary

Contents
Border Ownership
• effects on perception
• neural Implementation in V2
Object Segmentation: border ownership
In most tasks we want to group elements into objects.
An object consists of one or more surfaces plus a boundary. The boundary is “owned” by the object.
The background appears borderless (and hence shapeless).
e.g. the tennis player’s leg has a border, but the grass does not and appears to continue behind the leg:
Part g
Border Ownership

Object Segmentation: border ownership
You see the face as a profile (looking right) OR a frontal view (looking at you), and you can switch your perception between these different interpretations.
However, you see only one of these two interpretations at any one time.
When you see the profile view the right-hand border is owned by the face, when you see the frontal view the border seems to be owned by a white occluding surface.
V2 Border-ownership cells
V2 is the cortical area which comes next in the hierarchy to V1.
V2 contains cells that encode border-ownership.
e.g. This cell is selective for a particular edge orientation, but only responds strongly when the perceived figure is to the lower left of its RF.
Object Segmentation: border ownership
Where is the arrow?
Difficult to spot as letters are seen as foreground.
Birds or fish?
Can’t see both at the same time, as border can only be owned by one or the other interpretation.

Summary
Cortical Visual system is very complex
Consists of hierarchies of processing stages (cortical areas) 1st stage is V1
2nd stage is V2
Summary
cRFs (pattern of feedforward connections)
In V1:
large variety (colour, orientation, motion, frequency, disparity,…) mapped across cortical surface
 a patch of V1 represents a particular spatial location
 neighbouring patches represent neighbouring locations
A hypercoloumn is a patch which contains all the RFs for the same
location
Gabor functions
 provide good model of orientation selectivity (convolution of an image with a Gabor performs edge detection)
 provide efficient codes for natural images
Border-ownership via V2 Lateral Connections
V2 contains cells that encode border-ownership.
The response of V2 border- ownership cells is influenced by information outside the cRF
i.e. border-ownership is determined using the cell’s non- classical RFs

Summary
Mid-Level Vision = grouping image elements together
Which elements are grouped together is influenced by:
• priorexperienceandobjectknowledge(top-downinfluences),and
• Gestalt Laws applied to the image properties (bottom-up influences)
ncRFs in V1 and V2:
• implement some Gestalt principles and produce border-ownership.
Summary
ncRFs (pattern of lateral, and feedback, connections)
enables stimuli outside a neuron’s cRF to modulate its response nodes with similar cRFs at neighbouring locations connected via:
 inhibitory connections (generate pop-out, texture segmentation)
 excitatory connections (generate contour enhancement) increases salience of locations where image changes