- Source: Contextual image classification
Contextual image classification, a topic of pattern recognition in computer vision, is an approach of classification based on contextual information in images. "Contextual" means this approach is focusing on the relationship of the nearby pixels, which is also called neighbourhood. The goal of this approach is to classify the images by using the contextual information.
Introduction
Similar as processing language, a single word may have multiple meanings unless the context is provided, and the patterns within the sentences are the only informative segments we care about. For images, the principle is same. Find out the patterns and associate proper meanings to them.
As the image illustrated below, if only a small portion of the image is shown, it is very difficult to tell what the image is about.
Even try another portion of the image, it is still difficult to classify the image.
However, if we increase the contextual of the image, then it makes more sense to recognize.
As the full images shows below, almost everyone can classify it easily.
During the procedure of segmentation, the methods which do not use the contextual information are sensitive to noise and variations, thus the result of segmentation will contain a great deal of misclassified regions, and often these regions are small (e.g., one pixel).
Compared to other techniques, this approach is robust to noise and substantial variations for it takes the continuity of the segments into account.
Several methods of this approach will be described below.
Applications
= Functioning as a post-processing filter to a labelled image
=This approach is very effective against small regions caused by noise. And these small regions are usually formed by few pixels or one pixel. The most probable label is assigned to these regions.
However, there is a drawback of this method. The small regions also can be formed by correct regions rather than noise, and in this case the method is actually making the classification worse.
This approach is widely used in remote sensing applications.
= Improving the post-processing classification
=This is a two-stage classification process:
For each pixel, label the pixel and form a new feature vector for it.
Use the new feature vector and combine the contextual information to assign the final label to the
= Merging the pixels in earlier stages
=Instead of using single pixels, the neighbour pixels can be merged into homogeneous regions benefiting from contextual information. And provide these regions to classifier.
= Acquiring pixel feature from neighbourhood
=The original spectral data can be enriched by adding the contextual information carried by the neighbour pixels, or even replaced in some occasions. This kind of pre-processing methods are widely used in textured image recognition. The typical approaches include mean values, variances, texture description, etc.
= Combining spectral and spatial information
=The classifier uses the grey level and pixel neighbourhood (contextual information) to assign labels to pixels. In such case the information is a combination of spectral and spatial information.
= Powered by the Bayes minimum error classifier
=Contextual classification of image data is based on the Bayes minimum error classifier (also known as a naive Bayes classifier).
Present the pixel:
A pixel is denoted as
x
0
{\displaystyle x_{0}}
.
The neighbourhood of each pixel
x
0
{\displaystyle x_{0}}
is a vector and denoted as
N
(
x
0
)
{\displaystyle N(x_{0})}
.
The values in the neighbourhood vector is denoted as
f
(
x
i
)
{\displaystyle f(x_{i})}
.
Each pixel is presented by the vector
ξ
=
(
f
(
x
0
)
,
f
(
x
1
)
,
…
,
f
(
x
k
)
)
{\displaystyle \xi =\left(f(x_{0}),f(x_{1}),\ldots ,f(x_{k})\right)}
x
i
∈
N
(
x
0
)
;
i
=
1
,
…
,
k
{\displaystyle x_{i}\in N(x_{0});\quad i=1,\ldots ,k}
The labels (classification) of pixels in the neighbourhood
N
(
x
0
)
{\displaystyle N(x_{0})}
are presented as a vector
η
=
(
θ
0
,
θ
1
,
…
,
θ
k
)
{\displaystyle \eta =\left(\theta _{0},\theta _{1},\ldots ,\theta _{k}\right)}
θ
i
∈
{
ω
0
,
ω
1
,
…
,
ω
k
}
{\displaystyle \theta _{i}\in \left\{\omega _{0},\omega _{1},\ldots ,\omega _{k}\right\}}
ω
s
{\displaystyle \omega _{s}}
here denotes the assigned class.
A vector presents the labels in the neighbourhood
N
(
x
0
)
{\displaystyle N(x_{0})}
without the pixel
x
0
{\displaystyle x_{0}}
η
^
=
(
θ
1
,
θ
2
,
…
,
θ
k
)
{\displaystyle {\hat {\eta }}=\left(\theta _{1},\theta _{2},\ldots ,\theta _{k}\right)}
The neighbourhood:
Size of the neighbourhood. There is no limitation of the size, but it is considered to be relatively small for each pixel
x
0
{\displaystyle x_{0}}
.
A reasonable size of neighbourhood would be
3
×
3
{\displaystyle 3\times 3}
of 4-connectivity or 8-connectivity (
x
0
{\displaystyle x_{0}}
is marked as red and placed in the centre).
The calculation:
Apply the minimum error classification on a pixel
x
0
{\displaystyle x_{0}}
, if the probability of a class
ω
r
{\displaystyle \omega _{r}}
being presenting the pixel
x
0
{\displaystyle x_{0}}
is the highest among all, then assign
ω
r
{\displaystyle \omega _{r}}
as its class.
θ
0
=
ω
r
if
P
(
ω
r
∣
f
(
x
0
)
)
=
max
s
=
1
,
2
,
…
,
R
P
(
ω
s
∣
f
(
x
0
)
)
{\displaystyle \theta _{0}=\omega _{r}\quad {\text{ if }}\quad P(\omega _{r}\mid f(x_{0}))=\max _{s=1,2,\ldots ,R}P(\omega _{s}\mid f(x_{0}))}
The contextual classification rule is described as below, it uses the feature vector
x
1
{\displaystyle x_{1}}
rather than
x
0
{\displaystyle x_{0}}
.
θ
0
=
ω
r
if
P
(
ω
r
∣
ξ
)
=
max
s
=
1
,
2
,
…
,
R
P
(
ω
s
∣
ξ
)
{\displaystyle \theta _{0}=\omega _{r}\quad {\text{ if }}\quad P(\omega _{r}\mid \xi )=\max _{s=1,2,\ldots ,R}P(\omega _{s}\mid \xi )}
Use the Bayes formula to calculate the posteriori probability
P
(
ω
s
∣
ξ
)
{\displaystyle P(\omega _{s}\mid \xi )}
P
(
ω
s
∣
ξ
)
=
p
(
ξ
∣
ω
s
)
P
(
ω
s
)
p
(
ξ
)
{\displaystyle P(\omega _{s}\mid \xi )={\frac {p(\xi \mid \omega _{s})P(\omega _{s})}{p\left(\xi \right)}}}
The number of vectors is the same as the number of pixels in the image. For the classifier uses a vector corresponding to each pixel
x
i
{\displaystyle x_{i}}
, and the vector is generated from the pixel's neighbourhood.
The basic steps of contextual image classification:
Calculate the feature vector
ξ
{\displaystyle \xi }
for each pixel.
Calculate the parameters of probability distribution
p
(
ξ
∣
ω
s
)
{\displaystyle p(\xi \mid \omega _{s})}
and
P
(
ω
s
)
{\displaystyle P(\omega _{s})}
Calculate the posterior probabilities
P
(
ω
r
∣
ξ
)
{\displaystyle P(\omega _{r}\mid \xi )}
and all labels
θ
0
{\displaystyle \theta _{0}}
. Get the image classification result.
Algorithms
= Template matching
=The template matching is a "brute force" implementation of this approach. The concept is first create a set of templates, and then look for small parts in the image match with a template.
This method is computationally high and inefficient. It keeps an entire templates list during the whole process and the number of combinations is extremely high. For a
m
×
n
{\displaystyle m\times n}
pixel image, there could be a maximum of
2
m
×
n
{\displaystyle 2^{m\times n}}
combinations, which leads to high computation. This method is a top down method and often called table look-up or dictionary look-up.
= Lower-order Markov chain
=The Markov chain also can be applied in pattern recognition. The pixels in an image can be recognised as a set of random variables, then use the lower order Markov chain to find the relationship among the pixels. The image is treated as a virtual line, and the method uses conditional probability.
= Hilbert space-filling curves
=The Hilbert curve runs in a unique pattern through the whole image, it traverses every pixel without visiting any of them twice and keeps a continuous curve. It is fast and efficient.
= Markov meshes
=The lower-order Markov chain and Hilbert space-filling curves mentioned above are treating the image as a line structure. The Markov meshes however will take the two dimensional information into account.
= Dependency tree
=The dependency tree is a method using tree dependency to approximate probability distributions.
References
External links
Advanced Vision homepage
The Use of Context in Pattern Recognition
Image Analysis and Understanding: contextual image classification Archived 2004-12-10 at the Wayback Machine
Kata Kunci Pencarian:
- Jaringan saraf konvolusional
- Contextual image classification
- Computer vision
- 18 (British Board of Film Classification)
- Document classification
- Pattern recognition
- Outline of computer vision
- Irish Film Classification Office
- Image segmentation
- Convolutional neural network
- Time delay neural network