Visual Search

--- Technical Paper on ‘Visual Search’ by Group C6 of B.Tech. (CSE) for Minor Project, November 2008 ---

VISUAL SEARCH
Lov Loothra, Ashish Goel, Prateek and Shikha Vashistha
Department of Information Technology and Computer Science Engineering
Amity School of Engineering and Technology, Bijwasan

Abstract – This paper describes the on a codification of the image, trying to work on a
implementation of an application which accepts an minimal set of data which respects (and allows to
image as input from the user and finds images that reconstruct) the most important characteristics of the
are similar to it from a specified directory. Similar image. Besides, codification usually allows the
images may be defined as images that bear an deletion of redundant information and it is easy to
exact (pixel to pixel) resemblance to the query work on the improvement and analysis of the image
image or images that depict some likeness to the directly on the codified representation of the same.
query image in terms of their intensities (color),
overall shape (texture) or a combination of these Obviously, the reduction level of the image original
two factors. The application also aims to index or data can be associated to a relative loss of
sort the images of the database in order of their information. It is always convenient that the
similarity to the query image, i.e., from the most codification admits inversion (i.e., recovering the
similar to the least similar image. original image or an approximation of that original
image with the slightest error). Also, despite
Index Terms – edge detection, hausdorff distance, modifications made to the image, such as color, scale
image codification, image comparison, image or texture changes, it would be important to maintain
indexing, image similarity codification invariability. But this, at the same time,
requires the codified representation to store some
1. INTRODUCTION extra information to make such an inversion possible.
As of now, almost all popular search engines are text
Traditionally, the problem of image similarity
or tag based, i.e., they search for a web page, an
analysis – i.e., the problem of finding the subset of an
image, a video etc. on the basis of keywords used to
image bank with similar characteristics to a given
describe/store them. This provides for extremely
image – has been solved by computing a "signature"
accurate and practical results when we want to search
(codification) of each image to be compared, so then,
for a particular topic or information contained in a
correspondence between the signatures could be
web page. But the same method usually leads to
analyzed by means of a distance function that
somewhat inaccurate results when we’re specifically
measures the degree of approximation between the
searching for images, videos or related media for the
two given signatures.
simple reason that one person’s description may not
be accurate enough to cover all keywords.
Traditional methods to compute signatures are based
on some attributes of the image (for example, color
Instead, if we use an image itself as the search
histogram, recognition of a fixed pattern, number of
‘keyword’ and check for images that are similar to it,
components of a given type, etc). This "linearity" of
we’re bound to get more accurate results. This is
the signature makes it really difficult to obtain data
especially useful when the user knows what he wants
about attributes which were not considered in the
to obtain as a result of the search: it could be an
signature (and which could be relevant to the
image similar to the one he inputs, an image of higher
similarity or difference between two images). For
quality (better resolution) or an image that ‘contains’
instance, if we only take into account color
the image he’s input.
histograms, we would not take into account image
texture, nor we would be able to recognize similar
2. IMAGE & IMAGE SIMILARITY
objects painted in different colors.
A digital image is a function f (x, y) which has
been discretized in spatial coordinates and brightness. There are several well-researched methods in the
It can also be represented as a matrix, in which the domain of image processing that can be used to
rates of line and column identify a point in the image, formulate a working visual-query based database
and the content value in the matrix identifies the level search application. The techniques used in our project
of gray (or color) in that point (pixel). are briefly described below. Furthermore, this paper
elucidates the nuances of the actual implementation
The volume of the required data for the storage (and of the visual search application.
processing) of an image, makes it convenient to work


3. HASHING 7. DETAILS OF IMPLEMENTATION
A cryptographic hash function is a transformation The application, while searching, considers:
that takes an input (or 'message') and returns a fixed-  Exact match(es) (of the Source Image)
size string, which is called the hash value. The ideal  Color
hash function has three main properties - it is  Texture (Shape)
extremely easy to calculate a hash for any given data,
it is extremely difficult or almost impossible in a The first point involves searching the target directory
practical sense to calculate a text that has a given for an image or for images that are exact replicas of
hash, and it is extremely unlikely that two different the query image. This is accomplished using the
messages, however close, will have the same hash. hashing technique (explained below). The second and
third points involve searching for non-exact images
By computing and then comparing the hash of each that bear some degree of resemblance to the query
image, it can be quickly ascertained whether the image. For this, the images (query and database) are
images were identical or not. first subjected to the edge-detection filter and,
subsequently, the Hausdorff metric of the filtered
4. COLOR MAP database images with respect to the query image is
A pixel by pixel image comparison of two images can computed. Also, the generated Color Maps of the
also determine whether two images are alike. This, images are compared trivially to generate difference
however, becomes highly inefficient for large images metric. These are used to determine the degree of
and at the same time doesn’t take into account the similarity. The nuances of the implementation of the
regional or spatial similarity or dissimilarity. Hence above techniques are detailed below.
we use Color Maps. In our implementation, a Color
Map represents an image divided into blocks. These 7.1 HASHING TECHNIQUE
blocks (of a predetermined size) are made of a group The SHA hash functions are a set of cryptographic
of pixels and are used to represent the average pixel hash functions designed by the National Security
intensity of a particular area of the image. Agency (NSA) and published by the NIST as a U.S.
Federal Information Processing Standard. SHA stands
Corresponding blocks of two image maps can then be for Secure Hash Algorithm. The five algorithms are
compared to determine similarity or dissimilarity. denoted SHA-1, SHA-224, SHA-256, SHA-384, and
SHA-512. The latter four variants are sometimes
5. EDGE DETECTION collectively referred to as SHA-2. SHA-1 produces a
Edges characterize boundaries and are, therefore, a message digest that is 160 bits long; the number in
problem of fundamental importance in image the other four algorithm names denote the bit length
processing. Edges in images are areas with strong of the digest they produce. The classes used for
intensity contrasts – a jump in intensity from one computing these hashes are predefined in
pixel to the next. Detecting the edges of an image System.Security.Cryptography [6] which
significantly reduces the amount of data and filters can be freely used in any .NET or Visual Studio
out useless information, while preserving the implementation.
important structural properties in an image.
Hashing is a faster method to compare the images to
6. HAUSDORFF DISTANCE allow the tests to complete in a timely manner, rather
than comparing the individual pixels in each image
The Hausdorff distance [1] measures the extent to using GetPixel (x, y) [5][6]. Hashes of two
which each point of a ‘model’ set lies near some point images should match if and only if the corresponding
of an ‘image’ set and vice versa. Thus, this distance images also match. Small changes to the image result
can be used to determine the degree of resemblance in large unpredictable changes in the hash. This
between two objects that are superimposed on one property of the generated hashes can be used to find
another. Computing the Hausdorff distance between exact matches (duplicates) of the query image.
all possible relative positions of the query image and
the database image can solve the problem of detecting The ComputeHash [6] method of this class takes a
image containment. The Hausdorff distance byte array of data as an input parameter and produces
computation differs from many other shape a 256 bit hash of that data. By computing and then
comparison methods in that no correspondence comparing the hash of each image, it would be
between the query image and database image(s) is quickly able to tell if the images were identical or not.
derived [1]. The method is quite tolerant of small The problem was hence to device a way to convert
position errors as occur with edge detectors and other the image data stored in the Bitmap [5][6] objects to
feature extraction methods. Moreover, the method a suitable form for passing to the ComputeHash
extends naturally to the problem of comparing a method, namely, a byte array. The
portion of a model against an image.
ImageConvertor [6] class was thus used to allow

-2-


us to convert the Image (or Bitmap) objects to the the gradient of this signal (which, in one dimension,
hash-able byte array. is just the first derivative with respect to t) we get a
signal as shown by [FIG 7.3.2].
Examples: [7.1.1], [7.1.2].
Clearly, the derivative shows a maximum located at
7.2 COLOR MAPS the center of the edge in the original signal. This
method of locating an edge is characteristic of the
Color Maps can be easily and efficiently generated ‘gradient filter’ family of edge detection filters and
for small images by taking the respective Red, Green
includes the Sobel method [3]. A pixel location is
and Blue averages of a Block (16x16 in our
declared an edge location if the value of the gradient
implementation) at a time dynamically using:
exceeds some threshold. As mentioned before, edges
will have higher pixel intensity values than those
IntnstyAvg =
surrounding it.
(IntnstyAvg * (p – 1) + CIntnsty)/p
Based on this one-dimensional analysis, the theory
where p represent the current pixel location, and
can be carried over to two-dimensions as long as
CIntensity represents the present calculated intensity
there is an accurate approximation to calculate the
value.
derivative of a two-dimensional image. The Sobel
operator performs a 2-D spatial gradient measurement
However this method fast deteriorates as image size
on an image. Typically it is used to find the
increases and the number of pixels go up to a few
approximate absolute gradient magnitude at each
million. The most practical and efficient solution is to
point in an input grayscale image.
Scale the image down to a fixed size. For this we
need to know the scale factor, sf, based on the image
The Sobel edge detector uses a pair of 3x3
dimensions and the size itself:
convolution masks [3], one estimating the gradient in
MAX_DIM = Max(Img_Width, Img_Height) the x-direction (columns, Gx) [FIG 7.3.3] and the
sf = FIXED_SIZE / MAX_DIM other estimating the gradient in the y-direction (rows,
Gy) [FIG 7.3.3]. A convolution mask is usually much
So therefore, we have: smaller than the actual image. As a result, the mask is
slid over the image, manipulating a square of pixels at
New_Width = sf * Img_Width a time. An approximate magnitude can then be
New_Height = sf * Img_Height calculated using: |G| = |G x| + |Gy| [3]

Once an image is scaled the Intensity Average for a The actual algorithm involves the computation of the
block is computed and stored. The intensity of a grayscale of the image (if required) followed by the
particular pixel is obtained by the trivial application of the gradient masks.
GetPixel(x, y) method. These stored values of
the regional blocks (say A1, B1 for two images A, B) In our implementation, we used the Bitmap class to
can then be compared by a simple absolute difference represent the image. The GetPixel(x,y) method
scaled over the 8-bits used to represent the color was used to obtain the value of the Color[5][6] of
component (RGB): the pixel located at x, y. The working loop traversed
the entire dimensions of the image and obtained the
Difference = Color value (24 bit value for modern images). By
1 - |Blk_A1_Avg - Blk_B1_Avg| / 255 taking the average of the RGB component of the
Color value, we converted it to an 8-bit grayscale.
The computed value was then stored in a matrix as a
Examples: [7.2.1], [7.2.2].
simple integer between 0 – 255 for easy recall.
7.3 SOBEL EDGE DETECTION
The active pixel region, consisting of the current
There are many ways to perform edge detection. pixel location (say x, y) was then subjected to a
However, most of the different methods may be gradient. The region included 8 pixels adjacent to the
grouped into two categories: gradient and Laplacian. active pixel for a total of 9 pixels which could be
The gradient method detects the edges by looking for directly correlated (using Hadamard product) with the
the maximum and minimum in the first derivative of 3x3 gradient matrices and summed to produce the
the image. The Laplacian method searches for zero gradient values in x and y directions. The computed
crossings in the second derivative of the image to find gradient was then compared to the threshold of the 8-
edges. bit Bitmap, i.e., 0 & 255 and an appropriate intensity
value was assigned.
Suppose we have a signal, with an edge shown by the
jump in intensity as shown in [FIG 7.3.1]. If we take Examples: [7.3.4], [7.3.5].

-3-


7.4 CANNY EDGE DETECTION along the edge in the edge direction and suppress any
[2] pixel value (set it equal to 0) that is not considered to
The Canny edge detection algorithm is known to
be an edge (i.e., has a value less than its neighbor).
many as the optimal edge detector. It enhances the This will give a thin line in the output image. This is
many edge detectors already available. It is important accomplished by simply comparing the current pixel
that edges occurring in images should not be missed value under consideration with its two nearest
and that there be NO responses to non-edges. neighbors in one (of the four possible) direction that
Likewise, it is also important that the edge points be has been determined previously. The lower values
well localized. In other words, the distance between can be ignored.
the edge pixels as found by the detector and the actual
edge is to be at a minimum. Finally, hysteresis is used as a means of eliminating
streaking [2]. Streaking is the breaking up of an edge
The detector draws upon the implementation of the
contour caused by the operator output fluctuating
Sobel filter discussed previously. But before applying
above and below a particular threshold. If a single
the Sobel filter to the image, there is a need to
threshold, T1 is applied to an image, and an edge has
eliminate noise from the image. This noise removal is
an average strength equal to T1, then, due to noise,
done with the help of a Gaussian filter which
there will be instances where the edge dips below the
basically blurs the image. This is done by applying a
threshold. Equally it will also extend above the
Gaussian mask over the image. For the purpose of
threshold making an edge look like a dashed line.
implementation, we used a 3x3 mask [FIG 7.4.1] and
slid it over the image; manipulating a square of pixels
To avoid this, hysteresis uses 2 thresholds: high and
at a time by simple convolution.
low. Any pixel in the image that has a value greater
than T1 is presumed to be an edge pixel, and is
After the application of the Gaussian and Sobel
marked as such immediately. Then, any pixels that
filters, we obtain an image (over an 8-bit grayscale)
are connected to this edge pixel and that have a value
that approximates the intensity change areas of the
greater than T2 are also selected as edge pixels. To
image. The problem statement now is to remove the
follow an edge, start with a gradient of T2 and stop
gray factor which is a local maximum but a non-
when you get a gradient below T1. This step is very
maximum when viewed w.r.t. its neighbors. This is
similar to the following of edges and suppression of
known as non-maximum suppression and is done by
non-maximums and hence can be clubbed together in
determining the edge direction and then following it
the final implementation.
to remove the regional non-maximums. This step was
clubbed with the implementation of the Sobel filter as
Example: [7.4.3].
the direction could be trivially deduced as: θ =
tan-1 Gy/Gx, with appropriate exceptions being 7.5 HAUSDORFF DISTANCE COMPUTATION
made when Gx and/or Gy compute to 0, as:
orientation = (Gy == 0) ? 0 : 90. Given two finite point sets A = {a1,...ap} and B
= {b1,...bq}, the hausdorff distance between
Once the edge direction is known, the next step is to them is defined as:
relate the edge direction to a direction that can be
traced in an image. So if the pixels of a 5x5 image are H(A, B) = max(h(A, B), h(B, A)) [1]
aligned as in [FIG 7.4.2], then, it can be seen by
looking at the centre pixel, a, there are only four where h(A, B) = max a є A min b є B || a -
possible directions when describing the surrounding b || and || - || is some underlying norm on the
pixels: points of A and B (for a visual representation of
hausdorff distance refer [7.5.1]).
 0 degrees (in the horizontal direction),
 45 degrees (along the positive diagonal), The function h(A, B) is called the directed
 90 degrees (in the vertical direction), or hausdorff distance [1] from A to B. It identifies the
 135 degrees (along the negative diagonal) point a є A that is farthest from any point of B, and
Hence the obtained direction is now resolved into one measures the distance from a to its nearest neighbor
of these four directions depending on which direction in B (using the given norm || - ||, Euclidean in this
it is closest to. As an example, if the orientation angle case). That is, h(A, B) in effect ranks each point of
is found to be 3 degrees, make it zero degrees. The A based on its distance to the nearest point of B, and
resolved angle is stored in an array for further then uses the largest ranked such point as the distance
reference and recall. (the most mismatched point of A). Intuitively, if
h(A, B) = d, then each point of A must be within
Following the computation of the edge directions, we distance d of some point of B, and there also is some
are now in a position to perform non-maximum point of A that is exactly distance d from the nearest
suppression [2]. Therefore, we now need to trace point of B (the most mismatched point).

-4-


The hausdorff distance, H(A, B), is the maximum example. Given a threshold distance τ and the point
of h(A, B) and h(B, A). Thus it measures the (Bx, By), we need only consider it for distance
degree of mismatch between two sets, by measuring computation from the point (Ax, Ay) iff: (Ax – τ)
the distance of the point of A that is farthest from any ≤ Bx ≤ (Ax + τ) AND (Ay – τ) ≤ By ≤
point of B and vice versa. Intuitively, if the hausdorff (Ay + τ). This speeds up computations for smaller
distance is d, then every point of A must be within a values of τ and limits the maximum possible
distance d of some point of B and vice versa. Thus hausdorff distance. Visual inaccuracies may occur
the notion of resemblance encoded by this distance is when seemingly similar but translated images are
that each member of A be near some member of B compared under this assumption.
and vice versa. Unlike most methods of comparing
shapes, there is no explicit pairing of points of A with 7.5.3 Termination at Infinite Distance
points of B (for example many points of A may be
close to the same point of B) [1]. It can be noted that the outer loop of the algorithm
(Loop 2) simulates the maximum distance retention.
The extraction of the point sets from the images is This assumption builds on the previous assumption in
based on the result of the Canny Edge detector. The the sense that given the boundaries of the threshold
implementation uses those points of the Canny- distance window, there may be a few points from A
filtered image that actually constitute an edge. These which are not in the vicinity of any point from B.
points can be trivially determined by checking for Hence the computed distance will retain the initial
only the non-zero intensity pixels. value of infinity. Further consideration of any point
hereafter is trivially meaningless as the maximum
The function h(A, B) can be trivially computed in value of infinity was retained.
time O(pq) for two point sets of size p and q
respectively using the following Brute-Force 7.5.4 Scaling
Algorithm: Even after the application of the above techniques,
the computation efficiency rapidly deteriorates as
1. h = 0 image size increases and the number pixels go up to a
2. for every point ai of A, few million. Hence, as was discussed in section 7.2,
2.1 shortest = INF; the image is scaled down to a fixed size on the basis
2.2 for every point bj of B of a scale factor to effectively reduce the number of
dij = d (ai , bj ) pixels significantly.
if dij < shortest then
shortest = dij The above assumptions do affect the overall accuracy
2.3 if shortest > h then of the hausdorff metric but are useful nonetheless for
h = shortest a much required speed-up.

Our implementation used a slightly modified version 7.6 CONCLUSION AND OBSERVATIONS
of the above algorithm which makes certain
assumptions and eliminations based on the Hence, given any two images under consideration, we
computation of the Hausdorff metric. The steps to can easily compute their hash-values and their mutual
improve computation time are summarized below: hausdorff metric (after Canny filter application).
While on one hand the hash value comparison can
7.5.1 Termination at Zero Distance trivially determine whether or not the given images
are exact in all respects; the hasudorff metric signifies
This builds on the fact that the result of the distance the ‘closeness’ of the two images. A hausdorff metric
norm (Euclidean norm was used in our of 0 indicates exactness as far as features are
implementation; i.e., d = √ {(x1 - x2)2 + (y1 concerned, whereas further values reveal increasing
- y2)2}) can never be less than 0. Hence, once the dissimilarity between images.
inner loop of the above algorithm (Loop 2.2)
computes the shortest distance to be 0, we can safely This implementation can be extended intuitively to
stop considering any further points from B to consider a database of images.
compute the distance from the particular point ai є
A). This considerably speeds up the computation time Examples:
by skipping a significant chunk of unconsidered [7.6.1] Source Database
points. [7.6.2] Filtered images
[7.6.3] Hausdorff distances computed w.r.t.
7.5.2 Threshold Distance Window Firefox_Logo_Normal (Source Image)
We can eliminate the need to consider a point if it lies Results sorted in order of decreasing similarity.
outside a particular threshold distance window or
block. This can be understood with the help of an

-5-


8. SUMMARY OF IMPLEMENTATION 9. REFERENCES
A summary of the implementation is presented below [1] Daniel P. Huttenlocher, Gregory A. Klanderman,
in the form of a pseudo-code. and William J. Rucklidge. Comparing Images Using
the Hausdorff Distance. IEEE Trans. Pattern Analysis
8.1 Input Source Image, SI and Machine Intelligence, September 1993.
8.2 Input Target Directory, TD
-- Preprocessing Phase [2] J. Canny. A Computational Approach To Edge
8.3 For each image in the TD: Detection. IEEE Trans. Pattern Analysis and Machine
8.3.1 Compute & store the hash value (HV) Intelligence, November1986.
8.3.2 Compute and Store Color Details (CD)
8.3.3 Apply the Canny (Sobel based) filter [3] I. Sobel, G. Feldman. ‘A 3x3 Isotropic Gradient
8.3.4 Compute the location of non-zero Operator for Image Processing’. Presented at a talk
pixels and store in a matrix at the Stanford Artificial Project in 1968; Pattern
-- Preparation Phase Classification and Scene Analysis, 1973.
8.4 Compute HV for SI
8.5 Compute & Store Color Details of SI [4] H. Alt, B. Behrends and J. Blomer. Measuring the
8.6 Apply Canny filter to SI resemblance of Polygon Shapes. Proc. Seventh ACM
8.7 Compute & store location of non-zero pixels Symposium on Computational Geometry, 1991.
-- Comparison Phase
8.8 For each image in the TD: [5] Herbert Schildt. C# 2.0: The Complete Reference,
8.8.1 Compare HV of SI with stored HVs of Second Edition. Tata McGraw-Hill, 2006.
the image
8.8.2 Compare CD of SI with stored CDs of [6] MSDN Library. msdn.microsoft.com/en-
the image us/library/default.aspx
8.8.3 Compute Hausdorff metric b/w SI and
the image using the stored location of non-
zero pixels
8.8.4 Assign rank to image based on HV
comparison, computed Hausdorff metric and
the Color Details
-- Sorting Phase
8.9 Sort images of TD based on rank
8.10 Display images in sort-order

-6-


FIGURES

[7.1.1]

[7.1.2]

-7-


[7.2.1]

[7.2.2]

-8-


[7.3.1]

[7.3.2]

[7.3.3]

-9-


[7.3.4]

[7.3.5]

- 10 -


[7.4.1]

[7.4.2]

[7.4.3]

- 11 -


[7.5.1]

- 12 -


[7.6.1]

[7.6.2]

[7.6.3]

- 13 -

Visual Search

More Related Content

What's hot (14)

Viewers also liked (6)

Similar to Visual Search (20)

More from Lov Loothra (7)

Recently uploaded (20)

Visual Search