An Object-Level Feature Representation Model for the Multi-target Retrieval of Remote Sensing Images
- DOI : 10.5626/JCSE.2014.8.2.65
- Author: Zeng Zhi, Du Zhenhong, Liu Renyi
- Organization: Zeng Zhi; Du Zhenhong; Liu Renyi
- Publish: Journal of Computing Science and Engineering Volume 8, Issue2, p65~77, 30 June 2014
To address the problem of multi-target retrieval (MTR) of remote sensing images, this study proposes a new object-level feature representation model. The model provides an enhanced application image representation that improves the efficiency of MTR. Generating the model in our scheme includes processes, such as object-oriented image segmentation, feature parameter calculation, and symbolic image database construction. The proposed model uses the spatial representation method of the extended nine-direction lower-triangular (9DLT) matrix to combine spatial relationships among objects, and organizes the image features according to MPEG-7 standards. A similarity metric method is proposed that improves the precision of similarity retrieval. Our method provides a trade-off strategy that supports flexible matching on the target features, or the spatial relationship between the query target and the image database. We implement this retrieval framework on a dataset of remote sensing images. Experimental results show that the proposed model achieves competitive and high-retrieval precision.
Remote sensing , Image processing , Spatial representation , 9DLT , Content-based remote sensing image retrieval
Along with the rapid progress of satellite sensor technology and their application to high-resolution remote sensing images in Earth observation systems, a large amount of remote sensing data have become readily available for acquisition. In terms of spatial information, terrain geometry, and texture information, high-resolution remote sensing images have more features than middle or low-resolution images. To use the image database fully and to retrieve interesting information automatically and intelligently, a new efficient technology for multi-target retrieval (MTR) in an image, particularly in a specified region, is expected to be developed.
The number of image-processing applications for target retrieval is increasing, such as query by image content from IBM . Most studies in this area have focused on content-based image retrieval (CBIR) and content-based remote sensing image retrieval (CBRSIR), and have achieved significant results. In these processes, the contents of an image, which specify several low-level features, such as color, texture, shape, longitude and latitude, and spatial relationships among objects, are the bases of multidimensional image feature vectors. Regarding the differences in imaging conditions of various forms of remote sensing images, we cannot exactly express image contents by using only a single feature. Therefore, constructing the comprehensive features of an image is the key to improving extraction performance . However, if the combined features cannot be purified to form a unified model, then the accuracy of the similarity extraction and efficiency improvement of the images will be affected. For example, if we focus more on spatial relationship, then the detail of each target will be minimal. Thus, the efficiency of MTR will be higher than comparing the features of a single object. To effectively reveal the information retrieval process for remote sensing images, an objectlevel model is proposed, which can represent the contents of an image with overall accuracy. By using this model, we can retrieve and operate the information pre-stored in a symbolic image database (SID) with high efficiency, and neglect intrinsic information, such as color, texture, and shape. To date, research on feature representation models of image data for MTR remains limited. To build feature indices and to realize rapid retrieval, we propose an object-level feature representation model, based on a previous research on CBRSIR and the reference for MPEG-7 standards, starting with representing the contents of an image on an object-level feature, particularly the spatial relationship among targets.
The rest of this paper is organized as follows. Section Ⅱ discusses related literature on representation techniques of image contents for CBIR or CBRSIR. Section Ⅲ introduces calculation and representation feature values, and mainly describes the spatial representations of the extended nine-direction lower-triangular matrix (9DLT). Section Ⅳ presents a model of image content feature representation. Section Ⅴ proposes an MTR model and similarity calculation. The last section presents several experiments to validate the accuracy of the content-based feature representation model and the efficiency of image target extraction. A conclusion to the study is also presented in this section.
In the past three decades, academia has achieved a large number of results on CBIR and CBRSIR. At present, CBIR has many successful applications in the fields of facial recognition, medical diagnosis, and trade registration. Most of these systems have adopted single feature or combined features as image indices [3-10]. CBRSIR is similar to CBIR, because both contain visual and geographic information. Several systems have focused on the issue of spectral retrieval, such as texture representation, and different combinations with spectral bands . A special feedback approach has been employed to precisely describe the desired search characteristic in a remote sensing image . Some researchers even presented a code stream of images for remote sensing image retrieval . In addition, other scholars combined a scheme with an automatic classifier, and proposed the use of new feature ‘texton histograms’, to capture the weaktextured characteristic of remote sensing images for image retrieval . Meanwhile, others applied a texture analysis approach, called the local binary pattern operator, to implement image retrieval . Some of these studies even applied independent component analysis to extract independent components of feature values via linear combinations to realize multi-spectral image retrieval ; or adopted principal component analysis and a clustering technique to index remote sensing images for image retrieval . Considering various features, such as color, texture, and spectra, a prototype model for CBRSIR based on color moment and gray level co-occurrence matrix feature was proposed . A number of researchers combined several properties (color, texture, and points of interest) that were automatically extracted and immediately indexed images . In addition, some researchers proposed a framework based on a domaindependent ontology to perform semantic retrieval in image archives . Other scholars also presented a universal semantic data model for image retrieval . Regardless of how a feature vector is established, this vector still depends upon the representation of contents in images. To date, the contents of images can be represented in numerous ways. Some approaches adopt a quad-tree structure or a quin-tree method that splits large-scale remote sensing images into sub-images, to extract multiple features, such as color and texture [22, 23]. Others use the 2D C-string to represent spatial knowledge of an image database ; or the spanning representation of an object to realize spatial inference and similarity retrieval in an image database, through directional relation referenced frames . Others depict the relationships among spatial objects by using the methods of the nine-direction spanning area  or 9DLT ; and represent image colors by using pyramid technology ; or express an image by employing a symbol index, which is established in image space stratification . All the aforementioned related representations include color, space, and subimages that belong to the feature representation method on image contents. Implementing rapid and accurate retrieval with a massive remote sensing image is difficult, because its features include various data types, resolution scales, and data sources. In our investigation, we analyze the contents of an image based on the MPEG-7 standard to organize the features of the image, build an SID, and index the SID to accelerate target retrieval.
The key to improving image retrieval efficiency is the index technique, which involves obtaining objects after image segmentation and building an SID for the image database. In the present study, we adopt a mature algorithm, called object-oriented multiscale image segmentation. The object-oriented image processing algorithm is a synthetic algorithm that fuses spectrum characteristics, geometric information, and structural information. This algorithm regards an object as a minimal process unit by retrieving its multiple characteristics to form a logical relationship among images and objects. Then, we analyze the image from the local to the entire level, and ultimately, implement its understanding. In general, multiscale image segmentation begins with any pixel by using a region merging method, from the bottom to the top, to form objects. Small objects can be merged to form large objects, and the size of each object must satisfy the demand of which the heterogeneity of a merged object is less than a given threshold. In this case, heterogeneity is decided by differences in the spectra and shapes of objects. However, various features correspond to different scales of observation, in which each feature can be extracted and accurately analyzed in an image layer on a proper scale . In particular, we use the threshold value method on a multiple scale to segment an image.
After processing the calibration, segmentation, and raster vectorization of the image based on a specified region of latitude and longitude, the basic unit of the image is no longer a single pixel, but a polygon that is composed of homogeneous pixels. Each polygon can be used to calculate the spectral information of pixels, including shape, texture, color information, and topological relationships among the polygons. Next, we will introduce the method for calculating the feature vectors to implement the representation model.
Shape is a key feature used to differentiate two objects. It is also the basis for characteristic retrieval, and the classified process mentioned in the latter part of this paper. In general, in the field of object-level content retrieval, shape remains as the most basic feature for distinguishing objects. At present, two approaches are used to describe shapes: parametric and geometric approaches. In the present investigation, we adopt a geometric approach to characterize the shapes of different objects, namely, the model of centroid radii representation .
For an arbitrary polygon, such as the one shown in Fig. 1(a), the results of resampling an image with the angle
θinterval around, and counterclockwise to the y axis, are shown as Fig. 1(b). Let lkbe the distance between the centroid of the polygon, and the boundary sampling point. The shape descriptor of the polygon can be expressed by a centroid-radius model, as follows:
The condition for measuring similarity between two polygons based on shape is: if and only if the numerical difference between the central radii in all directions is less than a given minor threshold value
ε. That is, when two polygons are similar, the shape descriptor must satisfy the following regulation:
To ensure scale invariance of shape by using regulation (2), we need to normalize the Euclidean distance between the centroid and each vertex, within the range of [0, 1]. In this study, we discuss most of the possible transformations between two feature vectors. One of these transformations involves the possible rotations between two shapes, and the distances that are independent from rotation, including the starting and ending points.
After transforming image shape into matrix space, we store data, using the antipole tree structure .
Different features in high-resolution images typically have similar spectral appearances to human vision. The mean value floating of a spectral feature may also cause similar spectra among different homogeneity samples to converge as similar modes in a feature space, thus resulting in spectra with similar features. This phenomenon is attributed to the human eyes being insensitive to some portion of visible light. Therefore, we can improve the reliability of retrieval results by using features, such as shape, texture, and spatial relationships as references. Texture is a significant geometric (spatial) feature that can be used to distinguish among different objects and regions to reflect the changing discipline in gray space. A 2D Gabor filter is suitable for narrow-band coding of texture, because of its adjustable filtering direction, bandwidth, general band-center frequency, and optimal timedomain and spatial-domain analysis abilities. After finishing gray-scaling and normalization processes on image segmentation, we apply a Gabor filter to extract the texture feature of the objects. The Gabor filter function
g( x, y) is a 2D Gaussian function that is modulated by the complex sine window function g( x, y). Its Fourier transform function G( u, v) can be expressed by the following equations:
where, , , and σ
x, σ yare the Gabor filter spatial range and bandwidth of the frequency domain, respectively. In this case, ( f, 0) is the central frequency in the filter of the orthogonal coordinate in the frequency domain. Let g( x, y) be a function of the mother that generates the Gabor filter family. The set of functions g( m,n x, y), which is a complete non-orthogonal dataset, can be generated through rotation and scaling, according to Eq. (4).
x' = a−m( xcos θn+ ysinθ n), y' = a−m(− x ysinθ n+ ycos θn), a > 1, θn= nπ/ K, m= 0, 1, ..., S − 1, and n= 0, 1, ..., K− 1. Parameter θnis the counterclockwise rotation angle along the filter axis. S, Kare the total scale and rotation, respectively. After obtaining the energy value of each filter and the convolution of the image, we calculate the mean value and the mean square deviation of the filtering value on the energy of each object. Finally, we mark the texture feature vector of the object, as shown in Eq. (5).
Kis the central frequency, and Lis the directional angle. k= 0, 1 ..., K− 1, l= 0, 1, ......, L− 1, and Ek,l( x, y) is the filtering energy value of the filter ( k, l). Normalization is required to proceed toward Ek,l( x, y), to ensure that the energy value of each element in the energy information is not affected by the actual size. We commonly use Ek,l( x, y) = to calculate the energy value, according to the gray value p( x, y) in localocation ( x, y). Finally, the mean value μof the energy, and the mean square deviation σ of the target object ( n× npixels), can be obtained as Eqs. (6) and (7), respectively.
The spatial representation of an image describes the spatial relationships among objects to easily distinguish images with multiple targets. The spatial relationships in an image can be classified into two categories: positional and directional relationships. The former can be represented by a 2D string; whereas, the latter can be represented by 9DLT methods . For a calibrated remote sensing image within the region of a certain latitude and longitude, the directional relationship relative to the four corners among the objects is confirmed. In this section, we introduce problem definitions and preliminary concepts, through formal methods.
DEFINITION 1. Let α = (α1, α2, ..., αk) be a set of objects in the same image. Hence, αi is a subset of α. DEFINITION 2. The spatial relationship between two objects can be defined as one of the codes in nine directions, which is called 9DLT. DEFINITION 3(The 9DLT matrix). Let V = ｛v1, v2, v3, ..., vm｝ be composed of m distinct sets of objects, and Z be composed of z1, z2, z3, ..., zs in order, whereⱯ i= 1, 2, ..., s, zi ∊Suppose V. Cis a collection of 9D encodings, as shown in Fig. 2(a). Each direction code can then be used to specify the spatial relationship between two objects. Thus, a 9DLT matrix Tis an s × s matrix that is composed of tij, which belongs to the collection of 9D encoding C. The item tijat row icolumn jrepresents the direction code from zjto zi, only when iand jsatisfy the condition . j< i∈(1, s)
As shown in Fig. 2(a), let
Rbe the referred object expressed by 0, in which we define the direction code in a 45° interval from the northern counterclockwise, as 1…8. Each object from the source image will be represented by one centroid in a 9DLT expression. Fig. 2(b) shows a feature image that contains four objects. Fig. 3(a) exhibits the direction map in the grid between the objects; whereas, the direction code of the LT matrix in Fig. 3(b) demonstrates the spatial relationships among objects. The 9DLT string is ( A, B, C, D, 6, 6, 6, 7, 5, 4) in column order. A relationship between two objects exists in the matrix. DEFINITION 4. A pattern consists of the sets of objects and the spatial relationships among these objects. For example, α= ( α1, α2, ..., α k, αr1, αr2, ..., αr m) is a pattern, α iis an object and αr jis the corresponding spatial relationship, where 1 ≤ i≤ k, m= = k( k-1)/2, 1 ≤ j≤ m, and k≥ 2. That is, the spatial relationships between any two objects in this pattern are recorded. The length of a pattern is equal to the amount of objects. A pattern with a length that is equal to kis called the k-pattern. Constraints:(1) An item or object in a pattern is stored in alphabetical order.(2) No spatial relationship exists, if the length of a pattern is equal to 1.
The 9DLT expression is in accordance with the definition of the pattern.
DEFINITION 5.Pattern α= ( α1, α2, ..., α i, αr1, αr2, ..., αr m) is a sub-pattern of pattern β= ( β1, β2, ..., βj, βr1, βr2, ..., βr n), where ( α1, α2, ..., α i) is a subset of β1, β2, ..., βj), where j≥ i≥ 2. The spatial relationship between any two items in α is the same as in β. Pattern β contains pattern α, where. β⊇ αThe amount of a sub-pattern is N= C i i+ C i i−2 + ··· + C i2. For example, pattern α = ( A, B, C, 6, 6, 7) is a sub-pattern of pattern β = (A, B, C, D, 6, 6, 6, 7, 5, 4), because ( A, B, C) is a subset of ( A, B, C, D), and the code values of the spatial relationship of objects A, B, and Care the same as the code values underlined in pattern β. DEFINITION 6. The minimum support is the amount of objects that satisfy the spatial relationships, which is equal to the required amount of search objects. Inference 1.Two k-patterns can be joined, only if k−1 objects and the corresponding relationships between them are the same, and ksatisfies condition k≥2. Inference 2.Suppose a pattern does not contain any ( k−1) pattern; then, this pattern cannot be contained in the k-pattern. Inference 3.The pattern of feature images and their specific sub-pattern can be obtained from a 9DLT string. By contrast, if ( k−1)-pattern, k−1 objects, and the spatial relationships in the object sets are given, then the relative candidate sets of the k-pattern can be acquired.
Generating candidate sets can significantly help object retrieval. To extract the image with the object (
A, B, C) of minimum support 3 and spatial relationship αr in the image database, two 2-patterns are required, namely, ( A, B, 4) and ( A, B, 5), in which the same object A belongs to both patterns, and satisfies the joining condition. Then, we can calculate the candidate 3-pattern ( A, B, C, 4, 5, Δ). As shown in Fig. 4, the possible results are ( A, B, C, 4, 5, 7), ( A, B, C, 4, 5, 8), and ( A, B, C, 4, 5, 6). The direction codes of the possible relationship between Band Care 7, 8, and 6; therefore, the spatial representation model is ABC(4, 5, X: ｛7, 8, 6｝).
Similarly, the 9DLT string of each image is known in the image database. That is, the spatial relationship between objects has been confirmed, and the problem of finding all images that satisfy minimum support is the process of matching patterns. In fact, the process can be converted to search the LT matrix with a problem on inclusion relationship. As shown in Fig. 5, according to the difference of the given objects and the minimum support, the position of matching matrix
Pin the LT matrix may only be a part of the relation direction codes. The range of mapping to candidate matrix Cis also in k× k.
The description of the match algorithm is as follows.
Typically, a data model is a framework that can be used to provide representation for information, and an operation method in the database system. The object-level feature representation model belongs to a section of this data model. For remote sensing images, the data model also includes metadata, such as location, resolution, and light intensity. However, the standard of measurement for a content retrieval system determines the efficiency and accuracy of extraction. Hence, each image needs a good model with an efficient content-based representation. Moreover, selecting a formula for similarity calculation is also vital. Based on this concept, we present the objectlevel feature representation model for the image data in the next section.
According to MPEG-7 standards and the object-oriented concept, the object-level feature representation model for image data is described as a structural tree via layers . As Fig. 6 shows, the first layer is the object name, while the second layer is the feature name of the feature information that the object contains. Further down are the layers for sub-features, feature attributes, attribute values, etc. Constructing this structural tree is convenient for indexing feature information.
The overall model of the feature image can be represented by a formal method, as follows:
EAstands for the description of the object extraction algorithm, and MAstands for the description of the object matching algorithm.
We adopt the centroid-radii model
Fshape= ( objID, Centriod, Radii) = in Section Ⅲ-A, the 9DLT extended model Fspace= ( objID, Flocal, Fdirec) in Section Ⅲ-C, and the calculation value of the different direction angle energy Ftexturein Section Ⅲ-B. We choose parameters that refer to the methods in the literature . The color feature of an object can be expressed as Fcolor= ｛ μcolor, σcolor｝, through the mean value and the mean square deviation of its color.
Through this model, we can express the content of multiple targets in an image by using multiple records to represent a single object, such as color, shape, and texture features. Then, a logical expression is implemented by the spatial relationship among objects. Thus, we transform the MTR problem into a record-querying problem to enable the image indexing technology to further accelerate target retrieval in CBRSIR.
[Fig. 1.] Model of the centroid radii. (a) Resampled polygon with θ interval around, and counterclockwise to, the y axis; and (b) expression of the resampling result.
[Fig. 2.] Representation of nine-direction lower-triangular. (a) Nine-direction code and (b) symbolic figure of the object.
[Fig. 3.] Map of the matrix expression in nine-direction lowertriangular (9DLT). (a) Direction map in grids between objects and (b) matrix expression of four objects in 9DLT.
[Fig. 4.] Generation of candidate 3-pattern from 2-pattern.
[Fig. 5.] Map of the candidate matrix to match the threshold.
[Fig. 6.] Model of representation of feature objects.