An ObjectLevel Feature Representation Model for the Multitarget Retrieval of Remote Sensing Images
 DOI : 10.5626/JCSE.2014.8.2.65
 Author: Zeng Zhi, Du Zhenhong, Liu Renyi
 Organization: Zeng Zhi; Du Zhenhong; Liu Renyi
 Publish: Journal of Computing Science and Engineering Volume 8, Issue2, p65~77, 30 June 2014

ABSTRACT
To address the problem of multitarget retrieval (MTR) of remote sensing images, this study proposes a new objectlevel feature representation model. The model provides an enhanced application image representation that improves the efficiency of MTR. Generating the model in our scheme includes processes, such as objectoriented image segmentation, feature parameter calculation, and symbolic image database construction. The proposed model uses the spatial representation method of the extended ninedirection lowertriangular (9DLT) matrix to combine spatial relationships among objects, and organizes the image features according to MPEG7 standards. A similarity metric method is proposed that improves the precision of similarity retrieval. Our method provides a tradeoff strategy that supports flexible matching on the target features, or the spatial relationship between the query target and the image database. We implement this retrieval framework on a dataset of remote sensing images. Experimental results show that the proposed model achieves competitive and highretrieval precision.

KEYWORD
Remote sensing , Image processing , Spatial representation , 9DLT , Contentbased remote sensing image retrieval

Ⅰ. INTRODUCTION
Along with the rapid progress of satellite sensor technology and their application to highresolution remote sensing images in Earth observation systems, a large amount of remote sensing data have become readily available for acquisition. In terms of spatial information, terrain geometry, and texture information, highresolution remote sensing images have more features than middle or lowresolution images. To use the image database fully and to retrieve interesting information automatically and intelligently, a new efficient technology for multitarget retrieval (MTR) in an image, particularly in a specified region, is expected to be developed.
The number of imageprocessing applications for target retrieval is increasing, such as query by image content from IBM [1]. Most studies in this area have focused on contentbased image retrieval (CBIR) and contentbased remote sensing image retrieval (CBRSIR), and have achieved significant results. In these processes, the contents of an image, which specify several lowlevel features, such as color, texture, shape, longitude and latitude, and spatial relationships among objects, are the bases of multidimensional image feature vectors. Regarding the differences in imaging conditions of various forms of remote sensing images, we cannot exactly express image contents by using only a single feature. Therefore, constructing the comprehensive features of an image is the key to improving extraction performance [2]. However, if the combined features cannot be purified to form a unified model, then the accuracy of the similarity extraction and efficiency improvement of the images will be affected. For example, if we focus more on spatial relationship, then the detail of each target will be minimal. Thus, the efficiency of MTR will be higher than comparing the features of a single object. To effectively reveal the information retrieval process for remote sensing images, an objectlevel model is proposed, which can represent the contents of an image with overall accuracy. By using this model, we can retrieve and operate the information prestored in a symbolic image database (SID) with high efficiency, and neglect intrinsic information, such as color, texture, and shape. To date, research on feature representation models of image data for MTR remains limited. To build feature indices and to realize rapid retrieval, we propose an objectlevel feature representation model, based on a previous research on CBRSIR and the reference for MPEG7 standards, starting with representing the contents of an image on an objectlevel feature, particularly the spatial relationship among targets.
The rest of this paper is organized as follows. Section Ⅱ discusses related literature on representation techniques of image contents for CBIR or CBRSIR. Section Ⅲ introduces calculation and representation feature values, and mainly describes the spatial representations of the extended ninedirection lowertriangular matrix (9DLT). Section Ⅳ presents a model of image content feature representation. Section Ⅴ proposes an MTR model and similarity calculation. The last section presents several experiments to validate the accuracy of the contentbased feature representation model and the efficiency of image target extraction. A conclusion to the study is also presented in this section.
Ⅱ. RELATED STUDIES
In the past three decades, academia has achieved a large number of results on CBIR and CBRSIR. At present, CBIR has many successful applications in the fields of facial recognition, medical diagnosis, and trade registration. Most of these systems have adopted single feature or combined features as image indices [310]. CBRSIR is similar to CBIR, because both contain visual and geographic information. Several systems have focused on the issue of spectral retrieval, such as texture representation, and different combinations with spectral bands [11]. A special feedback approach has been employed to precisely describe the desired search characteristic in a remote sensing image [12]. Some researchers even presented a code stream of images for remote sensing image retrieval [13]. In addition, other scholars combined a scheme with an automatic classifier, and proposed the use of new feature ‘texton histograms’, to capture the weaktextured characteristic of remote sensing images for image retrieval [14]. Meanwhile, others applied a texture analysis approach, called the local binary pattern operator, to implement image retrieval [15]. Some of these studies even applied independent component analysis to extract independent components of feature values via linear combinations to realize multispectral image retrieval [16]; or adopted principal component analysis and a clustering technique to index remote sensing images for image retrieval [17]. Considering various features, such as color, texture, and spectra, a prototype model for CBRSIR based on color moment and gray level cooccurrence matrix feature was proposed [18]. A number of researchers combined several properties (color, texture, and points of interest) that were automatically extracted and immediately indexed images [19]. In addition, some researchers proposed a framework based on a domaindependent ontology to perform semantic retrieval in image archives [20]. Other scholars also presented a universal semantic data model for image retrieval [21]. Regardless of how a feature vector is established, this vector still depends upon the representation of contents in images. To date, the contents of images can be represented in numerous ways. Some approaches adopt a quadtree structure or a quintree method that splits largescale remote sensing images into subimages, to extract multiple features, such as color and texture [22, 23]. Others use the 2D Cstring to represent spatial knowledge of an image database [24]; or the spanning representation of an object to realize spatial inference and similarity retrieval in an image database, through directional relation referenced frames [25]. Others depict the relationships among spatial objects by using the methods of the ninedirection spanning area [26] or 9DLT [27]; and represent image colors by using pyramid technology [28]; or express an image by employing a symbol index, which is established in image space stratification [29]. All the aforementioned related representations include color, space, and subimages that belong to the feature representation method on image contents. Implementing rapid and accurate retrieval with a massive remote sensing image is difficult, because its features include various data types, resolution scales, and data sources. In our investigation, we analyze the contents of an image based on the MPEG7 standard to organize the features of the image, build an SID, and index the SID to accelerate target retrieval.
Ⅲ. CALCULATING FEATURE VALUE
The key to improving image retrieval efficiency is the index technique, which involves obtaining objects after image segmentation and building an SID for the image database. In the present study, we adopt a mature algorithm, called objectoriented multiscale image segmentation. The objectoriented image processing algorithm is a synthetic algorithm that fuses spectrum characteristics, geometric information, and structural information. This algorithm regards an object as a minimal process unit by retrieving its multiple characteristics to form a logical relationship among images and objects. Then, we analyze the image from the local to the entire level, and ultimately, implement its understanding. In general, multiscale image segmentation begins with any pixel by using a region merging method, from the bottom to the top, to form objects. Small objects can be merged to form large objects, and the size of each object must satisfy the demand of which the heterogeneity of a merged object is less than a given threshold. In this case, heterogeneity is decided by differences in the spectra and shapes of objects. However, various features correspond to different scales of observation, in which each feature can be extracted and accurately analyzed in an image layer on a proper scale [30]. In particular, we use the threshold value method on a multiple scale to segment an image.
After processing the calibration, segmentation, and raster vectorization of the image based on a specified region of latitude and longitude, the basic unit of the image is no longer a single pixel, but a polygon that is composed of homogeneous pixels. Each polygon can be used to calculate the spectral information of pixels, including shape, texture, color information, and topological relationships among the polygons. Next, we will introduce the method for calculating the feature vectors to implement the representation model.
> A. Shape
Shape is a key feature used to differentiate two objects. It is also the basis for characteristic retrieval, and the classified process mentioned in the latter part of this paper. In general, in the field of objectlevel content retrieval, shape remains as the most basic feature for distinguishing objects. At present, two approaches are used to describe shapes: parametric and geometric approaches. In the present investigation, we adopt a geometric approach to characterize the shapes of different objects, namely, the model of centroid radii representation [31].
For an arbitrary polygon, such as the one shown in Fig. 1(a), the results of resampling an image with the angle
θ interval around, and counterclockwise to the y axis, are shown as Fig. 1(b). Letl_{k} be the distance between the centroid of the polygon, and the boundary sampling point. The shape descriptor of the polygon can be expressed by a centroidradius model, as follows:The condition for measuring similarity between two polygons based on shape is: if and only if the numerical difference between the central radii in all directions is less than a given minor threshold value
ε . That is, when two polygons are similar, the shape descriptor must satisfy the following regulation:To ensure scale invariance of shape by using regulation (2), we need to normalize the Euclidean distance between the centroid and each vertex, within the range of [0, 1]. In this study, we discuss most of the possible transformations between two feature vectors. One of these transformations involves the possible rotations between two shapes, and the distances that are independent from rotation, including the starting and ending points.
After transforming image shape into matrix space, we store data, using the antipole tree structure [32].
> B. Texture
Different features in highresolution images typically have similar spectral appearances to human vision. The mean value floating of a spectral feature may also cause similar spectra among different homogeneity samples to converge as similar modes in a feature space, thus resulting in spectra with similar features. This phenomenon is attributed to the human eyes being insensitive to some portion of visible light. Therefore, we can improve the reliability of retrieval results by using features, such as shape, texture, and spatial relationships as references. Texture is a significant geometric (spatial) feature that can be used to distinguish among different objects and regions to reflect the changing discipline in gray space. A 2D Gabor filter is suitable for narrowband coding of texture, because of its adjustable filtering direction, bandwidth, general bandcenter frequency, and optimal timedomain and spatialdomain analysis abilities. After finishing grayscaling and normalization processes on image segmentation, we apply a Gabor filter to extract the texture feature of the objects. The Gabor filter function
g (x, y ) is a 2D Gaussian function that is modulated by the complex sine window functiong (x, y ). Its Fourier transform functionG (u, v ) can be expressed by the following equations:where, , , and σ_{x}, σ_{y} are the Gabor filter spatial range and bandwidth of the frequency domain, respectively. In this case, (
f , 0) is the central frequency in the filter of the orthogonal coordinate in the frequency domain. Letg (x, y ) be a function of the mother that generates the Gabor filter family. The set of functionsg_{m,n} (x, y ), which is a complete nonorthogonal dataset, can be generated through rotation and scaling, according to Eq. (4).where
x ' =a^{−m} (x cosθ_{n} +y sinθ_{n} ), y' =a^{−m} (−x y sinθ_{n} +y cosθ_{n} ), a > 1,θ_{n} =nπ /K ,m = 0, 1, ..., S − 1, andn = 0, 1, ...,K − 1. Parameterθ_{n} is the counterclockwise rotation angle along the filter axis.S, K are the total scale and rotation, respectively. After obtaining the energy value of each filter and the convolution of the image, we calculate the mean value and the mean square deviation of the filtering value on the energy of each object. Finally, we mark the texture feature vector of the object, as shown in Eq. (5).where,
K is the central frequency, andL is the directional angle.k = 0, 1 ...,K − 1,l = 0, 1, ......,L − 1, andE_{k,l} (x, y ) is the filtering energy value of the filter (k, l ). Normalization is required to proceed towardE_{k,l} (x, y ), to ensure that the energy value of each element in the energy information is not affected by the actual size. We commonly useE_{k,l} (x, y ) = to calculate the energy value, according to the gray valuep (x, y ) in localocation (x, y ). Finally, the mean valueμ of the energy, and the mean square deviation σ of the target object (n ×n pixels), can be obtained as Eqs. (6) and (7), respectively.> C. Spatial Representation of the Extended 9DLT
The spatial representation of an image describes the spatial relationships among objects to easily distinguish images with multiple targets. The spatial relationships in an image can be classified into two categories: positional and directional relationships. The former can be represented by a 2D string; whereas, the latter can be represented by 9DLT methods [25]. For a calibrated remote sensing image within the region of a certain latitude and longitude, the directional relationship relative to the four corners among the objects is confirmed. In this section, we introduce problem definitions and preliminary concepts, through formal methods.
DEFINITION 1. Let α = (α_{1}, α_{2}, ..., α_{k}) be a set of objects in the same image. Hence, α_{i} is a subset of α. DEFINITION 2. The spatial relationship between two objects can be defined as one of the codes in nine directions, which is called 9DLT. DEFINITION 3 (The 9DLT matrix).Let V = ｛v_{1}, v_{2}, v_{3}, ..., v_{m}｝ be composed of m distinct sets of objects, and Z be composed of z_{1}, z_{2}, z_{3}, ..., z_{s} in order, where Ɐi = 1, 2, ...,s, z_{i} ∊ SupposeV .C is a collection of 9D encodings, as shown in Fig. 2(a). Each direction code can then be used to specify the spatial relationship between two objects. Thus, a 9DLT matrixT is an s × s matrix that is composed oft_{ij} , which belongs to the collection of 9D encodingC . The itemt_{ij} at rowi columnj represents the direction code fromz_{j} toz_{i} , only wheni andj satisfy the condition .j <i ∈(1,s )As shown in Fig. 2(a), let
R be the referred object expressed by 0, in which we define the direction code in a 45° interval from the northern counterclockwise, as 1…8. Each object from the source image will be represented by one centroid in a 9DLT expression. Fig. 2(b) shows a feature image that contains four objects. Fig. 3(a) exhibits the direction map in the grid between the objects; whereas, the direction code of the LT matrix in Fig. 3(b) demonstrates the spatial relationships among objects. The 9DLT string is (A ,B ,C ,D , 6, 6, 6, 7, 5, 4) in column order. A relationship between two objects exists in the matrix.DEFINITION 4 .A pattern consists of the sets of objects and the spatial relationships among these objects . For example,α = (α _{1},α _{2}, ...,α _{k},αr _{1},αr _{2}, ...,αr _{m}) is a pattern,α _{i} is an object andαr _{j} is the corresponding spatial relationship, where 1 ≤i ≤k ,m = =k (k 1)/2, 1 ≤j ≤m , andk ≥ 2. That is, the spatial relationships between any two objects in this pattern are recorded. The length of a pattern is equal to the amount of objects. A pattern with a length that is equal tok is called thek pattern.Constraints: (1) An item or object in a pattern is stored in alphabetical order.(2) No spatial relationship exists, if the length of a pattern is equal to 1.The 9DLT expression is in accordance with the definition of the pattern.
DEFINITION 5. Patternα = (α_{1} ,α _{2}, ...,α _{i},αr _{1},αr _{2}, ...,αr _{m}) is a subpattern of patternβ = (β 1,β 2, ...,βj ,βr 1,βr 2, ...,βr _{n}), where (α_{1} ,α _{2}, ...,α _{i}) is a subset ofβ 1,β 2, ...,βj ), wherej ≥i ≥ 2. The spatial relationship between any two items in α is the same as in β. Pattern β contains pattern α, where.β ⊇α The amount of a subpattern isN =C _{i}^{i} +C _{i}^{i−2} + ··· +C _{i}^{2}. For example, pattern α = (A ,B ,C , 6, 6, 7) is a subpattern of pattern β = (A, B, C,D , 6, 6, 6, 7, 5, 4), because (A ,B ,C ) is a subset of (A ,B ,C ,D ), and the code values of the spatial relationship of objectsA ,B , andC are the same as the code values underlined in patternβ .DEFINITION 6. The minimum support is the amount of objects that satisfy the spatial relationships, which is equal to the required amount of search objects .Inference 1. Twok patterns can be joined, only ifk −1 objects and the corresponding relationships between them are the same, andk satisfies conditionk ≥2.Inference 2. Suppose a pattern does not contain any (k −1) pattern; then, this pattern cannot be contained in thek pattern.Inference 3. The pattern of feature images and their specific subpattern can be obtained from a 9DLT string. By contrast, if (k −1)pattern,k −1 objects, and the spatial relationships in the object sets are given, then the relative candidate sets of thek pattern can be acquired.Generating candidate sets can significantly help object retrieval. To extract the image with the object (
A ,B ,C ) of minimum support 3 and spatial relationship αr in the image database, two 2patterns are required, namely, (A ,B , 4) and (A ,B , 5), in which the same object A belongs to both patterns, and satisfies the joining condition. Then, we can calculate the candidate 3pattern (A ,B ,C , 4, 5, Δ). As shown in Fig. 4, the possible results are (A ,B ,C , 4, 5, 7), (A ,B ,C , 4, 5, 8), and (A ,B ,C , 4, 5, 6). The direction codes of the possible relationship betweenB andC are 7, 8, and 6; therefore, the spatial representation model isABC (4, 5,X : ｛7, 8, 6｝).Similarly, the 9DLT string of each image is known in the image database. That is, the spatial relationship between objects has been confirmed, and the problem of finding all images that satisfy minimum support is the process of matching patterns. In fact, the process can be converted to search the LT matrix with a problem on inclusion relationship. As shown in Fig. 5, according to the difference of the given objects and the minimum support, the position of matching matrix
P in the LT matrix may only be a part of the relation direction codes. The range of mapping to candidate matrixC is also ink ×k .The description of the match algorithm is as follows.
Ⅳ. THE OBJECTLEVEL FEATURE REPRESENTATION MODEL
Typically, a data model is a framework that can be used to provide representation for information, and an operation method in the database system. The objectlevel feature representation model belongs to a section of this data model. For remote sensing images, the data model also includes metadata, such as location, resolution, and light intensity. However, the standard of measurement for a content retrieval system determines the efficiency and accuracy of extraction. Hence, each image needs a good model with an efficient contentbased representation. Moreover, selecting a formula for similarity calculation is also vital. Based on this concept, we present the objectlevel feature representation model for the image data in the next section.
According to MPEG7 standards and the objectoriented concept, the objectlevel feature representation model for image data is described as a structural tree via layers [33]. As Fig. 6 shows, the first layer is the object name, while the second layer is the feature name of the feature information that the object contains. Further down are the layers for subfeatures, feature attributes, attribute values, etc. Constructing this structural tree is convenient for indexing feature information.
The overall model of the feature image can be represented by a formal method, as follows:
where,
EA stands for the description of the object extraction algorithm, andMA stands for the description of the object matching algorithm.We adopt the centroidradii model
F_{shape} = (objID ,Centriod ,Radii ) = in Section ⅢA, the 9DLT extended modelF_{space} = (objID ,F_{local} ,F_{direc} ) in Section ⅢC, and the calculation value of the different direction angle energyF_{texture} in Section ⅢB. We choose parameters that refer to the methods in the literature [34]. The color feature of an object can be expressed asF_{color} = ｛μ_{color} ,σ_{color} ｝, through the mean value and the mean square deviation of its color.Through this model, we can express the content of multiple targets in an image by using multiple records to represent a single object, such as color, shape, and texture features. Then, a logical expression is implemented by the spatial relationship among objects. Thus, we transform the MTR problem into a recordquerying problem to enable the image indexing technology to further accelerate target retrieval in CBRSIR.

[Fig. 1.] Model of the centroid radii. (a) Resampled polygon with θ interval around, and counterclockwise to, the y axis; and (b) expression of the resampling result.

[Fig. 2.] Representation of ninedirection lowertriangular. (a) Ninedirection code and (b) symbolic figure of the object.

[Fig. 3.] Map of the matrix expression in ninedirection lowertriangular (9DLT). (a) Direction map in grids between objects and (b) matrix expression of four objects in 9DLT.

[Fig. 4.] Generation of candidate 3pattern from 2pattern.

[Fig. 5.] Map of the candidate matrix to match the threshold.

[Fig. 6.] Model of representation of feature objects.