Query by Visual Example: A Comparative Study of the Efficacy of Image Query Paradigms in Supporting Visual Information Retrieval

Venters Colin C.

OA학술지
Journal of Information Management

Query by Visual Example: A Comparative Study of the Efficacy of Image Query Paradigms in Supporting Visual Information Retrieval

Author: Venters Colin C.
Organization: Venters Colin C.
Publish: Journal of Information Management Volume 42, Issue3, p71~94, 30 July 2011

ABSTRACT

Query by Visual Example: A Comparative Study of the Efficacy of Image Query Paradigms in Supporting Visual Information Retrieval

시각적 실례에 의한 질의는 내용기반 이미지 검색 환경에서 질의 표현을 위한 중요한 질의 패러다임이다. 이미지 및 스케치에 의한 질의는 질의표현을 가능하게 하는 방법으로서 오랫동안 알려졌다. 하지만 이 방법이 질의를 쉽게 작성하는 데 얼마나 도움을 주는지에 대한 효율성에 대한 실험적 입증은 아직 미미하다. 정보검색시스템에 표현하는 탐색자의 능력은 검색과정의 기본이다. 이 연구의 목적은 탐색자의 정보 문제와 효율적이고도 효과적인 시각적 질의 작성을 지원하기 위해 필요한 질의 방법들 간의 지식 격차의 원인이 되는 다양한 정보요구를 지원하는 데 있어서 유용성 실험을 통해 이미지에 의한 질의와 스케치방법에 의한 질의 조사하기 위함이었다. 본 연구 결과는 이미지에 의한 질의가 시각적 질의 작성에 실행 가능한 접근방식임을 제시한다. 반면에, 본 연구결과를 통해 탐색자의 정보 문제와 시각적인 질의 작성에 도움을 주는 스케치 패러다임에 의한 질의표현 능력 간에 상당한 불일치가 있다는 것을 알 수 있다. 효율(시간)과 유효성(오류)에 초점을 둔 유용성 실험결과와 이용자의 만족도는 큰 차이점이 있다고 보여준다(p<0.001). 이는 다음 세 가지 측정(시간, 오류, 이용자의만족도)에 대한 두 가지 질의 방식(이미지에 의한 질의, 스케치에 의한 질의) 사이에서 나타난 시간(Z=-3.597,p<0.001), 오류(Z=-3.317, p<0.001), 그리고 만족도(Z=-10.223, p<0.001)에서 드러난다. 본 연구결과는 또한질의도구를 참가자가 인지하는 유용성에 큰 차이가 있다는 것을 보여준다(Z=-4.672, p<0.001).

KEYWORD

Content-based Image Retrieval , Image Retrieval , Query by Image , Query by Visual Example , Query by Sketch , Visual Information Retrieval , Usability

본문

Collapse all

1. Introduction

Advances in technology have been the catalyst for an unprecedented increase in the volume of institutional and personal image data. However, while there have been significant technological advances with image data capture and storage, developments in effective image retrieval have not kept pace. To address this, research into the retrieval of image data in the last two decades has focused primarily on the use of visual properties and characteristics; commonly referred to as content-based image retrieval.

Within this field, a number of key research areas have been identified that are advocated as crucial to its advancement(Gudivada and Raghaven 1995; Rui, Huang, and Chang 1999;Hanjalic, Sebe, and Chang 2006; Datta et. al.2008). Despite a long recognition as a key area of research, query interfaces to support effective information retrieval including their expressive power and the impact of the query on retrieval performance, especially with real end users, has still to receive serious attention by the research community(Flickner et. al.1997). Jaimes, Sebe and Gatica-Perez(2006) argue that in order to gain acceptance, emerging image retrieval systems need to acquire a human-centered perspective.

Jaimes and Chang(2002) state that content-based image retrieval systems should consist of three basic components at the highest level of abstraction: the database, the indexing information, and the user interface.When users’ conduct searches, they interact with the system to express their information requirements through a user interface. They suggest that two important aspects of the user interface are its expressive power and its ease of use; where expressive power relates to what can be expressed using the language and ease of use is concerned with how difficult it is for the user to formulate the desired query using the language. This paper reports the results of an investigation that aimed to explore the efficacy of the query by image and query by sketch methods in supporting a range of information problems. Section two provides the background to the research outlining the query paradigms and related experimental evaluation studies. Section three outlines the methods and procedures used in the experiment. Section four presents the results that emerged from the experiment. Section five discusses the implications of the results and draws conclusions.

2. Research Background

The origins of content-based image retrieval lie in the fields of artificial intelligence,computer vision, image processing, pattern recognition, and signal processing. Advancement of the approach has been attributed to the early experiments conducted by Kato(1991) into the automatic retrieval of images by colour and shape feature. The process involves a direct matching operation between a query image and a database of stored images. For each image a feature vector is computed for its unique visual properties

Visual Feature Representation

where f is the feature vector, I is the representation of the image content, and d is the dimensionality. Similarity is calculated by comparing the feature vector of the query image against the feature vectors of images stored in a database using a distance model such as Euclidean distance. Other commonly used metric distance functions include Kullback-Leibler divergence, Mahalanobis distance, and minkowsky distance. The most common image matching feature characteristics are colour, shape, and texture. A number of review papers have surveyed the feature and matching techniques utilized in the process of content-based image retrieval and readers are directed to those publications for detailed descriptions of the techniques and approaches used(Idris and Panchanathan 1997; Del Bimbo 1999; Lew and Huang 2001; Castelli and Bergman 2002; Feng, Siu, and Zhang 2003). Nevertheless, the ability to provide effective and efficient image retrieval using this approach has proved challenging and its efficacy as a primary retrieval method is open to debate.

   2.1 Query by Visual Example

Image retrieval is the process of identifying relevant images from a database that satisfies an information need. In the context of this research, an information need is a requirement to find relevant information in response to a searcher’s specific information problem. This is formally expressed as a query Qn using the input language provided by the retrieval system. To provide both structure and access mechanisms to image data, concept-based image retrieval queries are expressed as keywords using either controlled or uncontrolled vocabularies supported by the retrieval system.In contrast, content-based image retrieval requires searchers to express their information problems as visual queries. Query by visual example(QVE) is the general term used to describe the overall approach and was first coined by Hirata and Kato(1992) to describe the interaction paradigm of their sketch query tool developed for the Art Museum image retrieval system developed by Kato, Kurita and Shimogaki(1989). Under this umbrella term,a number of query paradigms have been proposed for specifying a query Qn in an image database I. Del Bimbo(1999) and Lew and Huang(2001) distinguish between different methods of QVE including query by browsing, query by icon, and query by colour. This research explores the usability of the query by image and query by sketch methods, which are outlined in Section 2.1.1 and Section 2.1.2.

To initiate a content-based image retrieval search, the user provides, selects or creates the visual representation of their information need. Smeulders et. al.(2000) defined an abstract query space to represent user interaction in a content-based image retrieval system:

Abstract Query Space

where I_Q is the selection of images from the database I, F_Q is the selection of features derived from the images in I_Q, S_Q is the similarity function, and Z_Q is a set of labels to capture the goal dependent semantics. Rui and Huang(2001) proposed an alternative model to represent the query(Equation 3). They state that an image object O can be modeled as a function of the image data D, features F, and representations R:

Image Object Model

where D is the raw image data, F is a set of visual features associated with the image object, such as colour, texture or shape, and R is a set of representations for a given feature such as colour histogram or colour moments. However, they suggest that specifying object O, feature F, and R representation level weights imposes a dual burden on the user of the retrieval system and its developers(<Figure 2>). For example, specifying the object level weights U imposes a burden on the user, as it requires them to have comprehensive knowledge of the visual features used in the retrieval system and how they relate to their information need. Similarly,specifying features and their representation level weights V_i and W_ij imposes a burden on system developers who do not know which r_ij and r_ijk match users’ perceptions of the image content.

In general, the majority of content-based image retrieval systems exemplify the interaction model proposed by Rui and Huang(2001). This approach is generally considered sufficient for the formulation of an initial query as it leads to a set of results which, it is suggested, can be iteratively refined to satisfy the searcher’s information need. Yoshitaka and Ichikawa(1999) state that QVE provides the user with an intuitive method of query representation and a form of expressing a query condition that is close to that of the information problem. In contrast, Boujemaa, Fauqueur, and Gouet(2004) argue that while QVE has provided a framework for visual information retrieval research, it does not support the rich variety of visual search strategies required for effective image retrieval. However, there is little empirical evidence to support either of the polarised positions concerning the efficacy of QVE in supporting the formulation of visual queries in a content-based image retrieval environment.

Eidenberger and Breitender(2002) argue that user interfaces for content-based image retrieval systems are overly complicated and that casual users would be overtaxed by the demands of selecting similarity measures, retrieval features, and setting of weights. They suggest that in order to improve the acceptance of future systems, simpler user interfaces are required, proposing a framework for their design as part of the VizIR project; an open-framework to develop a Java API for visual information retrieval. The framework includes a description of the visual components and their class structure, and the communications between panels, visual components and the query engines based on the Multimedia Retrieval Mark-up Language(MRML) proposed by Muller et. al.(2000). The proposed framework implements two methods of the QVE paradigm, which they consider are the most intuitive: query by image and query by sketch. While there is considerable merit in their proposal to develop a framework for user interfaces for visual information retrieval, their decision to implement the two query methods does not take into account whether these methods support the formulation of searcher’s information problems.

2.1.1 Query by Image

Query by image allows the user to provide an example image as a representation of their query and is the most commonly implemented query method for content-based image retrieval. The example can be either an image selected externally or internally from the system. Di Lecce and Guerriero(1999) characterize this as query by external image example and query by internal image example. With query by external image example the searcher provides an external example image to the system. This method is generally perceived as the simplest approach to query formulation but it relies heavily on the end user having a suitable representative image to use as the basis of their query in the first instance. In contrast, query by internal image example allows the searcher select an image from the system’s database and is a form of browsing. Typically, the user is presented with a randomly generated list in an n-dimensional matrix and then iteratively refines the retrieval results by selecting images that satisfy their information need. Systems that adopt this approach automatically compute a similarity ranking on image features in the set and disregard the least similar images after each iteration until the user is left with a set of relevant images(Laaksonen et. al. 2000; Vendrig, Worring and Smeulders 2001).

Other approaches to query by internal image example include whether the data is unstructured or structured. With an unstructured approach, the searcher is presented with a complete view of all the images contained in the database and selects images that satisfy their information requirements. The view and structure of the image database can be modified using a relevance feedback mechanism. An early example of an unstructured browsing technique was proposed by Gecsei and Martin(1989). They suggested that users could progressively refine their search to one or two highly relevant surrogates with this approach. However, this was not demonstrated empirically. Similar approaches were proposed by Santini and Jain(1997) and Nakazato and Huang(2001) who extended this to 3D feature spaces including a 3D projection-based immersive VR using NCSA CAVE. However, they both acknowledge the difficulty of this approach in locating a suitable image to use as the basis of the initial query since no knowledge about the past or anticipated use of the system is available, the initial query space Q⁰ should not be biased toward specific images or make some image pairs a priori more similar than others (Smeulders et. al. 2000). With a structured approach, images are clustered into predefined, hierarchal classes based on a visual property or characteristic such as colour, shape or texture. The images are presented as thumbnails and the user can move through the hierarchy until satisfied. An example of this approach was proposed by Lai(2000) who clustered images into hierarchies using a set of ten colours based on an extended perceptual colour model proposed by Berlin and Kay(1991).

The concept of browsing has long been recognized in the field of information retrieval as a fundamental part of human information seeking behaviour(Wilson 1981). Marchionini(1995) states that browsing is an approach to information seeking that is both informal and opportunistic. Frost et al.(2000) suggest that one of the primary attractions of browsing is that it allows users to recognise what is interesting rather than formulating a precise information query in advance. Evidence suggests that on average users were willing to browse through approximately one hundred thumbnail images before refining the search parameters(2000). Nevertheless, the viability of browsing as an access strategy is heavily dependent on the size of the image collection and the effectiveness of any classification by which a physical order is imposed(1995).

2.1.2 Query by Sketch

Query by sketch allows the user to draw an approximate outline of the desired image with a graphic editing tool provided by the retrieval system. The sketch can represent either a completed object or scene. Kato et. al.(1989) were the first to provide a query by sketch interface for formulating visual queries but other examples of this approach have since been proposed by Chuah, Roth, and Kerpedjiev(1997), Muller, Eickeler, and Rigoll(1998), Di Sciascio and Mongiello(1999). Jaimes and Chang(2002) suggest that a sketch is a good way of making a precise request to a system because the user has total freedom in arranging different components when creating the query. In theory, any image could be retrieved using a sketch-based query as the user could potentially draw an exact replica of the image of interest. Nevertheless, Korfhage(1997) argued that queries formulated by this method are simplistic, relatively crude sketches of the desired query image and that the query method had limited functionality for expressing more complex image queries.

To address the limitations of the query by sketch method, Ko and Byun(2002) proposed query by gesture as an alternative. By this method, the user formulates a query by drawing the shape of the object using their finger in front of a CCD(Charge-Coupled Device)camera, which captures finger motion at 160×120 image frames at a rate of five frames per second. After drawing a shape the user can then select colours from a palette and the number of k nearest neighbour retrieved results;the system also supported a modified probabilistic feature relevance feedback mechanism proposed by Ko, Peng, and Byun(2001) which allows the progressive refinement of query results. However, the results of a retrieval experiment using six queries against a database of three thousand images from the Corel stock photography library showed that average retrieval precision was approximately forty-five percent without relevance feedback which increased to sixty-four percent after five iterations. The results suggest that their proposed approach significantly affects the retrieval performance of the system, which is notably poor even after five iterations, and the results are far from encouraging.

Long, Zhand and Feng(2003) suggest that a coarse sketch is sufficient as the query can be refined based on retrieval results. However, it has been demonstrated that the effectiveness of shape matching features are highly sensitive to noise and pixel arrangement where shape similarity is computed by evaluating the correlation between a linear sketch and edge images in the database(Del Bimbo and Pala 1997). As a result, the query image has an adverse effect on the retrieval results.Eakins(1992) argued that the shape drawn by the user must be an accurate representation of the shapes in the database, which he suggests is nearly impossible in practice although this was not demonstrated empirically.

At a theoretical level, query by sketch would appear to be a natural method for formulating visual queries. Despite its theoretical expressive power, the queries are difficult to formulate; drawing is a difficult task, particularly for users with limited skills. Blaser(1997) argues that the task can be simpler for users with some training or practice in drawing but, even then, using a computer interface to draw is usually unnatural. Regardless of whether the user is able to draw a good example to perform the query, the success of the system will depend on how the query is interpreted by the system. No research has been done to examine the effect of real user sketch-based queries on retrieval performance.

3. Experiment

In order to explore the viability of the query methods in supporting visual query formulation a usability experiment was conducted.The aim of the experiment was to investigate to what extent query by image and query by sketch were effective methods for visual query formulation; the experiment was restricted to query by internal example. Retrieval performance was not measured or the focus of this study; the performance of the retrieval engine was investigated by Eakins, Boardman,and Graham(1998), which demonstrated that their approach, albeit for a prototype system under specific laboratory conditions, was capable of providing effective shape retrieval for the majority of potentially citeable images.

This experiment focused on usability as defined in the ISO 9241 standard: effectiveness, efficiency and satisfaction(ISO 1998). Effectiveness is the extent to which a goal or task can be achieved. Indicators of effectiveness include quality of solution and errors. Efficiency is the amount of effort required to complete a task. Indicators of efficiency include task completion time and learning time. Satisfaction is the level of comfort that the user feels when using a product or system and how effective that product or system was in supporting their goals. All three measures of usability were included in the experiment as Frøkjær, Hertzum, and Hornbæk(2000) argue that studies which focus solely on one or two aspects risk making unfounded assertions about the overall usability of the system or product under investigation.

The sample population was drawn from the fourteen UK Patent Information Network(PIN) offices in order to produce a group who shared similar task orientation, motivation, system expertise, and domain knowledge. All were intermediary searchers acting on behalf of clients and who deal with the full range of trade mark information queries from non-specific to specific. The experiment included twenty-eight participants, four(14%) male and twenty-four(86%) female. The age range of participants was twenty-four to fifty eight

and had worked in the field of patent and trade mark information retrieval for periods ranging from less than one year to more than fifteen years. The participants had between one and fifteen years of experience of working with information systems with their level of information retrieval skill being described as either intermediate(50%) and expert(50%). All twenty eight participants had used a text-based image retrieval system and none had used a content-based image retrieval system prior to the experiment.

The experiment was a within-subjects experimental design. Participants were randomly assigned, using a number generator, to condition A or condition B, where condition A was query by image and condition B was query by sketch to counterbalance learning sequence effects. The independent variable was query method. The dependent variables were task completion time, number of errors,and user satisfaction. The user satisfaction dependent also included the dependent variable system capabilities. For the purpose of the experiment, task completion time was defined as the total time taken to complete a task. The number of errors was defined as the total number of errors generated during a task. For the purpose of this experiment two error types were defined: E₁ and E₂. A type E₁ error occurred when a participant could not take a task to completion. A type E₂ error occurred when an action did not lead to progress in performing the desired task.

Two questionnaires were designed to collect data on the sample population and their subjective opinions: a pre-session and post-session.The pre-session questionnaire collected demographic and background data. The postsession questionnaire was a modified version of the questionnaire for user interaction satisfaction(56) and was designed to ascertain the subjective satisfaction of participants. The granularity of the rating scale was modified from its linear scale of 1-9 and NA to a negative and positive scale, as studies have suggested that this scale reading corresponds more accurately to the opinions of users(Hix and Hartson 1993). The scale contained no neutral value in order to reduce the error of central tendency thereby forcing participants’to make a negative or positive choice. Statistics were calculated for measures of central tendency, dispersion, and distribution. The data was analysed using the Wilcoxon signed-rank test at an alpha level of p=0.001.

Two separate sessions were held over consecutive days and each evaluation session lasted approximately one hour. The experiment consisted of six tasks per session: three benchmark and three intervening. Benchmark tasks are primary, representative tasks, which the system is specifically designed to support.Intervening representative tasks are secondary, inter-related tasks that users would be expected to perform, such as printing or saving search results. This class of task is not tested but is suggested as important in adding breadth and

[Figure 1] Query by Image Specific Task

[Figure 2] Query by Sketch Specific Task

depth to the evaluation(Preece et. al. 1994). The tasks were based on specific, semi-specific,and non-specific device mark image queries and took the form of paper-based examples, both final-copy or sketches, and descriptions of image queries. Benchmark tasks one T₁ was a specific query where the participant was presented with a final paper-based copy of a candidate mark image being registered(<Figure 1; Figure 2>).

Benchmark tasks two T₂ was a semi-specific query where the participant had examples of the overall shape and features of interest in the form of a paper-based sketch(<Figure 3; Figure 4>).

Benchmark tasks three T₃ was non-specific query where the participant had a general description of the overall shape or features of the device mark. The query by image task was to find images that contained ‘wreath like shapes.’ Similarly, the query by sketch task was to find an S shape that resembles a car racing track. The benchmark tasks were randomly selected, by a number generator,

[Figure 3] Query by Image Semi-Specific Task

from fifteen hundred real queries submitted to the Patent Information Office and clustered by query type. The tasks were ordered from the specific to the non-specific to provide an increasing level of task difficulty. The number of tasks was restricted by the time constraints of the experiment.

4. Results

The following sections present a summary of the results of the experiment and are organized by the dependent variables of time, errors, and user satisfaction.

[Table 1] Task Completion Time

Task Completion Time

[Figure 4] Query by Sketch Semi-Specific Task

   4.1 Time

Task completion time measured the total time taken to complete a task. <Table 1>shows the mean ( x ), ( σ ), variance ( σ² ), and mean difference ( xdf ) for the dependent variable task completion time.

The results show that task T₁ and T₂ were completed in the shortest period of time using the query by image method with a mean difference of xdf =3.7986 and xdf =4.4868 between the two methods respectively. Task T₃ was completed in a shorter period of time using the query by sketch method with a mean difference of xdf =4.42766 between the two methods. The results also reveal that there was a notable degree of uniformity and consistency in overall task completion time scores for the query by sketch method with a marginal increase on task T₃. However, the variance in task completion time for T₃ using the query by sketch method reveals that the values are widely spread ( σ₂2=9.06671). In contrast, there was a marked increase in task completion time for task T₃ using the query by image method in comparison to the preceding two tasks with a mean difference of xdf =11.15696 between the highest and lowest scores. <Figure 5> and <Figure 6> show the distribution of scores for task completion time highlighting the significant increase in

[Figure 5] Query by Image Task Time

[Figure 6] Query by Sketch Task Time

[Table 2] Wilcoxon Signed Rank Scores for Task Completion Time by Task Type

Wilcoxon Signed Rank Scores for Task Completion Time by Task Type

task completion time for task T₃ using the query by image method and the notable difference in the distribution in time for T₃ using the query by sketch method.

A wilcoxon signed rank test for individual task completion time reveals that there was a significant difference between the two query methods(<Table 2>).

<Table 3> shows the mean x, standard deviationσ , and variance for the combined task completion time values of the query tools. <Figure 7> shows the distribution of data values for the combined scores. The results show that overall, tasks were completed in a shorter

[Table 3] Combined Task Completion Time Scores

Combined Task Completion Time Scores

[Figure 7] Distribution of Combined Task Completion Scores

time using the query by image method. However, the variance score for the query by image method shows the task completion time was much more widely distributed in comparison to the query by sketch method when the data is taken as a whole.

The results of a wilcoxon signed-rank test for the combined task completion time scores revealed a significant difference between query by image and query by shape Z=-3.597, p<0.001.

   4.2 Errors

Two error types were defined for this experiment: E₁ and E₂. A type E₁ error occurred when a participant could not take a task to completion. A type E2 error occurred when an action did not lead to progress in performing the desired task. In this experiment, no E₂ type errors were produced by the sample population. In contrast, eleven E₁ type errors occurred on task T₃ using the query by sketch method, which was only completed successfully by seventeen out of the twenty- eight participants with the other eleven participants abandoning the task. The results of this are discussed in the next section. Tasks T₁ and T₂ were all successfully completed by the twenty-eight participants using both the query methods. A wilcoxon signed rank test for E₂ type errors on individual tasks reveals that

[Table 4] Wilcoxon Signed Rank Scores for E2 Type Errors

Wilcoxon Signed Rank Scores for E2 Type Errors

there was a significant difference between the two query methods on task T₃, Z=-3.317, p<0.001 (<Table 4>).

   4.3 Satisfaction

<Table 5> shows the mean x and standard deviation σ for overall reaction to the query methods. The results reveal that the query by image method was rated positively on all five measures. Query by image was considered wonderful ( x=3.43), satisfying ( x=2.50), stimulating ( x=2.29), easy ( x=3.39) and flexible ( x=3.34). In contrast, the query by sketch method was rated negatively on four out of the five semantic differentials. The method was considered, terrible ( x=-1.68),frustrating ( x=-2.61), dull ( x=-1.32), and

[Table 5] Overall Satisfaction Mean and Standard Deviation

Overall Satisfaction Mean and Standard Deviation

difficult ( x=-3.46). However, the method was rated positively on the rigid-flexible semantic differential ( x=1.25). The difference between the scores for the two query methods on the difficult-easy semantic differential is diametrically opposed with the query by image method being rated highly positively ( x=3.39) and the query by sketch method being rated highly negatively ( x=-3.46). In terms of overall reaction, this data suggests that this sample population considers the query by sketch method insufficient for visual query expression and its ease of use is highly questionable.

<Figure 8> and <Figure 9> show the distribution of scores for overall users reaction to the query methods. The results reveal that there is a clustering between the individual

[Figure 8] Distribution Scores Query by Image

scores within range of one or two points in the inter-quartile range and that the scores are not widely distributed which suggests a degree of consensus between participants’ ratings.

A wilcoxon signed rank test on individual measures shows that there was a significant difference, p<0.001, in user satisfaction between the two tools <Table 6>.

[Table 6] Wilcoxon Signed Rank Scores for Individual Measures of Satisfaction

Wilcoxon Signed Rank Scores for Individual Measures of Satisfaction

[Table 7] Combined Scores for Overall User Satisfaction

Combined Scores for Overall User Satisfaction

[Figure 9] Distribution Scores Query by Sketch

<Table 7> shows the mean x, standard deviation σ, and variance σ² for overall reaction to the query methods. The results reveal that the query by image method.

<Figure 10> shows the distribution of scores for overall user reaction. The results show that there is a tight clustering in the scores for overall satisfaction with the query

[Figure 10] Distribution of Combined Scores for Overall User Satisfaction

[Table 8] Efficacy of the Query Methods

Efficacy of the Query Methods

by image method suggesting a strong degree of consensus in participants’ ratings. Similarly, the distribution of scores for the query by sketch method is within range of one or two points in the inter-quartile range and that the scores are not widely distributed.

The results of a wilcoxon signed-rank test for the combined satisfaction scores revealed a significant difference between query by image and query by shape Z=-10.223, p<0.001.

In addition to overall satisfaction, participants were also asked to consider the efficacy of the query methods. <Table 8> shows the mean x, standard deviation σ, and variance for participants perceived usefulness of the query tools. The results show that query by image was rated positively ( x=2.61). In contrast, query by sketch was rated highly negatively ( x=-3.32).

The results of a wilcoxon signed-rank test for the efficacy scores revealed a significant difference between query by image and query by shape Z=-4.672, p<0.001.

5. Discussion

Query by image performed well across all three measures of usability. From an effectiveness perspective, task completion time scores reveal that specific and semi-specific tasks were completed in a shorter time using this method. However, there was a marked increase for non-specific tasks. This increase in task completion time can be attributed specifically to the level of task difficulty. As the results of the individual task completion time scores <Table 1> show, there was a marked difference between specific and non-specific tasks. This is not a surprising result in itself as the tasks were specifically designed to provide an increasing level of complexity. As a result, it could be predicted that it would take participants longer to complete tasks that had a higher cognitive load as a result of the increased level of subjective interpretation introduced by the task. Whether this can be solely attributed to the complexity of the task and the increasing level of cognitive processing required by the searcher to solve the information problem or whether it is related to limitations of the query method itself is open to discussion.

Query by sketch is a common approach to visual query formulation. However, there is considerable disagreement between researchers as to the efficacy of this method in supporting visual query formulation. It is broadly considered either as an aid or a hindrance to effective and efficient query formulation. Despite being one of the main query methods used in content-based image retrieval prototype systems the results of this study suggest that the query by sketch method does not support visual query formulation. Examination of the indicators of usability shows extremely disappointing performance measures. In terms of satisfaction, participants overall satisfaction ratings for query by sketch were considerably less favourable across all five high-level user interface factors in comparison to query by image. The method was only rated positively on the ‘rigid v flexible’ semantic differential,which suggests that participants still considered that the query method offered some perceived degree of flexibility, at least from a theoretical perspective. However, the utility of the tool to aid visual query formulation is clearly not reflected in their ratings scores for other measures of satisfaction.

Despite its perceived expressive power, it was observed that even specific queries were extremely difficult to formulate. Drawing can be a difficult and challenging task particularly for people with limited natural ability. It was observed that queries formulated by this method were relatively crude, poorly formulated sketches, even when the task showed the exact query image. This supports the position taken by Korfhage(1997) who argued that query by sketch lacked the functionality to express an information problem beyond simple shapes. To what extent this may also be exacerbated by the input device is unclear and a comparison of different methods of sketching or drawing with a mouse and a graphics tablet using pencil and paper as a control would prove insightful. Whether this can be overcome with training and practice as Blaser(1997) suggests is unclear and provides an opportunity for further research to investigate the limits of the approach in supporting visual query formulation. In the context of this study, it was observed that participants found the task of drawing by this method challenging and frustrating. It was also observed that the quality of the query, particularly in terms of its structure, had a notable effect on the retrieval results. This is contrary to this position taken by Long, Zhand and Feng(2003) who suggested that a rough sketch would prove sufficient to produce a set of results, which could be refined. Similarly, while the results of retrieval performance experiments suggest that the retrieval effectiveness of existing prototype systems is extremely encouraging under specific laboratory conditions they also suggest that the quality of the query has an effect on retrieval performance. To what extent it affects the retrieval result is unclear as little empirical evidence exits into the effect of query formulation and its effect on retrieval performance.

For this experiment two error conditions were defined. Query by sketch produced a number of E1 type errors during the experiment where participants could not take a task to completion. This task involved using the query method to formulate a non-specific query with eleven participants abandoning the task. In comparison to the findings of Bird, Elliott, and Hayward(1999), it was noted that participants demonstrated considerable frustration in their attempts to produce a query of sufficient quality and the effect that this had on the retrieval results, blaming their ability to use the tool correctly. Given the expertise of the sample population with information systems and user interfaces, it is interesting to note that they attribute their ability as the single point of failure in the system rather than the usability of the system itself. This observation is not uncommon where users attribute usability problems experienced with a system to their ability to use it effectively. In addition, an error classification with a finer level of granularity, such as that proposed by Norman(1981), may have revealed more interesting data regarding the type of errors experienced by participants during the course of the experiment. Nevertheless, the development of such taxonomy was beyond the scope of this research.

These findings have important implications for the efficacy and utility of content-based image retrieval as an approach although there is a need to further examine the usefulness of both the tools in relation to retrieval performance which will be the focus of future research work.

6. Conclusion

This paper investigated the efficacy of the query by image and query by sketch methods in supporting a range of information problems through a usability experiment in order to contribute to the gap in knowledge regarding the relationship between searchers’ information problems and the query methods required to support efficient and effective visual query formulation. The results of the experiment suggest that query by image is a viable approach to visual query formulation.In contrast, the results strongly suggest that there is a significant mismatch between the searchers information problems and the expressive power of the query by sketch paradigm in supporting visual query formulation.

Content-based retrieval research has generally operated in a vacuum isolated from the real activities and tasks of the searcher focusing primarily on the technical aspects of delivering effective image retrieval. As a consequence,the design of query interfaces for content-based image retrieval has been driven by the underlying retrieval mechanism and not by the requirements of the end user. The importance of the user in the design and the evaluation of content-based image systems have been grossly underestimated by the research community. What the information needs of users of image collections are and how successfully these can be expressed in a content-based image retrieval environment is far from clear. This is a complex field and no single solution or approach will solve the multi-dimensional issues to achieve effective and efficient retrieval of the data. The fusion of work from a diverse range of disciplines,including mathematics and statistics, information and computer science, neuroscience and psychology, will all contribute to the development of accessible image retrieval systems.Nevertheless, research into query interfaces for content-based image retrieval is still in its infancy and substantial research is required to understand the relationship between the user’s information problem and methods of query formulation and refinement.

참고문헌

1. Berlin B, Kay P 1991 Basic Color Terms: Their Universality and Evolution
2. Bird C, Elloitt P. J, Hayward P. M 1999 “Content-Based Retrieval forEuropean Image Libraries.”
3. Blaser A. D 1997 “User Interaction in a Sketch-Based GIS User Interface.” [Lecture Notes In Computer Science] Vol.1329 P.505-506
4. Bordogna G, Pagani. M 2010 “A Flexible Content-based Image Retrieval Model and Customizable System for the Retrieval of Shapes.” [Journal of the American Society for Information Science and Technology] Vol.61 P.907-926
5. Boujemaa N, Fauqueur J, Gouet. V 2004 “What’s Beyond Query by Example?” Lecture Notes in Computer Science
6. Castelli V, Bergman. L. D 2002 Image Databases: Search and Retrieval of Digital Imagery P.1-10
7. Chin J, Diehl V, Norman L 1988 “Development of an Instrument MeasuringUser Satisfaction of the Human-Computer Interface.” P.213-218
8. Datta R, Joshi D, Li J, Wang. J. Z 2008 “Image Retrieval: Ideas Influences and Trends of the New Age.” [ACM Computing Surveys] Vol.40
9. Del Bimbo A, Pala. P 1997 “Visual Image Retrieval by Elastic Matching of User Sketches.” [IEEE Transactions on Pattern Analysis and Machine Intelligence] Vol.19 P.121-132
10. Di Lecce V, Guerriero. A 1999 “An Evaluation of the Effectiveness of Image Features for Image Retrieval.” [Journal of Visual Communication and Image Representation] Vol.10 P.351-362
11. Eakins J. P 1992 “Pictorial Information Systems:Prospects and Problems.” P.102-123
12. Eakins J. P, Boardman J. M, Graham. M. E 1998 “Similarity Retrieval of Trademark Images.” [IEEE MultiMedia] Vol.5 P.53-63
13. Eidenberger H, Breiteneder C 2003 “VizIR: A Framework for Visual Information Retrieval.” [Journal of Visual Languages and Computing] Vol.14 P.443-469
14. Enser P. G. B 1995 “Progress in Documentation Pictorial Information Retrieval.” [Journal of Documentation] Vol.51 P.126-170
15. Feng D, Siu W. C, Zhang H. J 2003 Multimedia Information Retrieval and Management: Technological Fundamentals and Applications P.1-26
16. Flickner M, Sawhney H, Niblack W, Ashley J, Huang Q, Dom B, Gorkani M, Hafner J, Lee D, Petkovic D, Steele D, Yanker. P 1995 “Query by Image and Video Content: The QBIC System.” [IEEE Computer] Vol.28 P.23-32
17. Frøkjær E, Hertzum M, Hornbæk K 2000 “Measuring Usability: Are EffectivenessEfficiency and SatisfactionReally Correlated?” P.345-352
18. Gecsei J, Martin. D 1989 “Browsing Assess to Visual Information.” [Optical Information Systems] Vol.9 P.237-241
19. Gudivada V. N, Raghaven. V. V 1995 “Content-based Image Retrieval Systems.” [IEEE Computer] Vol.September 1995 P.20
20. Hanjalic A, Sebe N, Chang E. Y 2006 “Multimedia Content Analysis Managementand Retrieval: Trends and Challenges.”
21. Hirata K, Kato T 1992 “Query byVisual Example.” P.56-71
22. Hix D, Hartson. H. R 1993 Developing User Interfaces: Ensuring Usability Through Product and Process
23. Idris F, Panchanathan S 1997 “Review of Image and Video Indexing Techniques.” [Journal of Visual Communication and Image Representation] Vol.8 P.146-166
24. 1998 ISO 9241
25. Jaimes A, Chang S-F 2002 “Concepts and Techniques for Indexing Visual Concepts.” In: Image Databases: Search and Retrieval of Digital Imagery P.497
26. Jaimes A, Sebe N, Gatica-Perez D 2006 P.855-864
27. Kato T 1991 “Intelligent Visual Interaction with Image Database Systems: Toward the Multimedia Personal Interface.” [Journal of Information Processing] Vol.14 P.134-143
28. Kato T, Kurita T, Shimogaki H 1989 “Multimedia Interaction with ImageDatabase Systems.” P.271-278
29. Kimia B. B, Castelli V, Bergman L. D 2002 “Shape Representation for Image Retrieval.” In: Image Databases:Search and Retrieval of Digital Imagery P.345-345
30. Ko B. C, Byun H 2002 “Concepts and Techniques for Indexing Visual Concepts.” In: Image Databases: Search and Retrieval of Digital Imagery [Journal of Visual Languages and Computing] Vol.13 P.375-390
31. Ko B. C, Peng J, Byun H 2001 “Region-Based Image Retrieval Using Probabilistic Feature Relevance Learning.” [Pattern Analysis and Applications] Vol.4 P.174-184
32. Korfhage R. R 1997 Information Storage and Retrieval
33. Laaksonen J, Koskela M, Laakso S, Oja E 2000 “PicSOM: Content-based Image Retrieval with Self-organizing maps.” [Pattern Recognition Letters] Vol.21 P.1199-1207
34. Lai T-S 2000 “CHROMA: A Photographic Image Retrieval System.”
35. Lew M. S, Huang. T. S, Lew M. S 2001 “Visual Information Retrieval: Paradigms Applicationsand Research Issues.” In: Principles of Visual Information Retrieval Advances in Pattern Recognition P.3-9
36. Loncaric S 1998 “A Survey of Shape Analysis Techniques.” [Pattern Recognition] Vol.31 P.983-1001
37. Preece J, Rogers Y, Sharp H, Benyon D, Holland S, Carey T 1994 Humancomputer Interaction
38. Mingqiang Y, Kidiyo K, Joseph R, Yin P-Y 2008 A Survey of Shape Feature Extraction Techniques. Pattern Recognition Techniques Technology and Applications P.43-90
39. Muller W, Muller H, Marchand-Maillet S, Pun T, Squire D. M, Pecenovic Z, Giess C, de Vries A. P 2000 “MRML:An Extensible Communication Protocolfor Interoperability and Bench-markingof Multimedia Information RetrievalSystems.” P.300-311
40. Nakazato M, Huang T. S 2001 “AnInteractive 3d Visualization for Content-based Image Retrieval.” P.44-47
41. Rui Y, Huang T. S, Lew M. S 2001 “Relevance Feedback Techniques.” In: Principles of Visual Information Retrieval P.219-258
42. Santini S, Jain R 1997 “Image Databasesare not Databases with Images.” [Proceedings of the 9th InternationalConference on Image Analysis andProcessing.] Vol.II P.38-45
43. Smeulders A. W. M, Worring M, Santini S, Gupta A, Jain. R 2000 “Contentbased Image Retrieval at the End of the Early Years.” [IEEE Transaction on Pattern Analysis and Machine Intelligence] Vol.22 P.1349-1380
44. Veltkamp R. C, Hagedoorn. M 2001 “State of the Art in Shape Matching.” In: Principles of Visual Information Retrieval P.87-120
45. Vendrig J, Worring M, Smeulders. A. W. M 2001 “Filter Image Browsing:Interactive Image Retrieval by Using Database Overviews.” [Multimedia Tools and Applications] Vol.15 P.83-103
46. Wilson T. D 1981 “On User Studies and Information Needs.” [Journal of Documentation] Vol.37 P.3-15
47. Yoshitaka A, Ichikawa. T 1999 “A Survey on Content-Based Retrieval for Multimedia Databases.” [IEEE Transactions on Knowledge and Data Engineering] Vol.11 P.81-93