Fast and Accurate Rigid Registration of 3D CT Images by Combining Feature and Intensity
 DOI : 10.5626/JCSE.2012.6.1.1
 Author: June Naw Chit Too, Cui Xuenan, Li Shengzhe, Kim Hakil, Kwack KyuSung
 Organization: June Naw Chit Too; Cui Xuenan; Li Shengzhe; Kim Hakil; Kwack KyuSung
 Publish: Journal of Computing Science and Engineering Volume 6, Issue1, p1~11, 30 March 2012

ABSTRACT
Computed tomography (CT) images are widely used for the analysis of the temporal evaluation or monitoring of the progression of a disease. The followup examinations of CT scan images of the same patient require a 3D registration technique. In this paper, an automatic and robust registration is proposed for the rigid registration of 3D CT images. The proposed method involves two steps. Firstly, the two CT volumes are aligned based on their principal axes, and then, the alignment from the previous step is refined by the optimization of the similarity score of the image’s voxel. Normalized cross correlation (NCC) is used as a similarity metric and a downhill simplex method is employed to find out the optimal score. The performance of the algorithm is evaluated on phantom images and knee synthetic CT images. By the extraction of the initial transformation parameters with principal axis of the binary volumes, the searching space to find out the parameters is reduced in the optimization step. Thus, the overall registration time is algorithmically decreased without the deterioration of the accuracy. The preliminary experimental results of the study demonstrate that the proposed method can be applied to rigid registration problems of real patient images.

KEYWORD
Image registration , Computed tomography , Principal axes , Normalized cross correlation

I. INTRODUCTION
The goal of image registration is to find out the geometric transformation of two or more images of the same scene that are taken at different instances, from different sensors or from different viewpoints and align them into the same spatial coordinate. The alignment is done by taking one image as
reference and the other one as afloat (image to be registered) and then, transform the float image to the same spatial coordinate of the reference image. It is one of the essential processes for different application fields in image processing such as medical imaging or remote sensing [1]. Medical images that are obtained from different modalities have different characteristics and properties. Integration or subtraction of images that are obtained from different modalities or the same modality is often desired for the evaluation of surgical or clinical diagnostic setting and other treatment problems. This is a critical task for medical image applications, as a registration technique is needed to complete this task for consistent and repeatable analyses.Many registration techniques have been developed for the various medical imaging problems. The development of effective registration techniques has been demanded for efficient clinical applications. Registration can be divided into rigid or nonrigid, according to the nature of the transformation [2]. Rigid registration is simpler, as its transformation in three dimensions is defined by six parameters: three for rotations and three for translations. Conversely, nonrigid registration is more complex. Besides, rotations and translations it involves voxeldependent distortion. Although rigid transformation is simple compared to nonrigid transformation, it is a crucial task for registering rigid structures such as bones, and it is often performed as a preliminary step for some nonrigid registration [3, 4].
From the aspect of the type of features that are used in a registration, the existing automatic registration methods are further divided as featurebased or intensitybased methods. Featurebased methods extract the corresponding features or attributes between a reference and the float images for registration. Common features include points, edges, contours or surfaces. These features may be specified manually or extracted automatically. Among the featurebased rigid registration methods, the principal axis registration (PAR) method [5] is the most motivating one as it can easily find the transformation parameters by calculating the principal axis of a common feature that is extracted from a reference and the float images.
PAR registration consists of translating the centroid of a float onto the reference image, and then, rotating it about the centroid to align the 3D principal axes of both the images. Hence, good features need to be extracted to obtain the accurate principal axes and centroids. In most of the 3D cases, a binary structure image [6, 7] is used as features while the contour features [8] are extracted for 2D principal axis registration. The main advantages of the featurebased principal axis registration methods are simple, and they are less computationally complex. Despite good properties, these approaches have been less popular as they can lead to inaccuracies in a medical image registration.
Intensitybased methods that uses full image content is the most interesting work, as they provide more reliable results with higher accuracy and this makes thees methods suitable for clinical application [9]. These approaches are commonly known as the voxel similarity based methods for 3D cases. These methods solve the registration task as a mathematical optimization problem. They use a similarity measure as a cost function to gauge the quality of the alignment between the reference and the float images for any given transformation. Obviously, finding out the optimized parameters between the two images needs the maximization of the similarity or the minimization of the difference. Common similarity metrics are mutual information (MI), normalized mutual information (NMI), cross correlation (CC), normalized cross correlation (NCC), sum of squared differences (SSD) and sum of absolute differences (SAD) [10, 11]. MI or NMI is considered as the most suitable metric for the multimodal image registration [12, 13] while the rest of them are appropriate for monomodality image registration [1416].
There are many mathematical optimizers that are suggested for image registration. Common optimizers include Powell’s, downhill simplex method, steepest descent, quasiNewton, LevenbergMarquardt, and conjugategradient method [17]. These approaches usually apply the voxel value of original intensity images and use an optimizer. Gradient NMIs based registration has been proposed for the registration of 3D CT data to 2D endoscopic image [18]. This approach calculates the NMIs in the gradient images as well as in the original intensity images and it adopted two optimizers, Powell’s and downhill simplex method. The voxel similarity method has been known to provide high accuracy for clinical application but due to their high computational cost, their usage in 3D clinical applications has been limited.
In order to diminish the computational complexity of the voxel similarity based methods, multiresolution approach has been developed by using the hierarchical structure of the input images [1921]. The hierarchical structure is constructed by consecutively subsampling the pyramid of two input images. The registration is fulfilled from low to high resolutions in the pyramid. The transformation parameters are searched in the optimization process, starting from the maximum similarity score or minimum difference between the two volumes at the lowest resolution. Then, the parameters obtained from the first (or lowest) resolution level are passed to the next resolution level. The registration is performed until the highest resolution level of the pyramid is reached, where the optimal parameter is achieved. As the level increases, the subsampling interval decreases, and the accurate registration result is obtained with a higher computational complexity. Many toolkit and toolboxes have been progressively developed for the voxel similarity method and for the multiresolution methods [22, 23]. Hence, the computational complexity of the multiresolution approach is reduced compared to the original voxel similarity method by using full image content. However, the accuracy of the registration depends on the definition of the level of the pyramid structure. The computational cost for a 3D volume is still an issue for realworld medical image applications.
Equivalent meridian plane (EMP) based MI registration method [24, 25] integrates the featurebased and voxel based methods to register the multimodal brain volumes, where the volumes are roughly aligned on the basis of the EMP which is determined by principal component analysis (PCA). Then, the registration is refined by maximizing the MI between the EMP of a float volume and the 2D XYcoordinate plane of the reference volume. Powell’s optimization method is performed for the enhancement of searching during the registration process. Even though this method is fast thanks to the 2Dplane based registration, the registration result depends on the correctness of EMP. This infers that it cannot guarantee an optimal solution for every case.
The purpose of this study is to provide a novel method for a rigid registration of 3D CT images. The registration domains in this study are rigid, threedimensional, and monomodal images of the same patient. The proposed method aims to combine the advantages of the featurebased and the voxel similarity based methods for the enhancement of the overall performance in the aspects of both accuracy and computational cost. Section II describes the methodology of the proposed method in detail. Experimental results and analysis are presented in section III. Finally, conclusions are drawn in section IV.
II. PROPOSED FEATUREVOXEL BASED IMAGE REGISTRATION
The proposed featurevoxel based image registration method consists of two steps: initialization and optimization step as shown in Fig. 1. In the initialization step, six transformation parameters are calculated based on the centroid and the principal axes of two CT volumes. The
estimated parameters are passed to the optimization step in order to obtain the optimal parameters. Differing from the previous EMPMI approach, the proposed method uses the whole 3D volume in order to provide trustworthy registration result. It calculates the NCC score between the reference and the float volume by using the downhill simplex method.
> A. Estimation of Initial Transformation Parameters
Let
V_{r} andV_{f} be thereference volume and the float volume, respectively, for registration. As image registration is the alignment of two images into the same spatial coordinates, the geometric mapping between the two points ∈x _{r} andV _{r} ∈x _{f} can be expressed as,V _{f}Where,
andx _{r} denote the 3D coordinates of voxels fromx _{f}V_{r} andV_{f} , respectively and is the geometric transformation matrix between them. The type of the transformation matrixM depends on the registration domain. In this paper, the relationship between the two input volumes is assumed rigid where the matrix is defined by six parameters which are three translations and three rotations. A rigid transformation matrix,M , can be written as the concatenation of a translation vectorM and rotation matrixT (t) in the homogeneous coordinates. Thus,R has the formM Hence, the vector representation of a rigid body transform of Equation (1) can be rewritten as,
A common way to describe a rotation matrix is the decomposed form of a sequence of the following matrices
Where,
α, β, γ represent the Euler angles about the 3D axes, respectively. The rotation matrix can be decomposed as =R R _{γ}R _{β} . Thus, the related position betweenR _{α}the reference volume
V_{r} and the floating volumeV_{f} is determined by a set of parameters = {P t_{x} ,t_{y} ,t_{z} ,α ,β ,γ } in whicht_{x} ,t_{y} ,t_{z} are the translation quanta andα ,β ,γ are the rotation angles of the floating volume along the 3D axes that is related to the reference volume, respectively. Now, the registration problem for the images that are related rigidly is to find out the optimal parameter set that best aligns the float volume to the reference volume.P The initial parameters of the two volumes are computed based on the transformation of their principal axes. Fig. 2a illustrates a 3D object with its principal axes where eigenvectors,
v_{1}, v_{2}, v_{3}, directly represent the orthogonal coordinate system within the object while the corresponding eigenvalues,λ_{1}, λ_{2}, λ_{3}, give the information about the length of the respective axes [7]. The rotation of the object is shown in Fig. 2b. Consequently, the rotation of that object is equal to the rotation of its principal axis and translation is the displacement of its centroid. The three steps involved for the extraction of the principal axis of 3D object are: feature extraction, centroid calculation, and principal axis calculation.A binary volume is used as the feature to represent the geometric shape of 3D to extract the principal axes. Let
B (x, y, z ) be the binary volume of a 3D volumeV (x, y, z ) as,Where,
x, y, z represent the coordinates of the image’s voxel and τ is a threshold value that defines the binary volume.The bone is the most special feature in CT images. Therefore for knee CT registration, the bone region must be extracted from unwanted tissues, such as muscle or fat. The global threshold value, τ, of the bone mask is chosen as 300 Hounsfield units (HU) for the intensity value range of CT image from 1,024 HU to 2,163 HU. Fig. 3b shows the candidate bone region with some residuals that are to be removed and holes that are to be filled after thresholding. The residuals are removed by 3D labeling and the holes are filled by the morphological operation. Fig. 3c shows the feature of a bone region that will be used for registration.
The centroid of
B (x, y, z ) is defined by the mean value of each coordinate point. It is important for the extraction of the principal axes and for the computation of the translations. The centroidC = (x_{c}, y_{c}, z_{c} ) is calculated as,The principal axes of the binary object are defined by the eigenvectors of its inertia matrix [6]. In fact, inertia matrix is the matrix that can be composed by the variances and covariance between the axes. The inertia matrix,
I of binary volumeB (x, y, z ) is given by,Where,
and
Here,
I_{xy} =I_{yx} ,I_{yz} =I_{zy} andI_{zx} =I_{xz} . Therefore, the inertia matrix,I is real and symmetric. The matrix form of its eigenvectors can be expressed as,After the three eigenvectors are calculated and form the matrix
E , three rotation angles for each principal axis are computed by equating the matrix to the rotation matrixE (i.e.,R =E ). The rotation anglesR α, β, γ can be calculated by using the following equations,To calculate a set of six parameters
_{0} = {P t _{x0},t _{y0},t _{z0},α _{0},β _{0},γ _{0}} that approximately aligns the floating volumeV_{f} to the reference volumeV_{r} , the centroids and the rotation angles of each volume must be computed first. The calculation is performed by using the following steps:Step 1: Define the binary volumes
B_{r} andB_{f} that represents the geometric shapes ofV_{r} andV_{f} , respectively.Step 2: Compute the centroid
C_{r} ofB_{r} andC_{f} ofB_{f} .Step 3: Calculate inertia matrixes
andI _{r} of their own binary volumeI _{f}B_{r} andB_{f} . Then, find the eigenvectors andE _{r} from their respectiveE _{f} andI _{r} to extract the rotation angles.I _{f}Step 4: Extract the rotation angles
θ_{r} = {α_{r}, β_{r}, γ_{r} } ofV_{f} andθ_{f} = {α_{f}, β_{f}, γ_{f} } ofV_{r} by equating andE _{r} toE _{f} .R Step 5: Take the difference of
C_{r} andC_{f} as =T _{0}C_{r} C_{f} , where, _{0} = {P t _{x0},t _{y0},t _{z0}} is the set of initial translation parameters.Step 6: Compute the subtraction between
andθ _{r} asθ _{f} =θ _{0} ？θ _{r} , so that a set of rotation parametersθ _{f} _{0} = {θ α _{0},β _{0},γ _{0}} is achieved.> B. Parameter Optimization
After the computation of the initial transformation parameters, which approximately estimates the coarse result to the ideal value, the task that remains is to find out the optimal transformation parameters for the two volumes. The process of finding out the optimal parameters that involves the computation of the voxel similarity measures between the reference and the float volumes. It is followed by the optimization step. The purpose of the similarity measure is to return a value that indicating how matched the two images are [9]. The cost function to determine the optimal parameter matrix is defined by,
Where,
S is the similarity metric. The selection ofS depends on the number of modalities that are included in the registration or the images’ characteristics. In this paper, the validation of registration is assessed by the correlation coefficient between the two volumes as the images to be registered are monomodality images. Therefore, a NCC is adopted to quantify the degree of the volumes’ similarity in the optimization process. The NCC function is defined as,Where,
n is the total number of voxels,i andj are the voxel indices,V_{r} (x_{i} ) andV_{f} (x_{j} ) are the intensity at pointsx_{i} andx_{j} in whichx representsx ,y ,z  coordinates ofV_{r} andrepresent the mean intensity values and σ_{r} and σ_{f} are the standard deviation of the respective volumes. They can be computed as,
In principle, the final transformation parameters are considered optimal when the NCC score between
V_{r} andV_{f} is maximum. However, the sign of the cost function is inverted as in Equation (16), as the downhill simplex optimizer tends to converge to the minimum score. This paper adopts the downhill simplex method as it is a commonly used and welldefined optimization method. Moreover, it does not require derivative information. The method initializes the optimization process withN + 1 points and it defines an initial simplex in anN dimensional parameter space. This simplex is then deformed iteratively. The simplex is formed by reflection, expansion or contraction steps in the iteration to shift the vertices of the simplex towards the minimum of the cost function,S . The implementation of a downhill simplex method is based on the algorithm given in [26]. In this work, the simplex is initialized with the six transformation parameters that are obtained from the previous step. This simplex will be iteratively optimized until the score of NCC reaches a minimum. Trilinear interpolation was used during registration for noninteger grids due to the translation and rotation.III. EXPERIMENTAL RESULTS
> A. Experimental Setup and Evaluation Method
The efficacy of the proposed method is evaluated on both phantom and synthetic data sets. As the purpose of this study is 3D rigid medical image registration, the phantom was simply created in a rigid structure. Fig. 4 describes the phantom that consisting of three erasers inside a box and mounted on a rectangular acrylic plate. The phantom was scanned in a CT scanner provided by Ajou University Hospital, Korea. The reference volume was first scanned without any transformation of the plate. Five floating volumes were obtained by scanning with a random transform of the phantom in x, y, z axes. It is illustrated in Fig. 5. Each CT volume is Digital Imaging
and Communications in Medicine (DICOM) image type. It had a dimension of 512 × 512 × 279 and slice thickness of 1 mm, with a 16bit depth and 0.6836 mm pixel spacing. One particular medical CT volume of knee acquisitions was selected for the synthetic data study. We created a database of transformed volume by applying a rigid transformation to it. The knee CT volume had a 16bit depth, 512 × 512 × 48 in dimensions, 3 mm in slice thickness and 0.6836 × 0.6836 mm in pixel spacing. The magnitude of the rigid transformations varied between ±10 degree for rotation and ±10 pixel for translation. Within these ranges, 40 volumes were randomly generated to validate the registration.
The proposed method is compared with a voxel similarity measure based approach (voxelbased), and a multiresolution based approach. The voxelbased method establishes a voxeltovoxel mapping between the reference and the float volume. On the other hand, the multiresolution registration reduces the computational complexity of the voxelbased registration method which is accurate but at the same time it is timeconsuming. The 4level volume pyramid for multiresolution registration is obtained by resampling both the reference and the float volume along each
x, y, z direction. The registration is performed from low to high resolution with the initial values set to one for both the voxelbased and the multiresolution based methods. For a comparison purpose, NCC and the downhill simplex method are adopted in both methods and the proposed method. The downhill simplex method was implemented in C function andcalled from the MATLAB (MathWorks, Natick, MA, USA) as mex file. The remaining parts of the programs were implemented in MATLAB 7.11 and they run on a PC with Intel® Core™ i7, 2.67 GHz processor and 8 GB RAM. The registration time, excluding the time of the loading data sets, is in seconds.
NCC scores between the reference and the registered volumes are computed for the evaluation of the accuracy of the proposed method on phantom images. The closer the score is to one, the higher the accuracy of the registration of the two volumes. In the case of synthetic image registration, the accuracy is measured by the mean and standard deviation in the differences between the computed transformation parameter values (
) and the groundtruths. The lower the mean and the standard deviation in the differences, the higher is the accuracy.P > B. Registration on Phantom Data Sets
Registration of the phantom is performed while setting up the first scan as the reference volume and the remaining five scans as float volumes. The reference volume is named as set_1 and the remainders are serially named as set_2 to set_6. In order to accomplish registration, the appropriate volume of interest (VOI) is selected from both the reference and the float volumes. The size of VOI for each volume is 231 × 131 × 111 and the registration process is taken on those VOIs. Fig. 6a and 6b show slices of VOI of the original reference and float volumes. For the phantom study, the binary volume which is needed for the extraction of the principal axes is simply obtained by using Otsu [27] thresholding method that is followed by 3D labeling. Otsu’s method [27] is a nonparametric and unsupervised method that provides an automatic global threshold value τ in Equation (4). Fig. 6c and 6d shows the binary results after the application of the Otsu’s method [27]. Some unwanted residual remains apart from the desired region. Those residuals are removed by labeling the connected component of the 3D binary volume. Fig. 6e and 6f show the binary volumes that are ready to extract the principal axes, after the labeling process. Fig. 6g and 6h show slices of the segmented regions for reference and float volumes, respectively.
The superimposed 3D visualization of the registration results for the reference (set_1 in dark gray) and the float (set_6 in light gray) volumes are presented in Fig. 7. Table 1 compares the NCC score and the processing time of the proposed method against the principal axes based approach, the voxelbased, the multiresolution based method and EMP based MI method.
The average NCC score and registration time are illustrated in Fig. 8. The figure reveals that the voxelbased approach provides high accuracy result with high computational complexity. Multiresolution based approach reduces the computational cost but the average NCC is low. This implies that it may produce unsatisfied results.
Both the principal axes based approach and the EMPMI based approach provide a fast processing time result but a low correlation score is obtained. Although the proposed method is not as fast as the other two methods, it provides
efficient registration result as the average NCC score is as high as the voxelbased method. Moreover, Table 2 describes the before and afterregistration score of different similarity measures for each phantom data set.
As NCC, the higher the score of NMI, the higher is the accuracy of the registration. In contrast to NCC and NMI, the higher accuracy of registration is obtained when the score of SSD is small. To evaluate the proposed method with different similarities, the correlation score of the reference (set_1) and the registered volume which is transformed the float volume (set_2set_6) by the parameters resulted from different similarity measures. The average correlation score of NCC and NMI is 0.98 and the average correlation of SSD is 0.96. The average registration time of NCC is 38.1 seconds, while those of NMI and SSD are 53.2 and 23.7 seconds, respectively. Therefore, NCC provides the similar average score with NMI, higher score of SSD and it acquires a faster average registration time.
> C. Registration on Synthetic Data Set
The registration is also evaluated on the synthetic datasets that are randomly generated by rotating and translating one knee CT volume along the
x, y, z axes within the given ranges. The appropriate 307 × 231 × 84 VOI size of the CT image is selected for registration.A set of new float images were created by the application of the random rigid transformation with a rotation that varies from 10 degree to +10 degree and a translation which varies from 10 pixels to +10 pixels along all the axes. As the registrations are performed with the groundtruths, the accuracy is evaluated by the mean and
standard deviation (SD) of rotation and translation errors. Standard deviation shows the amount of variation from the mean. The smaller the standard deviation, the higher is the accuracy of registration. Table 3 shows the mean (SD) of the transformation errors and the average processing time in the registration of the set of new float images with the reference images by different approach. Although the processing time of the principal axes based approach and the EMPMI based approach is faster than the voxelbased, multiresolution based and the proposed methods, they provide a big mean (SD) value of synthetic data sets. The proposed method provides more accuracy result and relevant processing time of registration. Fig. 9 shows the registration result of one of the synthetic data sets.
IV. CONCLUSIONS AND FUTURE WORK
This paper presents a novel method to automate the process of registering 3D CT images taken at different instances. The proposed twostep method first finds out the initial transformation parameters based on the principal axes of the binary structure of the input images. The parameters acquired in the first step are then optimized to provide the parameters that best align the float volume to the spatial coordinates of the reference volume. The objective of this preliminary study using phantom and synthetic data is to validate the performance of the proposed method in terms of accuracy and processing time. The experiments on phantom and synthetic data show that the algorithm provides robust and faster results compared to the voxel similarity measure based approach and the multiresolution registration method. The experimental results demonstrated that the proposed method can be applied for the rigid registration of real images. In future, real patient data sets are to be examined. Further, a more robust segmentation method is to be employed to obtain better results for the patient images, as the current correlation calculation is limited only to the bone region.

[Fig. 1.] Overall flowchart of the proposed method.

[Fig. 2.] 3D object with its principal axes (a) and rotated object (b); the rotation of an object is equal to the rotation of itsprincipal axes.

[Fig. 3.] (a) Original image (b) candidate bone region after global thresholding (c) the bone region B(x y z) for registration.

[Fig. 4.] A phantom (a) (b).

[Fig. 5.] Slices of phantom computed tomography data sets: set_1 (a) set_3 (b).

[Fig. 6.] Slice 40 of (a) original reference volume (set_1) (b) original float volume (set_6) (c) reference binary volume and (d) float binary volume after the Otsu’s method [27] (e) referencebinary volume and (f) float binary volume after 3D labeling (g) segmented reference volume (h) segmented float volume.

[Fig. 7.] The comparison of a superimposed 3D visualization of (a) reference (set_1) and float volume (set_6), (b) reference and the registered volume of the proposed method.

[Table 1.] Correlation score and registration time for phantom data sets (set_1 as reference and the remaining volumes as float) : time insecond

[Fig. 8.] Comparison of (a) average normalized cross correlation (NCC) score (b) average registration time of five phantom data sets. EMP: equivalent meridian plane MI: mutual information.

[Table 2.] Evaluation of the proposed method with different measures

[Table 3.] Mean (standard deviation) of the transformation error measurement of forty synthetic data sets

[Fig. 9.] The superimposed 3D visualization of (a) reference and float volume (b) reference and registered volume using the proposed method.