Data Interpretation Methods for Petroleomics
 DOI : 10.5478/MSL.2012.3.3.63
 Author: Islam Annana, Cho Yunju, Ahmed Arif, Kim Sunghwan
 Organization: Islam Annana; Cho Yunju; Ahmed Arif; Kim Sunghwan
 Publish: Mass Spectrometry Letters Volume 3, Issue3, p63~67, 20 Sep 2012

ABSTRACT
The need of heavy and unconventional crude oil as an energy source is increasing day by day, so does the importance of petroleomics: the pursuit of detailed knowledge of heavy crude oil. Crude oil needs techniques with ultrahigh resolving capabilities to resolve its complex characteristics. Therefore, ultrahigh resolution mass spectrometry represented by Fourier transform ion cyclotron resonance mass spectrometry (FTICR MS) has been successfully applied to the study of heavy and unconventional crude oils. The analysis of crude oil with high resolution mass spectrometry (FTICR MS) has pushed analysis to the limits of instrumental and methodological capabilities. Each highresolution mass spectrum of crude oil may routinely contain over 50,000 peaks. To visualize and effectively study the large amount of data sets is not trivial. Therefore, data processing and visualization methods such as Kendrick mass defect and van Krevelen analyses and statistical analyses have played an important role. In this regard, it will not be an overstatement to say that the success of FTICR MS to the study of crude oil has been critically dependent on data processing methods. Therefore, this review offers introduction to peotroleomic data interpretation methods.

KEYWORD
Mass spectrometry , Laser desorption , Petroleum , Data interpretation

Introduction
Energy has been playing a key role in the development of modern human society for many years. A person in modern industrialized society consumes approximately ten times more energy than one in the agricultural society because modern human is relying on energy for food production, heating, construction, and transportation.13 Fossil fuels especially crude oil have been one of the most heavily used energy resources. It is wellknown that oil is a limited resource and hence people are continuously looking for new alternative energy.4,5 However, the transition of our society to adapt to new energy source will take decades6 and it is logical to identify and use more immediately usable energy resources. In fact, as the world’s crude oil deposit becomes heavier, it is less efficient to utilize the crude oils.7 It is because smaller amount of economically viable component is generated from heavy crude oils than lighter ones. Therefore, it is very important to devise methods to increase the efficiency of utilizing heavy crude oils.8
Understanding the heavy component of crude oil at the molecular level has been very important to improve petroleum processing.9,10 Petroleomics refers to a research effort where detailed knowledge of heavy crude oil is pursued.11,12 High resolution mass spectrometry especially Fourier transform ion cyclotron resonance mass spectrometry (FTICR MS) has been one of the key components for petroleomics.13 It has not been easy to study the heavy (typically molecular weight over 400 Da) and/or polar compounds by use of traditional analytical methods such as gas chromatography mass spectrometry (GCMS).14 However, Fourier transform ion cyclotron resonance mass spectrometry (FTICR MS) coupled to various ionization sources has enabled us to observe and study heavy compounds up to about 800 Da with and without polar functional groups.1517 For example, atmospheric pressure photo ionization (APPI) coupled to FTICR MS is a powerful tool to study aromatic and/or sulfur containing compounds.10,18 More than thousands of compounds have been routinely observed by use of this technique.19
The successful application of FTICR MS to study crude oil at a molecular level has been possible partly due to key advances in data processing methods.2022 If the data processing methods were not been developed, it would have been very laborious to process and study thousands of peaks contained in each crude oils’ spectra. The complex highresolution mass spectra are the results of the complex nature of petroleum. It may happen that, spectrum of over 100,000 peaks comes out from a single crude oil sample. Therefore, development of data interpretation methods has played a crucial role in the study of crude oils by highresolution MS.23 For an example, Kendrick mass defect (KMD) and van Krevelen analytical methods have been important data processing methods to simplify and visualize crude oils’ spectra.24,25 There are other important methods developed to visualize the complex spectra as well.26,27 Therefore, the objective of this paper is to provide a review of developments related to data interpretation of very complex spectra of crude oil provided by FTICR MS.
Kendrick mass defect plot
Calculation of Kendrick mass is done by multiplying the observed
m/z values by ratio of nominal mass/exact mass of a given functional group.28 Typically, CH_{2} is used as a functional group for the KMD calculation. For CH_{2} KMD, 14.0000/14.01565 is multiplied to the observed m/z values. Nominal Kendrick mass is calculated by rounding up or down Kendrick mass. The Kendrick mass defect (KMD) is calculated by subtracting the Kendrick mass from the nominal Kendrick mass. Therefore, the digits after the decimal point of the Kendrick mass define the KMD values. In case of CH_{2} KMD, adding or subtracting (CH_{2})_{n} from a given molecular formula does not change KMD values of the formulae. In other words, Kendrick mass values of elemental formulae differing only by (CH_{2})_{n} will differ from each other only by whole numbers. In summary, each series of peaks differing only by (CH_{2})_{n} will have the same and their own unique KMD value. For examples, the KMD values of benzene (C_{6}H_{6}), toluene (C_{7}H_{8}) and phenol (C_{6}H_{6}O) can be calculated as follows. In the examples given below, benzene and toluene have the same KMD value because their elemental composition is different by CH_{2} but the phenol has the different KMD value.KMD (C6H6) = 78.04695 × (14.0000/14.01565) ？ 78 = 0.9598 (1)
KMD (C7H8) = 92.06300 × (14.0000/14.01565) ？ 92 = 0.9598 (2)
KMD (C6H6O) = 94.0418 × (14.0000/14.01565) ？ 94 = 0.9368 (3)
KMD plot is typically generated by plotting nominal KMD vs KMD values. An example of a KMD plot is presented in Figure 1. The dots aligned in each line parallel to xaxis have elemental compositions differing from each other by (CH_{2})_{n}. The KMD values are particularly useful because they can be calculated directly from mass numbers even before the elemental compositions are calculated and assigned.25 This feature allows ones to use KMD values to sort the mass numbers of calibrated mass spectra with a given neutral units. Relative abundance of peaks can be presented by color or size of dots.
A method analyzing KMD in higherorder was developed and reported.29 For the analysis, mass numbers from spectra are firstly grouped by KMD values (e.g., by the CH_{2} series), and then the groups are further sorted by the second KMD series (e.g., by the H_{2} series). A group of peaks in different classes are plotted with the CH_{2}based KMD values as the abscissa and the ratio of the CH_{2}based KMD and H_{2}based KMD of the CH_{2}based KMD as the ordinate.
Van Krevelen diagram
The van Krevelen diagram was originally used to study bulk elemental analysis data of coals.30 In the original van Krevelen plot, bulk hydrogentocarbon (H/C ratio) ratio was plotted as the ordinate and the bulk oxygentocarbon ratio (O/C ratio) as the abscissa. In this way, each sample was plotted as a dot in the diagram.30 Later, Kim et al. applied the van Krevelen diagram to plot the elemental compositions obtained by FTICR MS.24 To construct a van Krevelen diagram out of information obtained from high resolution mass spectrometry, accurate mass numbers obtained from the spectra are first converted into elemental formulae. Secondly, each of the molar ratio of hydrogento
carbon (H/C ratio) and the molar oxygentocarbon ratio (O/C ratio) of the formulae are plotted as the ordinate and the abscissa. In this way of plotting, each peak (or formulae) observed in a crude oil spectrum is plotted as a dot in the diagram. Van Krevelen diagram can be used to plot heteroatom classes and each heteroatom class can be plotted by these Krevelen diagrams. The van Krevelen diagram can be used to estimate major components observed in complex mass spectra.24
The relative abundance of peak observed in a given spectrum can be colorcoded and presented as a contour plot.31 An example of van Krevelen diagram constructed from a crude oil spectrum is presented in Figure 2. In the diagram, molar hydrogentocarbon ratios (H/C ratio) were plotted as the ordinate and molar nitrogentocarbon ratios (N/C ratio) were plotted as the abscissa. The van Krevelen diagram is very effective in displaying classes containing the same hetero atoms but with different numbers of heteroatoms. The N_{1} and N_{2} classes of compounds are displayed in the diagram shown in Figure 2 and it is clear that they are separated in the diagram by N/C ratio.
To plot formulae in the van Krevelen plot, the elemental formulae must have the same types of hetero atoms. Therefore, the van Krevelen plot can’t be used to compare different heteroatom classes. For an example, O_{2} and N_{1} classes can’t be compared by use of this technique.
Doublebond equivalence vs. carbon number plot
Doublebond equivalence (DBE) represents the number of double bonds and rings in a given molecular formula and can be calculated by the following equation for elemental formulae C_{c}H_{h}N_{n}O_{o}S_{s}:
DBE = c ？ h/2 + n/2 + 1 (4)
DBE is very important because it enables us to predict chemical structures from elemental formulae. For example, a compound with benzene ring structure has DBE value
of 4. Compounds each with naphthalene and anthracene core structure will have DBE value of 7 and 10. This means that addition of aromatic ring would increase DBE value by 3. Therefore, if there are series of elemental formulae which are different from each other by DBE value of 3, one can predict that the series of compounds can have aromatic structures.
The DBE values calculated from elemental formulae can be plotted against carbon number. The plot is often called “DBE vs carbon number plot”. In the plot, the relative abundance of each peak can be color coded. An example of DBE vs carbon number plot generated from high resolution mass spectrum of a crude oil is shown in Figure 3. DBE vs carbon number plot can be a useful tool to figure out structures of compounds existing in crude oil sample.32 Especially, the concept of planar limits can be used for structural interpretation. In a given DBE vs carbon number plot, planar limits can be defined as a line connecting maximum observed DBE values with carbon numbers.33,34 The planar limit is marked in Figure 3. The structural features of the observed peaks are responsible for the variation of the slopes and intercepts of the planar limits. Structures of molecules existing in the saturates, aromatics, resins and asphaltene (SARA) fractions were proposed based on the slopes and intercepts planar limits and observed elemental formulae.32
Another interesting concept originated from DBE vs carbon number plot is the compositional boundary.35,36 The compositional boundary indicates the maximum DBE values that any synthetic or natural chemical compounds can have. The line defining the compositional boundary can be calculated by the following equation.35, 36
The compositional boundary = carbon number + 1 (5)
The compositional boundary can be used when elemental formulae are assigned. It was reported that 10% of the possible elemental formulae for masses below 1000 Da could be excluded after the concept of compositional boundary was applied.35,36
Application of statistical analysis
In petroleomic study, it is very important to analyze multiple samples and compare the obtained high resolution mass spectra. It is because the relationship between chemical and/or physical properties of crude oils and mass spectral information cannot be fully understood only with a few samples. Instead, many crude oil spectra must be analyzed and interpreted. This means that a large number of peaks easily exceeding 1,000,000 peaks have to be processed at a time. Therefore, it is indeed a great analytical challenge to extract relational information between observed peaks and the properties of crude oils. Statistical analyses have been successfully used to better understand large data sets, and hence it is reasonable to expect that statistical analyses can be successfully applied to study large amount of petroleomic data.
A statistical analysis program was developed and applied it to study 20 different samples.37 In the previous study, principal component analysis (PCA) was successfully applied to group the samples based on their chemical compositions and enabled the identification of compositional differences. Additionally, hierarchical analysis (HCA) was successfully used to compare the samples. Figure 4 shows the results obtained from HCA of petroleum samples. The resulting data are presented by heat map and clustering.
For the statistical analyses to be more effective, data obtained by petroleomic technique should be more quantitative. However, at this moment, data provided by FTICR MS is semiquantitative at best. Therefore, there
should be more effort devoted to improve quantitative nature of petroleomic data.
Correlation analysis
Correlation is one of the important statistical methods by which relationship between two variables are identified. The correlation analysis has been applied to verify the validity of the key assumption of petroleomics that the spectral findings from high resolution mass spectrometry are related to crude oil properties.38 If the assumption is not valid, the practical usefulness of petroleomics will be greatly limited. Therefore, the assumption is very important for petroleomics. The assumption was validated by seeking correlational relationships between peaks identified by highresolution MS and the chemical/physical properties of crude oils.38
The result of correlation analysis was presented using a Circos diagram (Figure 5). The Circos diagram was originally developed for genomic research and it is a very effective in visualizing complex data. In the diagram shown in Figure 5, the outside shell designates heteroatom classes. Dots located in the second shell inside the class shell represent peaks in the heteroatom classes with significant correlation with the property. The peaks located near the line just inside the second shell have correlation value (P value) of +1 and the ones located near the third shell have correlation value of ?1. The peaks with P value of +1 mean that they have positive correlation with the property and ones with ?1 denote negative correlation. The circle at the center shows the distribution peaks in the studied spectra.
In the previous study,38 it was shown that high resolution mass spectra of crude oils showed correlational relationships with important chemical and physical properties of crude
oils such as sulfur and nitrogen contents, and total acid number. Therefore, this opens up the door for chemicalcomponent based prediction of the properties of crude oils.
Conclusions and future studies
The application of FTICR MS to the analysis of crude oils has begun a new era to improve the knowledge of these materials at the molecularlevel. However, these complex mixtures still remain mysterious with many issues. For example, quantitative understanding of the numerous compounds observed by FTICR MS is still very difficult. Future research will need to be focused on (i) quantitative interpretation, (ii) improving separation, and (iii) combining data obtained with other techniques such as ion mobility mass spectrometry to do structural interpretation.39,40

1. Schaefer C., Weber C., Voss A. 2003 [Energy] Vol.28 P.411

2. Klass D. L. 2003 [Energy Policy] Vol.31 P.353

3. Thoma M. 2004 [Energy Econ.] Vol.26 P.463

4. Dahlquist E., Thorin E., Yan J. 2007 [Int. J. of Energy Re.] Vol.31 P.1226

5. Szklo A., Schaeffer R. 2006 [Energy] Vol.31 P.2513

6. Barrow M. P. 2010 [Biofuels] Vol.1 P.651

7. Headley J. V., Peru K. M., Barrow M. P. 2009 [Mass Spectrom. Rev.] Vol.28 P.121

8. OrtizCruz A., Rodriguez E., IbarraValdez C., AlvarezRamirez J. 2012 [Energy Policy] Vol.41 P.365

9. Hsieh M., Philp R. P., del Rio J. C. 2000 [Org. Geochem.] Vol.31 P.1581

10. Chiaberge S., Fiorani T., Savoini A., Bionda A., Ramello S., Pastori M., Cesti P. [Fuel Proc. Tech.]

11. Hsu C. S., Hendrickson C. L., Rodgers R. P., McKenna A. M., Marshall A. G. 2011 [J. Mass Spec.] Vol.46 P.337

12. Rodgers R. P., McKenna A. M. 2011 [Anal. Chem.] Vol.83 P.4665

13. Marshall A. G., Rodgers R. P. 2008 [Proc. Natl. Acad. Sci.] Vol.105 P.18090

14. Yassaa N., Meklati B. Y., Brancaleoni E., Frattoni M., Ciccioli P. 2001 [Atm. Environ.] Vol.35 P.787

15. Kujawinski E. B. 2002 [Environ. Foren.] Vol.3 P.207

16. Miyabayashi K., Naito Y., Tsujimoto K., Miyake M. 2004 [Int. J. Mass Spec.] Vol.235 P.49

17. Marshall A. G., Hendrickson C. L., Jackson G. S. 1998 [Mass Spec. Rev.] Vol.17 P.1

18. NiednerSchatteburg G., ？ilha J., Schindler T., Bondybey V. E. 1991 [Chem. Phys. Lett.] Vol.187 P.60

19. Marshall A. G., Rodgers R. P. 2004 [Acc. Chem. Res.] Vol.37 P.53

20. Xian F., Corilo Y. E., Hendrickson C. L., Marshall A. G. 2012 [Int.J. Mass Spec.] Vol.325327 P.67

21. Carlsohn E., Angstrom J., Emmett M. R., Marshall A. G., Nilsson C. L. 2004 [Int. J. Mass Spec.] Vol.234 P.137

22. Rodgers R. P., Hendrickson C. L., Emmett M. R., Marshall A. G., Greaney M., Qian K. 2001 [Canadian J. of Chem.] Vol.79 P.546

23. Blakney G. T., Hendrickson C. L., Marshall A. G. 2011 [Int. J. Mass Spec.] Vol.306 P.246

24. Kim S., Kramer R. W., Hatcher P. G. 2003 [Anal. Chem.] Vol.75 P.5336

25. Hughey C. A., Hendrickson C. L., Rodgers R. P., Marshall A. G. 2001 [Anal. Chem.] Vol.73 P.4676

26. Gorshkov M. V., Nikolaev E. N. 1993 [Int. J. Mass Spec. Ion Proc.] Vol.125 P.1

27. Kazazic S., Zhang H.M., Schaub T. M., Emmett M. R., Hendrickson C. L., Blakney G. T., Marshall A. G. 2010 [J. Am. Soc. Mass Spec.] Vol.21 P.550

28. Kendrick E. 1963 [Anal. Chem.] Vol.35 P.2146

29. Roach P. J., Laskin J., Laskin A. 2011 [Anal. Chem.] Vol.83 P.4924

30. van Krevelen D. 1950 [Fuel] Vol.269

31. Bae E., Na J. G., Chung S. H., Kim H. S., Kim S. 2010 [Energy Fuels] Vol.24 P.2563

32. Kim Y. H., Kim S. 2010 [J. Am. Soc. Mass Spec.] Vol.21 P.386

33. Cho Y., Kim Y. H., Kim S. 2011 [Anal. Chem.] Vol.83 P.6068

34. Purcell J. M., Merdrignac I., Rodgers R. P., Marshall A. G., Gauthier T., Guibard I. 2010 [Energy Fuels] Vol.24 P.2257

35. Hsu C. S., Lobodin V. V., Rodgers R. P., McKenna A. M., Marshall A. G. 2011 [Energy Fuels] Vol.25 P.2174

36. Lobodin V. V., Marshall A. G., Hsu C. S. 2012 [Anal. Chem.] Vol.84 P.3410

37. Hur M., Yeo I., Park E., Kim Y. H., Yoo J., Kim E., No M. H., Kim J., Kim S. 2010 [Anal. Chem.] Vol.82 P.211

38. Hur M., Yeo I., Kim E., No M. H., Koh J., Cho Y. J., Lee J. W., Kim S. 2010 [Energy Fuels] Vol.24 P.5524

39. FernandezLima F. A., Becker C., McKenna A. M., Rodgers R. P., Marshall A. G., Russell D. H. 2009 [Anal. Chem.] Vol.81 P.9941

40. Ahmed A., Cho Y. J., No M.H., Koh J., Tomczyk N., Giles K., Yoo J. S., Kim S. 2010 [Anal. Chem.] Vol.83 P.77

[Figure 1.] Kendrick mass defect diagram of a crude oil.

[Figure 2.] Van Krevelen diagram of nitrogen containing (N1 and N2) classes.

[Figure 3.] DBE vs carbon number plot and the planar limit observed in the plot.

[Figure 4.] Diagram showing heat map and clustering resulted from hierarchical clustering analysis (HCA) of crude oil spectra.

[Figure 5.] Circos diagram showing correlational relationship between high resolution mass spectral peak information and physical property of crude oil.