A CNN Image Classification Analysis for ‘Clean-Coast Detector’ as Tourism Service Distribution
- Author: CHANG Mona, XING Yuan Yuan, ZHANG Qi Yue, HAN Sang-Jin, KIM Mincheol
- Publish: Journal of Distribution Science Volume 18, Issue1, p15~26, Dec 2020
Purpose: This study is to analyze the image classification using Convolution Neural Network and Transfer Learning for Jeju Island and to suggest related implications. As the biggest tourist destination in Korea, Jeju Island encounters environmental issues frequently caused by marine debris along the seaside. The ever-increasing volume of plastic waste requires multidirectional management and protection. Research design, data and methodology: In this study, the deep learning CNN algorithm was used to train a number of images from Jeju clean and polluted beaches. In the process of validating and testing pre-processed images, we attempted to explore their applicability to coastal tourism applications through probabilities of classifying images and predicting clean shores. Results: We transformed and augmented 194 small image dataset into 3,880 image data. The results of the pre-trained test set were 85%, 70% and 86%, and then its accuracy has increased through the process. We finally obtained a rapid convergence of 97.73% and 100% (20/20) in the actual training and validation sets. Conclusions: The tested algorithms are expected to implement in applications for tourism service distribution aimed at reducing coastal waste or in CCTVs as a detector or indicator for residents and tourists to protect clean beaches on Jeju Island.
Marine Debris , Clean Coast Detector , Convolution Neural Network , Tourism Application , Environmental Management
Marine Debris (Marine Litter) is a man-made waste that is intentionally or accidentally released into a lake, sea, or waterway, and can be defined as a permanent product or processed solid that has been abandoned or disposed of in marine and coastal environments (Lee et al., 2016; Terzi & Seyhan, 2017; Research Institute of Ships & Engineering, 2019). According to the United Nations Environment Program (UNEP), about 8 million tons of marine debris is produced annually worldwide, of which 80% are land-produced municipal waste (Baker et al., 2019). Wastes that are not completely disposed of are not only visually problematic, but which also pose a threat are addressed in a timely manner without changing human consciousness about the processes and responsibilities of production, consumption and disposal as a threat to the entire environmental ecosystem humans, animals and plants (Gall & Thompson, 2015).
In particular, the amount of marine debris in Jeju has surpassed 10,000 tons annually for the past four to five years and is emerging as a new local issue (Kim, 2019). According to the results announced by the Korean National Coastal Garbage Monitoring in 2018, plastics account for 59% of the marine debris, and Jeju's coastal wastes are easily attributed to external sources (Lee et al., 2016), from foreign countries (especially China). However, the amount of the waste originated from other countries estimated is actually less than 10 % (Kim, 2019). This means that most of the garbage produced inside Jeju Island is closely related to the increase of tourists and the entire population of Jeju Island, which has expanded rapidly in recent years (Lee et al, 2016, Chang et al, 2019; Choi et al, 2019).
As we are aware of these matters, various studies have been conducted in academia on the seriousness of the marine debris problem that threatens humans and the ecosystem of the earth. Among them, Alkalay et al. (2007) presented the concept of the Clean Coast Index (CCI), which indexed the cleanliness of the coastal areas to five grades, measured the amount of plastic parts per square meter out of costal wastes. Since then, CCI index was often cited in marine environmental studies (Laglbauer et al, 2014; Fernandino et al, 2015; Campbel et al., 2016; Terzi, Y., & Seyhan, K. 2017). In addition to the attempts to monitor and measure or predict the severity of plastic waste, especially in marine debris, recent researches using artificial intelligence, in particular CNN algorithms have been noticeable (Abbasi & El Hanandeh, 2016; Munari et al., 2016; Azarmi et al, 2018). Here, the Convolutional Neural Network(CNN) has been known as a model that can apply filtering techniques to artificial neural networks to process images or voices more effectively (LeCun et al., 1989) and it is evolving into a number of advanced model algorithms used in Deep Learning (Chen et al., 2018; Raghavendra et al., 2018; Xie et al., 2019).
However, most of the existing researches on marine waste measurement or indexing methods have been conducted in the field of environmental engineering, and it is very difficult to find the previous studies that approach this problem from the perspective of tourism. Furthermore, the mechanism of the Artificial Intelligence, which is divided into the gradual stages of Deep Learning, Machine Learning, and Artificial Intelligence, requires background knowledge of computer or data science. It is considered to be far convoluted for researchers in the social science field to access.
Therefore, this study intends to find an experimental approach that can be applied to the tourism industry. To do this, we label and train the image dataset consists of clean and beautiful views well as dirty ones with plastic debris of Jeju beaches using the CNN method. Through this, we explored the possibility of developing applications or sensors that enable to inform us the cleanliness of the coast. Once the successful deep learning algorithm is developed, it may be useful for the residents and tourists who would like to visit the clean coast of Jeju, and are helpful for the department-in-charge at the provincial government.
Marine debris is recognized as a global environmental problem as it flows into the fluid environment of the sea, not only directly affecting neighboring countries, but also spreading throughout streams, rivers and lakes (Paler et al., 2019). Recently, as the seriousness of the effects of micro plastics on marine animals is known, it is urgent to clarify plastic marine garbage such as size, color and material (Baker et al., 2019; Ebere. et al., 2019; Fallati et al., 2019; Fulton et al., 2019; Hartmann et al., 2019). According to local reports in Korea, plastics account for 80% of all marine debris, which takes about 500 years to disintegrate at sea (Ministry of Oceans and Fisheries, 2019) while implementing waste reduction policies(Jian, 2012). If it is not collected and processed, it will adversely affect the environment (Choi et al., 2018; Hartman et al., 2019). This is why many countries such as the United Kingdom, the United States, Japan, and many other European countries, along with China and Australia are fighting against plastics and establishing new regulations/codes for plastic waste matters (Choi et al., 2018; Lam et al., 2018).
In particular, marine scientists continue to raise the question of the high proportion of plastics in marine debris that threatens the survival of humans as well as marine life (Schulyer et al., 2014; Gall & Thompson, 2015; Paler et al., 2019). Plastic waste flowing to the sea not only threatens the lives of marine animals, but also impairs and damages fishing businesses for humans, and causes accidents for those who visit the sea for recreational purposes. Moreover, the spread of various pathogens to humans is caused by plastic waste (Kim et al., 2010; Campbell et al., 2016; Baker et al., 2019). Indeed, the UNEP has stated in a 2019 report that fishery damage from marine debris would cost about $ 72 million a year in Europe alone and $ 735 million for waste disposal (Baker et al., 2019).
As a part of problem solving, researchers suggest that the flow of plastics from production to distribution, consumption, and recycling through classification should be monitored by labeling according to their material or size (Choi et al., 2018). Furthermore, it is necessary to establish a standard for classification according to size, shape, color, etc. by ranging them from the size, color, and shape including the physical and chemical properties of plastic, and suggest that a basic design that requires consensus by community (Hartmann et al., 2019; Ebere et al., 2019; Paler et al., 2019). In this context, Alkalay et al. (2007) proposed the Clean-Cost Index (CCI) for Israel's Clean Coast Campaign Program, which uses a 10-metre coast as an indicator of cleanliness. Divided horizontally by the meter interval, the index of the plastic bottle cap size (more than 2 cm) found was measured as the number of wastes and the index was presented as follows (See Table 1).
The index derivation formula of the Clean-Cost Index (CCI) is as follows (Paler et al., 2019):
In this study, we performed a predictive experiment to classify the images of “Clean Beach” and “Polluted Beach” of Jeju Island before measuring the level of pollution on the beach. When normalizing specific datasets, the process of designing each layer and creating a model requires expertise and skills, consuming plenty of time and input data (big data) for training process, therefore it can be very difficult when the data is small. Thus, using a ready-designed model such as Inception-v3, if retrained, an effective normalization result can be achieved in a relatively small amount of input data and in a short time. This retraining of the trained model in other cases is called ‘Transfer Learning’ (Lee et al., 2019). The Inception module was located in the middle of the spectrum of regular and depth-wise separable convolutions of transfer learning. To make hypothesis stronger (decoupling between spatial correlation and cross-channel correlation), the Xception network, created by applying depth-wise separable convolution instead of the traditional inception module, has better performance when the number of parameters is similar to that of the existing Inception V3 (Chollet, 2017). In this study, a pre-trained CNN model using Xception was used as the text. A user-designated head was created to decode the final function. The Xception V1 model that pre-learned weights for ImageNet is as follows (Chollet, 2017).
keras.applications.xception.Xception(include_top=True, eights='imagenet', input_tensor=None, input_shape=None, pooling=None, classes=1000)
In this experiment, we tried to build a small data set consisting of 194 images of “Clean and Polluted Beaches” to utilize the model. All of these images were crawled by searching for 'Jeju Beach' in 'Google', and searched for clean beach and polluted beach respectively. The key words were 'Jeju Beach, Jeju Clean Beach, Jeju Polluted Beach, and Plastic Waste of Jeju Beach' etc. Google Chrome's extension provides an image batch download program called “Fatkun”. In this study, Fatkun Batch Download Image 5.3.10 was used to download desired images all at once quickly. Overlapping images and images from non-Jeju islands were filtered and excluded, and labeled 97 images from 0 to 96 and stored in each ‘train’ folder. The validation set (folder) contained a total of 20 images (10 per class) and the test set (folder) contained 5 per class, a total of 10 shuffled images. The image set for train and validation goes through image preprocessing and augmentation, and images in the test set were used only at the final stage to evaluate the normalization of the model. The pre-trained model used a 299x299x3 image input and outputs a 2048x1-feature vector after global full average pooling. Then we added 10 units of fully connected layers and 1 unit of output layers for binary classification. L2 normalization was used at both layers to prevent overfitting. Retraining was done using Tensorflow with Keras as backend.
Image preprocessing allowed each image to be 299x299 square by using OpenCV-Python. The processed images are stored in a numpy array in `polluted_beach_dataset.py` and then normalized to N (0, 1) for each channel. We then zoom in on a set of 194 training data images (97 clean shore and trash-contaminated data each), applying 20 epochs by randomly rotating, moving, skewing, zooming or flipping the images to be trained (194x20= 3,880 trainable image data. This pre-trained model can be used to extract feature vectors from augmented data (aug_and_feature_extract.py). In this way, we did not need to carry out real-time augmentation, iteratively calculate the entire model, computing for the two layer heads later. The source of features to perform every dataset tasked in Python in this paper using Jupyter Notebook, an interactive data visualization tool under online environment.
Through the above practice, we have x_train “array expansion and feature extraction (194 x 20 = 3,880)” followed by an array of 3,880x2,048, y_train is a vector of 3,880x1 and 194 datasets with 3,880 in order to expand them as observable datasets. When fitting a model to avoid the possibility of selecting more than one image with the same origin in the same batch, you can create a random image that is different from the original 194 images without mixing the training set. Adam is an active function used for optimizations with a basic learning rate of 1e-3, and dichotomous cross-entropy is used as a loss function. In addition, using 0.01 as the weight loss for L2 normalization, user-defined heads are stored in the model directory. As shown in Figure 4, the results of the pre-trained test set were 85% (0.85), 70% (0.70) and 86% (0.86). It can be seen that the accuracy has increased through the process. We can then see a rapid convergence of 97.73% and 100% (20/20) accuracy in the actual training and validation sets, respectively.
Subsequently, the confusion matrix (see Figure 5) shows the accuracy of the experiment at a glance. In addition, the classification result of training and prediction (see Figure 6 & 6) shows that error and loss are reduced in both training and verification set. Finally, the results of classifying 10 randomly given images are shown in Figure 8.
The 10 images in the test folder were used as a set of data (images) that were never presented during training and validation to assess the model's normalization probability (accuracy), and matched the probability of Polluted Beach almost correct as shown in Figure 8. In other words, the sizes of debris on the image #3 with Polluted Prob 0.39 (mistakenly recognized as a clean beach) is contrasted with the sizes of the garbage in the image # 6 (Polluted Prob 0.74). Based on such experience, we found that there are new questions are to be solved in the future whether to sort marine waste according to their sizes or by quantity.
Since the WTO began discussing the concept of sustainable tourism in 1993, the desire to find alternatives to mass tourism around the world had initiated (Carlo & Ko, 2018). Since then, there has been a growing voice of concern about overtourism related to the environmental issues such as global warming as the number of tourists has increased due to the economic development and the advancement of the aviation industry. The extent of the political, economic, social, cultural and environmental impacts of the tourism business in different countries will vary, and interests in each industry are intertwined, making it difficult to state the importance of tourism in each country. However, it is hard to deny that Jeju is the largest tourist’s destinations in Korea that needs universal discussion with environmental concern (Chang et al., 2019; Choi et al., 2019). Recently, moreover, the damage caused by marine plastic waste has been frequently reported in the press and media in Jeju province (Chon & Choi, 2019; Park, 2019). This is the time when national awareness such as tourism subjects (local residents) and objects (tourists), as well as tourist companies and organizations is needed before it gets too late.
The purpose of this study is to find out the solution of marine debris caused by the increase of resident population including tourists on Jeju Island from an environmental and tourism perspective. Deep learning, a kind of Artificial Intelligence-based technology, is a field of computer Machine Learning at the computer science department. An automated technology identifies the object of interest in the image by using the principle of probability to distinguish it from its background view. In other words, we needed only a small number of images in this study as an experiment, using the Convolutional Neural Network algorithm most commonly used for image classification techniques (LeCun et al., 1990; Srivastava et al., 2014; Lee, 2018; Park & Bae, 2019) and continuous upgraded active functions to identify images of dirty Jeju coast with marine debris. We tried to test the accuracy performance, then, we got the accuracy of test and verification data was 80% and 86% by logistic regression respectively. After analyzing the image data set consisting of 194 training data-20 verification data-10 test data.
As we discussed and performed the test, Deep Learning technology of reading and classifying images of photographs or videos will find a variety of applications for the Jeju tourism industry in the near future. Jeju Island is an environment where all slopes are close to the beautiful seaside. In particular, over 20 beaches along 258km throughout the island are adjacent to the walking trail called 'Olle', making it a popular spot for tourists regardless of the season. We expect that recent photos posted on social media by visitors (or nearby merchants) to a particular beach or sea side can be categorized by Deep Learning algorithm and used in an indicator app that shows the cleanliness of the Jeju coast. For businesses or management offices near the coast, they will try to remove waste (especially plastic marine debris) to get rid of the stigma of a less clean coast. In addition, it will provide tourists with real-time guidance on high-purity beaches, and it can be an application that invites tourists to participate in reducing marine debris. On the other hand, if this function is combined with a closed-circuit camera (CCTV) on the coast, it will be possible that the real-time image transmission and classification model can be used to create a meaningful task of 'maintaining and managing the clean coast on Jeju Island'.
Although the CCI (Clean Cost Index) has classified and scaled coastal cleanliness in five grades (Alkalay et al., 2007), measuring plastic waste on a wide range of shores in real time is a time-and labor-intensive task. On the other hand, image classification using Deep Learning technology will be able to develop into an app that anyone can participate by taking a picture/video and uploading it in real time as well as utilizing devices such as smartphones and CCTVs. Until recently, most of the studies that applied CNN algorithms to marine debris detection were carried by researchers from earth or environmental science and engineering department, and using the UAV(Unmanned Aerial Vehicle), known as drones (Tran, 2018; Bak et al., 2019; Fallati et al. , 2019; Fulton et al., 2019).
It is worthwhile that this study may be the very first door opener to propose a new way to study Jeju tourism by suggesting feasible ideas to reduce marine debris through an experiment using Deep Learning algorithms that can anticipate accuracy even with small image datasets using Transfer Learning (Lee et al., 2019) and Xception Model (Chollet, 2017). If upgraded functions with stronger accuracy is developed in the future, it is expected that the advancement of Deep Learning-based CCI (Clean Cost Index) will enable to utilize in the market. In addition, we would like to reiterate that we need endless efforts to understand and develop Deep Learning algorithms, since there are no existing studies comparable to this study in practical tourism dimension.
[Table 1:] Clean-Cost Index