1.1 Associative Concept Dictionary
Large scale concept dictionaries such as
Our proposed Japanese concept dictionary, which is based on the results of large-scale human association experiments, is a model of the hierarchical structure of human concepts and the semantic relations among concepts. In comparison with
Text summarization methods generally require deep semantic processing and background knowledge in order to approach the level of human results. Much of the previous work has used superficial clues (Watanabe, 1996) and ad hoc heuristics. In summarizing texts, the word frequency approach or connectionist approach (Hashida
The Contextual Semantic Network is used in our research to calculate the importance scores for sentences given in the input document. The results are compared with those from human summarization experiments and those from conventional methods using word frequencies. The comparison shows that the system does well in summarization tasks.
Word sense disambiguation is one of the most difficult problems in the field of NLP because it needs contextual meanings. A lot of previous work on such disambiguation used the co-occurrence of words in context. Several machine learning algorithms, such as the Naive Bayes method or the Support Vector Machine (Murata
Human-beings can assign an appropriate word sense to anambiguous word in a sentence based on the words that follow it. We propose a Dynamic Contextual Network Model for word sense disambiguation, where the Contextual Semantic Network has a structure that dynamically changes depending on each input word sequentially in an input sentence. In this model, the network architecture is based on the proposed concept dictionary, which includes the semantic relations among concepts/words, and the relations can be represented using the quantitative distances between them. In the word sense disambiguation process, an interactive activation model is used to identify a word’s meaning on the Contextual Semantic Network.
2. Associative Concept Dictionary
Background knowledge is crucial for computers to better understand contextual information as well as the syntactic or shallow semantic information from texts. The Associative Concept Dictionary (hereafter ACD) was created based on the results of large-scale online association experiments in which many participants simultaneously used the campus network at Shonan Fujisawa Campus of Keio University. The details of the ACD were described in the Journal of NLP (Okamoto & Ishizaki, 2001), which was written in Japanese, and the essential parts of the paper are described below.
In the experiments, the stimulus words that were used were fundamental nouns used in Japanese elementary school textbooks in which homonyms were excluded. Fifty participants focused on each stimulus word. In the experiment, a participant was requested to associate from ten stimulus words with a given set of semantic relations, such as hypernym, hyponym, part/material, attribute, synonym, action, and situation, and to input the associated words using a Japanese input system.
All of the associated words in the ACD have distances to the stimulus words calculated by using the following linear programming method. The distance D(x,y) between concepts, x and y, is shown by means of the following equation (Okamoto & Ishizaki, 2001):
where
The ACD is built using quantified distances and is organized in a hierarchical structure in terms of hypernyms and hyponyms. The attribute information is used to explain the features of a given word. It also includes the action and situation concepts related to the stimulus words. It contains 1100 stimulus words and the total number of the associated words is about 280,000. It has about 64,000 different associated words when the overlapping words are not counted. The experiments have been going on for more than ten years. The total number of participants for the experiments has been increasing every year and is currently more than 5000 people. Figure 1 shows a description for the stimulus word “魚:fish” in the ACD. The second column represents the semantic relations given to the participants. The third column shows the associated words for “fish”. The numerals in <1> represent the frequencies of participants who gave the same associated word. The numerals in <2> are the averages of the participants’ association order, where 1.0 is the first place of the association. The numerals in <3> expressed the calculated conceptual distances using equation (1). We have distributed at no cost a CD-ROM containing the ACD to more than fifty research organizations who sent email requests to Prof. Ishizaki, one of the authors and agreed to a simple contract.
We compared the associative concept dictionary with
3. Summarization method using the Associative Concept Dictionary
Text summarization has conventionally been accomplished by extracting important sentences from a document based on various superficial cues. For example, in such conventional methods, the frequency of the occurrence of a given word in a document has often been used in calculating the importance scores of sentences. In this research, the Contextual Semantic Network (hereafter CSN) is developed using the ACD. A spreading activation model is used to calculate a word’s score on the CSN, where the activation values on the network are calculated using quantitative distances among the concepts.
3.1 Extraction of important sentences based on word scores
3.1.1 Important sentence extraction using CSN
In the proposed summarization method, the CSN is used to calculate the importance scores of the sentences in the input document. One cannot use only the information obtained from the word co-occurrence inits context, but must also use that from a comparatively rich network with quantitative distances and contextual information for extracting important sentences.
The following steps are the procedural details for the CNS construction according to an input document for summarization tasks (Figure 2). These steps are used again for word sense disambiguation in Section 4.1.1.
For example, let the input sentences be “ガラパゴスにはゾウガメがい る.その亀は島の中を歩いている: There is a giant tortoise in Galapagos. This turtle is walking around the island.” In this text, “ゾウガメ: A giant tortoise” is a hyponym of “亀: a turtle”. The importance score of “a turtle” is calculated using the distance between “turtle” and “giant tortoise” in the CSN. “ガラパゴス: Galapagos” is an island and has a situation relation with “ゾウガメ: a giant tortoise”. “歩く: Walk” is a verb concept of “a giant tortoise” and also that of “a turtle”. We can construct an intra-document network since all the words are included in the ACD. In addition, some hypernym words are added in the CSN such as “生物: living-thing”, which is a hypernym of “a turtle.”
The activation value of each node is calculated using a spreading activation method on the CSN. The initial values (
where the calculation is repeated until Σ
3.1.2 Extracting important sentences based on word frequency
We use the following important sentence extraction methods to compare them with our proposed methods. The input texts are morphologically analyzed using the Japanese dependency structure analysis software Cabocha. The importance scores are calculated using the root morphemes of the nouns, adjectives, adverbs, and verbs. Pronouns and certain nouns (number, counter suffix, and so on) are not included in the calculations.
In this method, the word
where
3.1.3 Important sentence extraction by human participants
We used eight documents extracted from Japanese elementary school textbooks because the ACD is constructed of the basic nouns in these textbooks. These documents contain about 17 sentences on average (range 10-23). All the documents focus on a single topic in natural science and consist of titles and documents.
We carried out an experiment with forty participants all of whom were native Japanese speakers and students from Keio University. We requested the participants to choose the five most important sentences from the document and to arrange them in the order of their importance in the document. The importance scores were from 5 to 1. An importance score of 5 is given to the most important sentence and a score of 1 to the fifth most important sentence. Next, the important sentences are sorted on the basis of the sum of the importance scores given to each sentence by all the participants.
Next, we calculated the degree of coincidence using Kendall’s coefficients of concordance (
Kendall’s coefficients of concordance (
[Table 1.] Kendall’s coefficients of concordance (W) for sentence rankings
Kendall’s coefficients of concordance (W) for sentence rankings
3.2 Evaluations and discussion
3.2.1 Evaluation of our method and conventional methods
In order to show the effectiveness of the ranks of the sentences obtained by our system, we compared our results with those extracted by the participants and those automatically obtained using the conventional methods based on the word frequencies. First, importance scores from 10 to 6 were given respectively to the top five sentences that the participants chose. For a sentence chosen both by the participants and by one of the other methods, the value of correspondence (C) is calculated using the following equation (Okamoto & Ishizaki, 2003).
where
Table 2 lists a comparison of our method with the word frequency methods. The data in the “Our method” rows in Table 2 was obtained by comparing the top five sentences that our method chose using CSN with those chosen by the participants. The data in the “Word frequency method” rows in Table 2 was obtained by comparing the top five sentences using the word frequencies and those chosen by the participants. The CS in the table is the number of sentences from the top five sentences extracted using the automatic methods that correspond to the sentences extracted by the participants. In Table 2, the C for our method was high. The results show that this method, using the CSN, was more effective than the word frequency method. The Cs of some documents (D3, D4, and D7) are relatively lower than the ones from other documents. These results correspond with the results of Kendall’s coefficients of concordance (W), which were also relatively low (see Table 1).
Values of correspondence (C) and correspondence of sentences (CS) extracted by our method and word frequency methods
4. Word sense disambiguation using the Associative Concept Dictionary
Numerous Japanese ideographs have several different meanings and pronunciations. For example, the ideograph “額,” which has two pronunciations, /hitai/ and /gaku/, in which the former represents “the forehead of the human body” and the latter “the amount of money” or “a frame”. In the association experiments, such ideographs were given as stimulus words followed by their pronunciations to avoid ambiguities.
4.1 Homographic ideogram understanding by a Dynamic Contextual
Network Model
A Dynamic Contextual Network Model (hereafter DCNM) disambiguates word senses by means of a spreading activation model in the CSN. In addition, this network model is not static but dynamic, where the network structure changes depending on the context of the words in the sentences. In the proposed model, when the network is not yet rich enough to disambiguate word senses and when the model cannot decide the appropriate meaning of the homographic ideograms, it will get another input word from the input sentence sequentially and add it to the network to enhance it. Using the dynamic model we can assign appropriate word senses to ambiguous words by comparing the activation values and choosing the best one for the homographic ideogram meanings.
4.1.1 Construction of CSN for homographic ideograph
The CSN is constructed starting from the ambiguous words and a dependency structure of the input text. Then, concepts associated from the words are added to the CNS using the semantic relations and the quantitative distances in the ACD.The CSN is constructed using the ACD for word sense disambiguation. The following steps (C and D) are added to steps A and B for the CSN construction using the content words in Section 3.1.1.
Figure 3 shows an example of a CSN. The square shape nodes are the input words in the sentence for the two meanings of the homographic ideographs and are connected with an inhibitory link. The oval shape nodes are added using the ACD and are connected with the excitatory links. The dotted lines are the inhibitory links. The associated word nodes of a homographic ideograph are connected with other homographic ideographs using the inhibitory links. “Museum” is a situational concept of “frame”. “Picture” and “face” do not exist in the sentence but are obtained from the ACD.
4.1.2 Activation value calculation in the Network
The activation value of each node is calculated based on the interactive activation model (McCelland & Rumelhart, 1981) on the CSN. We define the maximum activation level as
where
where the decay parameter
where
where
where m is the minimum activation level of the node and is set to
4.1.3 Enhancement of the Contextual Semantic Network
Human beings try to assign appropriate word senses to a given ambiguous word in a sentence using the contextual information located before the word; when they cannot disambiguate the word, they will use the words that follow the ambiguous word. In our model, if there is only a slight difference between the activation values of the two nodes for a given input word, it enhances the network by adding the words that follow it from the input sentence. Figure 4 shows an example of enhancing CSN depending on the words that follow it.
4.2 Simulation for WSD by the proposed method
Let us take as input the following Japanese sentence: “壁のピカソの絵の額 が落ちて,頭に当たって額から血が出た,” and its translation in English: “The frame of Picasso’s picture dropped from the wall, struck my head, and my forehead bled.”
In this sentence, “frame” and “forehead” are English expressions that correspond to the homographic ideograph “額” in Japanese. The input word order follows the Japanese order. At first, we construct the CSN based on the ACD for the first input word “wall”. Then, the nouns in the sentence, including the ambiguous words, are added sequentially to the network and all the nodes activate each other by the method outlined in Section 4.1.2.
Figure 5 shows the activation values of the homographic ideographs and the other input words in the simulation, where each input word spans twenty time cycles. The horizontal axis represents time, and the vertical axis represents the activation values. The words in the rectangles are the major words selected from the input sentence. For the homographic ideogram, “frame” has a bigger value than that of “forehead” (see broken-line circle in Figure. 5) in the first half of the sentence, so we can assign the appropriate pronunciation and meaning to “frame” as that from the Japanese ideograph “額.” However, since the homographic ideogram “forehead” has a bigger value than that of “frame” (see thick-line circle in Figure. 5) in the last half of the sentence, in this case we can assign “forehead” to the Japanese ideograph “額.”
In this article we have proposed a simulation model for word sense disambiguation and an application system for document summarization. Both use contextual information and the quantitative distances among the concepts in the ACD. The method for summarization shows that it is better than the conventional ones and the method for disambiguation shows that it can dynamically identify the meaning of ambiguous words according to the input words.
The ACD is currently a small size dictionary compared with other concept dictionaries. We will extend it to a large-scale dictionary by automatically extracting concepts from the corpora. This extension will be useful for higher level contextual understanding systems.
The evaluation of the word sense disambiguation method using DCNM is a future project from the NLP point of view. As a preliminary evaluation, we used the multinomial Naive Bayes text classification method. We have obtained success rates of roughly 90% for both methods. We need to increase the test data to conduct a more precise evaluation.
Human word sense disambiguation processes have been observed using MEG (Magnetoencephalogram) (Ihara