Current Location: > Detailed Browse

Identifying Feature Words Based on Abstracts and Citation Text Corpus of Breakthrough Research postprint

请选择邀稿期刊:
Abstract: [Purpose/significance] Based on the author's descriptive evaluation of his research and the critical citations of later researchers, the abstract and citation corpus of the breakthrough research are used to extract the feature words. Feature words can be used to understand the abstract and citation corpus features of the breakthrough research and contribute to the identification of breakthrough research. [Method/process] Key documents selected by Science as "Breakthrough of the Year" and "key publications" of Nobel Prize winners were selected as breakthrough research corpus data. Feature words were extracted by integrating abstracts and citation corpus of the paper. In the feature word extraction, the Stanford CoreNlp tool was used to perform word frequency statistics on the corpus, and the feature words were filtered in combination with expert opinions. Then we used the semantic relationship of medical texts to semantically expand feature words, which were used as the seed words. Finally, the retrieval and recognition effects of the abstract and citation feature words were further compared by the recall rate and the precision rate. [Result/conclusion] In the breakthrough research corpus, we selected 8 feature tokens of abstract corpora and 8 feature tokens of citation corpora. In the retrieval and recognition of feature words, the recall rate of the extended feature words of abstracts and citations is the highest, the precision of citation feature words is the highest. The comprehensive effect of the recall rate and precision of citation expansion feature words are better.

Version History

[V1] 2023-04-01 16:15:55 ChinaXiv:202304.00214V1 Download
Download
Preview
License Information
metrics index
  •  Hits1046
  •  Downloads584
Comment
Share