Obtaining reference’s topic congruity in indonesian publications using machine learning approach

Subroto I.M.I., Haviana S.F.C.

Abstract

There are some criteria on how an article is categorized as a good article for publications. It could depend on some aspect like formatting and clarity, but mainly it depends on how the content of the article is constructed. The consistency of the topic that the article was written could show us how the authors construct the main idea in the article content. One indication that shows this consistency is congruity in the article’s topic and the topic of literature or reference cited in the document listed in the bibliography. This works attempting to automate the topic detection on the article’s references then obtain the congruity to the article title’s topic through metadata extraction and text classification. This is done by extracting metadata of an article file to obtain all possible reference title using GROBID than classify the topic using a supervised classification model. We found that some refinements in the whole approach should be considered in the next step of this work.

Journal
International Conference on Electrical Engineering Computer Science and Informatics Eecsi
Page Range
428-431
Publication date
2019
Total citations
ParsCit: An open-source CRF reference string parsing package

Councill I.G., Kan M.-Y., Lee Giles C.

Machine Learning vs. Rules and Out-of-the-Box vs. Retrained: An Evaluation of OpenSource Bibliographic Reference and Citation Parsers

Beel J., Collins A., Sheridan P., Tkaczyk D.

CERMINE: Automatic extraction of structured metadata from scientific literature

Bolikowski L., Dendek P.J., Fedoryszak M., Szostek P., Tkaczyk D.

GROBID: Combining automatic bibliographic data recognition and term extraction for scholarship publications

Lopez P.

Evaluation of header metadata extraction approaches and tools for scientific PDF documents

Beel J., Breitinger C., Gipp B., Lipinski M., Yao K.

Automatic extraction of titles from general documents using machine learning

Cao Y., Hu Y., Li H., Meyerzon D., Teng L., Zheng Q.

Information extraction from research papers by data integration and data validation from multiple header extraction sources

Latif S., Saleem O.

Reference metadata extraction from scientific papers

Guo Z., Jin H.

Accurate information extraction from research papers using conditional random fields

McCallum A., Peng F.

Metadata extraction from bibliographies using bigram HMM

Deng Z., Yang D., Yin P., Zhang M.