Obtaining reference’s topic congruity in indonesian publications using machine learning approach

Haviana S.F.C., Subroto I.M.I.

Abstract

There are some criteria on how an article is categorized as a good article for publications. It could depend on some aspect like formatting and clarity, but mainly it depends on how the content of the article is constructed. The consistency of the topic that the article was written could show us how the authors construct the main idea in the article content. One indication that shows this consistency is congruity in the article’s topic and the topic of literature or reference cited in the document listed in the bibliography. This works attempting to automate the topic detection on the article’s references then obtain the congruity to the article title’s topic through metadata extraction and text classification. This is done by extracting metadata of an article file to obtain all possible reference title using GROBID than classify the topic using a supervised classification model. We found that some refinements in the whole approach should be considered in the next step of this work.

Journal
International Conference on Electrical Engineering Computer Science and Informatics Eecsi
Page Range
428-431
Volume
Issue Number
Publication date
2019
Total citations

References 10

Cited By 2

Councill I.G., Lee Giles C., Kan M.-Y.

ParsCit: An open-source CRF reference string parsing package

Tkaczyk D., Collins A., Sheridan P., Beel J.

Machine Learning vs. Rules and Out-of-the-Box vs. Retrained: An Evaluation of OpenSource Bibliographic Reference and Citation Parsers

Tkaczyk D., Szostek P., Fedoryszak M., Dendek P.J., Bolikowski L.

CERMINE: Automatic extraction of structured metadata from scientific literature

Lopez P.

GROBID: Combining automatic bibliographic data recognition and term extraction for scholarship publications

Lipinski M., Yao K., Breitinger C., Beel J., Gipp B.

Evaluation of header metadata extraction approaches and tools for scientific PDF documents

Hu Y., Zheng Q., Li H., Cao Y., Teng L., Meyerzon D.

Automatic extraction of titles from general documents using machine learning

Saleem O., Latif S.

Information extraction from research papers by data integration and data validation from multiple header extraction sources

Guo Z., Jin H.

Reference metadata extraction from scientific papers

Peng F., McCallum A.

Accurate information extraction from research papers using conditional random fields

Yin P., Zhang M., Deng Z., Yang D.

Metadata extraction from bibliographies using bigram HMM

Ï