Obtaining reference’s topic congruity in indonesian publications using machine learning approach
Subroto I.M.I., Haviana S.F.C.
Abstract
There are some criteria on how an article is categorized as a good article for publications. It could depend on some aspect like formatting and clarity, but mainly it depends on how the content of the article is constructed. The consistency of the topic that the article was written could show us how the authors construct the main idea in the article content. One indication that shows this consistency is congruity in the article’s topic and the topic of literature or reference cited in the document listed in the bibliography. This works attempting to automate the topic detection on the article’s references then obtain the congruity to the article title’s topic through metadata extraction and text classification. This is done by extracting metadata of an article file to obtain all possible reference title using GROBID than classify the topic using a supervised classification model. We found that some refinements in the whole approach should be considered in the next step of this work.
ParsCit: An open-source CRF reference string parsing package
Councill I.G., Kan M.-Y., Lee Giles C.
Machine Learning vs. Rules and Out-of-the-Box vs. Retrained: An Evaluation of OpenSource Bibliographic Reference and Citation Parsers
Beel J., Collins A., Sheridan P., Tkaczyk D.
CERMINE: Automatic extraction of structured metadata from scientific literature
Bolikowski L., Dendek P.J., Fedoryszak M., Szostek P., Tkaczyk D.
GROBID: Combining automatic bibliographic data recognition and term extraction for scholarship publications
Lopez P.
Evaluation of header metadata extraction approaches and tools for scientific PDF documents
Beel J., Breitinger C., Gipp B., Lipinski M., Yao K.
Automatic extraction of titles from general documents using machine learning
Cao Y., Hu Y., Li H., Meyerzon D., Teng L., Zheng Q.
Information extraction from research papers by data integration and data validation from multiple header extraction sources
Latif S., Saleem O.
Reference metadata extraction from scientific papers
Guo Z., Jin H.
Accurate information extraction from research papers using conditional random fields
McCallum A., Peng F.
Metadata extraction from bibliographies using bigram HMM
Deng Z., Yang D., Yin P., Zhang M.