NAMED ENTITY RECOGNITION (NER) FOR NEWS ARTICLES
Keywords:
Named Entity Recognition (NER), Conditional Random Fields (CRF), CoNLL-2003 Dataset, Information Extraction, Natural Language Processing (NLP), Feature EngineeringAbstract
Named Entity Recognition (NER) plays a pivotal role in automating the extraction and categorization of named entities from textual data, enabling efficient information retrieval and analysis across various domains. This paper presents a comprehensive study on NER techniques, focusing particularly on their application in news articles. The project employs Conditional Random Fields (CRF) as a discriminative probabilistic model for sequence labeling tasks, leveraging feature engineering and preprocessing steps for accurate entity recognition. The CoNLL-2003 dataset serves as the benchmark dataset for training and evaluating the CRF model, showcasing its performance in identifying entities such as persons, organizations, and locations.
References
Vychegzhanin, Sergey, and Evgeny Kotelnikov. "Comparison of named entity recognition tools applied to news articles." 2019 Ivannikov Ispras Open Conference (ISPRAS). IEEE, 2019.
Nadeau, David, and Satoshi Sekine. "A survey of named entity recognition and classification." Lingvisticae Investigationes 30.1 (2007): 3-26.
Sang, Erik F., and Fien De Meulder. "Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition." arXiv preprint cs/0306050 (2003).
Rodriquez, Kepa Joseba, et al. "Comparison of named entity recognition tools for raw OCR text." Konvens. 2012.
Linhares Pontes, Elvys, et al. "Impact of OCR quality on named entity linking." Digital Libraries at the Crossroads of Digital Information for the Future: 21st International Conference on Asia-Pacific Digital Libraries, ICADL 2019, Kuala Lumpur, Malaysia, November 4–7, 2019, Proceedings 21. Springer International Publishing, 2019.
Downloads
Published
Issue
Section
License
Copyright (c) 2024 Tejal Chavan, Seema Patil (Author)
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.