Data Science, Bernardo Najlis

Named Entity Recognition from Online News

Posted on May 27, 2018 by bnajlis

This is a project from the Natural Language Processing course in my Masters in Data Science program. The project aimed to create a series of models for the extraction of Named Entities (People, Locations, Organizations, Dates) from news headlines obtained online. We created two models: a traditional Natural Processing Language Model using Maximum Entropy , and a Deep Neural Network Model using pre-trained word embeddings. Accuracy results of both models show similar performance, but the requirements and limitations of both models are different and can help determine what type of model is best suited for each specific use case.

The final conclusion is that, as the Deep Learning Model is less dependent on specific language grammar rules, it is more generalizable (given embeddings and some labeled corpora is provided in any language) whereas the Maximum Entropy model will perform poorly on an language where there is no Domain Knowledge to create the required features.

This is our deck for the final presentation:

This is our final report / paper with our results and conclusion:

All source code for this project can be found in this GitHub repository: https://github.com/bnajlis/named_entity_recognition