Topic Modelling and Sentiment Analysis via News Headlines, NLP Methods on Australian Broadcasting Commission

Zaur Gouliev, Fernando Perez-Tellez

Abstract


The main aim of this paper is to provide a holistic overview, implementation and comparison of some of the main supervised and unsupervised machine learning methods that are used in natural language processing for extracting topics and sentiment from headlines. This paper employs supervised learning methods such as logistic regression, supper vector machine classifier (SVM) and unsupervised learning methods such as K-means clustering and Latent Dirichlet allocation (LDA). To demonstrate these NLP applications, an extensive dataset of one million news headlines is used provided online by the Australian Broadcasting Commission which contains 17 years of news headlines, which provides for rich analysis. Our results show that logistic regression based models which use lexicon-based emotion classifiers score very highly in accuracy for sentiment analysis, reaching 93\% and clustering-based techniques K-means scored 75\% for topic modelling. An detailed explanation of these methods, along with limitations, assumptions, ethical considerations and suggestions of future  work are discussed.

Keywords


News headlines, machine learning, natural language processing, sentiment analysis3

Full Text: PDF