Data Scientist - me@johnbrugman.com - (626) 825-2537
The past two decades have seen a meteoric rise of social media, particularly in allowing individuals the ability to reach ever-increasing numbers of peers with little to no cost. This amplification has significantly increased the voice of many who had previously been disenfranchised; however, it has also led to an unprecedented amount of opinion being portrayed as fact, or downright falsehoods being touted as gospel. By using NLP techniques, we were able to accurately predict which tweets for a recent desaster were likely to be true or false. We would not be able to know for sure without fact checking, and in this social meadia age, news gets spread much faster than can be checked. By modeling off of a database of previously labeled tweets, we can use context clues to see if the pattern of words is more truthful or not. This is especially helpful when there are conflicting reports, so we can get an idea of reliable new sources.
By using a Multinomial Naive Bayes Classifier, we modeled our database of text. Applying this model to our scraped tweets, we were able to get an accuracy score of 80%, with a precision of 82.2%. This was significantly better than we expected, but with proper implementation we are confident we can improve our accuracy. We also implemented a translator, and modeled in different languages as to help with global news, not just those English sources.