Analyzing and quantifying unstructured data, such as text, is the core of natural language processing. In this short video, director of data science, Max Margenot explains how to preprocess a text document using tokenization and stemming to create a bag of words for use in whatever sort of model you want, including sentiment models.
Learn more by subscribing to our Quantopian Channel to access all of our videos.
As always, if there are any topics you would like us to focus on for future videos, please comment below or send us a quick note at [email protected].