Machine Learning and Natural Language Processing to Predict Stock Prices$5,100,000,000,000 (or 5.1 trillion dollars) is the total amount of currency traded on Wall Street on a day to day basis.

(Triennial Central Bank Survey, 2016) And that is only one of many different financial instruments that could globally be traded or invested in across various financial markets. With the large number of investments made on the financial markets every day, it becomes crucial for anyone who wants invest on the financial market to be very efficient in their analysis of the any variable data. According to the efficient market theory, the price of the stocks at any point in time is a reflection of all publicly available information at that point in time. (Malkiel, 1989) If this theory would indeed hold, it would be impossible to predict stock prices. Research by Chan (2003), however, suggests that some stock prices are an underreaction to news.   Analyzing data rapidly and efficiently, and thus obtaining available information, enables investors to trade more effectively than their competitors.

This is because based on this data, they can make predictions about stock prices, which enables the effective trading. Analyzing the relationships between text data and stock prices using natural language processing (from here on NLP) can be an important tool to anticipate stock prices. Predictive and analytical methods, such as machine learning and NLP based on mathematical models have been used for a long time now. (e.g. West & Cho, 1995)  Improved machine learning and NLP techniques has helped traders to better analyze all the available real time as well as historical data.  Machine learning can present a way to more efficiently analyze data than normal statistical analysis methods.

In this paper, we discuss some of the machine learning and NLP techniques that has been used to provide the traders with better analysis of future value of financial instruments, specifically stocks.

