1 Assessing Opinion Mining in Stock Trading Sathish Nagappan (srn), Govinda Dasu (gdasu) I. Introduction We hypothesize that money should go where the public wants and needs it to go, and the firm that is the best and fastest at determining these human demands will yield the highest percentage growth. To test this hypothesis we set up two predictors: (1) the first predictor considered only numerical data such as historical prices, EBITDA, and PE Ratio, and (2) the second considered human news and opinions on companies, their products, and their services. The first task was to derive an accurate first predictor. As our baseline we used SVR with RBF kernel, which led to SGD with various iterations to better approximate the RBF kernel. After implementing this by grouping all stocks from major indices, we realized that we should consider stocks individually and take into account time series. This resulted in the ARIMAX model with AIC backwards search selection (predictor 1). Next, we moved to predictor 2. We added NLP features for each company such as indicators of specific ngrams that give insight into the positivity of the stream of relevant text about a company’s products and services. Ultimately this led to ARIMAX with these NLP features and combination feature selection (predictor 2). This allowed us to compare the relative successes of the model with and without NLP features. II. Data and Cross Validation The numerical data was obtained from Bloomberg and the headline data from Factset. We retrieved a list of all stocks from the S&P 500, Russell 1000, and NASDAQ 100. Each training example was indexed by company ticker and date and had 28 features such as PE Ratio, EBITDA, price, and volatility. The target for each training example was the one day percent change in closing price. We retrieved 2M training examples from 2009 to present. Our method of evaluation comes from the concept of score defined as follows: core S = 1 − ∑ m i = 1 (y(i) y(i) ) / true − predicted 2 ∑ m i = 1 (y(i) y(i) )true − true_mean 2 Perfect prediction yields a score of 1. Less optimal predictions will be lower, even arbitrarily large negative numbers. We used a variant of holdout cross validation; we tested our models on the last 6 months and trained using the remainder of the data. Due to computational complexity, for our initial algorithm, we trained on the first 1.5 years of 20122013 and tested on the last 6 months, totaling 486k training examples. For our later algorithms, we trained on the first 3.5 years of 20092013 and tested on the last 6 months. Randomly selecting a subsample to cross validate would yield an unrealistic and unfair advantage since we would be using future