Predict the outcome of EURO 2016 soccer matches for fun and profit
Your task is to design machine learning models to predict the outcome of EURO 2016 soccer matches. Briefly explain how you choose the features and the models and how accurate is the model.
The data is included in “data.csv” (we’ve also provided you the code “eurocup_predict.py” used to generate the data). In order to compute accuracy of your prediction, split the data into training part (games took place before 2016) and testing part (games took place after 2016).
HINT:
1. In “eurocup_predict.py” we provide some functions to parse the data from http://www.eloratings.net/ to a Pandas data frame with the following columns: Team1, Team2, Score1, Score2, ELO1, ELO2, Date. Try modify the code to include more features in the model.
Team1 | Team2 | Score1 | Score2 | ELO1 | ELO2 | Date |
England | Scotland | 4 | 2 | 1814 | 1786 | 1873-03-08 |
ELO1 and ELO2 is the Elo rating for Team1 and Team2 respectively. The Elo rating system is a method for calculating the relative skill levels of players in games such as chess (https://en.wikipedia.org/wiki/Elo_rating_system).
- Measure the outcome using 1 (win) and -1 (lose) and disregard the game in which both teams score the same.
- You may want to split one game into two training data points when you generate the feature matrix. For example, for the above game England (4) vs Scotland (2) consider including two symmetric instances with two outcomes (1 and -1).
In order to compute accuracy of your prediction, split the data into training part (games took place before 2016) and testing part (games took place after 2016).