CS代考程序代写 python algorithm MAFS 6010Y Assignment 1

MAFS 6010Y Assignment 1

MAFS 6010Y Assignment 1

Can we use the bandit algorithm to choose which stocks to invest? (30%)
Deadline: 2:00PM on 28 Feb
Submit by Canvas. Late submission will deduct 20%
files:
raw code (Python/.py file is preferred)
dataset (Necessary if you use real dataset)
a simple report (reward, used techniques, agent algorithm)

bandit algorithm
(try to train an agent driven by long-term rewards)
Problem setting:
Imagine you are in a casino facing multiple slot machines and each is configured with an unknown probability of how likely you can get a reward at one play. The question is: What is the best strategy to achieve highest long-term rewards?
Strategy
Stochastic bandits: Explore first, epsilon greedy
Bayesain bandits: Thompson Sampling
Contextual bandits:
Train your algo to maximum long-term rewards

Stock investment
Buy one or some financial products and try to make more profits
Investment:
Stock selection:
Follow the trend
Qualitative: company value, company culture , etc.
Quantitative: technical indicator (RSI, MA, crossover, etc.)
Direction and amount
Buy, sell or hold
what amount(weight [-1, 1])
Execution
Profit realization (closing)
Evaluation: return rate, Sharpe ratio, max dropdown, turnover rate…

In our case
To train an agent that can do investment like human being ( to simulate the thinking pattern when you want to do stock investment)
Let’s consider…
What is your investment goal: maximum your return rate or minimize max dropdown or anything that is reasonable [Reward]
What information can I(agent) can see to help make decision: the current price, economic indicator, my own trading history [State]
What action you can choose based on the information: 1/0, 1/-1/0, etc.
How to let the agent learn to make decision: algos, eplison-greedy, greedy, upper confidence bound (UCB), contextual bandit, etc.
How can we know the agent is an advanced investor: averaged rewards for each episode. [output]

To wrap up
Define your problem and fit it into the problem setting of bandit algo

Submit codes
That can output a reward chart
That can train a convergent algo
That can make me understand your code
That may have some innovation

/docProps/thumbnail.jpeg