Đã hoàn thành

Reinforcement Learning problem

Reinforcement learning agent with two actions (a1,a2) and three states (S1,S2,S3).

After a period interacting with the environment we have the following values of the Q function:

Q1(S1,a1) = -2

Q2(S1,a2) = -6

Q3(S2,a1) = -4

Q4(S2,a2) = -2

Q5(S3,a1) = -4

Q6(S3,a2) = -2

Now the agent is in state S2 and he choses the action a1 with reward -1.

Consider he stays in S2, what will be the chance the a1 action to be chosen again?

ε=δ=0,1 and discount factor γ =0.9

I think we have to use temporal difference learning

Kĩ năng: Thuật toán

Xem nhiều hơn: what algorithm, problem algorithm, algorithm problem, algorithm function, a1, reinforcement learning, learning, learning c, interacting, learning algorithm, vassilito, s1, reinforcement, state algorithm, site map problem, please give chance design website, joomla template problem, problem megavideo player, joomla ie6 compatible ie7 problem, problem datadir check sure exists writeable

Về Bên Thuê:
( 7 nhận xét ) Athens, Greece

ID dự án: #1684809

Được trao cho:


Please read PM

$70 USD trong 1 ngày
(0 Đánh Giá)

2 freelancer đang chào giá trung bình $85 cho công việc này


Hi, Lets do this.

$100 USD trong 2 ngày
(18 Nhận xét)