Đang Thực Hiện

Reinforcement Learning problem

Reinforcement learning agent with two actions (a1,a2) and three states (S1,S2,S3).

After a period interacting with the environment we have the following values of the Q function:

Q1(S1,a1) = -2

Q2(S1,a2) = -6

Q3(S2,a1) = -4

Q4(S2,a2) = -2

Q5(S3,a1) = -4

Q6(S3,a2) = -2

Now the agent is in state S2 and he choses the action a1 with reward -1.

Consider he stays in S2, what will be the chance the a1 action to be chosen again?

ε=δ=0,1 and discount factor γ =0.9

I think we have to use temporal difference learning

Kỹ năng: Thuật toán

Xem thêm: what algorithm, problem algorithm, algorithm function, a1, reinforcement learning, learning c, interacting, learning algorithm, s1, reinforcement, state algorithm, site map problem, joomla template problem, problem megavideo player, math problem find range, magento shopping cart problem, centos free memory problem

Về Bên Thuê:
( 7 nhận xét ) Athens, Greece

Mã Dự Án: #1684809

Đã trao cho:

conatus

Please read PM

$70 USD trong 1 ngày
(0 Đánh Giá)
1.7

2 freelancer đang chào giá trung bình $85 cho công việc này

dobreiiita

Hi, Lets do this.

$100 USD trong 2 ngày
(18 Đánh Giá)
4.5