Đang Thực Hiện

Reinforcement Learning problem

Reinforcement learning agent with two actions (a1,a2) and three states (S1,S2,S3).

After a period interacting with the environment we have the following values of the Q function:

Q1(S1,a1) = -2

Q2(S1,a2) = -6

Q3(S2,a1) = -4

Q4(S2,a2) = -2

Q5(S3,a1) = -4

Q6(S3,a2) = -2

Now the agent is in state S2 and he choses the action a1 with reward -1.

Consider he stays in S2, what will be the chance the a1 action to be chosen again?

ε=δ=0,1 and discount factor γ =0.9

I think we have to use temporal difference learning

Kỹ năng: Thuật toán

Xem thêm: what algorithm, problem algorithm, algorithm problem, algorithm function, a1, reinforcement learning, learning, learning c, interacting, learning algorithm, vassilito, s1, reinforcement, state algorithm, site map problem, please give chance design website, joomla template problem, problem megavideo player, joomla ie6 compatible ie7 problem, problem datadir check sure exists writeable, math problem find range, problem send mail php spam, magento shopping cart problem, centos free memory problem

Về Bên Thuê:
( 7 nhận xét ) Athens, Greece

Mã Dự Án: #1684809

Đã trao cho:


Please read PM

$70 USD trong 1 ngày
(0 Đánh Giá)

2 freelancer đang chào giá trung bình $85 cho công việc này


Hi, Lets do this.

$100 USD trong 2 ngày
(18 Đánh Giá)