Đang Thực Hiện

Monte Carlo Algorithm

Consider a world with grid 2x2 ( see attachment)

The cells S1, S2, S3, S4 are the states.

In each state the agent can choose one of the following actions: up, down, left, right.

The S1 state is the terminal state. In any other state the agent is moving to the next cell depending on the action.

For example: we are in the S3 and we choose the action ''Right''. Then the agent moves to S4 with probability 1 and reward -1.

In case of the action selected drives the agent outside the grid then it will hit to a wall and will move to the opposite state with reward -2. For example. At S4 we want to go right, will result the agent to move left to S3.

We consider initially that Q(S,a) is 0 for every S,a.

Monte Carlo algorithm for every visit with exploring starts for an episode of 3 steps.

What will be the policy of the agent after the episode and why?

Kỹ năng: Thuật toán

Xem thêm: what's an algorithm, what's algorithm, what is the algorithm, what is a probability, what is an algorithm, what is algorithm, what is a algorithm, what an algorithm, what algorithm, the algorithm is, example of an algorithm, example of algorithm, example of a algorithm, example for algorithm, example algorithm, can 0 be a probability, an example of an algorithm, an algorithm, algorithm world, algorithm's, algorithm is, algorithm example, s4, monte, episode

Về Bên Thuê:
( 7 nhận xét ) Athens, Greece

Mã Dự Án: #1085542

Đã trao cho:

dobreiiita

Hi,Please check your inbox,Thanks.

$35 USD trong 0 ngày
(18 Đánh Giá)
4.8

3 freelancer đang chào giá trung bình $35 cho công việc này

ronobir1

Hellow friend Please check PM

$40 USD trong 1 ngày
(6 Đánh Giá)
3.4
topcoder0

I can do it

$30 USD trong 1 ngày
(0 Đánh Giá)
0.0