The following MDP world consists of 5 states and 3 actions: (1, 1) (1, 2) Action: Exit - -10 Actions: down, right (2, 1) (2, 2) Action: Exit = -10 Actions: down, right (3, 1) Action: Exit = 10 When taking action down, it is successful with probability 0.7. with probability 0.2 you go right, and with probability 0.1 you stay in place. When taking action right, it is successful with probability 0.7, with probability 0.2 you go right, and with probability O.1 you stay in place. When taking action Exit, it is successful with probability 1.0. The only reward is when taking action Exit, and there is no discounting. Calculate the value of states using Value Iteration algorithm for required time step: Provide the value for State (1,2) at time step 4 (calculate to 3 decimal places): Va(1,2) =

icon
Related questions
Question
The following MDP world consists of 5 states and 3 actions:
(1, 1)
(1, 2)
Action: Exit - -10
Actions: down, right
(2, 1)
(2, 2)
Action: Exit = -10
Actions: down, right
(3, 1)
Action: Exit = 10
When taking action down, it is successful with probability 0.7. with probability 0.2 you go right, and with probability 0.1 you stay in place.
When taking action right, it is successful with probability 0.7, with probability 0.2 you go right, and with probability O.1 you stay in place.
When taking action Exit, it is successful with probability 1.0.
The only reward is when taking action Exit, and there is no discounting.
Calculate the value of states using Value Iteration algorithm for required time step:
Provide the value for State (1,2) at time step 4 (calculate to 3 decimal places):
Va(1,2) =
Transcribed Image Text:The following MDP world consists of 5 states and 3 actions: (1, 1) (1, 2) Action: Exit - -10 Actions: down, right (2, 1) (2, 2) Action: Exit = -10 Actions: down, right (3, 1) Action: Exit = 10 When taking action down, it is successful with probability 0.7. with probability 0.2 you go right, and with probability 0.1 you stay in place. When taking action right, it is successful with probability 0.7, with probability 0.2 you go right, and with probability O.1 you stay in place. When taking action Exit, it is successful with probability 1.0. The only reward is when taking action Exit, and there is no discounting. Calculate the value of states using Value Iteration algorithm for required time step: Provide the value for State (1,2) at time step 4 (calculate to 3 decimal places): Va(1,2) =
Expert Solution
trending now

Trending now

This is a popular solution!

steps

Step by step

Solved in 3 steps

Blurred answer