The following MDP world consists of 5 states and 3 actions: (1, 1) (1, 2) Action: Exit - -10 Actions: down, right (2, 1) (2, 2) Action: Exit = -10 Actions: down, right (3, 1) Action: Exit = 10 When taking action down, it is successful with probability 0.7. with probability 0.2 you go right, and with probability 0.1 you stay in place. When taking action right, it is successful with probability 0.7, with probability 0.2 you go right, and with probability O.1 you stay in place. When taking action Exit, it is successful with probability 1.0. The only reward is when taking action Exit, and there is no discounting. Calculate the value of states using Value Iteration algorithm for required time step: Provide the value for State (1,2) at time step 4 (calculate to 3 decimal places): Va(1,2) =

The following MDP world consists of 5 states and 3 actions: (1, 1) (1, 2) Action: Exit - -10 Actions: down, right (2, 1) (2, 2) Action: Exit = -10 Actions: down, right (3, 1) Action: Exit = 10 When taking action down, it is successful with probability 0.7. with probability 0.2 you go right, and with probability 0.1 you stay in place. When taking action right, it is successful with probability 0.7, with probability 0.2 you go right, and with probability O.1 you stay in place. When taking action Exit, it is successful with probability 1.0. The only reward is when taking action Exit, and there is no discounting. Calculate the value of states using Value Iteration algorithm for required time step: Provide the value for State (1,2) at time step 4 (calculate to 3 decimal places): Va(1,2) =