Vitamin 10 | Data 102

.pdf

School

University of California, Berkeley *

*We aren’t endorsed by this school

Course

102

Subject

Computer Science

Date

May 8, 2024

Type

pdf

Pages

Uploaded by MegaWorld8301 on coursehero.com

5/6/24, 7 : 57 PM Submit Vitamin 10 | Gradescope Page 1 of 11 https://www.gradescope.com/courses/742764/assignments/4324403/submissions/new 10/10 Questions Answered Saved on Apr 17 at 1:22 PM Vitamin 10 Q1 1 Point Calculate the Hoe ff ding bound for a variable which is the sample mean of 10 random variables which are each bounded between 3 and 8. In other words, compute a bound for the following probability where is the expected value of . Explanation Hoe ff ding’s inequality states . Remember that is the sample mean of 10 variables which are bounded on the interval . Therefore, . Then, we can rewrite the probability as which is a direct expression of Hoe ff ding's inequality where . This is Z P ( Z − μ ≥ t ) μ Z 10 10 E [ X ] i 25 2 1 t >=0 min e 10 t E [ e ] t ( Z − μ ) exp − ( 5 4 t 2 ) P ( X − E [ X ]) ≥ ϵ ≤ ( n 1 ∑ i =1 n i i ) exp − ( ( b − a ) 2 2 n ϵ 2 ) Z X i [ a , b ] = [3, 8] Z = X 10 1 ∑ i =1 10 i P ( Z − μ ≥ t ) P ( X − E [ X ]) ≥ t ( n 1 ∑ i =1 10 i i ) n = 10 2 ⋅ 10 t 2

5/6/24, 7 : 57 PM Submit Vitamin 10 | Gradescope Page 2 of 11 https://www.gradescope.com/courses/742764/assignments/4324403/submissions/new then upper bounded by . Save Answer Last saved on Apr 17 at 9:16 AM Q2 Bandits 2 Points Let be the set of arms in the setting of the bandits problem. For each arm , let the random variable be the random reward received for pulling arm at time and let be the expected reward from pulling arm . Finally, let the random variable be the arm pulled at time (so is the reward received at time ). All expectations (e.g., , ) are taken over the randomness in the payouts as well as the arm pulled at each timestep. Q2.1 Regret 1 Point Select all of the definitions which are true. exp − = ( (8 − 3) 2 2 ⋅ 10 t 2 ) exp( − = ( 5 2 20 t 2 ) exp − ( 5 4 t 2 ) A a ∈ A X i a a i μ = a E[ X ] i a a A i i X i A i i E[ X ] i a E[ X ] i A i X i a A i The regret is . R ( t ) X − X i =1 ∑ t ( a ∈ A max i a i A i ) The regret is . R ( t ) μ − X i =1 ∑ t ( a ∈ A max a i A i ) The regret is . R ( t ) X − E[ X ] i =1 ∑ t ( a ∈ A max i a i A i )

5/6/24, 7 : 57 PM Submit Vitamin 10 | Gradescope Page 3 of 11 https://www.gradescope.com/courses/742764/assignments/4324403/submissions/new Explanation This is by the definition of regret. Simplifying gets the second correct option. Save Answer Last saved on Apr 17 at 9:43 AM Q2.2 Expected Regret and Pseudo-Regret 1 Point Select all of the definitions which are true. i =1 The regret is . R ( t ) μ − E[ X ] i =1 ∑ t ( a ∈ A max a i A i ) The regret is R ( t ) t μ − a ∈ A max a X i =1 ∑ t i A i The regret is R ( t ) t μ − a ∈ A max a E[ X ] i =1 ∑ t i A i μ = i =1 ∑ t a ∈ A max a t μ a ∈ A max a The expected regret is the expectation of the regret over the randomness in the payout, conditioned on the sequence of choices made. The expected regret is the expectation of the regret over the randomness in the payout and the sequence of choices made. X − X ∑ t ( a A )

5/6/24, 7 : 57 PM Submit Vitamin 10 | Gradescope Page 4 of 11 https://www.gradescope.com/courses/742764/assignments/4324403/submissions/new Explanation By the definitions of expected regret and pseudo-regret. Pseudo-regret is conditioned on the sequence of choices made, so it is , where is the arm pulled at timestep . The expected regret is . X − X i =1 ∑ ( a ∈ A max i a i A i ) The expected regret is . X − E[ X ] i =1 ∑ t ( a ∈ A max i a i A i ) The expected regret is μ − E[ X ] i =1 ∑ t ( a ∈ A max a i A i ) The pseudo-regret is the expectation of the regret over the randomness in the payout, conditioned on the sequence of choices made. The pseudo-regret is the expectation of the regret over the randomness in the payout and the sequence of choices made. The pseudo-regret is . X − X i =1 ∑ t ( a ∈ A max i a i A i ) The pseudo-regret is . X − E[ X ] i =1 ∑ t ( a ∈ A max i a i A i ) The pseudo-regret is μ − E[ X ] i =1 ∑ t ( a ∈ A max a i A i ) μ − μ i =1 ∑ t ( a ∈ A max a a i ) a i i

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version