SOLUTION: In a poll conducted prior to a certain election, 37% of respondents said they preferred candidate 𝐴, 33% said they preferred candidate 𝐵, and 30% said they preferred candidat

Click here to see ALL problems on Probability-and-statistics

Question 1163991: In a poll conducted prior to a certain election, 37% of respondents said they preferred candidate 𝐴, 33% said they preferred candidate 𝐵, and 30% said they preferred candidate 𝐶. At least how many people would have to be surveyed for you to be confident that candidate 𝐴 is truly in the lead, candidate 𝐵 is truly in second place, and candidate 𝐶 is truly in third place?
𝑛>____?
(Enter a number only.)
Answer by CPhill(2264) (Show Source):
You can put this solution on YOUR website!
```python?code_reference&code_event_index=3
import numpy as np
from scipy.stats import norm
def find_n_multinomial_order(p, conf=0.95):
p_a, p_b, p_c = p

# Hypothesis: pi_a > pi_b AND pi_b > pi_c
# We use the Wald-type approach for differences:
# d1 = p_a - p_b (must be > 0)
# d2 = p_b - p_c (must be > 0)

# Variance of p_i - p_j is [p_i(1-p_i) + p_j(1-p_j) + 2*p_i*p_j] / n
# Note: cov(p_i, p_j) = -p_i * p_j / n

z = norm.ppf(1 - (1 - conf)/2) # Standard two-tailed Z for 95% confidence

# We want lower bound of CI for p_a - p_b to be > 0:
# (p_a - p_b) - z * sqrt([p_a(1-p_a) + p_b(1-p_b) + 2*p_a*p_b] / n) > 0
# n > z^2 * [p_a(1-p_a) + p_b(1-p_b) + 2*p_a*p_b] / (p_a - p_b)^2

var_ab = p_a*(1-p_a) + p_b*(1-p_b) + 2*p_a*p_b
n1 = (z**2 * var_ab) / (p_a - p_b)**2

# We want lower bound of CI for p_b - p_c to be > 0:
var_bc = p_b*(1-p_b) + p_c*(1-p_c) + 2*p_b*p_c
n2 = (z**2 * var_bc) / (p_b - p_c)**2

return n1, n2
p = [0.37, 0.33, 0.30]
n1, n2 = find_n_multinomial_order(p)
print(f"95% Confidence: n1={n1}, n2={n2}")
# Let's check 90% as well
n1_90, n2_90 = find_n_multinomial_order(p, 0.90)
print(f"90% Confidence: n1={n1_90}, n2={n2_90}")

```
```text?code_stdout&code_event_index=3
95% Confidence: n1=1676.7967752329873, n2=2685.1797156651887
90% Confidence: n1=1180.9697177126488, n2=1891.17487441269
```
To determine the minimum sample size ($n$) required to be confident that the candidates are truly in the order $A > B > C$, we need to ensure that the differences between their support levels are **statistically significant**.
In a multinomial distribution (where respondents choose between three categories), the variance of the difference between two proportions $p_i$ and $p_j$ is given by:
$$\text{Var}(p_i - p_j) = \frac{p_i(1-p_i) + p_j(1-p_j) + 2p_ip_j}{n}$$
To be confident in the ranking, we must satisfy two conditions simultaneously:
1. **Condition 1 ($A > B$):** The margin of error for the difference $p_A - p_B$ (which is $0.04$) must be small enough that the lower bound of the confidence interval is greater than $0$.
2. **Condition 2 ($B > C$):** The margin of error for the difference $p_B - p_C$ (which is $0.03$) must be small enough that the lower bound of the confidence interval is greater than $0$.
Using a standard **95% confidence level** ($Z \approx 1.96$):
* For $A$ vs $B$: $n > \frac{1.96^2 \times (0.37 \cdot 0.63 + 0.33 \cdot 0.67 + 2 \cdot 0.37 \cdot 0.33)}{0.04^2} \approx 1,677$
* For $B$ vs $C$: $n > \frac{1.96^2 \times (0.33 \cdot 0.67 + 0.30 \cdot 0.70 + 2 \cdot 0.33 \cdot 0.30)}{0.03^2} \approx 2,685$
The more restrictive condition is the smaller gap between candidate $B$ and candidate $C$. Therefore, to be confident in the entire 1-2-3 ranking, you would need a sample size larger than the highest calculated threshold.
2685