Question 575425
I'll assume that those five students (A,B,...E) got the other 42 correct.


Suppose we fix student A's set of responses. We want to find the probability that student B got the exact same set of responses, by chance. If the probability of obtaining a question is 1/5, it is pretty simple:


*[tex \LARGE P(B = A) = (\frac{1}{5})^{42} (\frac{4}{5})^8], we don't multiply by 50C8 because it's not exactly a binomial distribution; order of the problems is important. This probability is roughly 7.37*10^(-31), pretty small. Since we also have students C,D,E, raise this number to the fourth power; you'll get a pretty minuscule number.


However, the probability of a correct answer is not 20%. Assume the probability of a correct answer is p, then


*[tex \LARGE P(B = A) = p^{42}(1-p)^8]


If p = .99, then P(B=A) = 6.56*10^(-17). You can use calculus or inequalities to optimize P(B = A) but it will still be small. Note that you have to raise this probability to the fourth power to account for four students who are supposedly independent of each other. They're probably cheating.


There are definitely exceptions though, e.g. the eight questions were extremely difficult while the other 42 were extremely easy. Then I wouldn't be all that suspicious.