.
An e-mail filter is planned to separate valid e-mails from spam.
The word free occurs in 60% of the spam messages and only 4% of the valid messages.
Also, 20% of the messages are spam. Determine the probabilities:
(a) the message contain free.
(b) the message is spam given that it contains free.
(c) the message is valid given that it does not contain free.
~~~~~~~~~~~~~
Couple of notices before we start:
(1) 20% of messages are spam --- hence (due to the context) 80% (or 0.8) of massages are valid.
(2) 4%, or 0.04 of valid messages contain free --- hence 1-0.04 = 0.96 valid messages DO NOT contain free.
(a) P(a message contain free) = 0.6*P(spam) + 0.04*P(valid) = 0.6*0.2 + 0.04*0.8 = 0.152. ANSWER
(b) It is about calculating conditional probability
P(spam | contains "free") = P(spam AND contains free) / P(contains free) =
= = 0.7895 (rounded) ANSWER
Notice to the calculation: the denominator of the fraction P(contains free) is just calculated in part (a) as 0.152.
(c) It is about calculating conditional probability
P(valid | does not contain free) = P(valid AND does not contain free) / P(does not contain free) =
= = 0.9057 (rounded). ANSWER
Notice to the calculation: (1-0.04) comes from (2); (1-0.152) is P(does not contain free), due to part (a).
Solved and completed.