9 votes

How do you compute the probability of covering an entire population given you take an arbitrary number of random samples?

I suck at probability, so I thought I would ask here.

To clarify, given a population of size P, a sample size of K, and an arbitrary number of trials N, how do I compute the probability of having included each member of the population at least once in the experiment?

This problem is difficult to wrap my head around. It seems like it uses a combination of combinatorics and dependent events, which really throws me off.

Edit: This problem isn't the coupon collector's problem (please see some of my responses below). Think of the coupon collector's problem as being a special case of this problem where K = 1. My question is meant to cover an arbitrary K >= 1.

13 comments

  1. [4]
    GraveAccent
    (edited )
    Link
    After some thinking and googling, I found this result: https://math.stackexchange.com/questions/28905/expected-time-to-roll-all-1-through-6-on-a-die In which the best answer says that THIS type of...

    After some thinking and googling, I found this result:

    https://math.stackexchange.com/questions/28905/expected-time-to-roll-all-1-through-6-on-a-die

    In which the best answer says that THIS type of problem is known as the "Coupon Collectors" problem -- something I've never heard of before!!

    https://en.wikipedia.org/wiki/Coupon_collector's_problem

    Now i'm watching youtube videos to understand it better ha

    Edit:
    So, using a sample size of 1, the number of trials needed for population size P is:

    P*( γ + ln (P))

    Where γ is the Euler–Mascheroni constant = 0.5772156649015

    6 votes
    1. [3]
      Emerald_Knight
      Link Parent
      I've found the coupon collector's problem wikipedia page and the problem space is similar in some respects, but the key difference is that in my scenario I want to calculate the probability when...

      I've found the coupon collector's problem wikipedia page and the problem space is similar in some respects, but the key difference is that in my scenario I want to calculate the probability when grabbing a batch of coupons from the bin before replacing them. This fundamentally impacts the nature of the problem by throwing in an additional dependent event.

      I suspect that the solution to the coupon collector's problem will play a part, but it's definitely not a complete solution to the problem I've described.

      1 vote
      1. [2]
        GraveAccent
        Link Parent
        Well the math I posted above is again for sample size 1. To find K>=1, you need to change the expectation of each of the remaining examples. Its essentially going through the Coupon Collectors...

        Well the math I posted above is again for sample size 1. To find K>=1, you need to change the expectation of each of the remaining examples. Its essentially going through the Coupon Collectors equations, only changing the summation of probabilities.

        I found this link, you could try walking through this math:

        https://math.stackexchange.com/questions/131664/coupon-collector-problem-with-batched-selections

        2 votes
        1. Emerald_Knight
          Link Parent
          Yeah, I see that now. Thanks for the link! I'll be sure to see if I can wrap my head around this with this discussion in mind.

          Yeah, I see that now. Thanks for the link! I'll be sure to see if I can wrap my head around this with this discussion in mind.

  2. Emerald_Knight
    Link
    Example scenario: The probability of the numbers 1-6 appearing at least once each after N rolls of a pair of dice.

    Example scenario: The probability of the numbers 1-6 appearing at least once each after N rolls of a pair of dice.

    2 votes
  3. [2]
    unknown user
    Link
    The solution is beyond me, but it looks like maybe this has been asked and answered over on Mathematics Stack Exchange? It's certainly an interesting question.

    The solution is beyond me, but it looks like maybe this has been asked and answered over on Mathematics Stack Exchange? It's certainly an interesting question.

    2 votes
    1. Emerald_Knight
      Link Parent
      Unfortunately that problem only applies to single-item draws, not batch-item draws. The nature of the batch draws fundamentally alters the problem. It's a tricky one :)

      Unfortunately that problem only applies to single-item draws, not batch-item draws. The nature of the batch draws fundamentally alters the problem. It's a tricky one :)

  4. [4]
    aphoenix
    Link
    This is a restatement of the Coupon Collector's Problem which is fairly in depth. Wikipedia does give a pretty decent solution. Edit: not immediately "getting" the Coupon Collector's problem...

    This is a restatement of the Coupon Collector's Problem which is fairly in depth. Wikipedia does give a pretty decent solution.

    Edit: not immediately "getting" the Coupon Collector's problem certainly doesn't mean you suck at probability, which is deceptively difficult.

    2 votes
    1. [3]
      Emerald_Knight
      Link Parent
      The coupon collector's problem assumes a single item is drawn and immediately replaced. My problem is just different enough to matter. In my case, the items are drawn in batches before being...

      The coupon collector's problem assumes a single item is drawn and immediately replaced. My problem is just different enough to matter. In my case, the items are drawn in batches before being replaced.

      To give an extreme example, if I draw 50 items from a 50-item pool before replacement, then I have a 100% probability of getting them all in the first trial. If, however, I draw them individually and replace them each time, then the probability will be significantly less than 100%.

      1 vote
      1. [2]
        aphoenix
        Link Parent
        I'm pretty sure that a solution exists for batch selection in the Coupon Collector's Problem (though obviously not in the Wikipedia article that I linked). Have a read through this answer on the...

        I'm pretty sure that a solution exists for batch selection in the Coupon Collector's Problem (though obviously not in the Wikipedia article that I linked).

        Have a read through this answer on the math stack exchange - I think it'll get you most of the way there.

        Again, just to hammer on an earlier point:

        I suck at probability

        That's probably an unfair thing to say about yourself considering that I have a math degree and I had to read the solution twice to get it.

        2 votes
        1. Emerald_Knight
          Link Parent
          The problem is that I suck at probability in general ;)

          That's probably an unfair thing to say about yourself considering that I have a math degree and I had to read the solution twice to get it.

          The problem is that I suck at probability in general ;)

  5. Rocket_Man
    Link
    This is neat, I think I know how to calculate the probability of a single person being chosen over an arbitrary number of trials. However I'm not sure that's the same as ensuring every person was...

    This is neat, I think I know how to calculate the probability of a single person being chosen over an arbitrary number of trials. However I'm not sure that's the same as ensuring every person was chosen.

  6. arghdos
    Link
    This is a good question. I'm pretty sure you could get some bounds on the probabilities by putting together a Markov chain, where you states (using the dice rolling example) were the number of...

    This is a good question. I'm pretty sure you could get some bounds on the probabilities by putting together a Markov chain, where you states (using the dice rolling example) were the number of unique numbers (U) you have seen thus far.

    For example, at U=0, you have Pn=1 chance of seeing a new number, and Po=0 chance of seeing a repeat.
    At U=1, you have Pn=5/6 chance of seeing a new number, and Po=1/6 chance of seeing a repeat.
    For U=2, Pn=4/6, Po=2/6.
    ...
    These form the conditional probabilities for the chain. Strictly speaking, you have 0% probabilities of jumping from U=0 directly to U=2, or U decreasing, etc. (this forms the probability matrices you see on the wiki).

    Then you can set the initial state to x=[1, 0, 0, 0, 0, 0, 0] (i.e., U=0), and increment the system (see example) to see the change in the probabilities over the number of rolls.

    You could do some fancier math to get bounds on how many rolls it would take before you would expect only a 1% chance (or whatever) of seeing all the unique numbers, but my Markov class was a long time ago and I've forgotten most of it :P