25 votes

Computer scientists invent an efficient new way to count

Posted May 17 by skybrian

Tags: algorithms.streaming, science.computer, author.steve nadis, source.quanta magazine

https://www.quantamagazine.org/computer-scientists-invent-an-efficient-new-way-to-count-20240516/

Link information

This data is scraped automatically and may be incorrect.

Published: May 16 2024
Word count: 1067 words

6 comments

[6]
skybrian (OP)
May 17
Link
From the article: The paper is here and Donald Knuth wrote about it here.

From the article:

The CVM algorithm, named for its creators — Sourav Chakraborty of the Indian Statistical Institute, Vinodchandran Variyam of the University of Nebraska, Lincoln, and Kuldeep Meel of the University of Toronto — is a significant step toward solving what’s called the distinct elements problem, which computer scientists have grappled with for more than 40 years. It asks for a way to efficiently monitor a stream of elements — the total number of which may exceed available memory — and then estimate the number of unique elements.

“The new algorithm is astonishingly simple and easy to implement,” said Andrew McGregor of the University of Massachusetts, Amherst. “I wouldn’t be surprised if this became the default way the [distinct elements] problem is approached in practice.”

The paper is here and Donald Knuth wrote about it here.

12 votes
1. [5]
  Omnicrola
  May 17
  Link Parent
  This is really interesting! I don't have an immediate application for it, but I'm definitely going to mentally file it away for later use.
  
  This is really interesting! I don't have an immediate application for it, but I'm definitely going to mentally file it away for later use.
  
  2 votes
  1. [4]
    vord
    May 17
    Link Parent
    That's really the kicker that limits its usefulness for many applications. Not to say there isn't some great ways it can be applied, but there's an old idiom in the USA: "Close Only Counts in...
    
    then estimate the number of unique elements.
    
    That's really the kicker that limits its usefulness for many applications. Not to say there isn't some great ways it can be applied, but there's an old idiom in the USA: "Close Only Counts in Horseshoes and Hand Grenades."
    
    This algorithm is useless if you need complete accuracy.
    
    3 votes
    
    cykhic
    May 17
    Link Parent
    I don't understand why you say this. Rather than being useless, this algorithm makes a good estimate (arguably, the best possible estimate) with tight and well-understood error bars, given memory...
    
    I don't understand why you say this.
    
    Rather than being useless, this algorithm makes a good estimate (arguably, the best possible estimate) with tight and well-understood error bars, given memory constraints.
    
    In particular, this algorithm is strictly better than deterministic (i.e. completely accurate) algorithms --- in every case where a deterministic algorithm has enough space to compute the true answer, this algorithm would give the true answer as well.
    
    “Of course,” Variyam said, “if the [memory] is so big that it fits all the words, then we can get 100% accuracy.”
    
    17 votes
    
    Wulfsta
    May 17 (edited May 17)
    Link Parent
    This sort of estimate is used all the time for things such as inexact number of rows in a database table - I think PostgreSQL is an example. Not exactly the same problem, but certainly close....
    
    This sort of estimate is used all the time for things such as inexact number of rows in a database table - I think PostgreSQL is an example. Not exactly the same problem, but certainly close.
    
    Edit: Another example is HyperLogLog for this type of probabilistic counting.
    
    8 votes
    
    Omnicrola
    May 17
    Link Parent
    I've always heard this from my dad as "Close only counts in Horseshoes, hand grenades, and H-Bombs" xD
    
    : "Close Only Counts in Horseshoes and Hand Grenades."
    
    I've always heard this from my dad as "Close only counts in Horseshoes, hand grenades, and H-Bombs" xD
    
    2 votes