10
votes
Acoustic cameras, motion amplification, and reading someone’s pulse through a video call
Link information
This data is scraped automatically and may be incorrect.
- Title
- How To Make A Legit Sound Camera
- Authors
- Benn Jordan
- Duration
- 16:15
- Published
- Sep 29 2024
A really interesting video from a tech perspective - the creator is clearly just experimenting with something that they find fascinating - but one that’s stuck with me for the privacy implications; the comments on the PimEyes thread reminded me to post it.
The idea of being able to see someone’s heart rate through a work call, or by glancing at them with smart glasses, feels way more invasive to me than the practical consequences would probably justify. Perhaps because it’s something so literally, physically internal being exposed? Or perhaps because it seems like the thin end of the wedge on what we’re inadvertently exposing as data analysis advances?
It starts with using a cheap mic array and some clever processing to pinpoint sources of audio in a recording, with a nice demo locating bird calls in a cluster of trees where the bird can’t be seen, and then shifts across to using motion isolation and enhancement to do similar things directly from the pixel data.
By the end, it’s at a point where the software can read someone’s heart rate through a normal video call with reasonable accuracy.
It’s also all done by someone who’s clearly technically skilled and familiar with audio and video work, but also very clearly not an experienced programmer. The kind of analysis he’s running is nowhere close to state of the art from a data processing perspective, and that makes me think we’re exposing way, way more than we realise simply by existing within range of a camera.
The analysis is not state of the art, but imo the main step up and reason why it feels like magic is not a result of the specific algorithm being run but simply the result of running an algorithm that extracts data from multiple frames and therefore significantly increases the signal to noise ratio of its source data when compared to running any computation on a single frame. So I wouldn't expect any further seemingly magical improvements, outside of machine learning which can extrapolate and make up plausible information.
I build ML models for a living, and I’ve still been surprised by some of things the field’s managed in the last couple of years, so that’s definitely colouring my perception!
My immediate thought was that if your recording is revealing all the info a human could recognise (who you are, your facial expression, mannerisms, etc.) and additional info that you either physically can’t control or wouldn’t think to because it’s imperceptible to others, that’s a ton of data to train a classifier on. Take someone’s TikTok account, or recordings of their webcam from the last few months’ meetings, or surveillance footage of them every time they’re in your building, and that’s a solid baseline for them as an individual on all those metrics.
What would your target variable be to correlate with all this data? No idea! I don’t even know if there’s anything nefarious that meaningfully could be done with it - but I’ve seen enough to be shocked by what you can glean from the things people knowingly, willingly share, so I’ve got a healthy concern about the idea there’s a bunch more latent information hiding in plain sight too.
The heartrate detection is pretty interesting - a decade ago I was studying digital image processing in university and a classmate was implementing the color amplification algorithm for heartrate detection in Matlab as a project. It worked, but only with a real digital camera (not a webcam - though webcams were worse on average at that time) and you needed to stand perfectly still. Despite Benn Jordan's claims, the short snippet he shows doesn't seem to work very well with his webcam either. However, at that time I had no idea that simply amplifying motion instead of colors would yield a better result with cheaper equipment.
The acoustic imaging has been done before by DIYers, it's simpler and cheaper than it used to be thanks to the availability of MEMS microphones with digital output. It's pretty useful for finding sources of noise pollution, and I wonder how useful it would be to find something like sources of weird noises when diagnosing a car.
edit: regarding the last part, the sound creating heat, what he's actually measuring is simply the tweeter voice coil heating up, or at best the membrane surround heating up due to flexing, nothing to do with the sound itself.