For the app I just downloaded, they can't be doing image processing in any normal sense, because a finger placed on the lens can't produce a normal image. Contact with the lens, however, keeps the unfocused object stationary, so pixel to pixel correspondence over time is assured, or at least encouraged.
So, yeah, they seem to be looking at frame by frame variation/trends in the color bits of each pixel, and drawing inference from that.
ISTR reading about using frame to frame comparisons to extract useful 3D data from ordinary video, more than ten years ago, at which time it took some fairly fancy hardware to do such things in anywhere near real time. Coverage ebbed rather quickly, leading me to suspect that the technology may have become classified for its military potential.
Now, everybody and his brother has a phone that's certainly capable of real time image processing, so some cats are out of the bag...
Mike Halloran
Pembroke Pines, FL, USA