Here's the long and short of it in my mind. A few clarifications:
Yes, full-time inspection. Otherwise, how do you know about lift they installed below?
Large project? Multiple inspectors, each having brought their judgement. You should also have multiple proctors and even more gradations for these individuals to view, thus ensuring compaction in a reasonable and efficient manner.
Low concrete tests: The difference is that soil can be improved right now! There is nothing you can do to improve the concrete. The presence of a competent inspector on site can almost always get the job done cheaply and immediately with water, vibratory rollers, dump trucks, ect.
Basically, I think inspectors are armed with enough tools these days to be able to help with the minor adjustments that can be made while they're still on site: water, rolling pattern, fill variability, to ensure that compaction is met. When I fail a contractor on a test, I am communicating a thing to the engineer: they have a soft area. If an inspector cannot find a way to meet compaction on site where it could be met, and fails you with a 93%, he is not worth his salt. If he cannot find a way to meet compaction on site where there's something terribly wrong with the fill, then the problem is documented, and when your foundation cracks, we know why. If I leave site with a failing report saying the problem still isn't solved, I am communicating a concern, not a number.
Now, about the statistical methods. I see two scenarios, both of which are flexible: a set standard for 95% compaction coupled with the judgement of the inspector for those 94s, and a statistical average where low tests don't need the judgement of anyone, due to possibly higher tests in a completely different area. The first system requires judgement by the inspector. The second does not. The second system allows for soft areas on record, the first does not. I would personally rather play a role where my observations and judgements could do some good for the project, instead of the prospect of being just a button-pusher.
Now, having said all that, I think BigH's point about a 93% in a large unit being de minimus is a good one. That is where judgement comes in. If they think it is an issue, it shouldn't take more than 1 or 2 more passes on a unit like that to achieve their 95%- that is why they're there. It's when they don't affect a fix of the minor problems when it goes to the engineer's desk, and they have to deal with it after the fact that things get convoluted, leading I think partly, to this very discussion.
Good topic!