Given a bit of mulling time, I think I'm beginning to understand. Someone has been a bit creative here.
The second statement is a fairly standard statement of the operating characteristics of an industrial batch sampling scheme - you decide how many samples you're going to test from each batch, how many of these have to pass for you to pass the whole batch and how many of them have to fail for you to fail the whole batch.
Depending on the decisions you make, the scheme will be more or less good at telling the difference between a good batch and a bad one.
What you normally do is to define the percentage of a batch which would have to be defective for you to want to be pretty sure that the sampling scheme failed it, and the percentage that you could accept, and would want to be pretty sure that the sampling scheme would let through. You then calculate the probabilities that the scheme will come up with the wrong answer given a population at each of those percentages (knowing what you do about process variation) to give you two measures, the supplier's and producer's risk levels, which define how good the scheme is at telling good and bad batches apart.
I think the cute thing that's been done here is to take the same theory, but apply it to a different scenario. Instead of taking the various items of a production batch as the population, I think what they've done is to take successive instances of a test being carried out on the same vehicle.
A vehicle is presented for emissions testing. A smoke probe is stuck up its exhaust pipe and it's revved right up to the governor a specified number of times. Each of these is a sample drawn from the population of "every time the driver stamps on the pedal"
Associated with the test is a set of rules telling you how to turn this series of measurements into a pass or a fail.
If you then say that you want to be sure of failing any vehicle which is smoky on 65% or more of its accelerations, and want to be sure of passing any vehicle which is only smoky on 40% or fewer of accelerations, you can use the same processes as the industrial statistician does to find out how effective your sampling scheme is.
In this case, the scheme designer is saying that he thinks he will pass only 10% of vehicles that are so smoky they should have failed, and will only fail 5% of vehicles which are clean enough that they should have passed.
A sampling regime that did many more accelerations would be much less prone to getting the wrong answer, but likely to destroy rather more engines in the process - hence the compromise.
I believe the current methodology in the UK is to perform three accelerations and to average the readings from these. If the mean is below a given threshold, the test passes. Otherwise, another acceleration is carried out, and the average of the last three accelerations is checked against the threshold. If this fails, another acceleration is done, and the vehicle is only declared a failure if you haven't got a good "average of the last three" after you've done six accelerations.
At first sight, this scheme looks like one with a sample size of four, with up to three defectives being acceptable. In fact, I'm not sure this works out because the samples, being averages with measurements in common with one another, are heavily cross-contaminated, rather than being "drawn at random from the population" as the theory demands.
Hope that helps with the question you had in mind. Apologies if I've gone off on a long tangent.
A.