Previous diary:
GlobalQuality
Future diaries in this set (titles may change):
- The New MakeOrBuy
- Management Control Systems
- Management System Standards and Auditing
- Quality Management Gone Wrong
- Management metastandards and globalization
- Do the Chinese have standards?
- Standards as a Progressive policy tool
There are few topics that engender as much confusion as the concept of an 'inspection'. We want to 'inspect' all of the arrivals at our ports to find terrorist weapons. We want to 'inspect' all food imports from China. We want to inspect all sorts of things. But what does that mean? What is an inspection? How effective is it? How much does it cost? Hopefully, the following discussion will answer some of those questions.
Where and How to Inspect?
There are a lot of places in the supply chain that inspections can be done, each with advantages and disadvantages. The first diary examined the choice between outgoing and incoming inspection. However there are other possibilities. The buyer can trust the supplier to inspect the product before they ship it. The buyer can send one of their own people to inspect the product at the supplier's factory. The buyer can have the product inspected when it arrives at the port of departure. It can inspect it at the port of arrival (Customs may also inspect it here - but for other reasons).
Typically there are three main factors that drive the decision: where is there likely to be something to find? , where is it least costly?, and where do you have the greatest confidence in the inspection and the inspectors?
For effectiveness, the key locations are at the raw materials and when the product exits the factory. Raw materials inspections are critical because materials defects may be hidden by subsequent production processes (i.e., slap some paint on it). As much as one would hope that suppliers would thoroughly vet their suppliers, manufacturers often get busy, lazy and desperate. A response to my previous diary made this point.
Even if you think you're buying from the manufacturer, the manufacturer may be obtaining product from a distributor because of his own material shortages or capacity problems. No distributor does anything beyond cursory visual inpsection (and sometimes not even that), and almost no customers go beyond visual inspection at receiving any longer.
As far as cost goes, the cheapest solution is almost always to inspect it before it is shipped. If it fails the inspection, you avoid the waste of shipping junk and inconveniencing a customer. You also learn that you have a problem much sooner - while you can still fix it. Trust, on the other hand, has two components: competence and honesty. Suppliers tend to be competent, but their honesty is not always certain. Purchasers tend to trust their own staff, but the staff's ability to diagnoze complex supplier-generated quality problems may be shaky. It all depends. The following table summarizes some of the key choices and issues.

Inspection Methods
Inspection is a complex subject. Not only are there a great many different things to look for, but there are a lot of different tests and tools available. However, for all the apparent complexity, the basic logic of inspection breaks down to a few major issues and options.
Statistical Inspections
The concept of using statistical analysis of small samples to make pass/fail decisions seems wrong to many people. Worse, there is a germ of truth in their fear. There are, in fact, two distinct scenarios for sampling. One works fine with small samples. The other requires larger samples exactly the way most people would expect. No wonder people are confused.
Before WWII, people who pioneered statistics designed methods that used random sampling to reduce the effort required to inspect for certain types of problems - with a high degree of accuracy. However, to appreciate their logic you must divide inspection situations into two distinct categories, based on the type of data that you plan to study:
The techniques and logic for drawing conclusions from samples differs profoundly between the two categories, and this difference greatly affects the cost and accuracy. One way way to explain this is with an example:
Assume that you routinely receive shipments of packaged cakes in cargo containers with 1000 cakes per container.
Sampling Numeric Variables - Let's say that you want to inspect arriving containers to see if the supplier was honest about the weight of the cakes. You select a random sample from the container. The random sample will likely include a cake or two that is packed deep in the container, so you will probably wait until you unload the container. Let's say that the agreed-upon weight is 1000g and you consider any cake under 980g to be underweight. You are willing to accept a container if no more than 1% of the cakes, on average (over a continuous series of container shipments), are underweight.
Weight is a numeric variable that can vary freely over a range (a.k.a. it is a 'real number'). The first cake might weigh 987gm. The next could weigh 1023.5gm and so on. So, assume that you took the following random samples from each of two containers:
The sample for Container 1 exhibits a modest variation in the individual weights(as measured by the standard deviation). The least is 991gm and the highest is 1006gm. The average is slightly low but the weights are pretty consistent. Meanwhile, the sample from Container 2 has an average that is close to the target, but there's more variation in the weights. None of the cakes were actually underweight in either sample. So should we accept Container 1, Container 2, both or none?
The diagram depicts the problem in graphical form and shows the samples for Container 2. A statistician knows (as perhaps a lay person does not) that, if the sample variability is large, it is likely that underweight cakes will be present. The main purpose of the sample is not to discover individual defects but, rather, to map the size and pattern of the variation. If we know how much the cake weights vary, we can predict whether defects are present - even without seeing one. This is so important that I want to repeat it in highlight.
When we randomly sample numeric variables, we're not trying to find defects. We are trying to get enough data to map the way the variable is distributed. With that, we can predict the likelihood that a given number of defects will be present. It turns out that we don't need a lot of data points to see the pattern of variation, so we can rely on surprisingly small samples.
The extra information contained in the pattern of variation is the power that drives all statistical analysis. Statisticians have found a host of ways to use that pattern to understand and predict things that would otherwise be just a confusing mush of contradictory data. Note, for example, that the variation increased from container 1 to container 2. That is a red flag that something about the cake-making process may have changed for the worse. Perhaps a key baker has left. Perhaps a machine is wearing out. Perhaps the procedures are poorly documented. It raises a lot more questions than just whether the cakes are underweight.
So how small is small? If you use sqconline.com's handy-dandy calculator devised for military acceptance testing (with variability distribution = unknown, batch size = 1000 and Acceptable Quality Level = 1%), you will see that you only need to sample 35 cakes from the 1000 in a container. That's 3.5% of the shipment. Thirty-five is so few that it is quite plausible that we won't find one that is underweight. Yet it is enough to map the distribution and predict whether we should reject the container or not. It turns out that we should accept Container 1 easily (mainly because its weights show such small variation) and reject Container 2 (because the weights vary so much that underweight cakes are highly likely).
Sampling Attribute Variables - What if you wanted to inspect a feature that doesn't vary over a continuous range? Say we want to examine the cake for the presence of animal hair (don't ask!). We know that our inspection will either find hair(s) or not. So, if we take a sample, we might get results like the following.
What does this mean? We tried cakes in one container and they were all OK. We tried cakes in another container and there was one with hair. Does that mean that container 2 is bad and container 1 is good? The truth is that we don't really know. As shown in the following diagram, our random sample may have picked up a bad cake by accident. In the left panel, the sample found no problems, even though quite a few were present. On the right panel, the sample happened to stumble on the only bad cake in the shipment. If 'finding a bad cake' were the criteria, we would reject the shipment on the right and keep the one on the left - which is clearly wrong.

So why do small samples work with numeric values and not with attributes? The answer is simple. Numeric values give us extra information about the pattern of variation. That extra information allows us to predict potential defects. This is again so important that I want to highlight it.
When we randomly sample for attributes, we will either find them or not. The sample doesn't contain any extra information that would allow us to infer or predict their possible presence. Therefore, we have to use a larger (possibly much larger) sample in order conclude that defects aren't present.
When we are seeking an attribute, we will have to test a much larger number of cakes to be sure about the incidence of defects. We can use sqconline.com's different handy-dandy calculator (batch size = 1000, AQL = 1%) to find out how big a sample we have to use to know whether the container has less than 1% defectives on average. It turns out that we need to sample 80 cakes to achieve that level of confidence. That's more than double the sample size that we needed when we tested for weight.
Suppose we want to reject the shipment if it indicated that there might be more than one (1 in 1000 or 0.1%) defect on average over a series of shipments. If we were measuring weight, the calculator tells us that we still only need to sample 35 cakes. That's enough to estimate the distribution pattern of the variation. However, if we are looking for those pesky hairs at the 0.1% level, we would need to test 125 cakes out of each shipment, nearly 4 times as many.
Safety - Inspecting for Rare Attributes
The difference between testing for numerical values and testing for attributes is hugely important as we seek perfection (and safety) in our products. What if not even one defective is allowable, let alone 1 in 1000. There are lots of situations where a single defect could be dangerous or disabling. One bad component can mean that an electronic device won't even turn on. A small crack in one part might cause it to fail and bring down an airplane. Ironically, if defects are widespread, it's actually less scary. A bunch of defectives will probably be noticed pretty quickly. One lonely defective may escape attention until it is too late.
If we decided that we can't accept even one defect, what must we do? Look again at the last diagram. If there are a few defects, there is a good chance that a large, albeit partial, sample will clue us in. However, if defects are very rare and none are acceptable, the only option is to test every item. We have to do a 100% inspection. Unfortunately, even with a 100% sample, we may not be safe. Inspectors can miss things, so a 100% inspection may not be 100% effective. Many quality experts argue against doing 100% inspections unless they are done by well-maintained, frequently-calibrated, fully-automated inspection equipment. They argue that anything else just leads to a false sense of security. If we believe everything is OK, we may be less observant and miss evidence of a problem later on.
Replacing Attributes with Variables
The practical difficulties of doing attribute-based inspections for rare events have prompted the experts to look for workarounds. For example, it would be very convenient if we could somehow change the desired attribute to a numeric variable. If we do that, we can once again exploit the pattern of variation to give us a little bit more information that can help us predict problems from smaller samples. So how can we do that? There are many possibilities, including the following:
- We can look for a variable that is somehow related to the attribute we are seeking. For example, if we are looking for baked goods that are overcooked, we may be able to tie that to the fact that the crust will be darker. Accordingly, we might measure the amount of light reflected back from the product.
- We can look for a variable that we can use as a proxy for 'goodness'. We know, for example, that gold has a precise density (its weight divided by its volume). That is a basic physical property of the element. If we measure a purported gold bar's density very carefully, we should always get that exact value, with very low variation. If we get any other value we can be fairly certain that something else is there. We won't know what has been added, but we will know it wasn't gold.
Using proxies for critical quality measures represents a convenient shortcut. It allows us to use variables to predict problems with much smaller samples. The weakness of this approach lies in its dependence on an associative relationship that we assume, but do not directly test. If the bakery decided to put an egg wash glazing on all of its baked goods, the shininess might fool our inspection. If a crook added a carefully calibrated mixture of lead (heavier) and copper (lighter) to the gold, they might fool us. Its a clever approach and potentially very useful, but not foolproof.
Testing Technology
All inspection depends on our ability to 'inspect' the product. That, in turn, depends on our ability to use instruments to measure or assess important characteristics. There are a wide range of options and issues that determine the accuracy and feasibility of the inspection.
- Destructive vs Non-Destructive Inspection - Destructive tests alter the product in a way that compromises its intended use or value. Non-destructive tests gather information without affecting the product or service. Destructive tests are prohibitively expensive if you are searching for rare defects (i.e., in 100% inspection scenarios). If you have to open every can of soup to look for foreign objects, you had best do it beside the pot and a stove - and just before lunch.
- Human Inspection - Most older inspection methods were manual or used simple instruments, wielded by a human. Unfortunately, these tests tend to be either superficial or comparatively 'destructive'. They are superficial if we rely on our eyes, ears, etc. They can be destructive if we use our hands. Humans are clumsy testers. We have to open the package to look inside. We have to poke the fruit to see if it's ripe. Our limited senses force us to damage the item in order to inspect it. The one place that humans excel is in our ability to synthesize confusing or conflicting information to arrive at a subjective assessment.
- Online vs Offline Inspection - Online inspections are done as the product or service is being produced or in the normal course of required handling. Offline inspections extract items from the production process specifically for testing. If you use specialized instruments, or a rigidly controlled test procedures, those usually have to be done offline with random sampling. Online inspections can range from random sampling to 100% inspection (if the test is fast and easy).
- Automated Testing - Relentless advances in automated inspection technology are dramatically changing the mix of potential inspections. The new technologies that permit rapid, high quality non-destructive tests (NDT) are especially exciting. They get quick answers without damaging the test item. As these instruments improve, they allow us to do online 100% inspections with more ease that we now do random sampling. There's a lot of information on the Internet: Wikipedia page on NDT, NDT in Food Processing, and Powerpoint summary of NDT in metal products
As new testing methods are developed, we should be able to institute online testing to find those elusive rare defects much more efficiently. However, most of this new technology is fairly expensive and suppliers in developing countries probably won't be able to afford it in the near future. To deal with their quality issues, we will have to look elsewhere in the short run.
Unk-Unks
Some defects occur in ways and places that no one would expect. Even unscrupulous suppliers don't usually plan to create unsafe products. They are more likely to do it accidentally or negligently. Anything else would make them terrorists and that's a whole different deal. If mistakes are inadvertent, then it follows that many of them will occur in ways that no one expects. Who would have thought that anyone would add melamine to pet food?
If the defects tend to be yes/no attributes, are rare, and no one would reasonably think to look for them, then is it reasonable to expect an inspection to find it? The answer is no. This type of defect falls in a class of problems called "unknown unknowns" or 'unk-unks'. The silliness of the Rumsfeld usage of this term aside, it's a perfectly valid and very useful concept. In fact, it is part of a hierarchy of uncertainties that confront all inspection schemes:
- Known and Unfound - We know that something may be there, but since we haven't found it, we don't know whether it is really not there or if we just missed it.
- Known and Unfindable - We know something might be there, but we don't have the means to find it.
- Known Unknowns - We know that something might be there, but we've no idea what it might be.
- Unknown Unknowns - We don't even suspect that something might be there, so it doesn't matter whether we could find it or not. We'll never look.
In an increasingly technological world, there are a growing number of things that fall in the second, third and fourth categories. We might make some inroads on the known and unfindable by developing better tests and more sophisticated inspection technology. We may even make a dent in the known unknowns if we engage in systematic and scientific research. However, it is hard to see how we will ever anticipate unk-unks - until they bite us.
This fact is so common that quality experts use the term 'special process' for situations that are fertile ground for these types of defects:
"processes where deficiencies become apparent only after the product is in use or the service has been delivered" - ISO 9001:2000, section 7.5.2
In other words, there are processes that spit out products that cannot be inspected. Special processes abound. There are millions upon millions of them in every developed economy - in manufacturing and in services. A weld can have a hidden crack. A plastics maker can forget a small, but critical, ingredient. A baker can get the recipe just slightly wrong. In services, a lawyer can forget to close a subtle loophole in a contract. All of the delivered outputs will look perfectly fine to a casual inspection - until they are used, or until the passage of time reveals their fatal flaw.
The Answer - Inspect the process instead of the product
If inspections aren't foolproof and if they can't protect us from a special process gone wrong, what should we do? We clearly enjoy huge quantities of perfectly good and safe products. This suggests that suppliers must have found another approach. The answer, for at least the past 50 years, has been to turn the inspection lens away from the output products and toward the production process. The typical logic for dealing with special processes goes something like this:
- Design a production process to produce a given product in a rigidly controlled fashion. Pin down every production detail - from raw materials specifications, to recipes, to measuring tools, to machine settings, to operator procedures, to operator training, to ...
- As you are engineering the production process, take the interim outputs and subject them to thorough testing. Test for everything you can think of. Test for things you can't even imagine. Look to see if any tweaks in procedures or settings could change the final product. Test the interim products long enough that any hidden defects will have time to emerge. Apply destructive tests with gusto.
- When you know that the process is capable of producing a good product, lock everything down - tight! Establish a detailed, comprehensive standard for the process. Write down every required machine setting. Document every procedure. Create a formal training syllabus and religiously train every worker before allowing them to come near the process.
- Set up instruments and procedures to monitor the process. Forget about inspecting the product. Set up continuous inspections (a.k.a. monitoring) to make sure that the process always follows the plan - with no deviations - ever! Check the machine settings constantly. Audit the worker's procedures frequently. Keep records of times, temperatures, pressures, etc. and frequently check them against the process standard.
This is a total shift in focus. Instead of applying inspections to the product, we can use inspections to control the processes that generate the products. The development of this concept in the late 1930s was arguably one of the greatest technical achievements of the 20th century. The quality management systems (QMS) that evolved from this insight are the reason that our products, foods and even services are as good as they are. Without this insight, I doubt that our technical society would even be possible.
Finally - the Melamine
The melamine in pet food scandal appears to be a case study in the issues described above. I am not an expert in food safety and I am from press reports (always dangerous), so some of my interpretations may be off the mark. If so, please be gentle in the corrections and I will update.
From Internet accounts, gluten is wheat (or rice or some other grains) whose kernels have been crushed, made into a dough and washed with plain water to eliminate excess starch and bran. By this definition, there is no reason that anything would be present except for material from the natural grain. Moreover, wheat gluten was first developed as a product in China - so sourcing from China made a certain amount of sense. Presumably, they would be the most knowledgeable manufacturers.
Wheat gluten (like wheat) is inspected by offline tests. The variable that most interests buyers is the level of protein in the gluten. The more protein, the greater the nutritional value. Also the higher the protein level, the less likely that the gluten has significant impurities. Unfortunately, it appears that there are no cost-effective direct tests for either the 'glutenness' (an attribute) or protein level (a variable). There is, however, a stable relationship between the amount of protein and and the amount of nitrogen in a given type of food.
Hence, the inspection strategy has been to use nitrogen as a proxy for the variable (or attribute) that is really being sought. Nitrogen can be tested, but the tests are not trivial and have gone through a rapid evolution in recent years. The oldest test, the Kjeldahl method, takes several hours and produces some awkward byproducts. The more recent Dumas is much quicker, but still offline. It only appeared in major applications in North America in the mid-1990s. The newest Near-infrared (NIR) methods are faster, but they are not quite as precise.
With an inspection system dedicated to searching for nitrogen, the food supply system was a sitting duck for any vendor that picked up on melamine. Melamine is cheap, has a very high nitrogen level and easily into a brick of brown gunk. Since the instrumentation was looking for nitrogen, the Chinese gluten plus melamine tested as really good stuff. This method of spiking the food to fool the inspection didn't occur to the purchasers or their inspections. For them, it was an unk-unk.
Worse, the clinical pathology of melamine was also covert. This article suggests that melamine by itself is harmless and remains so. If, however, it is mixed with small concentrations of variants on uric acid, it produces crystals on pets' kidneys that can be fatal. Neither melamine, nor uric acid is damaging alone - only in combination. Several news reports have suggested (so far unsubstantiated) that the Chinese producers in question may have used waste melamine rather than pure melamine to spike the wheat gluten. Who knows what other contaminants were in that waste? Uric acid, perhaps? The use of melamine (indeed the use of anything other than real wheat gluten) opened the door to bad things. This underscores why all food and drug production is broadly considered to be a special process.
Given the history, the testing technology, and the surprise factor, it is quite possible that the Chinese exporters might have continued to ship melamine-laded wheat gluten for a long time. It appears that the idea to look for contaminants only surfaced when pets began dying. It may only be chance that the particular waste melamine they bought had the deadly second ingredient. More nitrogen inspections probably wouldn't have made any difference
So what would have helped? Just one thing I can think of - a locked-down quality management system applied at every step of the food production supply chain. The challenge is to find ways to force Chinese (and all other) manufacturers to adopt this approach and to be diligent in applying it. However, that is grist for a future post.
At the end of the day, reliance on incoming inspection is almost always a weak approach. The real root cause IMHO is that the original Chinese producers were allowed to participate in food production even though they lacked any credible form of QMS (quality management system). Any working QMS would, by its very nature, have prevented the production of material that was this far from the specification. The fact that food producers in the developed world are required by law to implement a QMS is one reason why the melamine substitution exploit is far less likely in North America, Europe or Japan.
A question to chew on ...
If locked down QMS systems are the only real protection, how do we get suppliers in developing countries to implement them - properly? Manufacturing in China is growing so fast that it's not surprising if their management systems cannot keep pace.
Permalink | 18 comments