In the early 1930's, the Tennessee Valley Authority built the "Hawk's Nest Tunnel" through Gauley Mountain in West Virginia to build a hydroelectric facility. In order to accomplish this, the workers drilled though one mile of almost pure silica. Five thousand people worked on this project; no safety precautions were taken to prevent respirable-silica exposure. Approximately 1,200 workers developed silicosis, and approximately 400 - 600 of these perished from the disease. This is know as the "Hawk's Nest incident," and it is considered America's worst industrial disaster [Jack 2005 at pp 8 - 9].
...during 2002-2004, the 20,479 new silicosis claims in Mississippi are over five times greater than the total number of silicosis cases one would expect over the same period in the entire United States [Jack 2005 at pp 12-13].
This explosion in the number of silicosis claims in Mississippi suggests a silicosis epidemic 20 times worse than the Hawk's Nest incident. Indeed, these claims suggest perhaps the worst industrial disaster in recorded world history [Jack 2005 at pg 13].
--Jack, JG (the Honorable),
"Order No. 29: Addressing subject-matter jurisdiction, expert testimony and sanctions,"
In re Silica Product Liability Litigation (2:03-MD-1553), June 30, 2005.
Medical-Legal Screening Processes are Inherently Flawed
Robert C. Herrick MBA, CPCU
The opinions expressed below are those of the author and are not necessarily those of his employer.
Judge Jack's findings in regard to screening in silica litigation were scathing, and are widely credited with having exposed a massive complex of cursory, potentially fraudulent screening activities that massively inflated claims made in that litigation. The spill-over effect in asbestos litigation, which shared many of the same practices, and even shared many of the same claimants, was nearly immediate, and probably accounts for much of the huge drop off in claims filed in the past four years.
The screenings in the silica litigation were clearly beyond the pale, with single doctors diagnosing tens of thousands of claimants in the course of a single year. However, silica and asbestos are not the only mass torts for which screening activities take place. When some mass torts gather steam, it is partially the result of the emergence of firms that are involved in identification of potential claimants through some screening process. The best example remains asbestos followed closely by silica, but these are by no means the only examples. In products as diverse as exterior siding to pharmaceuticals, one can observe the emergence of 'claims handlers' or 'screeners' that act to identify potential litigants or claims makers. Many of these screeners have little if any connection to the world of medicine, but some do. When the process involves physician conducted screening in support of mass tort litigation, it is referred to as medical-legal screening.
Where medical-legal screening is fraudulent, details of the fraud are likely to eventually emerge, allowing a correction to follow. This result took place in silica litigation and, to some lesser extent, in asbestos. In other cases the screening is more controlled and more deliberate, although rarely are the details of the screening process revealed. Suppose medical-legal screening is done without fraud? Is this good enough, or is the very concept of screening at odds with concepts like equity and preponderance of proof?
An exception to obviously sloppy and fraudulent medical-legal screening can be found in welding rod fume litigation, where one of the doctors involved in screening has been quite open about the process. By looking at the outcomes of that process we can see if these a more controlled medical-legal screening activity produces acceptably better results from either the point of view of medicine or that of equity. Addressing that question is the subject of this brief paper.
Associating Welding and PD
Dr. Brad A. Racette is a Neurologist working at Department of Neurology at the Washington University School of Medicine who has been investigating a possible link between manganese inhalation via exposure to welding rod fumes and Parkinson's Disease (PD) or other forms of Parkinonism [1] for about a decade. His first published work in this area [2] examined a group of 15 career welders that presented themselves in the course of Dr. Racette's work at the Movement Disorders Center at the WU School of Medicine. These welders were part of a larger group of over 900 new patients seen between 1996 and 2000. He observed that the average age at diagnosis of PD in this small group was about seventeen years earlier than average, and found that the only clinical difference between these 15 welders and idiopathic PD was this early onset of symptoms. He noted the absence of classic symptoms of a disease called manganism and speculated that there might be some involvement between their occupations, the consequent exposure to manganese fumes, and the early age at onset. He reported his findings in 2001:
Our study suggests that welding may be a risk factor for a parkinsonism syndrome that is ... clinically indistinguishable from idiopathic PD except for age at onset. We believe that welding exposure acts and an accelerant to cause PD. Our finding do not prove that manganese is the toxic agent, and other components of the fume could be responsible for parkinsonism in welders [Racette 2001 at 12].
Evidently, this report resonated with plaintiff's attorneys looking for the 'next asbestos,' because by 2005 Dr. Racette was reporting the results of a very large screening process he had undertaken, funded in part by certain plaintiff's attorneys.
"Racette Screening"
In 2005 Dr. Racette published a second article [3] in which he reported on results on a much larger group of welders. Over the period of August 2002 to March 2003 his group screened a total of 1,950 welders, including as many as 580 in a single day [Racette 2006, at 360]. All of the welders had been referred by an attorney. Most, but not all, of the patients screened had symptoms. Most, but not all, of the patients were welders. His 2005 article addresses a subset of 1,423 patients that resided in Alabama. From this group results were published for 1,090. The fate of the missing 333 is unclear. The only mention of loss of patients to the study was exclusion based on a perception of malingering, although a 23% malingering rate would seem large, at least outside of a litigation context.
Dr. Racette described the process in two papers, including one in 2006 [4] As he describes it, the screening was largely done based on videotaped examination, presumably screened much more slowly than at the rate of 500 a day. Two sets of criteria were used, labeled 'conservative' and 'liberal.' The liberal criteria were a subset of the conservative; all conservative diagnoses would also be liberal diagnoses, but not necessarily vice versa. Dr. Racette makes two important claims:
1. The sensitivity of the study was 56% and the specificity of the study ranged from 91% to 100% We discuss what those two terms mean in the next section, as well as their implications for medicine and for the law.
2. The conservative criteria implied an age-standardized 'prevalence ratio' of 7.60 comparing the incidence of diagnosed PD among active male Alabama welders compared to the prevalence of PD in the male general population of a county in Mississippi. This is a claim that Alabama welders are nearly 8 times as likely to get PD as the general population.
It is important to understand the basis on which Dr. Racette reports his estimates of sensitivity and specificity. First we need to define our terms.
Sensitivity and Specificity
Sensitivity and specificity are measures used by statisticians to describe how a testing or screening process performs when there are two outcomes. Consider a test that will identify things as either in or out of some classification. In the Racette study the classification is a diagnosis of PD. For any given test there are four possible outcomes:
True positive. The test correctly classifies the subject as being in the class. This welder who has PD has been diagnosed as such.
True negative. The test correctly classifies the subject as not being in the class. This welder who does not have PD does not get diagnosed as having PD.
False positive. The test classifies a subject as belonging in the class when it should not be included. This healthy welder has been diagnosed with PD.
False negative. The test classifies a subject as not belonging in the class when it should be included. This welder with PD is diagnosed as healthy.
Sensitivity is the proportion of actual positives that are correctly identified as such. Mathematically, it is the ratio of true positives to the sum of true positives and false negatives. In the case of welding fume screening, it is the percentage of welders that have PD that are actually diagnosed as such. A sensitivity of 56% implies that the test misses almost as many as it finds, and is not a particularly powerful result.
Specificity is the proportion of actual negatives that are correctly identified. Mathematically, it is the ratio of true negatives to the sum of true negatives and false positives. In welding fume screening it is the percentage of welders that do not have PD that are diagnosed as healthy. A specificity of 100% implies no false positives, and is a very strong claim for a screening process.
What is a reasonable estimate of the real specificity of Racette's screening method?
Dr. Racette wrote:
We compared the agreement of these quantitative criteria with the diagnosis of parkinsonism by in-person examiner. Of the 48 subjects undergoing in-person and video diagnoses, the conservative criteria diagnosed three subjects with definite parkinsonism and six subjects with probable parkinsonism. The liberal criteria diagnosed three subjects with definite parkinsonism and nine subjects with probable parkinsonism. The sensitivity and specificity of the liberal criteria compared to the in-person examination were 56 and 91%, respectively. The sensitivity and specificity of the conservative criteria compared to the in-person examination were 56 and 100%, respectively [Racette 2006 at 359].
Leaving aside the question of misdiagnosis by the in-person examiner, it seems reasonable to critically think about the claimed specificity. This table may make it easier to understand what these numbers mean:
|
Definite PD |
Probable PD |
Total |
Conservative |
3 |
6 |
9 |
Liberal |
3 |
9 |
12 |
The 91% specificity for the liberal criteria mean that one of the 12 liberal diagnoses was incorrect. Recall that the criteria for a liberal diagnosis were such that any conservative PD diagnosis would also be a liberal diagnosis, which means the equal numbers of Definite PD diagnoses are not a coincidence. One of the three additional diagnoses found by the liberal methods but not the conservative methods therefore must have been a false positive. That is a 33% failure rate at the margin. It gives some sense of the volatility of at least the liberal criteria, and because of the close overlap of the liberal and conservative, may suggest something about the volatility of the conservative as well.
These findings are based on what are clearly very small samples. What would have been the effect if even one of the conservative diagnoses had been wrong? In short, specificity would have dropped to 89% One might then ask if it is reasonable to take a sample size of 9 and represent that there is no possibility of a false positive? Probably not.
If the real specificity of this test were 90%, and the sample size were 12, the standard deviation assuming a binomial distribution would be just over 1. That is, an outcome of 11 or 12 correct classifications would not be a very rare event [5]. The same argument applied the the conservative criteria would suggest that if the real sensitivity were 90%, the standard deviation would be .9, and a score of nine for nine (a specificity of 100%) would also not be rare [6]. It seems reasonable to believe that while the specificity of Dr. Racette's method is undoubtedly high, it is unlikely to be as high as 100%, and could plausibly be as low as 90% or even lower. A much larger sample would be needed to nail this number down.
Obviously the same argument could be made about the sensitivity, where the standard deviations would be around 1.5 - 1.7.
Implications
When thinking about the sensitivity and specificity of tests it is useful to think about two things: How many false positives and false negatives are there in absolute terms, and what are the costs of each? This is particularly true where the natural incidence rate is low in relation to the rate of false positives. It is also useful to consider cost from various perspectives. Since this report was of a procedure a medical-legal screening, those are the two perspectives to consider.
Assume that the true sensitivity is 60% and the true specificity is 95%. These are just illustrative numbers, but they are plausible ones in this example. Assume the true rate of PD in the population is 1%. Again, this is illustrative but in the ballpark [7]. With these assumptions, there are approximately 11 true cases of PD in the group of 1,090 welders, of which 6 or 7 are correctly identified and 4 or 5 are missed. Doctors probably find that a dismaying result, particularly if early diagnosis would have helped slow the progress of the disease. Plaintiff attorneys might mourn the potential loss of clients, but in all likelihood those misdiagnoses will be corrected eventually and those clients will return.
On the other hand, there are 54 or 55 false positives (1,090 * (1 - .95)), for a total of 60 - 62 positive diagnoses of PD, or about 5.5 times too many. Computing the age adjusted prevalence ratio is beyond this short paper, but 5.5 seems at first blush to be about the 7.6 ratio that the Racette study found. In order for the number of actual positives and false positives to be approximately equal, the specificity of the test would need to be about 99.5%. The implications for medicine are that the false positives this method produces would result in substantially more demand for at least some medical resources, and would cause needless anxiety in patients misdiagnosed. Plaintiff attorneys, on the other hand, would have a much larger base of potential clients, and could deploy the threat of that large base to suit their own purposes.
These diagrams help illustrate the situation, assuming a 60% sensitivity, 2,000 welders screened, a natural rate of PD of 1%, and three different specificities. There are several real world examples where tests of high sensitivity and specificity are, nonetheless, not adequate for medical or public policy purposes: mammogram tests for breast cancer, PSA tests for prostate cancer, and anti-terrorist data mining have all been in the news recently, and all are good examples of the pitfalls of tests with less than exceptional specificity.
In summary, even in the total absence of fraud, unless a screening process is virtually free of false positives or the true incidence is about the same as the false positive rate, a 'positive' result in medical-legal screening is more likely than not to be false, and perhaps more likely by several hundred percent. The result of a medical-legal screening, therefore, seems a flimsy basis for bringing suit in a world in of "a preponderance of proof."
Endnotes
[1] Parkinsonism is an umbrella term for a constellation of movement disorders. Parkinson's Disease is a form of parkinsonism for which the cause is presently unknown. The term is sometimes written as 'Idiopathic Parkinson's Disease' or iPD to reflect the unknown nature of the cause. Manganism is a rare form of parkinsonism caused by overexposure to manganese, usually the result of inhalation of dust or fumes in mining, milling or manufacturing environments.
[2] Racette BA, McGee-Minnich L, Moerlein SM, Mink JW, Videen TO, Perlmutter JS, "Welding-related parkinsonims: clinical features, treatment, and pathophysiology," Neurology 2001; 56:8-13
[3] Racette BA, Tabbal SD, Jennings D, Good L, Permutter JS, Evanoff BA, "Prevalence of Parkinsonism and relationship to exposure in a large sample of Alabama Welders," Neurology 2005; 64(2):230-235
[4] Racette BA, Tabbal SD, Jennings D, Good LM, Perlmutter JS, Evanoff BA, "A rapid method for mass screening for parkinsonism," NeuroToxicology 2006; 27:357-361
[5] The probability of going 11 for 12 (or better) under these assumptions would be just under 66%
[6] The probability of going 9 for 9 under these assumptions would be just under 39%.
[7] PD is a common diagnosis in the US, affecting about 1 million people. The lifetime incidence rate in the US population is on the order of 1 - 2% [Olanow CW, "Manganese-induced Parkinsonism and Parkinson's Disease," Ann. N.Y. Acad. Sci 2004; 1012:209-223 at 217], and 0.8% of all death certificates list PD as the cause of death [The CDC reports the
fifteen leading causes of death in the US annually in the National Vital Statistics Report based on a review of death certificates from all fifty states, which are thought to capture some 99% of all US causes of death. In the report for 2005, published 24 Apr 2008, Parkinson’s disease ranked #14, representing 0.8% of all 2.4 million deaths].