Screening tests: 90% "false positives"?
- Jérémie Lengrais
- Sep 20, 2020
- 6 min read

This week, Le Monde questions the hypersensitivity of COVID-19 tests, via the thousand-euro question:
“ Can we really say (…) that 90% of confirmed cases (…) are false positives ?”
And surprisingly, among several explanations, the newspaper neglects to mention that in medicine, a screening test with 90% "positives false" (we've reversed this for clarity regarding the percentage that interests Le Monde here) is... quite common. And often perfectly normal.
Indeed, even if it is counterintuitive, there is no contradiction in the concurrence of the following facts:
The T screening test is very effective: out of 100 patients, it only "misses" one ( false negative ).
The T screening test rarely makes a mistake: out of 100 healthy individuals, only one is wrongly diagnosed as sick ( false positive ).
And: the vast majority of cases diagnosed positive by the T test are wrong ( more false positives than true positives ).
For screening tests, this "paradox" is actually the rule, for two reasons specific to this field of application which we will see together.
Specific algorithmic models
First, let's remember that behind a screening test, as is often the case, lies an algorithmic model : a model that takes as input a patient's health data via samples (blood, saliva, etc.) and, based on this data, outputs a decision , in our case binary: positive (contagious) / negative (not contagious). In data science, this decision-making process is called a classification , and the final categorization of the patient is a label .
However, the algorithmic models behind screening tests, unlike other industrial algorithmic models, are subject to two particular constraints:
1. An imbalance in the populations concerned : sick/healthy individuals. A disease generally affects only a fraction of the population, on the order of one percent or even much less. There are many more healthy individuals than sick individuals. And therefore, in absolute terms , there are more diagnostic errors among the (numerous) healthy individuals than correct diagnoses among the (small) fraction of sick individuals.
In addition to this, for COVID-19, the tested populations are not "filtered" as for other health conditions (think of screening for breast cancer, colon cancer) given the health emergency and the banality of the first symptoms: the individuals tested are not only those who are strongly presumed to have the virus, but potentially everyone.
For example, for a disease affecting 1% of the population, out of 1000 randomly selected individuals:
• If the test generates 1% false negatives among sick individuals, there will be approximately 10 true positives (1000 x 1% x 99%).
• In parallel, if the false positive rate is 1% among healthy individuals, there will be approximately 10 people wrongly diagnosed as ill (1000 x 99% x 1%)
This results in as many true positives as false positives. Varying the parameters in this example helps to understand that it is easy to reach a situation where the number of false positives exceeds the number of true positives.
That's the simple mathematical explanation: absolute value and relative weight . However, it's worth remembering that imbalances in populations or cohorts, while leading to basic mathematical biases, can play powerful tricks on even the most brilliant minds. For example, it appears alongside confounding factors in Simpson's paradox , one of the most fascinating yet simplest cognitive biases in statistics (Editor's note: we've included external links on this topic at the end of this article).
2. The public health objective of screening tests : priority is generally given to minimizing the number of false negatives, that is, the number of people who, after being tested, believe they are healthy when they are actually infected. In data science, this objective is measured via recall , and differs from precision , which focuses on false positives.

Minimizing false negatives
Let us look further into this second aspect, undoubtedly the most interesting in the context of this article, because it goes beyond simple mathematical misjudgment and touches on preconceived ideas about algorithmic models.
The health objective of screening tests may seem obvious, but it is in fact a choice that is both constrained and informed , which is that of any algorithmic model: what type of errors to prioritize?
To understand this, we need to quickly revisit two fundamental aspects of statistical algorithmic models, which are counterintuitive for the general public:
There is no perfect generalizable algorithmic model . In other words, there is no model that generates no errors (false positives and false negatives) in a large-scale, ad hoc situation (which is what these screening tests are). This is not due to an engineering or mathematical error; it is due to the complexity of the phenomena being studied, of which, by nature, every algorithmic model is a simplification. As George Box famously said, " All models are wrong, but some are useful ."
Beyond a certain model quality threshold, reducing the number of false negatives inevitably comes at the cost of increasing the number of false positives. And vice versa. A choice must then be made: either more false positives or more false negatives. This trade-off is the precision/recall trade-off, very common in data science.
For screening tests, this trade-off amounts to asking the following question when finalizing the algorithmic model:
1. Do we prefer to experience a few more false positives: healthy individuals, but wrongly diagnosed as suffering from a pathology?
2. Or do we prefer to suffer a few more false negatives: sick individuals wrongly diagnosed as healthy?
In healthcare, the balance generally tips heavily toward the first answer, because the social cost of a false positive is relatively low and the social cost of a false negative potentially immense. Wrongly diagnosed with a condition, an individual gets away with a scare, which is disproven by a subsequent test. The cost of a false negative, however, is sometimes fatal.
COVID-19 and UIA-752
It is worth reiterating that this accuracy/recall trade-off is specific to each algorithmic model , and to each application domain.
To convince ourselves of this, let us consider, for example, an algorithmic model of air defense , responsible for detecting, on the basis of incoming data (radar data for example) whether an object approaching an airspace is a missile (positive label) or another non-military object, an aircraft for example (and in this case the detection test should be negative).
In this example, the trade-off between accuracy and recall is clearly much more delicate : the cost of a false negative (an enemy missile mistaken for an aircraft) can be human cost on the ground. Conversely, a false positive (a harmless object mistaken for an enemy missile, triggering an anti-aircraft response) can be human cost in the air. Recent dramatic examples (such as UIA Flight 752 in January 2020 over Iran) remind us that these issues are anything but a simple exercise in intellectual projection.

In conclusion
So, in summary, here's what lies behind this "paradox of 90% false positives": a mathematical side effect that makes us forget that the goal of screening tests is not to minimize the number of false positives, but to minimize the number of unknowingly contagious individuals (false negatives). These two objectives are not only different but are even, in their construction and implementation, marginally contradictory . And this isn't a paradox or an anomaly: with any algorithmic model, choices must be made, including the choice of its errors .
Why then this magnifying glass effect today on a characteristic that is, all things considered, quite standard in screening tests? There is a media factor, of course: the COVID-19 tidal wave and its anxieties, the global figures (especially athletes) who tested positive in error. The (mis)information behind any phenomenon requires a sustained effort of explanation.
But there are also entirely legitimate questions related to the specific characteristics of COVID-19 as a severe contagious disease, for which the social cost of a false positive is multiplied tenfold by the health and logistical constraints (quarantine) required to contain the disease. This cost is compounded by the fact that, most of the time, there is neither the possibility (few different types of tests) nor the practical necessity (the disease progresses rapidly) of correcting these diagnostic errors before incurring the consequences.
Does all this point to a need to adjust algorithmic models to counteract the percentage of false positives? This seems unlikely as long as the marginal (health) cost to society of a false negative remains significantly higher than the marginal (economic) cost of a false positive. This is another debate we will refrain from expanding on in this scientific blog.
PS: We hope you enjoyed reading this article, which once again demonstrates the strong interdependence between data science and its industrial applications. Some concepts have been simplified due to space constraints; we will add more details gradually.
------------------------------
* Follow us on LinkedIn to stay up-to-date on our upcoming articles:
Interested in our expertise? Let's discuss your projects together:
Or
* Our public GitHub repository:
-----------------------------
In addition to this article:
On Simpson's paradox, here are two videos from excellent French-speaking science communicators:
https://www.youtube.com/watch?v=vs_Zzf_vL2I (Amazing Science)
https://www.youtube.com/watch?v=0NbyYOcIwAY (Science4all)




