Reflections on the Nature and Future of Systematic Review in Healthcare.
Barry Robson PhD DSc
(Rough Draft, Version 1.0, Feb 22 2015)
1. Systematic Review
A systematic review (SR) is “a high-level overview of primary research on a particular research question that tries to identify, select, synthesize and appraise all high quality research evidence relevant to that question in order to answer it” . The systematic review is an approach to decision making to healthcare that is promoted by the Cochrane Collaboration . The Cochrane Collaboration prepares, maintains and promotes systematic reviews to inform healthcare decisions as “Cochrane Reviews”. The approach is based upon the views expressed by Archie Cochrane  as to what strategy should be applied in healthcare in order to best benefit patients, and not least to avoid harming them. The general approach is consistent with that David Sackett  and Muir Gray  who framed, implemented, and refined Cochrane’s proposed methods as “Evidence Based Medicine”.
2. The Scope of Systematic Review.
The general kind of activity inherent in SR has been the task for scholars for centuries when examining diverse kinds of prior art and knowledge prior to any supposed new research. Strictly speaking, however, SR as a modern practice in its own right neither extends to consideration of identifying the possibility of the endeavor in the first place, nor to the issue of whether the endeavor is a good or useful thing to do. An SR practitioner per se (though he or she may be other things at the same time, including a general researcher, inventor or entrepreneur) is commissioned to complete one link in the chain of an actual or intended workflow. He or she examines reports on studies that provide evidence considered relevant as to whether some proposition X presented is indeed the case. “Scientific” includes domains as diverse as sociology, archeology, anthropology, economics, business intelligence, and marketing and so on and so forth, but many if not all of the principles are still further extensible to other kinds of scholarly pursuit, and onward still to activities as broad and diverse as student learning, espionage, and security surveillance. Experienced scientific researchers perform tasks similar to SR but related to learning and surveillance to keep up mastery of the field or for professional enhancement. In the present case, it is the notion of SR as a specific link in an organized workflow to enable a specific decision point in the field of medicine, including biomedicine, pharmaceutical science, and healthcare, that is of interest. For such purposes, and especially but not solely for supporting decisions in everyday clinical practice, the SR principles adopted by the Cochrane Collaborations are embodied in the “Cochrane Handbook for Systematic Reviews of Interventions” .
3. The Current Challenges of Systematic Review.
In many respects SR would seem to be a more disciplined form of the same kind of everyday activity to which the human brain is well suited, i.e. it invokes the same fundamental challenges that we constantly encounter in seeking to understand the everyday world and in seeking to formulate that understanding as actionable knowledge. Admittedly, in the simplest SR scenario one “merely” seeks, reads, seeks to understand, some kind of report A of a scientific or similar study, and from that provide argument as to why the conclusions are relevant to, and actionable for, some kind of endeavor. Like many aspects of everyday life, however, the task is rendered more difficult because there is usually not just a single source A available, but also B, C, D, etc. that the notion of “all and best evidence” behooves us to take into consideration. According to the Cochrane collaboration, “Systematic reviews seek to collate all evidence that fits pre-specified eligibility criteria in order to address a specific research question” and “Systematic reviews aim to minimize bias by using explicit, systematic methods’ . This is, of course more easily said than done. Each source may provide ways to address only pieces of the puzzle, and there are several issues relating to the matter of which of these are “eligible”. “Eligibility”, moreover, is a matter to be gleaned objectively from the analysis. In practice, any answer to a piece of the puzzle from one source is not necessarily consistent with those rendered by other sources, but there is no case for saying only one is clearly right. Also, one or more sources such as D may also be some kind of review of the others as primary sources, more analogous to a juror than witnesses A, B, C, etc. in a court of law. The analogy of a court of law is a useful one that to which there is frequent allusion below, useful because it is insightful. For example, in regard to the above juror and witness analogies, we can immediately extend to the idea that even information from witnesses might have a juror aspect, i.e. be “contaminated” by some similar aspect of belief and prior prejudice. Such components of information may not rest on prior beliefs shared by all, be anecdotal and opinionated, and also subject to misconception and possibly making additional errors about what was actually the case. But such review content of any source, whether explicit, tacit, or somewhat concealed in a document, is not to be discarded lightly, because it may still of value in matters of unobvious interpretation and by inclusion of further relevant knowledge.
4. How “Hard” is Systematic Review?
The word “hard” here is certainly meant in the occupational task sense, but also as used in mathematics, and particularly as used in computation. The above Section 3 suggests that fully automating the task would not be easy, since many aspects of Artificial Intelligence are implied. Nonetheless, a useful question to ask is about the extent to which, and for what parts of the task, a systematic reviewer could be replaced by an artificially intelligent agent. One can envision a test somewhat like the Turing test  in which an expert in SR is asked whether a SR study and report was done by a human expert, or a smart computer program with access to the web. The real importance of this question includes helping consider what parts will relatively easy for a computer to do in principle, compared with a human being, and on which parts would be relatively easy for a human being, and for the immediate future to focus attention on what algorithms, in the general “recipe” or “protocol” sense, should be adopted by human systematic reviewers.
5. Meta-Analysis and Probabilistic Statements.
The related discipline to SR is “meta-analysis” (MA) and there has in the past been some confusion as to the nature of that relation. Interpreting “analysis” broadly, as both qualitative and quantitative, systematic review is seen as a subset of MA . Interpreted as “data meta-analysis”, i.e. as an aspect of data analytics including, amongst other quantitative aspects, statistics and quantitative decision support, it is “just” a tool. In the larger modern picture, however, these two, i.e. overall reasoning and data analytics, are less distinct. This is because tools of artificial intelligence and text analytics see natural language and thought as a matter of probabilistic semantics . Overall conclusion depends on the perceived degree of truth of statements proving pro and contra evidence. In the present author’s view, this is in the sense of the argumentation model, in which, as in a court of law, final conclusions as to truth of a proposition are reached by combination of evidence for and against, each piece being a statement contributing evidence having different weights. These weights may have diverse origin, being associated with uncertainty as probability in diverse manifestations of belief, scope, and statistical census. How might we envisage quantifying this, at least in principle, and in general terms?
6. The Four Pillars of Evidence.
In one sense, matters are simple, at least in the “from-a-thousand-feet” perspective. The SR process is just another kind of black box that takes prior evidence as input and makes predictions to support decisions. As with diagnostic tests, simple EBM measures as predictors, more complicated risk factors, a still more elaborate clinical decision support program, and importantly as with the correctness and error types of single statistical procedures, the argumentation model ultimately depends on the quantification of, ideally, four forms of positivity and negativity of statements. Even when rendered semantically, but most conveniently then categorically, e.g. as when seeing “dogs chase cats” as “dogs are cat-chasers”, the real associated probability of interest is not a single real or scalar quantity, but at very least a real valued 2 x 2 matrix or a vector or a scalar that is complex. It is complex in the sense of having one or imaginary parts, i.e. components multiplied by imaginary numbers. We can relate this to, though not directly, to “all dogs chase cats”, “no dogs chase cats”, “other than dogs chase cats”, and “other than dogs do not chase cats. The fact that these are not necessarily stated with probability one means that the existential “some” as well as universal “all” description is implied. For example, arguably and theoretically it is the extent to which “some dogs are cat chasers” and “some cat chasers are dogs” have measurable probabilities that are equal in value. Such ideas also map to the well-known true and false positives, and true and false negatives, and ideally we can test their positive and negative predictive power and other quality measures as a new version of the grid in which the probabilities are now based on counts of success and failure. Each source of information, and the final integrated result, relates to four pillars of evidence captured in the cells of the 2 x 2 grid, and we might imagine both the near-final result, and each piece of contributing evidence, as associated with such a grid. The overall judgment process involves putting together a final such 2 x 2 grid by integrating many such grids.
Correspondingly, in a court of law, we have perhaps many statements of evidence that the accused committed the crime, that what the accused did was not the crime, that someone other than the accused committed the crime, and that what someone other than the accused did was not the crime. That the defense should suddenly prevent evidence that someone other committed the crime is important, and no less would be the consequent counter-argument that it was not the crime, or not the crime under consideration, that the new suspect committed. In short, similar considerations apply to SR as to any decision making or predictive process, and they are similarly probabilistic, albeit probability that entangles matters of credibility, belief, uncertainty, error, and scope. Combing many conceptual 2 x 2 grids from many studies and diverse types of evidence is not easy. Moreover, very often, however, we do not obtain four pillars of evidence from any one specific cited study. For example, in classical and frequentist statistics, the investigator merely has to prove that the null hypothesis, i.e. that in some deliberately vague way the drug or whatever dos not work, is improbable.
Here, a moment should be taken to put aside probabilistic aspects of semantics that are certainly relevant, and which are called upon in practice, but not core. Essentially, one needs to distinguish probabilities of association in text analytics and from probabilistic generators or analyzers of grammatical structure. Here we can put aside digressions into primary sources of linguistic theory since it is the sense of relevance in the field that matters! Traditionally, authors like Bach  identify two major theses on the character of natural language and thought, Chomsky’s thesis, i.e. natural languages can be described as formal systems, and Montague’s thesis, i.e. natural languages can be described as interpreted formal systems, but there is a growing feeling that information theoretic models are more appropriate [7, 9, 10]. For example, “The Harris-Jelinek thesis: natural languages can be described as information theoretic systems, using stochastic models that express the distributional properties of their elements” [10, 11].
What is still important is that these latter notions are nonetheless relevant. The task would not be difficult if all sources were structured tabular form expressed in a universal format and ontology. It would be easier still if already expressed in probabilistic statements in a 2 x 2 grid, though that still leaves open the efficacy of the definitions implied in the statements, and on interpretation of the kind of probability presented with each statement. In practice, however, the hard part of the process within the “black box” is in converting source natural text and pictures, i.e. “unstructured information”, into understanding expressed as evidence with weights. Correct parsing of source text depends on a probabilistic grammar in the sense of the preceding paragraph, but also on context, and particularly it depends on our prior idea of the most likely knowledge or opinion which the author was intending to express. This depends on our common sense and world view, and our opinion of the common sense and world view of the author. For example, as is well known, much of humor depends on misinterpretation and improbable, but on reconsideration most likely correct, re-interpretation: “One morning I shot an elephant in my pajamas. How he got into my pajamas I’ll never know.” (Grouch Marx) . The linguistic issue is to what the proposition “in” applies, but ultimately, the issue is also ontological, depending on the how any interpretation of parsing fits the perceived degree of truth of the statement “elephants can wear pajamas”. Indeed, we have need to consider the degrees of truth of four statements that elephants belong to the set of pajama-wearing things, that entities other than elephants belong to the set of pajama-wearing things, that elephants do not belong to the set of pajama-wearing things, and that entities other than elephants do not belong to the set of pajama-wearing things. As with analogies in diagnosis and courts of law, the latter pillars matter, since a priori, it is Groucho, not the elephant, that most likely belongs to the pajama-wearing set.
7. Towards Automated Systematic Review.
The author and colleagues proposed a web based universal exchange and inference language for healthcare and biomedicine called Q-UEL . It seeks to address many of the challenges that would arise in automating SR on a computer, so partly, and later extensively, automated SR is becoming an important, test case and use case. It relies heavily on automated text analytics of web entries, including scientific papers to auto-surf and spawn captured statements of knowledge as XML-like tags, and it addresses all the above probabilistic semantic and 2 x 2 grid ideas. Though XML-like, Q-UEL is suited to manage uncertainty in medical information and decision making, because the language also has algebraic force. It is quantitative and more specifically usually probabilistic, based on notions of information and probability that have been widely used in bioinformatics for many years  and on their re-expression in recent tools for mining archives of many electronic health records as “Big Data” in medicine .
Importantly, it is also based on the Dirac notation  and the vector-matrix algebra that it implies, widely used in quantum mechanics since the 1940s . Its formulation in terms of a special kind of complex value based on a complex number rediscovered by Dirac, the hyperbolic imaginary number h such that hh = +1, has been shown as applicable to inference and decision making in the classical everyday world of human experience . Compared with the conservative field of EBM and public health reporting, clinical decision support systems have a tradition of being innovative and somewhat controversial, a tradition [13, 19] that goes back to the MYCIN project . Q-UEL is targeted at the medical part of the emerging Semantic Web [21, 22], and the most important XML-like tag form,
<subject expression | relationship expression | object expression>
relates not only to Dirac’s bra-operator-ket, but also to the Semantic Web’s semantic triple subject-relationship-object .
Strictly speaking, Q-UEL is aimed at SR and other manifestations of inference and reasoning in a future probabilistic Semantic Web, a “Thinking Web” WW4 that will be essential to handle measure, risk, and uncertainty in medicine, and quantitative Evidence Based Medicine and epidemiology in general. Problems regarding probability on the Semantic Web are still a major bottleneck , resolution of which is a Q-UEL mission . A Bayes’ Net approach to decision support and to the probabilistic Semantic Web is perhaps most popular, but certainly not alone, and diversity of exotic offerings is the norm [13, 24]. The problem is a Bayes Net has unrealistic restrictions, confining semantic relationships in knowledge networks to a “directed acyclic graph” of relationships between things, states, events, observations and measurements . Quantum mechanics, and indeed classical physics, sees no such restriction in the real world. Dirac regarded his methods as applicable to all systems of human thought where numbers were involved .
Q-UEL reasoning systems are currently embodied as Hyperbolic Dirac Nets  related to Bayes’ Nets, and in POPPER, a “Simple Programming Language for Probabilistic Semantic Inference in Medicine” . Other Q-UEL applications relate to source
representation and interoperability, i.e. use in clinical practice for representing, communicating, and data mining electronic health records as requested by the US President’s Council of Advisors in science and Technology (PCAST) . It was seen from the outset as global and capable of queries anywhere, anytime, by authorized persons, and so beneficially placed on the web . The Semantic Web community has (although somewhat belatedly compared with Q-UEL) responded , but only Q-UEL has so far implemented probability, as well as a process called “disaggregation” (reversible shredding) for privacy, consent, and authority mechanisms that PCAST requested . In practice, the government recommendations seem to be not so much towards Semantic Web approaches but toward using Entity Attribute Value models [32-34]. This trend is also interesting, because while these models are often criticized for neglecting relationships, Q-UEL is more precisely a probabilistic relation entity attribute value model.
Because Q-UEL tags comprises statements of knowledge and also metastatements (rules of grammar, linguistic definitions, and logic that manipulate statements), and because both download to build inference networks and engines, Q-UEL also relates to the OPENCOG initiative . “OpenCog, as a software framework, aims to provide research scientists and software developers with a common platform to build and share artificial intelligence programs” . That project uses the notion of “atoms” that are typically nouns like “cat”, and bonds as relationships between them . We can use this idea to put together some brief account of Q-UEL. In terms of Q-UEL and its applications like POPPER, we have in effect <atom | relationship| atom> as <attribute | relationship | attribute>. In Q-UEL, an attribute can be a quality rather than concrete or a “noun phrase” with its own ontological graph structure, and more generally and most frequently, as noted above, Q-UEL uses <subject expression | relationship expression | object expression> where the expressions are typically logical expressions of attributes as arguments. Following quantum mechanical rules, we can embed (nest) examples of these as arguments in other examples, making up a nest parsed structure of a sentence or knowledge graph. Better still, Q-UEL’s probabilistic tags with h-complex probabilities expressing a dual of probabilities x and y such that (x,y) = (y,x)*. It is subject to complex conjugation * that changes there sign of the imaginary part, and such that in the case of a Hermitian relationship as is usually the case [13, 28]
<subject expression | relationship expression | object expression> = <object expression | relationship expression | subject expression>*
along with the active-passive transform of semantic equivalence such that
<object expression | relationship expression | object expression> = <subject expression | (relationship expression)* | object expression>
There is, of course, more to be said about their detailed use. It is these entities that make up a POPPER inference network , and for simple case of conditional relationships or their inverses where the relationship is “if”, “are”, “is”, “causes”, i.e. like in a Bayes’ Net, then we have a simpler Hyperbolic Dirac Net  effectively a Bayes’ Net without the unrealistic restrictions.
(Dates are in US format, month first)
1. http://community.cochrane.org/about-us/evidence-based-health-care (last accessed 2/22/2015).
2. Cochrane A. L., Effectiveness and Efficiency : Random Reflections on Health Services. London: Nuffield Provincial Hospitals Trust, 1972. Reprinted in 1989 in association with the BMJ. Reprinted in 1999 for Nuffield Trust by the Royal Society of Medicine Press, London, ISBN 1-85315-394-X. (temporarily out of print; new edition scheduled for early 2013)
3. Sackett D.L, Rosenberg W.M.C., Gray J.A.M., Haynes R.B., Richardson W.S. 1996. Evidence based medicine: what it is and what it isn’t. BMJ 312: 71–2 (1996)
4. Gray J. A. M. 1997. Evidence-based healthcare: how to make health policy and management decisions. London: Churchill Livingstone (1997).
5. http://en.wikipedia.org/wiki/Turing_test (last accessed 2/23/2015).
6. Higgins, J.P.T. and Green, S. (Eds.) Cochrane Handbook for Systematic Reviews of Interventions, http://handbook.cochrane.org/ (last accessed 2/22/2015).
7. The Himmelfarb Health Sciences Library, https://himmelfarb.gwu.edu/tutorials/studydesign101/systematicreviews.html#cochrane (last accessed 2/22/2015).
8. Goodman, N. D. and Lassiter,D. Probabilistic Semantics and Pragmatics:
Uncertainty in Language and Thought, https://web.stanford.edu/~ngoodman/papers/Goodman-HCS-final.pdf (last accessed 2/22/2015).
9. Bach, E. Informal Lectures on Formal Semantics, Albany: SUNY Press (1989).
10. van Eijck, J. and Lappin, S., Probabilistic Semantics for Natural Language, http://www.dcs.kcl.ac.uk/staff/lappin/papers/vaneijck-lappinLIRA12.pdf (last accessed 2/22/2015).
11. Pustejovsky, J., Distinguishing Possible and Probable in Linguistic Theory, (2014) http://jamespusto.com/wp-content/uploads/2014/07/Prague-Prelim-2014.pdf (last accessed 2/22/2015).
13. B. Robson, T. P. Caruso and U. G. J. Balis, Suggestions for a Web Based Universal Exchange and Inference Language for Medicine, Computers in Biology and Medicine, 43(12), 2297 (2013).
14. B. Robson, Analysis of the Code Relating Sequence to Conformation in Globular Proteins: Theory and Application of Expected Information, Biochem. J141, 853-867 (1974).
15. I. M. Mullins, I. M., M.S. Siadaty, J. Lyman, K. Scully, G.T. Garrett, G. Miller, R. Muller, B. Robson, C. Apte, C., S. Weiss, I. Rigoutsos, D. Platt, and S. Cohen, Data mining and clinical data repositories: Insights from a 667,000 patient data set, Computers in Biology and Medicine, 36(12) 1351 (2006).
16. P.A.M. Dirac (1939). A new notation for quantum mechanics, Mathematical Proceedings of the Cambridge Philosophical Society 35 (3): 416–418
17. P. A. M. Dirac, The Principles of Quantum Mechanics, First Edition, Oxford University Press, Oxford (1930).
18. S. Deckelman and Robson, B. “Split-Complex Numbers and Dirac Bra-Kets” Communications in Information and Systems (CIS), in press (2015).
19. R. A. Greenes (Ed.), Clinical Decision Support, Academic Press (2006).
20. B. Buchanan, E.H. Shortliffe, Rule Based Expert Systems. The Mycin Experiments of the Stanford Heuristic Programming Project, Addison-Wesley: Reading, Massachusetts (1982).
21. http://en.wikipedia.org/wiki/Semantic_Web (last access 3/30/2013).
22. http://en.wikipedia.org/wiki/Resource_Description_Framework (last accessed 4/10/2013).
23. http://en.wikipedia.org/wiki/Triplestore (last accessed 6/5/2013).
24. L. Prediou and H. Stuckenschmidt, H. Probabilistic Models for the SW – A Survey. http://ki.informatik.uni-mannheim.de/fileadmin/ publication/ Predoiu08Survey.pdf (last accessed 4/29/2010) (2009).
25. J. Pearl, Probabilistic Reasoning in Intelligent Systems. San Francisco CA: Morgan Kaufmann (1985).
27. B. Robson, Hyperbolic Dirac Nets for Medical Decision Support. Theory, Methods, and Comparison with Bayes Nets, Computers in Biology and Medicine, 51: 183 (2013).
28. Robson, B. POPPER, a Simple Programming Language for Probabilistic Semantic Inference in Medicine. Computers in Biology and Medicine, Computers in biology and Medicine”, 56, 107 (2015).
29. http://www.whitehouse.gov/sites/default/files/microsites/ostp/pcast-health-it-report.pdf (last accessed 2/2/2014).
30. http://yosemitemanifesto.org/ (last accessed 7/5/2014).
31. Robson, B., Caruso, T, and Balis, U. G. J. “Suggestions for a Web Based Universal Exchange and Inference Language for Medicine. Continuity of Patient Care with PCAST Disaggregation.” Computers in Biology and Medicine, 56, 51 (2015).
32. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2110957/ (last accessed 1/5/2014).
33. http://www.ehps-net.eu/article/intermediate-data-structure-ids-longitudinal-historical-microdata-version-4 (last accessed 1/5/2014).
34. G. Alter, and K. Mandemakers, The Intermediate Data Structure (IDS) for Longitudinal Historical Microdata, version 4. Historical Life Course Studies, Vol.1, 1-26. http://hdl.handle.net/10622/23526343-2014-0001?locatt=view:master
35. http://wiki.opencog.org/w/About_OpenCog (last accessed 2/22/2015).
36. http://wiki.opencog.org/w/Atom_types (last accessed 2/22/2015).