Changes

Jump to: navigation, search

Causal inference

11,181 bytes added, 23:22, 11 October 2018
m
{{expertneeded}}

'''Causal inference''' is the process of drawing a conclusion about a [[causal]] connection based on the conditions of the occurrence of an effect. The main difference between causal inference and inference of [[association (statistics)|association]] is that the former analyzes the response of the effect variable when the cause is changed.<ref name=Pearl_Journal>{{cite journal|last=Pearl|first=Judea|title=Causal inference in statistics: An overview|journal=Statistics Surveys|date=1 January 2009|volume=3|issue=|pages=96–146|doi=10.1214/09-SS057|url=http://ftp.cs.ucla.edu/pub/stat_ser/r350.pdf}}</ref><ref name=Morgan_book>{{cite book|last=Morgan|first=Stephen|author2=Winship, Chris|title=Counterfactuals and Causal inference|publisher=Cambridge University Press|year=2007|isbn=978-0-521-67193-4}}</ref> The science of why things occur is called [[etiology]]. Causal inference is an example of [[causal reasoning]].

==Definition==
Inferring the [[cause]] of something has been described as:
*"...reason[ing] to the conclusion that something is, or is likely to be, the cause of something else".<ref name=EB>{{cite web|title=causal inference|url=http://www.britannica.com/EBchecked/topic/1442615/causal-inference|publisher=Encyclopædia Britannica, Inc.|accessdate=24 August 2014}}</ref>
*"Identification of the cause or causes of a phenomenon, by establishing covariation of cause and effect, a time-order relationship with the cause preceding the effect, and the elimination of plausible alternative causes."<ref name=psy>{{cite book|author1=John Shaughnessy|author2=Eugene Zechmeister|author3=Jeanne Zechmeister|title=Research Methods in Psychology|date=2000|publisher=McGraw-Hill Humanities/Social Sciences/Languages|isbn=0077825365|pages=Chapter 1 : Introduction|url=http://www.mhhe.com/socscience/psychology/shaugh/ch01_concepts.html|accessdate=24 August 2014}}</ref>

==Methods==
Epidemiological studies employ different [[epidemiological method]]s of collecting and measuring evidence of risk factors and effect and different ways of measuring association between the two. A [[hypothesis]] is formulated, and then tested with statistical methods (see [[Statistical hypothesis testing]]). It is [[statistical inference]] that helps decide if data are due to chance, also called [[random variation]], or indeed correlated and if so how strongly. However, [[correlation does not imply causation]], so further methods must be used to infer causation.

Common frameworks for causal inference are [[structural equation modeling]] and the [[Rubin causal model]].{{citation needed|date=August 2014}}

==In epidemiology==
[[Epidemiology]] studies patterns of health and disease in defined populations of [[living beings]] in order to [[infer]] causes and effects. An association between an [[Exposure (environmental hazard)|exposure]] to a putative [[risk factor]] and a disease may be suggestive of, but is not equivalent to causality because [[correlation does not imply causation]]. Historically, [[Koch's postulates]] have been used since the 19th century to decide if a microorganism was the cause of a disease. In the 20th century the [[Bradford Hill criteria]], described in 1965<ref name="bh65">{{cite journal |last=Hill |first=Austin Bradford |year=1965 |title=The Environment and Disease: Association or Causation? |journal=[[Proceedings of the Royal Society of Medicine]] |volume=58 |pages=295–300 |url=http://www.edwardtufte.com/tufte/hill |pmid=14283879 |pmc=1898525 |issue=5}}</ref> have been used to assess causality of variables outside microbiology, although even these criteria are not exclusive ways to determine causality.

In [[molecular epidemiology]] the phenomena studied are on a [[molecular biology]] level, including genetics, where [[biomarkers]] are evidence of cause or effects.

A recent trend{{when|date=August 2014}} is to identify [[evidence]] for influence of the exposure on [[molecular pathology]] within diseased [[Tissue (biology)|tissue]] or cells, in the emerging interdisciplinary field of [[molecular pathological epidemiology]] (MPE).{{third-party-inline|date=August 2014}} Linking the exposure to molecular pathologic signatures of the disease can help to assess causality. {{third-party-inline|date=August 2014}} Considering the inherent nature of [[heterogeneity]] of a given disease, the unique disease principle, disease phenotyping and subtyping are trends in biomedical and [[public health]] sciences, exemplified as [[personalized medicine]] and [[precision medicine]].{{third-party-inline|date=August 2014}}

==In computer science==
Determination of cause and effect from joint observational data for two time-independent variables, say X and Y, has been tackled using asymmetry between evidence for some model in the directions, X → Y and Y → X. One idea is to incorporate an independent noise term in the model to compare the evidences of the two directions.

Here are some of the noise models for the hypothesis Y → X with the noise E:
* Additive noise:<ref>Hoyer, Patrik O., et al. "Nonlinear causal discovery with additive noise models." NIPS. Vol. 21. 2008.</ref> <math>Y = F(X)+E</math>
* Linear noise:<ref>Shimizu, Shohei, et al. "DirectLiNGAM: A direct method for learning a linear non-Gaussian structural equation model." The Journal of Machine Learning Research 12 (2011): 1225-1248.</ref> <math>Y = pX + qE</math>
* Post-non-linear:<ref>Zhang, Kun, and Aapo Hyvärinen. "On the identifiability of the post-nonlinear causal model." Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence. AUAI Press, 2009.</ref> <math>Y = G(F(X)+E)</math>
* Heteroskedastic noise: <math>Y = F(X)+E.G(X)</math>
* Functional noise:<ref name="Mooij">Mooij, Joris M., et al. "Probabilistic latent variable models for distinguishing between cause and effect." NIPS. 2010.</ref> <math>Y = F(X,E)</math>

The common assumption in these models are:
* There are no other causes of Y.
* X and E have no common causes.
* Distribution of cause is independent from causal mechanisms.

On an intuitive level, the idea is that the factorization of the joint distribution P(Cause,Effect) into P(Cause)*P(Effect | Cause) typically yields models of lower total complexity than the factorization into P(Effect)*P(Cause | Effect). Although the notion of “complexity” is intuitively appealing, it is not obvious how it should be precisely defined.<ref name="Mooij"/> A different family of methods attempt to discover causal "footprints" from large amounts of labeled data, and allow the prediction of more flexible causal relations.<ref>Lopez-Paz, David, et al. "Towards a learning theory of cause-effect inference" ICML. 2015</ref>

==In statistics and economics==
{{Main|Causality#Statistics and economics}}

In [[statistics]] and [[economics]], causality is often tested for using [[regression analysis|regression]]. Several methods can be used to distinguish actual causality from spurious indications of causality. First, the [[explanatory variable]] could be one that conceptually could not be caused by the [[dependent variable]], thereby avoiding the possibility of being misled by [[reverse causation]]: for example, if the independent variable is rainfall and the dependent variable is the [[futures price]] of some agricultural commodity. Second, the [[instrumental variables]] technique may be employed to remove any reverse causation by introducing a role for other variables (instruments) that are known to be unaffected by the dependent variable. Third, the principle that effects cannot precede causes can be invoked, by including on the right side of the regression only variables that precede in time the dependent variable. Fourth, other regressors are included to ensure that [[confounding variable]]s are not causing a regressor to spuriously appear to be significant. Correlation by coincidence, as opposed to correlation reflecting actual causation, can be ruled out by using large [[sample size|samples]] and by performing [[cross-validation (statistics)|cross validation]] to check that correlations are maintained on data that were not used in the regression.

==Education==
Graduate courses on causal inference have been introduced to the curriculum of many schools.

*[[Arizona State University]], Department of Statistics
*[[Duke University]], Department of Political Science<ref>{{Cite news|url=https://polisci.duke.edu/courses/POLSCI748|title=Introduction to Causal Inference|date=2015-05-21|work=Political Science|access-date=2018-08-26|language=en}}</ref>
*[[Saint Louis University]], College of Public Health & Social Justice
* [[Carnegie Mellon University]], Department of Philosophy
* [[Harvard University]], School of Public Health
* [[Johns Hopkins University]], Department of Computer Science, Bloomberg School of Public Health
* [[London School of Hygiene & Tropical Medicine]]
* [[Karolinska Institutet]], Department of Medical Epidemiology and Biostatistics
* [[McGill University]], Department of Epidemiology, Biostatistics and Occupational Health
* [[New York University]], Department of Applied Statistics, Social Sciences, and Humanities
* [[Northwestern University]], Department of Sociology and Kellogg School of Management
* [[University of Pittsburgh]], Department of Psychology in Education
* [[University of Groningen]], Department of Statistics & Measurement Theory
* [[University of California, Los Angeles]], Department of Epidemiology and Department of Computer Science
* [[University of California, Berkeley]], School of Public Health
* [[University of Copenhagen]], Department of Public Health
* [[University of Pennsylvania]], Department of Biostatistics and Epidemiology
* [[University of Texas]], Department of Educational Psychology <ref>https://education.utexas.edu/departments/educational-psychology/graduate-programs/quantitative-methods/required-courses-doctoral</ref>
* [[The University of British Columbia]], School of Population and Public Health
* [[Vanderbilt University]], Department of Leadership, Policy, and Organizations, Department of Biostatistics
* [[Stevens Institute of Technology]], Department of Computer Science <ref>http://www.skleinberg.org/teaching/CI15/index.html</ref>
* [[University of North Carolina at Chapel Hill]], Department of Biostatistics <ref>http://www.bios.unc.edu/~mhudgens/bios/776/2017/bios776.html</ref>
* [[University of California, Irvine]], Department of Statistics <ref>https://www.ics.uci.edu/~sternh/courses/265/</ref>

== See also ==
* [[Granger causality]]
* [[Multivariate statistics]]
* [[Partial least squares regression]]
* [[Pathogenesis]]
* [[Pathology]]
* [[Regression analysis]]
* [[Transfer entropy]]

== References ==
{{Reflist}}

==External links==
{{Commonscat}}
*[http://clopinet.com/isabelle/Projects/NIPS2013/ NIPS 2013 Workshop on Causality]
*[http://webdav.tuebingen.mpg.de/causality/ Causal inference at the Max-Planck-Institute for Intelligent Systems Tübingen]

{{Portal bar|Science}}

[[Category:Causal inference| ]]
[[Category:Graphical models]]
[[Category:Regression analysis]]
[[Category:Inductive reasoning]]
[[Category:Philosophy of statistics]]
Anonymous user

Navigation menu