Chi-square automatic interaction detection
Chi-square automatic interaction detection (CHAID)[1] is a decision tree technique based on adjusted significance testing (Bonferroni correction, Holm-Bonferroni testing).[2] [3]
History
[edit ]CHAID is based on a formal extension of AID (Automatic Interaction Detection)[4] and THAID (THeta Automatic Interaction Detection)[5] [6] procedures of the 1960s and 1970s, which in turn were extensions of earlier research, including that performed by Belson in the UK in the 1950s.[7]
In 1975, the CHAID technique itself was developed in South Africa. It was published in 1980 by Gordon V. Kass, who had completed a PhD thesis on the topic.[2]
A history of earlier supervised tree methods can be found in Ritschard, including a detailed description of the original CHAID algorithm and the exhaustive CHAID extension by Biggs, De Ville, and Suen.[3] [1]
CHAID was used as the data mining technique. It is a technique based on multiway splitting to create discrete groups and understand their impact on the dependent variable. CHAID was preferred for analysis because of five major criteria:
1. A good proportion of input data was categorical;
2. Its efficiency in large datasets;
3. Its highly visual and ease of interpretation;
4. Ease of implementation/integration of business rules generated from CHAID in business; and
5. Input data quality can be handled efficiently[8] [9]
Properties
[edit ]CHAID can be used for prediction (in a similar fashion to regression analysis, this version of CHAID being originally known as XAID) as well as classification, and for detection of interaction between variables.[4] [5] [6]
In practice, CHAID is often used in the context of direct marketing to select groups of consumers to predict how their responses to some variables affect other variables, although other early applications were in the fields of medical and psychiatric research.[citation needed ]
Like other decision trees, CHAID's advantages are that its output is highly visual and easy to interpret. Because it uses multiway splits by default, it needs rather large sample sizes to work effectively, since with small sample sizes the respondent groups can quickly become too small for reliable analysis.[citation needed ]
One important advantage of CHAID over alternatives such as multiple regression is that it is non-parametric.[citation needed ]
See also
[edit ]- Bonferroni correction
- Chi-squared distribution
- Decision tree learning
- Latent class model
- Market segment
- Multiple comparisons
- Structural equation modeling
References
[edit ]- ^ a b Ritschard, Gilbert (2013). "CHAID and Earlier Supervised Tree Methods". Contemporary Issues in Exploratory Data Mining in the Behavioral Sciences, McArdle, J.J. And G. Ritschard (Eds). New York: Routledge: 48–74.
- ^ a b Kass, G. V. (1980). "An Exploratory Technique for Investigating Large Quantities of Categorical Data" . Applied Statistics. 29 (2): 119–127. doi:10.2307/2986296. JSTOR 2986296.
- ^ a b Biggs, David; De Ville, Barry; Suen, Ed (1991). "A method of choosing multiway partitions for classification and decision trees" . Journal of Applied Statistics. 18 (1): 49–62. Bibcode:1991JApSt..18...49B. doi:10.1080/02664769100000005. ISSN 0266-4763.
- ^ a b Morgan, James N.; Sonquist, John A. (1963). "Problems in the Analysis of Survey Data, and a Proposal" . Journal of the American Statistical Association. 58 (302): 415–434. doi:10.1080/01621459.1963.10500855. ISSN 0162-1459.
- ^ a b Messenger, Robert; Mandell, Lewis (1972). "A Modal Search Technique for Predictive Nominal Scale Multivariate Analysis" . Journal of the American Statistical Association. 67 (340): 768–772. doi:10.1080/01621459.1972.10481290. ISSN 0162-1459.
- ^ a b Morgan, James N. (1973). THAID, a sequential analysis program for the analysis of nominal scale dependent variables. Robert C. Messenger. Ann Arbor, Mich. ISBN 0-87944-137-2. OCLC 666930.
{{cite book}}
: CS1 maint: location missing publisher (link) - ^ Belson, William A. (1959). "Matching and Prediction on the Principle of Biological Classification" . Applied Statistics. 8 (2): 65–75. doi:10.2307/2985543. JSTOR 2985543.
- ^ Behera, Desik (Nov 2012). "Acquiring Insurance Customer: The CHAID Way". Research Gate. Retrieved 7 Aug 2025.
- ^ Kotane, Inta (September 2024). "APPLICATION OF CHAID DECISION TREES AND NEURAL NETWORKS METHODS IN FORECASTING THE YIELD OF CEREAL INDUSTRY COMPANIES". Research Gate. doi:10.17770/het2024.28.8264 . Retrieved 7 August 2025.
{{cite web}}
: CS1 maint: url-status (link)
Bibliography
[edit ]- Press, Laurence I.; Rogers, Miles S.; & Shure, Gerald H.; An interactive technique for the analysis of multivariate data, Behavioral Science, Vol. 14 (1969), pp. 364–370
- Hawkins, Douglas M.; and Kass, Gordon V.; Automatic Interaction Detection, in Hawkins, Douglas M. (ed), Topics in Applied Multivariate Analysis, Cambridge University Press, Cambridge, 1982, pp. 269–302
- Hooton, Thomas M.; Haley, Robert W.; Culver, David H.; White, John W.; Morgan, W. Meade; & Carroll, Raymond J.; The Joint Associations of Multiple Risk Factors with the Occurrence of Nosocomial Infections, American Journal of Medicine, Vol. 70, (1981), pp. 960–970
- Brink, Susanne; & Van Schalkwyk, Dirk J.; Serum ferritin and mean corpuscular volume as predictors of bone marrow iron stores, South African Medical Journal, Vol. 61, (1982), pp. 432–434
- McKenzie, Dean P.; McGorry, Patrick D.; Wallace, Chris S.; Low, Lee H.; Copolov, David L.; & Singh, Bruce S.; Constructing a Minimal Diagnostic Decision Tree, Methods of Information in Medicine, Vol. 32 (1993), pp. 161–166
- Magidson, Jay; The CHAID approach to segmentation modeling: chi-squared automatic interaction detection, in Bagozzi, Richard P. (ed); Advanced Methods of Marketing Research, Blackwell, Oxford, GB, 1994, pp. 118–159
- Hawkins, Douglas M.; Young, S. S.; & Rosinko, A.; Analysis of a large structure-activity dataset using recursive partitioning, Quantitative Structure-Activity Relationships, Vol. 16, (1997), pp. 296–302
External lkinks
[edit ]- Luchman, J.N.; CHAID: Stata module to conduct chi-square automated interaction detection, Available for free download, or type within Stata: ssc install chaid.
- Luchman, J.N.; CHAIDFOREST: Stata module to conduct random forest ensemble classification based on chi-square automated interaction detection (CHAID) as base learner, Available for free download, or type within Stata: ssc install chaidforest.
- IBM SPSS Decision Trees grows exhaustive CHAID trees as well as a few other types of trees such as CART.
- An R package CHAID is available on R-Forge.