CCEA and DNFEA

버전 1.0.5 (58.3 KB) 작성자: John Hanley
The conjunctive clause evolutionary algorithm (CCEA) and the disjunctive normal form evolutionary algorithm (DNFEA); with examples.
다운로드 수: 93
업데이트 날짜: 2020/5/4

라이선스 보기

The conjunctive clause evolutionary algorithm (CCEA) and the disjunctive normal form evolutionary algorithm (DNFEA) were created to find complex interactions associated with real-world data with nominal and possibly ordinal outputs. The CCEA and DNFEA perform supervised learning to find complex, multivariate correlations with a specific target outcome (e.g., disease). The CCEA is capable of finding feature (epistatic) interactions in datasets that have noise, missing data, and/or multiple data types (i.e., continuous, ordinal, and nominal). The CCEA also has the capability of using a feature sensitivity function to help prevent the archiving of overfit feature interactions. The DNFEA is used after the CCEA to find heterogeneous combinations that may have a stronger correlations with an output category than any single conjunctive clause. Both the CCEA and DNFEA use the hypergeometric probability mass function as a fitness function.

인용 양식

Hanley, J.P., Rizzo, D.M., Buzas, J.S., and Eppstein, M.J. "A Tandem Evolutionary Algorithm for Identifying Causal Rules from Complex Data.", Evolutionary Computation, accepted subject to final editorial review, 2019. Abstract: We propose a new evolutionary approach for discovering causal rules in complex classification problems from batch data. Key aspects include (a) the use of a hypergeometric probability mass function as a principled statistic for assessing fitness that quantifies the probability that the observed association between a given clause and target class is due to chance, taking into account the size of the dataset, the amount of missing data, and the distribution of outcome categories, (b) tandem age-layered evolutionary algorithms for evolving parsimonious archives of conjunctive clauses, and disjunctions of these conjunctions, each of which have probabilistically significant associations with outcome classes, and (c) separate archive bins for clauses of different orders, with dynamically-adjusted order-specific thresholds. The method is validated on majority-on and multiplexer benchmark problems exhibiting various combinations of heterogeneity, epistasis, overlap, noise in class associations, missing data, extraneous features, and imbalanced classes. We also validate on a more realistic synthetic genome dataset with heterogeneity, epistasis, extraneous features, and noise. In all synthetic epistatic benchmarks, we consistently recover the true causal rule sets used to generate the data. Finally, we discuss an application to a complex real-world survey dataset designed to inform possible ecohealth interventions for Chagas disease.

MATLAB 릴리스 호환 정보
개발 환경: R2016a
모든 릴리스와 호환
플랫폼 호환성
Windows macOS Linux
카테고리
Help CenterMATLAB Answers에서 Statistics and Machine Learning Toolbox에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!
버전 게시됨 릴리스 정보
1.0.5

Fixed a couple of small bugs in the DNFreducepop function.

1.0.4

Small bugs were fixed in CCreducepop, FeatureSensitivity, and CCSensitivity functions.

1.0.3

Uploaded an image picture for MathWorks website.

1.0.2

A Read_Me text file was added. Also, more information was added to the example problems such as how one could plot the results and how to convert interesting DNFs into a more readable format.

1.0.1

Updated example problems.

1.0.0