Proceedings of the 4rd RapidMiner Community Meeting and Conference (RCOMM 2013)
Authors
More about the book
Because of costs and scarcity, datasets are often highly imbalanced, with a large majority class and a far smaller minority class. Typical examples of imbalanced datasets are healthy versus diseased tissue measurements, lawful versus criminal banking transactions, and correctly priced versus mispriced financial instruments. Constructing classifiers from imbalanced data presents significant theoretical and practical challenges. Validation is also affected by imbalance, as a trivial classifier that ignores its input and always predicts the majority class will appear prescient. This presentation surveys class imbalance from a conceptual perspective, and empirically investigates several RapidMiner approaches to constructing classifiers from imbalanced data. Finally, the presentation describes a set of broadly applicable RapidMiner processes that detect, construct, and evaluate classifiers with imbalanced data.