Good generalized machine learning models should have high variability post learning 1. Tree-based approaches2 are very popular due to their inherent ability in being visually representable for decision consumption as well as robustness and reduced training times. However, tree-based approaches lack the ability to generate variations in regression problems. The maximum variation generated by any single tree-based model is limited to the maximum number of training observations considering each observation to be a terminal node itself. Such a condition is an overfit model. This paper discusses the use of a hybrid approach of using two intuitive and explainable algorithms, CART2 and k-NN3 regression to improve the generalizations and sometimes the runtime for regression-based problems. The paper proposes first, the use of using a shallow CART algorithm (Tree depth lesser than optimal depth post pruning). Following the initial CART, a KNN Regression is performed at the terminal node to which the observation for prediction generation belongs to. This leads to a better variation as well as more accurate prediction than by just the use of a CART or a KNN regressor as well as another level of depth over an OLS regression1.
Comments: 7 Pages.
[v1] 2020-01-07 02:09:22
Unique-IP document downloads: 9 times
Vixra.org is a pre-print repository rather than a journal. Articles hosted may not yet have been verified by peer-review and should be treated as preliminary. In particular, anything that appears to include financial or legal advice or proposed medical treatments should be treated with due caution. Vixra.org will not be responsible for any consequences of actions that result from any form of use of any documents on this website.
Add your own feedback and questions here:
You are equally welcome to be positive or negative about any paper but please be polite. If you are being critical you must mention at least one specific error, otherwise your comment will be deleted as unhelpful.