What are the disadvantages of random forest?

What are the disadvantages of random forest?

Disadvantages of random forests

Prediction accuracy on complex problems is usually inferior to gradient-boosted trees. Moreover, aforest is less interpretable than a single decision tree. Single trees may be visualized as a sequence of decisions.

Let’s look at the disadvantages of random forests:

1. It is a difficult tradeoff between the training time (and space) and increased number of trees. The increase of the number of trees can improve the accuracy of prediction. However, random forest often involves higher time and space to train the model as a larger number of trees are involved. Assume that the maximum number of weak learners (decision trees) in the random forest is n. Generally speaking, if n is too small, underfitting is easy to occur. If n is too large, the amount of calculation will be increased, and as n reaches a certain number, the model will be improved very little with increasing n, so, a moderate value should be found, tree number n, and learning efficiency should be considered together. And the tradeoff between them is a difficult problem for solving.

2. According to an example, when selecting stocks from the CSI300 Index components, random forest may lose effectiveness. According to the importance score of factors. Market value and reversal factor greatly affect random forest. Which can generally achieve good prediction results under the premise that the environment does not change a lot. But it is sensitive to parameters, noise, environmental changes and other factors. Therefore, in this example, random forest for stock selection performed well from 2011 to 2016, but had poor performance since 2017.

In addition, compared with the linear regression model, random forest has no obvious advantage in Maximum Drawdown, and its Maximum Drawdown sometimes can even be larger.

3. Random forest may not get good results for small data or low-dimensional data (data with few features). Since the randomness becomes greatly reduced. Processing high-dimensional data and feature-missing data are the strengths of random forest.

4. Random forest may overfit for data with much noise. Decision trees tend to be overfitted in prediction, random forest reduces the degree of overfitting through voting, but its prediction is still overfitting compared to linear model, which is characterized by good matching of existing data. But very conservative with unknown data, and high probability of false negative error.

5. Random forest is like a black box that we have little control over. Its computations may go far more complex compared to other algorithms. And it is not easily interpretable-it provides feature importance but it does not provide complete visibility into the coefficients as linear regression.

Artificial Intelligence & Machine Learning

Yann LeCun

What are the disadvantages of random forest?