In a previous article, I introduced a model which uses fundamental data to estimate Dividend Strength, P(“same or more dividends next year” | “predicted to be giving same or more dividends next year”). Numerous people have shown interest in this estimation model, therefore I decided to spend more time improving its performance.
Its performance have increased from AUC (Area Under ROC) of 0.65 to 0.7. To put it simply, its performance when pitted against random guessing has improved from 30% (0.65 / 0.5) to 40% (0.7 / 0.5), as random guessing would give us an AUC of 0.5.
I have also added a new page so that its results can be conveniently viewed.
This is more for my future reference. It is totally fine to skip this section.
The improved performance is via additional features, more powerful classifier and more training data.
Additional Features: Previously, features were simply added via the permutation of what is available. This round, I added new features using my domain knowledge of what might be useful for predicting future dividends. In addition, I also ensured that all metrics in iScreener are added as features whenever possible. I also added a new type of feature (SGXKeyStatsChangeRatioFeature). In total, the number of features has increased from 884 to 1305.
More Powerful Classifier: I changed Random Forest (100) to Random Forest (1000). This increased the training time, but since I am only rebuilding this model once a week, it is still fine.
More Training Data: Previously, I built the model using ten fold cross-validation and to increase the number of training data available, I increased it to twenty fold. This increased both training and testing time (20 instead of 10 models) but again, since I am only updating once a week, it is still acceptable.