On Jun 10, 2020, at 11:22 AM, Bill Bane
<billfbane(a)gmail.com> wrote:
I have run into this need at times, and used a filtered classifier that:
a. adds higher-order attribute(s) using /AddExpression/, then
b. performs /LinearRegression /(with the /EliminateColinearAttributes /flag
turned off).
For example, using a cubic model approach:
weka.classifiers.meta.FilteredClassifier -F "weka.filters.MultiFilter -F
\"weka.filters.unsupervised.attribute.AddExpression -E a1^2 -N X-2\" -F
\"weka.filters.unsupervised.attribute.AddExpression -E a1^3 -N X-3\" -F
\"weka.filters.AllFilter \"" -W
weka.classifiers.functions.LinearRegression
-- -S 1 -C -R 1.0E-8 -additional-stats -num-decimal-places 4
In a couple of synthetic examples, this returns very similar accuracy to
SMOReg with a Poly Kernel of the same order (in this example, 3 -- with
UseLowerOrder = True). The advantage of the Linear Regression approach is
that the outputs are more interpretable, and the coefficients can easily be
used offline for scenario modeling.
I’m not quite following how this allows you to do LinearRegression on nonlinear, higher
order.
What I’m currently doing is copying the Instances and then taking the logs using Weka
MathExpression, this flattens the higher order to linear, then do LinearRegression.
The R^2 indicates decent results for that. So I am assuming I am getting reasonable
results and this is giving me a somewhat valid estimate of the degree of the nonlinear
power law/polynomial, which is all I really want. I see using it on a comparison basis
where higher degrees indicate more complexity and less scalability. So an ordering metric.
If it correctly indicates which cases scale better then accuracy on the exact degree
doesn’t really matter.
I did see somewhere that R^2 should not be used with nonlinear but with the log transform
this is actually linear and is a metric used in the Empirical Complexity paper which uses
the same approach.
Right now I’m doing some refactoring on the code to separate out a part that seems could
be reusable. I have another use in mind. Also packaging it since it is getting beyond the
simple command line tool I originally intended. Even considering a first attempt at
modular, basically, just because on that.