[Wekalist] Re: how weka predicts in logistic regression?

ouyeyu panyu ouyeyu at gmail.com
Sat Jul 28 05:46:14 NZST 2012


Anyone can help me on this issue?


2012/7/26 ouyeyu panyu <ouyeyu at gmail.com>

> Hi there,
>
> Do you know how weka predicts in logistic regression?
> The common formula to calculate prediction probability is as below.
> [image: \mathbb{E}[Y_i|\mathbf{X}_i] = p_i =
> \operatorname{logit}^{-1}(\boldsymbol\beta \cdot \mathbf{X}_i) =
> \frac{1}{1+e^{-\boldsymbol\beta \cdot \mathbf{X}_i}}] However, by putting
> weka model coefficients and test data into this formula, I can NOT get same
> probability as Weka yields.
>
> An example is as below.
> *model_coeffs                                                test_data*
> =============================================
> Intercept -4.440248
> price     0.00000271591                                   474700
> sqft       -0.0003140287                                    1693
> ltv_new  0.6966173                                          2
> age_cat  -0.001434963                                    4
> current_hold_days -0.0001069998                  1947
> appr_since_last_sqr 0.002447026                   0.0961
>
> BX = -4.440248 + 0.00000271591 *474700 - 0.0003140287*1693 + 0.6966173*2 -
> -0.001434963*4 - -0.0001069998*1947 + 0.002447026*0.0961 = -3.779602634944
>
> probabilityOfTestData = 1/(1+e^3.779602634944) = 0.0223
>
> However, the probabilityOfTestData that weka's distributionForInstance
> method yielded is 0.07563123767893659.
>
> Do you know why the two values are so different?
> This issue is important to my work, any idea would be appreciated.
> Thanks in advance.
>
>
> 2012/7/25 ouyeyu panyu <ouyeyu at gmail.com>
>
>> *Hi Mark and Michael,*
>>
>>
>> I'm trying to do logistic regression via WEKA.
>>
>> First I trained the model against trainData.
>>     val cf_train: Logistic = new Logistic();
>>     cf_train.buildClassifier(trainData);
>>
>> WEKA generated a model as below.
>> Variable
>> ==========================================
>> price                                    0
>> sqft                                0.0003
>> nod                                -2.8426
>> ltv                                 -0.6966
>> age                               0.0014
>> chd                                 0.0001
>> asl                                    -0.0024
>> Intercept                           4.4403
>>
>>
>> Then I do prediction against testData.
>>     for (i <- 0 to (testData.size()-1))
>>     {
>>       val predictedValue: Array[Double] =
>> cf_train.distributionForInstance(testData.get(i));
>>     }
>> For a specific record, its predictedValue = 0.07563123767893659.
>>
>> I want to verify if this value is correct, so I did some manual
>> calculation by substituting the trained coefficients into the following
>> formula
>> manual_value = intercept + price*testPrice + sqft*testSqft + nod*testNod
>> + ltv*testLtv + age*testAge + chd*testChd + asl*testAsl
>>                        = 4.4403    + 0                      + 1693*0.0003
>> + 0                  -0.6966*2    + 0.0014*4      + 0.0001*1947
>> -0.0024*0.0961
>>                        = 3.75506936
>>
>> So my manual_value is totally different from weka's predictedValue.
>> Do you know why this happens?
>> Thanks in advance.
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://list.waikato.ac.nz/pipermail/wekalist/attachments/20120727/2b978da5/attachment.html>


More information about the Wekalist mailing list