Skip to content

[BUG] 'DefaultFeatPipeline' object has no attribute 'keys_order3s' #2

@mglowacki100

Description

@mglowacki100

Dataset from following kaggle challenge:
https://www.kaggle.com/c/acquire-valued-shoppers-challenge
Note that is enough to reproduce error using single table:

import pandas as pd
import auto_smart
import os.path
import time
import datetime


PREPROC = True
NROWS = None
TARGET = 'repeater'
DATE = 'offerdate'

if PREPROC:
    #train & target
    df_tr = pd.read_csv(os.path.join('data', 'train', 'trainHistory.csv'), nrows=NROWS)
    
    df_tr_lbl = df_tr[[TARGET]]
    df_tr_lbl[TARGET] = df_tr_lbl[TARGET].map({'f': 0, 't': 1}) 
    df_tr_lbl = df_tr_lbl.rename(columns={TARGET: 'label'})
    df_tr_lbl.to_csv(os.path.join('data', 'train', 'main_train.solution'), index=False)
    
    df_tr = df_tr[df_tr.columns.difference([TARGET])]
    df_tr = df_tr.drop(['repeattrips'], axis=1)
    df_tr[DATE] = df_tr[DATE].apply(lambda s: time.mktime(datetime.datetime.strptime(s, '%Y-%m-%d').timetuple()))
    df_tr.to_csv(os.path.join('data', 'train', 'main_train.data'), index=False, sep='\t')

    # #offer:
    # df_of = pd.read_csv(os.path.join('data', 'train', 'offers.csv'), nrows=NROWS)
    # df_of.to_csv(os.path.join('data', 'train', 'offers.data'), index=False, sep='\t')

    #transactions
    # df_txs = pd.read_csv(os.path.join('data', 'train', 'transactions.csv'), nrows=NROWS)
    # df_txs['date'] = df_txs['date'].apply(lambda s: time.mktime(datetime.datetime.strptime(s, '%Y-%m-%d').timetuple()))
    # df_txs.to_csv(os.path.join('data', 'train', 'transactions.data'), index=False, sep='\t')

    #test:
    df_te = pd.read_csv(os.path.join('data', 'test', 'testHistory.csv'), nrows=NROWS)
    df_te[DATE] = df_te[DATE].apply(lambda s: time.mktime(datetime.datetime.strptime(s, '%Y-%m-%d').timetuple()))
    df_te.to_csv(os.path.join('data', 'test', 'main_test.data'), index=False, sep='\t')


print('info...')
info = auto_smart.read_info('data')

print('train...')
train_data, train_label = auto_smart.read_train('data', info)

print('test...')
test_data = auto_smart.read_test('data', info)

print('model...')
prd = auto_smart.train_and_predict(train_data, train_label, info, test_data)
    
print('finalizing...')
prd_df = pd.read_csv('sampleSubmission.csv')
prd_df['repeatProbability'] = prd
prd_df.to_csv('predictions.csv', index=False)

with following json configuration:

{
 "time_budget": 300,
 "time_col": "offerdate",
 "start_time": 1550654179,
 "tables": {
  "main": {
    "id": "cat",
    "chain": "cat",
    "offer": "cat",
    "market": "cat",
    "offerdate": "time"
  }
 },
 "relations": []
}

I got following error:

  'New categorical_feature is {}'.format(sorted(list(categorical_feature))))
--------------------total feat num:22, drop feat num:0
----------------End   [LGBFeatureSelectionWait.fit]. Time elapsed: 0.56 sec.
----------------End time: 2020-02-11 06:21:35

----------------Start [LGBFeatureSelectionWait.transform]:
----------------Start time: 2020-02-11 06:21:35
----------------End   [LGBFeatureSelectionWait.transform]. Time elapsed: 0.00 sec.
----------------End time: 2020-02-11 06:21:35
------------End   [LGBFeatureSelectionWait.fit_transform]. Time elapsed: 0.56 sec.
------------End time: 2020-02-11 06:21:35
--------End   [FeatEngine.fit_transform_keys_order2]. Time elapsed: 0.56 sec.
--------End time: 2020-02-11 06:21:35

--------Start [FeatEngine.fit_transform_keys_order3]:
--------Start time: 2020-02-11 06:21:35
Traceback (most recent call last):

  File "/home/mglowacki/Desktop/AVR_kaggle/autosmart_avr.py", line 61, in <module>
    prd = auto_smart.train_and_predict(train_data, train_label, info, test_data)

  File "/home/mglowacki/anaconda3/envs/py37/lib/python3.7/site-packages/auto_smart/__init__.py", line 71, in train_and_predict
    return cmodel.predict(test_data)

  File "/home/mglowacki/anaconda3/envs/py37/lib/python3.7/site-packages/auto_smart/util.py", line 38, in timed
    result = method(*args, **kw)

  File "/home/mglowacki/anaconda3/envs/py37/lib/python3.7/site-packages/auto_smart/model.py", line 358, in predict
    self.my_fit(self.Xs, self.y, X_test)

  File "/home/mglowacki/anaconda3/envs/py37/lib/python3.7/site-packages/auto_smart/util.py", line 38, in timed
    result = method(*args, **kw)

  File "/home/mglowacki/anaconda3/envs/py37/lib/python3.7/site-packages/auto_smart/model.py", line 156, in my_fit
    feat_engine.fit_transform_keys_order3(main_table,y)

  File "/home/mglowacki/anaconda3/envs/py37/lib/python3.7/site-packages/auto_smart/util.py", line 38, in timed
    result = method(*args, **kw)

  File "/home/mglowacki/anaconda3/envs/py37/lib/python3.7/site-packages/auto_smart/feat_engine.py", line 143, in fit_transform_keys_order3
    for feat_cls in self.feat_pipeline.keys_order3s:

AttributeError: 'DefaultFeatPipeline' object has no attribute 'keys_order3s'

It is auto_smart issue, I've check file auto_smart/feat/feat_pipeline.py and there is no self.keys_order3s = ....
"Stop-error solution" for single table is set self.keys_order3s to self.keys_order2s, but different error appears when you add offers table (about signature mismatch) also it doesn't look right to me. Additional error could be related to this "stop-error solution" or completly independent thing.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions