본문 바로가기
Study/Recommendation

[Deep Learning] [논문 코드 구현] Deep FM : A Factorization-Machine based Neural Network for CTR Prediction (deepctr-torch 사용하기! )

by 후이 (hui) 2022. 3. 17.
728x90
반응형

 

0. Deep FM 논문 설명 

https://huidea.tistory.com/279

 

[Deep learning] [논문리뷰] Deep FM : A Factorization-Machine based Neural Network for CTR Prediction (CTR, FM, wide&deep 개

논문 link : https://arxiv.org/pdf/1703.04247.pdf 오늘은 추천 알고리즘 중 click through rate (상품클릭률) task를 다루는 알고리즘 중 하나인 DeepFM(Deep Factorization Machine)을 살펴보겠다 ! 해당 논문..

huidea.tistory.com

 

이전에 살펴본 광고 추천 알고리즘 CTR 논문 중 DeepFM 모델을 구현한 코드이다. 

사실 CTR task의 모델들은 deepctr-torch 의 api로 정말 쉽게 ! 구현할 수 있다. (tensor 기반의 deepctr도 있다)

 

 

1. 활용 패키지 & 데이터

활용 패키지 : DeepCTR-Torch

https://github.com/shenweichen/DeepCTR-Torch

 

GitHub - shenweichen/DeepCTR-Torch: 【PyTorch】Easy-to-use,Modular and Extendible package of deep-learning based CTR models.

【PyTorch】Easy-to-use,Modular and Extendible package of deep-learning based CTR models. - GitHub - shenweichen/DeepCTR-Torch: 【PyTorch】Easy-to-use,Modular and Extendible package of deep-learning bas...

github.com

 

 

 

 

https://deepctr-torch.readthedocs.io/en/v0.2.4/

 

Welcome to DeepCTR-Torch’s documentation! — DeepCTR-Torch 0.2.4 documentation

© Copyright 2019-present, Weichen Shen Revision 6eec1eda.

deepctr-torch.readthedocs.io

 

 

활용 데이터 : Avazu dataset 

https://www.kaggle.com/c/avazu-ctr-prediction

 

Click-Through Rate Prediction | Kaggle

 

www.kaggle.com

 

- 전체 데이터 크기가 너무 커서 구글 빅쿼리에서 10만개만 sampling 했다. (덕분에 오천만년만에 sql 썼다)

 

 

 

2. 코드 

- "Reverse Engineering - for studying" 아래 부분은 deepctr 의 deepfm 코드를 직접 뜯어본거다.

-  공부용으로 적어둔 코드니 참고만 하시고 코드 실행은 굳이 할 필요없다. 학습과정은 그 위에서 다 끝난다. 

-  같은 코드 올려둔 깃헙 주소 :  https://github.com/SeohuiPark/MLDLstudy/blob/main/Recommendation/deepfm_avazudata_10.ipynb

 

3. 결과 해석  + 느낀점

- 성능이 썩그리 좋지 못하다. 

- 아무래도 overfitting이 된거 같아서, batch도 늘리고 dropout 비율도 늘렸는데... test data 에서 0.72 이상은 안오른다.

- 우선 임베딩 하지 않는 numeric feature 개수가 학습 성능 개선에 영향을 미치는 듯하다. 

    이전에 date를 categorical value로 두고 label encoding 후 embedding 했을 때보다,

    아래 코드처럼 date를 numeric value로 두고 scaling 하는게 더 성능이 좋다. 

 

 

 

deepfm_avazudata_10

Open In Colab

  • 전체 데이터 40,428,967 - 40만 - colab에서 불러오기 안됨
  • 10만개만 샘플링한 후 load
In [2]:
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:150% !important; }</style>"))
In [2]:
! pip install deepctr-torch
Collecting deepctr-torch
  Downloading deepctr_torch-0.2.7-py3-none-any.whl (70 kB)
     |████████████████████████████████| 70 kB 5.0 MB/s 
Requirement already satisfied: torch>=1.1.0 in /usr/local/lib/python3.7/dist-packages (from deepctr-torch) (1.10.0+cu111)
Requirement already satisfied: sklearn in /usr/local/lib/python3.7/dist-packages (from deepctr-torch) (0.0)
Requirement already satisfied: tensorflow in /usr/local/lib/python3.7/dist-packages (from deepctr-torch) (2.8.0)
Requirement already satisfied: tqdm in /usr/local/lib/python3.7/dist-packages (from deepctr-torch) (4.63.0)
Requirement already satisfied: typing-extensions in /usr/local/lib/python3.7/dist-packages (from torch>=1.1.0->deepctr-torch) (3.10.0.2)
Requirement already satisfied: scikit-learn in /usr/local/lib/python3.7/dist-packages (from sklearn->deepctr-torch) (1.0.2)
Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.7/dist-packages (from scikit-learn->sklearn->deepctr-torch) (3.1.0)
Requirement already satisfied: scipy>=1.1.0 in /usr/local/lib/python3.7/dist-packages (from scikit-learn->sklearn->deepctr-torch) (1.4.1)
Requirement already satisfied: joblib>=0.11 in /usr/local/lib/python3.7/dist-packages (from scikit-learn->sklearn->deepctr-torch) (1.1.0)
Requirement already satisfied: numpy>=1.14.6 in /usr/local/lib/python3.7/dist-packages (from scikit-learn->sklearn->deepctr-torch) (1.21.5)
Requirement already satisfied: h5py>=2.9.0 in /usr/local/lib/python3.7/dist-packages (from tensorflow->deepctr-torch) (3.1.0)
Requirement already satisfied: six>=1.12.0 in /usr/local/lib/python3.7/dist-packages (from tensorflow->deepctr-torch) (1.15.0)
Requirement already satisfied: astunparse>=1.6.0 in /usr/local/lib/python3.7/dist-packages (from tensorflow->deepctr-torch) (1.6.3)
Requirement already satisfied: wrapt>=1.11.0 in /usr/local/lib/python3.7/dist-packages (from tensorflow->deepctr-torch) (1.13.3)
Requirement already satisfied: tensorboard<2.9,>=2.8 in /usr/local/lib/python3.7/dist-packages (from tensorflow->deepctr-torch) (2.8.0)
Requirement already satisfied: flatbuffers>=1.12 in /usr/local/lib/python3.7/dist-packages (from tensorflow->deepctr-torch) (2.0)
Requirement already satisfied: setuptools in /usr/local/lib/python3.7/dist-packages (from tensorflow->deepctr-torch) (57.4.0)
Requirement already satisfied: protobuf>=3.9.2 in /usr/local/lib/python3.7/dist-packages (from tensorflow->deepctr-torch) (3.17.3)
Requirement already satisfied: absl-py>=0.4.0 in /usr/local/lib/python3.7/dist-packages (from tensorflow->deepctr-torch) (1.0.0)
Requirement already satisfied: keras<2.9,>=2.8.0rc0 in /usr/local/lib/python3.7/dist-packages (from tensorflow->deepctr-torch) (2.8.0)
Requirement already satisfied: gast>=0.2.1 in /usr/local/lib/python3.7/dist-packages (from tensorflow->deepctr-torch) (0.5.3)
Requirement already satisfied: opt-einsum>=2.3.2 in /usr/local/lib/python3.7/dist-packages (from tensorflow->deepctr-torch) (3.3.0)
Requirement already satisfied: keras-preprocessing>=1.1.1 in /usr/local/lib/python3.7/dist-packages (from tensorflow->deepctr-torch) (1.1.2)
Requirement already satisfied: libclang>=9.0.1 in /usr/local/lib/python3.7/dist-packages (from tensorflow->deepctr-torch) (13.0.0)
Requirement already satisfied: termcolor>=1.1.0 in /usr/local/lib/python3.7/dist-packages (from tensorflow->deepctr-torch) (1.1.0)
Collecting tf-estimator-nightly==2.8.0.dev2021122109
  Downloading tf_estimator_nightly-2.8.0.dev2021122109-py2.py3-none-any.whl (462 kB)
     |████████████████████████████████| 462 kB 48.2 MB/s 
Requirement already satisfied: tensorflow-io-gcs-filesystem>=0.23.1 in /usr/local/lib/python3.7/dist-packages (from tensorflow->deepctr-torch) (0.24.0)
Requirement already satisfied: google-pasta>=0.1.1 in /usr/local/lib/python3.7/dist-packages (from tensorflow->deepctr-torch) (0.2.0)
Requirement already satisfied: grpcio<2.0,>=1.24.3 in /usr/local/lib/python3.7/dist-packages (from tensorflow->deepctr-torch) (1.44.0)
Requirement already satisfied: wheel<1.0,>=0.23.0 in /usr/local/lib/python3.7/dist-packages (from astunparse>=1.6.0->tensorflow->deepctr-torch) (0.37.1)
Requirement already satisfied: cached-property in /usr/local/lib/python3.7/dist-packages (from h5py>=2.9.0->tensorflow->deepctr-torch) (1.5.2)
Requirement already satisfied: markdown>=2.6.8 in /usr/local/lib/python3.7/dist-packages (from tensorboard<2.9,>=2.8->tensorflow->deepctr-torch) (3.3.6)
Requirement already satisfied: werkzeug>=0.11.15 in /usr/local/lib/python3.7/dist-packages (from tensorboard<2.9,>=2.8->tensorflow->deepctr-torch) (1.0.1)
Requirement already satisfied: google-auth<3,>=1.6.3 in /usr/local/lib/python3.7/dist-packages (from tensorboard<2.9,>=2.8->tensorflow->deepctr-torch) (1.35.0)
Requirement already satisfied: requests<3,>=2.21.0 in /usr/local/lib/python3.7/dist-packages (from tensorboard<2.9,>=2.8->tensorflow->deepctr-torch) (2.23.0)
Requirement already satisfied: tensorboard-data-server<0.7.0,>=0.6.0 in /usr/local/lib/python3.7/dist-packages (from tensorboard<2.9,>=2.8->tensorflow->deepctr-torch) (0.6.1)
Requirement already satisfied: google-auth-oauthlib<0.5,>=0.4.1 in /usr/local/lib/python3.7/dist-packages (from tensorboard<2.9,>=2.8->tensorflow->deepctr-torch) (0.4.6)
Requirement already satisfied: tensorboard-plugin-wit>=1.6.0 in /usr/local/lib/python3.7/dist-packages (from tensorboard<2.9,>=2.8->tensorflow->deepctr-torch) (1.8.1)
Requirement already satisfied: rsa<5,>=3.1.4 in /usr/local/lib/python3.7/dist-packages (from google-auth<3,>=1.6.3->tensorboard<2.9,>=2.8->tensorflow->deepctr-torch) (4.8)
Requirement already satisfied: pyasn1-modules>=0.2.1 in /usr/local/lib/python3.7/dist-packages (from google-auth<3,>=1.6.3->tensorboard<2.9,>=2.8->tensorflow->deepctr-torch) (0.2.8)
Requirement already satisfied: cachetools<5.0,>=2.0.0 in /usr/local/lib/python3.7/dist-packages (from google-auth<3,>=1.6.3->tensorboard<2.9,>=2.8->tensorflow->deepctr-torch) (4.2.4)
Requirement already satisfied: requests-oauthlib>=0.7.0 in /usr/local/lib/python3.7/dist-packages (from google-auth-oauthlib<0.5,>=0.4.1->tensorboard<2.9,>=2.8->tensorflow->deepctr-torch) (1.3.1)
Requirement already satisfied: importlib-metadata>=4.4 in /usr/local/lib/python3.7/dist-packages (from markdown>=2.6.8->tensorboard<2.9,>=2.8->tensorflow->deepctr-torch) (4.11.2)
Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.7/dist-packages (from importlib-metadata>=4.4->markdown>=2.6.8->tensorboard<2.9,>=2.8->tensorflow->deepctr-torch) (3.7.0)
Requirement already satisfied: pyasn1<0.5.0,>=0.4.6 in /usr/local/lib/python3.7/dist-packages (from pyasn1-modules>=0.2.1->google-auth<3,>=1.6.3->tensorboard<2.9,>=2.8->tensorflow->deepctr-torch) (0.4.8)
Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests<3,>=2.21.0->tensorboard<2.9,>=2.8->tensorflow->deepctr-torch) (3.0.4)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests<3,>=2.21.0->tensorboard<2.9,>=2.8->tensorflow->deepctr-torch) (1.24.3)
Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests<3,>=2.21.0->tensorboard<2.9,>=2.8->tensorflow->deepctr-torch) (2.10)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests<3,>=2.21.0->tensorboard<2.9,>=2.8->tensorflow->deepctr-torch) (2021.10.8)
Requirement already satisfied: oauthlib>=3.0.0 in /usr/local/lib/python3.7/dist-packages (from requests-oauthlib>=0.7.0->google-auth-oauthlib<0.5,>=0.4.1->tensorboard<2.9,>=2.8->tensorflow->deepctr-torch) (3.2.0)
Installing collected packages: tf-estimator-nightly, deepctr-torch
Successfully installed deepctr-torch-0.2.7 tf-estimator-nightly-2.8.0.dev2021122109
In [3]:
import os
import gzip
import shutil
import glob

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.metrics import log_loss, roc_auc_score
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, MinMaxScaler

import torch
from deepctr_torch.inputs import SparseFeat, DenseFeat, get_feature_names
from deepctr_torch.models import *
In [4]:
def data_load():
    print("\n\n1. data load ")
    data_path = "/content/drive/MyDrive/Colab Notebooks/2022_recom_study/ctr_sample_dataset/abazu_dataset/"
    data = pd.read_csv(data_path + "avazu_sample_10.csv")
    display(data.head(3))
    print(data.columns)
    print(data.shape) 
    return data
In [5]:
def feature_selection(data):
    print("\n\n2. feature selection ")

    sparse_features = data.columns.tolist()
    sparse_features.remove('click')
    sparse_features.remove('hour')
    dense_features = ['hour']

    print("sparse feature :", sparse_features)
    print("dense feature :", dense_features)
    print("target :", 'click')

    return data, sparse_features, dense_features
In [6]:
def feature_encoding(data, sparse_features, dense_features):

    print("\n\n3-1. feature encoding ")
    print("categorical value to numeric label")
    for feat in sparse_features:
        lbe = LabelEncoder()
        data[feat] = lbe.fit_transform(data[feat])

    print("numeric value Minmax scaling ")
    mms = MinMaxScaler(feature_range=(0, 1)) ### date 더 최근일 수록 더 큰 숫자가 입력됨 
    data[dense_features] = mms.fit_transform(data[dense_features])

    return data
In [7]:
def feature_format_deepfm(data, sparse_features, dense_features, embedding_dim):

    print(f"\n\n3-2. feature embedding - embedding size {embedding_dim}")
    spar_feat_list = [SparseFeat(feat, vocabulary_size=data[feat].max() + 1, embedding_dim=embedding_dim) for i, feat in enumerate(sparse_features)]
    dense_feat_list = [DenseFeat(feat, 1, ) for feat in dense_features]
    fixlen_feature_columns = spar_feat_list + dense_feat_list

    dnn_feature_columns = fixlen_feature_columns
    linear_feature_columns = fixlen_feature_columns
    feature_names = get_feature_names(linear_feature_columns + dnn_feature_columns)

    return dnn_feature_columns, linear_feature_columns, feature_names
In [8]:
def data_split(data, test_rato, feature_names, random_seed):
    print(f"\n\n4. data split (test ratio - {test_rato})")
    train, test = train_test_split(data, test_size=test_rato, random_state = random_seed)
    train_model_input = {name: train[name] for name in feature_names}
    test_model_input = {name: test[name] for name in feature_names}

    return train, test, train_model_input, test_model_input 
In [9]:
def modeling(linear_feature_columns, dnn_feature_columns,
             batch_size, num_epoch, val_ratio, test_rato, l2_decay_val, random_seed):
    
    print(f"\n\n5. Modeling")
    model = DeepFM(linear_feature_columns=linear_feature_columns,  
               dnn_feature_columns=dnn_feature_columns, 
               l2_reg_linear=l2_decay_val, l2_reg_embedding=l2_decay_val, l2_reg_dnn=l2_decay_val,
               dnn_dropout=0.5, 
               dnn_use_bn = True,
               dnn_hidden_units=(32, 16),
               task='binary',
               seed=random_seed, device=device)


    model.compile("adam", "binary_crossentropy", 
                metrics=["binary_crossentropy", "auc"], )


    return model 
In [10]:
def eval_test(model, test_model_input, batch_size ):
    print(f"\n\n6. Evaluation testset")
    pred_ans = model.predict(test_model_input, batch_size) #batch_size default : 256
    print("")
    print("test LogLoss", round(log_loss(test[target].values, pred_ans), 4))
    print("test AUC", round(roc_auc_score(test[target].values, pred_ans), 4))

4. modeling

In [11]:
if __name__ == "__main__":
    batch_size = 1000
    num_epoch = 20
    val_ratio = 0.1
    test_rato = 0.1
    random_seed = 2022
    l2_decay_val = 1e-01
    embedding_dim = 5

    device = 'cpu'
    use_cuda = True
    if use_cuda and torch.cuda.is_available():
        print('cuda ready...')
        device = 'cuda:0'


    data = data_load()
    target = ['click']

    data, sparse_features, dense_features = feature_selection(data)
    data = feature_encoding(data, sparse_features, dense_features)

    dnn_feature_columns, linear_feature_columns, feature_names = feature_format_deepfm(data, sparse_features, dense_features, embedding_dim)

    train, test, train_model_input, test_model_input = data_split(data, test_rato, 
                                                                  feature_names, random_seed)

    model = modeling(linear_feature_columns, dnn_feature_columns,
             batch_size, num_epoch, val_ratio, test_rato, l2_decay_val, random_seed)
    
    model.fit(train_model_input, train[target].values,
            batch_size=batch_size, epochs=num_epoch, verbose=2, validation_split=val_ratio)
    
    eval_test(model, test_model_input, batch_size)
cuda ready...


1. data load 
id click hour C1 banner_pos site_id site_domain site_category app_id app_domain ... device_type device_conn_type C14 C15 C16 C17 C18 C19 C20 C21
0 3.572791e+18 0 14102518 1005 1 856e6d3f 58a89a43 f028772b ecad2386 7801e8d9 ... 1 0 18854 320 50 1882 3 35 -1 13
1 3.299518e+18 0 14102404 1005 0 d9750ee7 98572c79 f028772b ecad2386 7801e8d9 ... 1 0 21153 320 50 2420 2 35 -1 69
2 3.990806e+18 0 14102907 1005 0 517b8671 ac5abf20 f028772b ecad2386 7801e8d9 ... 1 0 23642 320 50 2709 3 35 -1 23

3 rows × 24 columns

Index(['id', 'click', 'hour', 'C1', 'banner_pos', 'site_id', 'site_domain',
       'site_category', 'app_id', 'app_domain', 'app_category', 'device_id',
       'device_ip', 'device_model', 'device_type', 'device_conn_type', 'C14',
       'C15', 'C16', 'C17', 'C18', 'C19', 'C20', 'C21'],
      dtype='object')
(100000, 24)


2. feature selection 
sparse feature : ['id', 'C1', 'banner_pos', 'site_id', 'site_domain', 'site_category', 'app_id', 'app_domain', 'app_category', 'device_id', 'device_ip', 'device_model', 'device_type', 'device_conn_type', 'C14', 'C15', 'C16', 'C17', 'C18', 'C19', 'C20', 'C21']
dense feature : ['hour']
target : click


3-1. feature encoding 
categorical value to numeric label
numeric value Minmax scaling 


3-2. feature embedding - embedding size 5


4. data split (test ratio - 0.1)


5. Modeling
cuda:0
Train on 81000 samples, validate on 9000 samples, 81 steps per epoch
Epoch 1/20
2s - loss:  0.4807 - binary_crossentropy:  0.4794 - auc:  0.6223 - val_binary_crossentropy:  0.4279 - val_auc:  0.7085
Epoch 2/20
1s - loss:  0.3631 - binary_crossentropy:  0.3571 - auc:  0.8065 - val_binary_crossentropy:  0.4116 - val_auc:  0.7217
Epoch 3/20
1s - loss:  0.0236 - binary_crossentropy:  0.0132 - auc:  0.9998 - val_binary_crossentropy:  0.4475 - val_auc:  0.6926
Epoch 4/20
1s - loss:  0.0089 - binary_crossentropy:  0.0001 - auc:  1.0000 - val_binary_crossentropy:  0.5140 - val_auc:  0.6884
Epoch 5/20
1s - loss:  0.0073 - binary_crossentropy:  0.0001 - auc:  1.0000 - val_binary_crossentropy:  0.5706 - val_auc:  0.6925
Epoch 6/20
1s - loss:  0.0061 - binary_crossentropy:  0.0001 - auc:  1.0000 - val_binary_crossentropy:  0.6374 - val_auc:  0.6975
Epoch 7/20
1s - loss:  0.0053 - binary_crossentropy:  0.0001 - auc:  1.0000 - val_binary_crossentropy:  0.6759 - val_auc:  0.7030
Epoch 8/20
1s - loss:  0.0046 - binary_crossentropy:  0.0001 - auc:  1.0000 - val_binary_crossentropy:  0.7233 - val_auc:  0.7074
Epoch 9/20
1s - loss:  0.0041 - binary_crossentropy:  0.0001 - auc:  1.0000 - val_binary_crossentropy:  0.7508 - val_auc:  0.7108
Epoch 10/20
1s - loss:  0.0037 - binary_crossentropy:  0.0001 - auc:  1.0000 - val_binary_crossentropy:  0.8009 - val_auc:  0.7130
Epoch 11/20
1s - loss:  0.0033 - binary_crossentropy:  0.0001 - auc:  1.0000 - val_binary_crossentropy:  0.8225 - val_auc:  0.7144
Epoch 12/20
1s - loss:  0.0030 - binary_crossentropy:  0.0001 - auc:  1.0000 - val_binary_crossentropy:  0.8467 - val_auc:  0.7156
Epoch 13/20
1s - loss:  0.0028 - binary_crossentropy:  0.0001 - auc:  1.0000 - val_binary_crossentropy:  0.9039 - val_auc:  0.7164
Epoch 14/20
1s - loss:  0.0025 - binary_crossentropy:  0.0001 - auc:  1.0000 - val_binary_crossentropy:  0.8825 - val_auc:  0.7172
Epoch 15/20
1s - loss:  0.0024 - binary_crossentropy:  0.0001 - auc:  1.0000 - val_binary_crossentropy:  0.9666 - val_auc:  0.7174
Epoch 16/20
1s - loss:  0.0022 - binary_crossentropy:  0.0001 - auc:  1.0000 - val_binary_crossentropy:  0.9581 - val_auc:  0.7175
Epoch 17/20
1s - loss:  0.0020 - binary_crossentropy:  0.0000 - auc:  1.0000 - val_binary_crossentropy:  0.9483 - val_auc:  0.7178
Epoch 18/20
1s - loss:  0.0019 - binary_crossentropy:  0.0001 - auc:  1.0000 - val_binary_crossentropy:  0.9558 - val_auc:  0.7180
Epoch 19/20
1s - loss:  0.0018 - binary_crossentropy:  0.0000 - auc:  1.0000 - val_binary_crossentropy:  1.0066 - val_auc:  0.7183
Epoch 20/20
1s - loss:  0.0017 - binary_crossentropy:  0.0000 - auc:  1.0000 - val_binary_crossentropy:  1.0427 - val_auc:  0.7182


6. Evaluation testset

test LogLoss 1.0367
test AUC 0.7225

Reverse Engineering - for studying

In [ ]:
class FM(nn.Module):
    """Factorization Machine models pairwise (order-2) feature interactions
     without linear term and bias.
      Input shape
        - 3D tensor with shape: ``(batch_size,field_size,embedding_size)``.
      Output shape
        - 2D tensor with shape: ``(batch_size, 1)``.
      References
        - [Factorization Machines](https://www.csie.ntu.edu.tw/~b97053/paper/Rendle2010FM.pdf)
    """

    def __init__(self):
        super(FM, self).__init__()

    def forward(self, inputs):
        fm_input = inputs

        square_of_sum = torch.pow(torch.sum(fm_input, dim=1, keepdim=True), 2)
        sum_of_square = torch.sum(fm_input * fm_input, dim=1, keepdim=True) 
        cross_term = square_of_sum - sum_of_square 
        cross_term = 0.5 * torch.sum(cross_term, dim=2, keepdim=False)

        return cross_term
In [ ]:
import torch
import torch.nn as nn
from deepctr_torch.models.basemodel import BaseModel
from deepctr_torch.inputs import combined_dnn_input
from deepctr_torch.layers import FM, DNN

class DeepFM(BaseModel):
    """Instantiates the DeepFM Network architecture.
    :param linear_feature_columns: An iterable containing all the features used by linear part of the model. (-> FM에 들어갈 피쳐, 전체 피쳐)
    
    :param dnn_feature_columns: An iterable containing all the features used by deep part of the model. (-> DNN에 들어갈 피쳐, 전체 피쳐)  
    :param use_fm: bool,use FM part or not (FM 사용할지 말지)
    
    :param dnn_hidden_units: list,list of positive integer or empty list, 
       the layer number and units in each layer of DNN (-> DNN 모델 layer 개수 - default 256, 128)
    :param dnn_dropout: float in [0,1), the probability we will drop out a given DNN coordinate.(->딥러닝 dropout)
    :param dnn_activation: Activation function to use in DNN (-> 딥러닝 활성함수) 
    :param dnn_use_bn: bool. Whether use BatchNormalization before activation or not in DNN (->딥러닝 배치norm)
    

    :param l2_reg_linear: float. L2 regularizer strength applied to linear part (-> FM l2 정규화 정도, defalut 1e-5) 
    :param l2_reg_embedding: float. L2 regularizer strength applied to embedding vector (-> embedding l2 정규화 정도, defalut 1e-5) 
    :param l2_reg_dnn: float. L2 regularizer strength applied to DNN (-> dnn l2 정규화 정도, defalut 1e-5) 
    
    :param init_std: float,to use as the initialize std of embedding vector (-> 임베딩 초기 표준편차)
    :param seed: integer ,to use as random seed. (-> 랜덤시드)

    :param task: str, ``"binary"`` for  binary logloss or  ``"regression"`` for regression loss (->태스크 - 이진분류 / 회귀)
    :param device: str, ``"cpu"`` or ``"cuda:0"`` (->cpu, gpu 선택)
    :param gpus: list of int or torch.device for multiple gpus. If None, run on `device`. `gpus[0]` should be the same gpu with `device`.
    :return: A PyTorch model instance.
    """

    def __init__(self,
                 linear_feature_columns, dnn_feature_columns, use_fm=True,
                 dnn_hidden_units=(256, 128),
                 l2_reg_linear=0.00001, l2_reg_embedding=0.00001, l2_reg_dnn=0, init_std=0.0001, seed=1024,
                 dnn_dropout=0,
                 dnn_activation='relu', dnn_use_bn=False, task='binary', device='cpu', gpus=None):

        super(DeepFM, self).__init__(linear_feature_columns, dnn_feature_columns, l2_reg_linear=l2_reg_linear,
                                     l2_reg_embedding=l2_reg_embedding, init_std=init_std, seed=seed, task=task,
                                     device=device, gpus=gpus)

        self.use_fm = use_fm
        self.use_dnn = len(dnn_feature_columns) > 0 and len(
            dnn_hidden_units) > 0
        
        if use_fm: ### FM model 로딩
            self.fm = FM()

        if self.use_dnn: ### dnn part 에서 쓰일 모델들 선언 
            self.dnn = DNN(self.compute_input_dim(dnn_feature_columns), dnn_hidden_units,
                           activation=dnn_activation, l2_reg=l2_reg_dnn, dropout_rate=dnn_dropout, use_bn=dnn_use_bn,
                           init_std=init_std, device=device)
            self.dnn_linear = nn.Linear(
                dnn_hidden_units[-1], 1, bias=False).to(device)

            self.add_regularization_weight(
                filter(lambda x: 'weight' in x[0] and 'bn' not in x[0], self.dnn.named_parameters()), l2=l2_reg_dnn)
            self.add_regularization_weight(self.dnn_linear.weight, l2=l2_reg_dnn)
        self.to(device)

    def forward(self, X): ### 학습 

        sparse_embedding_list, dense_value_list = self.input_from_feature_columns(X, self.dnn_feature_columns,
                                                                                  self.embedding_dict)
        ## 1) FM 연산 
        ### 1.1) 선형 모델 통과 
        logit = self.linear_model(X) ## 결과값 추가 
        
        ### 1.2) fm 연산 (칼럼끼리 곱)
        if self.use_fm and len(sparse_embedding_list) > 0: 
            fm_input = torch.cat(sparse_embedding_list, dim=1)
            logit += self.fm(fm_input) ## 결과값 추가 

        ## 2) DNN 학습 
        if self.use_dnn: 
            dnn_input = combined_dnn_input(
                sparse_embedding_list, dense_value_list) 
            dnn_output = self.dnn(dnn_input)
            dnn_logit = self.dnn_linear(dnn_output)
            logit += dnn_logit ## 결과값 추가 

        y_pred = self.out(logit)

        return y_pred
728x90
반응형

댓글