Skip to Content
Technical Articles

Deep Learning using SAP Leonardo ML Foundation: Text Classification

This is the second article in a series of articles in the space of Deep Learning and how to use SAP Leonardo ML Foundation for the same. These articles will cover the complete process of a Deep Learning project starting for data preparation to prediction.

If you missed the first article, which was on Image Classification using SAP Leonardo ML foundation, find it here.

As the title says, this is another very popular application of Deep Learning: Text Classification. With so much textual data, automatic text classification is necessary. Some major industry examples include articles tagging, news article classification, sentiment analysis, and others.

Problem statement: Make a deep learning model to classify text into various categories.

Dataset: Any text classification dataset. In this article, we will use the famous 20 Newsgroup Dataset. The dataset contains news articles and the category they belong to. There are 20 different categories. There are 11314 samples for training and 7532 samples for testing.

Technological stack:

  1. Platform: Train Your Own Model functionality of SAP Leonardo ML foundation.
  2. Deep Learning library: Keras.
  3. Programming language: Python 2

Let’s start the work! Major steps are:

  1. Making model and training.
  2. Making predictions using the trained model.

 

Making model and training.

The dataset comes with sklearn library, so we need not download it explicitly. Let’s jump into coding:

Create new training.py file and keep writing all the code:

import pandas as pd
from sklearn.datasets import fetch_20newsgroups

newsgroups_train = fetch_20newsgroups(subset='train')
newsgroups_test = fetch_20newsgroups(subset='test')

train = pd.DataFrame()
train['article'] = newsgroups_train.data
train['category'] = newsgroups_train.target

test = pd.DataFrame()
test['article'] = newsgroups_test.data
test['category'] = newsgroups_test.target

This code above will make two data frames: train and test. They look like:

The first column contains the text of articles and the second column is the category (0 to 19) to which the article belongs.

Let’s start coding model as the data is ready.

Do necessary imports.

import numpy as np

from keras.models import Model
from keras.layers import Dense, Embedding, Input, LSTM, Bidirectional, GlobalMaxPool1D, Dropout
from keras.callbacks import EarlyStopping, ModelCheckpoint
from keras.preprocessing import text, sequence

 

Some constants.

max_features = 10000
maxlen = 100
embed_size = 512
batch_size = 64
epochs = 100

 

Text tokenization.

train_sentences = train['article'].values
test_sentences = test['article'].values
y = train['category'].values

tokenizer = text.Tokenizer(num_words=max_features)
tokenizer.fit_on_texts(list(train_sentences))
train_tokenized = tokenizer.texts_to_sequences(train_sentences)
test_tokenized = tokenizer.texts_to_sequences(test_sentences)
train = sequence.pad_sequences(train_tokenized, maxlen=maxlen)
test = sequence.pad_sequences(test_tokenized, maxlen=maxlen)

 

Define the model.

def get_model():
    inp = Input(shape=(maxlen, ))
    x = Embedding(max_features, embed_size)(inp)
    x = Bidirectional(LSTM(128, return_sequences=True))(x)
    x = GlobalMaxPool1D()(x)
    x = Dropout(0.2)(x)
    x = Dense(64, activation='relu')(x)
    x = Dropout(0.3)(x)
    x = Dense(20, activation='softmax')(x)

    model = Model(inputs=inp, outputs=x)
    model.compile(loss='categorical_crossentropy', optimizer='adam',  metrics=['accuracy'])

    return model

 

Define some callbacks.

file_path = 'model.hdf5'
checkpoint = ModelCheckpoint(file_path, monitor='val_loss', verbose=1, save_best_only=True, mode='min')
early = EarlyStopping(monitor='val_loss', mode='min', patience=10)

callbacks = [checkpoint, early]

 

All set to train the model. This will take some time.

model = get_model()
model.fit(train, y, batch_size=batch_size, epochs=epochs, validation_split=0.15, callbacks=callbacks)

 

Predictions

Till this code, the model is trained and the weights are saved. Now, it’s time to see the trained model in action i.e. to do predictions.

model.load_weights(file_path)

y_test = model.predict(test)

 

Upload the job to SAP Leonardo ML foundation.

All the code is written, now we need to upload this to SAP Leonardo ML foundation. Put the training.py in code folder. We also need to create a yaml file named newsgroup.yaml which specifies the resources for running the process.

job:
  name: "newsgroups"
  execution:
    image: "tensorflow/tensorflow:1.5.0-gpu"
    command: "pip install keras --upgrade && python training.py"
    completionTime: "10"
    resources:
      cpus: 1
      memory: 10000
      gpus: 1

 

Upload the job. Open command prompt in the same directory as code folder and yaml file. Run the following command:

cf sapml job submit -f newsgroup.yaml code

You can also see logs of the job using appropriate commands.

 

Through this article:

  1. Doing basic text classification using keras and python.
  2. Running programs as jobs on SAP Leonardo ML foundation using Train Your Own Model (TYOM).
3 Comments
You must be Logged on to comment or reply to a post.
  • Hi, I don’t see “job” as am available command of the sapml plugin. I am using version 1.0.0 and the commands I see are the following:

     

    config      Display or modify configuration

    fs          Interact with training file system

    help        Help about any command

    model       Manage models in model repository

    modelserver Manage model servers

    retraining  Retraining Service

    version     Print the client version information

      • Hi Michael:

        Yes, I was able to deploy my model but I followed a slightly different approach than the one described here. Instead of saving (and then loading) model weights I saved my model in TF’s “saved model format”:

        tf.saved_model.simple_save(
        keras.backend.get_session(),
        export_path,
        inputs={‘entrada’: model.input},
        outputs={‘salida’: model.output})

        Then this model can be served using TF’s Serving API. This way there’s no need to use the -apparently inexistent- sapml “job” command. Follow the SAP HANA Academy videos on this subject and let me know if I can help.

        Best,

        Martín