Artificial Intelligence and Machine Learning Blogs
Explore AI and ML blogs. Discover use cases, advancements, and the transformative potential of AI for businesses. Stay informed of trends and applications.
cancel
Showing results for 
Search instead for 
Did you mean: 
former_member517463
Discoverer
This is the second article in a series of articles in the space of Deep Learning and how to use SAP Leonardo ML Foundation for the same. These articles will cover the complete process of a Deep Learning project starting for data preparation to prediction.

If you missed the first article, which was on Image Classification using SAP Leonardo ML foundation, find it here.

As the title says, this is another very popular application of Deep Learning: Text Classification. With so much textual data, automatic text classification is necessary. Some major industry examples include articles tagging, news article classification, sentiment analysis, and others.

Problem statement: Make a deep learning model to classify text into various categories.

Dataset: Any text classification dataset. In this article, we will use the famous 20 Newsgroup Dataset. The dataset contains news articles and the category they belong to. There are 20 different categories. There are 11314 samples for training and 7532 samples for testing.

Technological stack:

  1. Platform: Train Your Own Model functionality of SAP Leonardo ML foundation.

  2. Deep Learning library: Keras.

  3. Programming language: Python 2


Let’s start the work! Major steps are:

  1. Making model and training.

  2. Making predictions using the trained model.


 

Making model and training.


The dataset comes with sklearn library, so we need not download it explicitly. Let's jump into coding:

Create new training.py file and keep writing all the code:
import pandas as pd
from sklearn.datasets import fetch_20newsgroups

newsgroups_train = fetch_20newsgroups(subset='train')
newsgroups_test = fetch_20newsgroups(subset='test')

train = pd.DataFrame()
train['article'] = newsgroups_train.data
train['category'] = newsgroups_train.target

test = pd.DataFrame()
test['article'] = newsgroups_test.data
test['category'] = newsgroups_test.target

This code above will make two data frames: train and test. They look like:



The first column contains the text of articles and the second column is the category (0 to 19) to which the article belongs.

Let's start coding model as the data is ready.

Do necessary imports.
import numpy as np

from keras.models import Model
from keras.layers import Dense, Embedding, Input, LSTM, Bidirectional, GlobalMaxPool1D, Dropout
from keras.callbacks import EarlyStopping, ModelCheckpoint
from keras.preprocessing import text, sequence

 

Some constants.
max_features = 10000
maxlen = 100
embed_size = 512
batch_size = 64
epochs = 100

 

Text tokenization.
train_sentences = train['article'].values
test_sentences = test['article'].values
y = train['category'].values

tokenizer = text.Tokenizer(num_words=max_features)
tokenizer.fit_on_texts(list(train_sentences))
train_tokenized = tokenizer.texts_to_sequences(train_sentences)
test_tokenized = tokenizer.texts_to_sequences(test_sentences)
train = sequence.pad_sequences(train_tokenized, maxlen=maxlen)
test = sequence.pad_sequences(test_tokenized, maxlen=maxlen)

 

Define the model.
def get_model():
inp = Input(shape=(maxlen, ))
x = Embedding(max_features, embed_size)(inp)
x = Bidirectional(LSTM(128, return_sequences=True))(x)
x = GlobalMaxPool1D()(x)
x = Dropout(0.2)(x)
x = Dense(64, activation='relu')(x)
x = Dropout(0.3)(x)
x = Dense(20, activation='softmax')(x)

model = Model(inputs=inp, outputs=x)
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

return model

 

Define some callbacks.
file_path = 'model.hdf5'
checkpoint = ModelCheckpoint(file_path, monitor='val_loss', verbose=1, save_best_only=True, mode='min')
early = EarlyStopping(monitor='val_loss', mode='min', patience=10)

callbacks = [checkpoint, early]

 

All set to train the model. This will take some time.
model = get_model()
model.fit(train, y, batch_size=batch_size, epochs=epochs, validation_split=0.15, callbacks=callbacks)

 

Predictions


Till this code, the model is trained and the weights are saved. Now, it's time to see the trained model in action i.e. to do predictions.
model.load_weights(file_path)

y_test = model.predict(test)

 

Upload the job to SAP Leonardo ML foundation.


All the code is written, now we need to upload this to SAP Leonardo ML foundation. Put the training.py in code folder. We also need to create a yaml file named newsgroup.yaml which specifies the resources for running the process.
job:
name: "newsgroups"
execution:
image: "tensorflow/tensorflow:1.5.0-gpu"
command: "pip install keras --upgrade && python training.py"
completionTime: "10"
resources:
cpus: 1
memory: 10000
gpus: 1

 

Upload the job. Open command prompt in the same directory as code folder and yaml file. Run the following command:
cf sapml job submit -f newsgroup.yaml code

You can also see logs of the job using appropriate commands.

 

Through this article:

  1. Doing basic text classification using keras and python.

  2. Running programs as jobs on SAP Leonardo ML foundation using Train Your Own Model (TYOM).

4 Comments