LSTM with CoreML

January 31, 2020

LSTTMs can be very effective when it comes to finding patterns in sequence-based data. In this tutorial, you'll take a shallow dive into using LSTMs with CoreML. The objective of this tutorial will be to look at how work with LSTMs in CoreML.

By the end of this tutorial, you'll learn how to

Create an LSTM model using Keras
Use coremltools to convert the Keras model to CoreML
Shape the input data for your CoreML model, and see the results in Swift

This tutorial makes use Google Colab, however, feel free to use whatever development environment you wish.

LSTM a.k.a. Long-Short-Term-Memory

This tutorial won’t go into detail on how LSTMs work. If you’re looking for a very good resource, check out this post by Andrew Karpathy. The line that sticks out for me is the following:

The core reason that recurrent nets are more exciting is that they allow us to operate over sequences of vectors: Sequences in the input, the output, or in the most general case both.

This is different from the way a Vanilla Neural Network which generally expects a single vector as input.

And with that very brief introduction, you'll go over the dataset used in this tutorial.

The Data

This tutorial's focus is to look at LSMT in a general way, therefore you'll synthetically generate data for the LSTM. In this section, you'll learn how the data is formatted. If you’ve never worked with LSTMs in the past, the most important thing to know is that the input must be three-dimensional, comprising of samples, time steps, and features.

Samples. These are the rows in your data. One sample may be one sequence.
Time steps. These are the past observations for a feature, such as lag variables.
Features. These are columns in your data.

When working with Keras to create your model, this mean that your data must be organized with the shape [Samples, Time step, Features].

As a concrete example, take a look at the following five samples of price taken from a random stock:

0 => 1455
1 => 1399
2 => 1402
3 => 1403
4 => 1441
...

If you were to train your LSTM, where you wish to predict a stock given three samples, you'd format your data to have the shape [Samples, 3, 1], where you only have one feature (the price), the three time steps, and the number of examples (or samples), to pass into your model during training.

Using the above data, if you were to generate one sample, it would look as follows:

X =
[
    [[1455], [1399], [1402]],
]

y =
[
    [1403],
]

In this tutorial, the data you'll work is far less sophisticated. In this tutorial, you'll develop a model that echos one of the input values. As an example, if the input to the LSTM is [ [[5], [6], [3], [2], [7]] ], the output will be one of those values. Which value is selected, will be up to you. Say you decide that the model will output the last index, the model will output [7].

There are three small restrictions you'll add to this dataset

* The number of time steps will be five.
* The values of the features will be limited between 0-9
* The features will be one-hot encoded

This means, given the example input [ [[5], [6], [3], [2], [7]] ], after one-hot encoding, the input data will looks as follows:

[
    [
        [0 0 0 0 0 1 0 0 0 0 0],
        [0 0 0 0 0 0 1 0 0 0 0],
        [0 0 0 1 0 0 0 0 0 0 0],
        [0 0 1 0 0 0 0 0 0 0 0],
        [0 0 0 0 0 0 0 1 0 0 0]
    ]
]

This also means that the input to the LSTM will be [Samples, 5, 10].

You might be wondering why the features are being one-hot encoded? Why convert one feature, into ten? Those are good questions. The quick answer is, it's a simple way of normalizing the input data. This tutorial won't go into why your data should be normalized, however, another benefit of normalizing this data is this turns the problem from solving a regression problem, to solving a classification problem. More on that in the next section.

Here's the code that will generate the input array, and one-hot encode it. In an empty cell, enter the following:

import numpy as np
from keras.utils.np_utils import to_categorical

def generate_sequence(length, n_features):
    return np.random.randint((n_features - 1), size=length)

def one_hot_encode(sequence, n_features):
    return to_categorical(sequence, num_classes=n_features)

def generate_example(length, n_features, out_index, samples=1):
  X = []
  Y = []
  for _ in range(0, samples):
    sequence = generate_sequence(length, n_features)
    encoded = one_hot_encode(sequence, n_features)
    X.append(encoded)
    Y.append(encoded[out_index])

  return np.array(X), np.array(Y)

In this tutorial, length will be five, and n_features will be ten, however, feel free to try other values. There is one other parameter out_index which defines which index from the input array the model should output. In this tutorial it will be the third index.

An example usage would be as follows:

length = 5
n_features = 10
out_index = 2

X, y = generate_example(length, n_features, out_index, 1)

print(X.shape) # (1, 5, 10)
print(y.shape) # (1, 10)

You'll need one more method which can decode the one-hot encoded output, in an empty cell, enter the following:

def one_hot_decode(encoded_seq):
    return np.array([np.argmax(vector) for vector in encoded_seq])

Ok, that should be enough on the data. When you're ready, move on to the next section to create the model.

The Keras Model

Now that you have the code to generate data for training, you'll use Keras with a Tensorflow backend to create a plain LSTM model. In an empty cell, enter the following:

from keras.models import Sequential
from keras.layers import LSTM
from keras.layers import Dense

length = 5
n_features = 10
out_index = 2

model = Sequential()
model.add(LSTM(25, input_shape=(length, n_features)))
model.add(Dense(n_features, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.summary()

You should get a summary as follows:

_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
lstm_1 (LSTM)                (None, 25)                3600
_________________________________________________________________
dense_1 (Dense)              (None, 10)                260
=================================================================
Total params: 3,860
Trainable params: 3,860
Non-trainable params: 0
_________________________________________________________________

Here, you're creating a model which contains a single LSTM and a fully-connected dense layer with a softmax activation. The number of hidden layers in the LSTM was arbitrarily chosen, and the output of the dense layer is equal to the number of features.

Lastly, in the previous section, it was mentioned that one of the reasons you're one-hot encoding the data is that it turns the problem from a regression problem into a classification problem. Since the activation of the dense model is a softmax, it should output an array where the index with the biggest probability equals the expected output value.

Something that wasn't mentioned earlier, however, here are the versions of Keras and Tensorflow:

keras 2.2.5
tensorflow 1.15.0

Depending on when you're reading this tutorial, you may have to configure your environment to use these values.

Finally, in order to train the Keras model, simply generate some data and call fit on the model. Here's the code:

X, y = generate_example(length, n_features, out_index, 10000)
history = model.fit(X, y, epochs=4, verbose=2, batch_size=5)

Why 10000 samples? Why 4 epochs? Why batch sizes of 5? Totally arbitrary. Play around with values, and hope that by the last epoch, you have a very small loss and a very high accuracy. Here's an example of what was achieved, your numbers will vary:

Epoch 1/4
 - 31s - loss: 1.1426 - acc: 0.6042
Epoch 2/4
 - 31s - loss: 0.1598 - acc: 0.9712
Epoch 3/4
 - 31s - loss: 0.0127 - acc: 1.0000
Epoch 4/4
 - 31s - loss: 0.0024 - acc: 1.0000

With the model trained, in the next section, you'll convert your Keras model over to CoreML.

Converting the Keras Model to CoreML

The first step will be to install CoreML tools if not already installed in your environment. To use the latest version, run the following command from inside a cell:

!pip install -U git+https://github.com/apple/coremltools.git

As of this writing, the version that was installed was coremltools 3.2. In order to convert a keras model, you can use one of the standard converts available with coremltools.

import coremltools

coreml_model = coremltools.converters.keras.convert(
    model,
    input_names=['input'],
    output_names=["output"],
)

You can see that we're passing in custom names for the input and output of the CoreML model. These names show up in Xcode, and are also the names of the properties generated by Xcode. So there's good reason to giving them better names. It's recommended that you have a look at the coremltools documentationto learn more about its options.

Before exporting the CoreML, it's worth printing the input of the model from python. Run the following code in an empty cell:

print(coreml_model._spec.description)

You should see an output that looks as follows:

input {
  name: "input"
  type {
    multiArrayType {
      shape: 10
      dataType: DOUBLE
    }
  }
}

What does this say about the input? The first thing you should notice is that the name of the model was correctly set. Second, you'll notice that the input is of type multiArrayType. This makes sense, and translates to the MLMultiArrayType class in Swift. More on that in the last section. Finally, notice the shape of the input array. It says that the input is of shape 10. You might be wondering, doesn't an LSTM expect a three dimensional array that conforms to [Samples, Time steps, Features]? Here, it appears that CoreML is expecting an array of a single dimension, with the number of features. You'd be right to be skeptical. We'll explore this in the next section.

Performing a Prediction

One step that was skipped in the section where the Keras model was created, was to run a prediction to see if your model worked as expected. Additionally, if you're running the code on a Mac, you'll test run a prediction using the CoreML directly.

To test a prediction using the Keras model, run the following code in an empty cell:

X_predict, y_predict = generate_example(length, n_features, out_index, 1)
yhat = model.predict(X_predict)

print(X_predict)
print('Sequence: %s' % [one_hot_decode(x) for x in X_predict])

print(yhat)
print('Expected: %s' % one_hot_decode(y_predict))
print('Predicted: %s' % one_hot_decode(yhat))

You should get an output that looks as follows:

[[[0. 0. 0. 0. 0. 0. 0. 0. 1. 0.]
  [0. 0. 0. 0. 0. 0. 0. 0. 1. 0.]
  [0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]
  [0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]
  [0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]]]
Sequence: [array([8, 8, 7, 2, 2])]

[[3.3137629e-05 2.1478966e-04 5.4943381e-04 6.5908389e-05 2.0768159e-04
  9.6994845e-06 1.0115184e-06 9.9652904e-01 2.3886263e-03 6.0489873e-07]]
Expected: [7]
Predicted: [7]

The first two print statements consist of printing the one-hot encoded input and the original sequence array. The next print statement print the output of the Keras model. You should see that the value at the the seventh index (9.9652904e-01) generated the biggest probability, which exactly translates to the expected output (seven), which is the value at the second index of our input array. Looks like your model Keras model is working as expected.

You can perform the same prediction on your CoreML model if you're on a Mac. However, before you try that, what's the expected input shape of the CoreML model? If you recall, in the previous section, the CoreML model appears to expect an array with shape ten. What does this mean? Let's see what happens when you use the same input as the one passed into the Keras model, Run the following code in an empty cell:

output = coreml_model.predict({'input': X_predict})['output']
print(output, output.shape)

Before looking at the output, notice that you invoke predict by using a dictionary that contains the named input parameter you defined when converting the model. Moreover, notice that you can extract the output by name. Ok, looking at the output, you should see something like this:

[[9.50734988e-02 5.68468310e-02 1.25199839e-01 1.81365922e-01
  6.82530180e-02 6.19330369e-02 6.57987371e-02 1.49990991e-01
  1.17628723e-01 7.79093429e-02]
 [1.01013213e-01 9.05998349e-02 1.03784502e-01 9.46180597e-02
  1.03531428e-01 1.15829453e-01 1.12450920e-01 9.77989808e-02
  1.03252187e-01 7.71213844e-02]
 [1.01013213e-01 9.05998349e-02 1.03784502e-01 9.46180597e-02
  1.03531428e-01 1.15829453e-01 1.12450920e-01 9.77989808e-02
  1.03252187e-01 7.71213844e-02]
 [1.01013213e-01 9.05998349e-02 1.03784502e-01 9.46180597e-02
  1.03531428e-01 1.15829453e-01 1.12450920e-01 9.77989808e-02
  1.03252187e-01 7.71213844e-02]
 [9.10707284e-03 8.26905966e-02 2.09420774e-04 3.71262692e-02
  9.93105568e-05 1.35476614e-04 7.70646054e-03 7.76922286e-01
  8.49828646e-02 1.02028938e-03]] (5, 10)

You might be wondering why did you get back an array of five by ten? The first thing to note is that the CoreML model didn't seem to throw any exceptions passing in an array with dimensions [1, 5, 10]. So what's going on here? This is where the subtleties of CoreML come into play. The answers to this was found in the code for CoreML, namely here

/**
 * A unidirectional long short-term memory (LSTM) layer.
 *
 * .. code::
 *
 *      (y_t, c_t) = UniDirectionalLSTMLayer(x_t, y_{t-1}, c_{t-1})
 *
 * Input
 *    A blob of rank 5, with shape ``[Seq, Batch, inputVectorSize, 1, 1]``.
 *    This represents a sequence of vectors of size ``inputVectorSize``.
 * Output
 *    Same rank as the input.
 *    Represents a vector of size ``outputVectorSize``. It is either the final output or a sequence of outputs at all time steps.
 *
 * - Output Shape: ``[1, Batch, outputVectorSize, 1, 1]`` , if ``sequenceOutput == false``
 * - Output Shape: ``[Seq, Batch, outputVectorSize, 1, 1]`` , if ``sequenceOutput == true``
 *
 */

Do you see what went wrong? The expected input parameters here aren't the usual [Sample, Time step, Features] that you've come to understand. Instead, CoreML expects inputs to LSTM to have the shape [Time Step, Sample, Features] (the comment about the input being a rank 5 blob is also misleading. This isn't strictly necessary, as CoreML will also work with a rank 3 blob).

This means, if your transpose the data, things should work. Replace the prediction code with the following:

output = coreml_model.predict({'input': np.transpose(X_predict, (1,0,2))})['output']
print(output, output.shape)

And you should get an output as follows:

[6.03173976e-04 1.23322578e-04 7.92621267e-06 3.03294946e-04
 5.13771556e-05 2.56988023e-05 3.43524476e-07 9.98805404e-01
 7.90999184e-05 3.29076329e-07] (10,)

Looks like things are working now, as the index with the highest probability is the seventh index (9.98805404e-01). So things appear to be working, however, this doesn't explain why, in the previous section when you printed the inputs of the model, the CoreML expected a multiArrayType of shape ten? Unfortunately, this is one of those things that the author of the tutorial couldn't explain. nor were they able to get a response from the developers. Its just something you will have to learn from experience when working with CoreML.

Great, with that solved, time to try this model out in a Swift project. First, you'll have to save the model, so the following code in an empty cell:

coreml_model.save("Echo.mlmodel")

Note that this will save your model in your current working directory. You could also give the method a path to where you'd like the model saved. Finally, the name you give this file will be the name of the class that Xcode will generate for you. In this tutorial, the name of the file was arbitrarily named Echo.

Predictions in Swift

You've converted your model, and you're ready to use that model in your Swift application. In this section, although you'll create an iOS app, but know that CoreML models can be used on all Apple devices that support CoreML. There won't be a UI element to this project, instead, you'll print everything to the console.

Create an iOS app in Xcode, and drag the CoreML model you created previously into the project.

Note: Be sure to "Copy items if needed" is selected as the destination when adding the file to the project.

Xcode should have generated a class based on the name of the file. In this case, the name of the class should be Echo. First you'll have to create the input array, which is of type MLMultiArray.

let array = try? MLMultiArray(shape: [5, 1, 10], dataType: .double)

Notice that you've created an array with the same shape as the array in python, with the number of time steps set as the first index. Next, you'll need a way to populate the array. Here is some code to do that.

private func assign(_ array: MLMultiArray, timestep: NSNumber, value: NSNumber) {
    for index in 0..<array.shape[2].intValue {
        array[[timestep, 0, index as NSNumber]] = 0
    }

    array[[timestep, 0, value]] = 1
}

Next you'll create a one-hot encoded array using the sample input of [7, 5, 2, 7, 0]. Note that we expect the output to generate the value 2.

for (index, value) in [7, 5, 2, 7, 0].enumerated() {
    assign(array!, timestep: index as NSNumber, value: value as NSNumber)
}

Next, you'll create your model, initialize it with the MLMultiArray you created, and invoke predict on the model.

let echo = Echo()
let input = EchoInput(
    input: array!,
    lstm_1_h_in: nil,
    lstm_1_c_in: nil
)

let prediction = try? echo.prediction(input: input)

And finally you can dump the results of the output to the console.

var out = ""
for index in 0..<prediction!.output.shape[0].intValue {
    out += "[\(index)] => \(prediction!.output[index].stringValue)\n"
}
print("\(out)")

If all goes well, you should have an output that looks as follows:

[0] => 1.472839176130947e-05
[1] => 0.00010259824921377
[2] => 0.999420166015625
[3] => 9.922024764819071e-05
[4] => 1.190937655337621e-05
[5] => 2.223510136900586e-06
[6] => 3.21386287396308e-05
[7] => 0.0002895792422350496
[8] => 2.652751754794735e-05
[9] => 8.262906590061903e-07

If all went well, you should have an output that shows the highest probability is at index 2, Congratulations, you've gone from training an LSTM model in Keras model to using that model in a Swift project.

Takeaways

In this tutorial, you created and trained a Keras model which uses an LSTM, converted it to CoreML using coremltools, and performed a prediction in Swift.

There are two key points in this tutorial:

Keras LSTMs expect input data in the format [Samples, Time steps, Features].
CoreML LSTMs expect the input data in the format [Time steps, Samples, Features]

If you enjoyed this tutorial on working with CoreML LSTMs, consider checking out mlfairy.com. MLFairy is a service that helps you create better CoreML model for all your Apple edge devices.