{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n\n# Training Vision Models for microTVM on Arduino\n**Author**: [Gavin Uberti](https://github.com/guberti)\n\nThis tutorial shows how MobileNetV1 models can be trained\nto fit on embedded devices, and how those models can be\ndeployed to Arduino using TVM.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

Note

This tutorial is best viewed as a Jupyter Notebook. You can download and run it locally\n using the link at the bottom of this page, or open it online for free using Google Colab.\n Click the icon below to open in Google Colab.

\n\n\n\n## Motivation\nWhen building IOT devices, we often want them to **see and understand** the world around them.\nThis can take many forms, but often times a device will want to know if a certain **kind of\nobject** is in its field of vision.\n\nFor example, a security camera might look for **people**, so it can decide whether to save a video\nto memory. A traffic light might look for **cars**, so it can judge which lights should change\nfirst. Or a forest camera might look for a **kind of animal**, so they can estimate how large\nthe animal population is.\n\nTo make these devices affordable, we would like them to need only a low-cost processor like the\n[nRF52840](https://www.nordicsemi.com/Products/nRF52840) (costing five dollars each on Mouser) or the [RP2040](https://www.raspberrypi.com/products/rp2040/) (just $1.45 each!).\n\nThese devices have very little memory (~250 KB RAM), meaning that no conventional edge AI\nvision model (like MobileNet or EfficientNet) will be able to run. In this tutorial, we will\nshow how these models can be modified to work around this requirement. Then, we will use TVM\nto compile and deploy it for an Arduino that uses one of these processors.\n\n### Installing the Prerequisites\n\nThis tutorial will use TensorFlow to train the model - a widely used machine learning library\ncreated by Google. TensorFlow is a very low-level library, however, so we will the Keras\ninterface to talk to TensorFlow. We will also use TensorFlow Lite to perform quantization on\nour model, as TensorFlow by itself does not support this.\n\nOnce we have our generated model, we will use TVM to compile and test it. To avoid having to\nbuild from source, we'll install ``tlcpack`` - a community build of TVM. Lastly, we'll also\ninstall ``imagemagick`` and ``curl`` to preprocess data:\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "%%bash\npip install -q tensorflow tflite\npip install -q tlcpack-nightly -f https://tlcpack.ai/wheels\napt-get -qq install imagemagick curl\n\n# Install Arduino CLI and library for Nano 33 BLE\ncurl -fsSL https://raw.githubusercontent.com/arduino/arduino-cli/master/install.sh | sh\n/content/bin/arduino-cli core update-index\n/content/bin/arduino-cli core install arduino:mbed_nano" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Using the GPU\n\nThis tutorial demonstrates training a neural network, which is requires a lot of computing power\nand will go much faster if you have a GPU. If you are viewing this tutorial on Google Colab, you\ncan enable a GPU by going to **Runtime->Change runtime type** and selecting \"GPU\" as our hardware\naccelerator. If you are running locally, you can [follow TensorFlow's guide](https://www.tensorflow.org/guide/gpu) instead.\n\nWe can test our GPU installation with the following code:\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import tensorflow as tf\n\nif not tf.test.gpu_device_name():\n print(\"No GPU was detected!\")\n print(\"Model training will take much longer (~30 minutes instead of ~5)\")\nelse:\n print(\"GPU detected - you're good to go.\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Choosing Our Work Dir\nWe need to pick a directory where our image datasets, trained model, and eventual Arduino sketch\nwill all live. If running on Google Colab, we'll save everything in ``/root`` (aka ``~``) but you'll\nprobably want to store it elsewhere if running locally. Note that this variable only affects Python\nscripts - you'll have to adjust the Bash commands too.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import os\n\nFOLDER = \"/root\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Downloading the Data\nConvolutional neural networks usually learn by looking at many images, along with labels telling\nthe network what those images are. To get these images, we'll need a publicly available dataset\nwith thousands of images of all sorts of objects and labels of what's in each image. We'll also\nneed a bunch of images that **aren't** of cars, as we're trying to distinguish these two classes.\n\nIn this tutorial, we'll create a model to detect if an image contains a **car**, but you can use\nwhatever category you like! Just change the source URL below to one containing images of another\ntype of object.\n\nTo get our car images, we'll be downloading the [Stanford Cars dataset](http://ai.stanford.edu/~jkrause/cars/car_dataset.html),\nwhich contains 16,185 full color images of cars. We'll also need images of random things that\naren't cars, so we'll use the [COCO 2017](https://cocodataset.org/#home) validation set (it's\nsmaller, and thus faster to download than the full training set. Training on the full data set\nwould yield better results). Note that there are some cars in the COCO 2017 data set, but it's\na small enough fraction not to matter - just keep in mind that this will drive down our percieved\naccuracy slightly.\n\nWe could use the TensorFlow dataloader utilities, but we'll instead do it manually to make sure\nit's easy to change the datasets being used. We'll end up with the following file hierarchy:\n\n```\n/root\n\u251c\u2500\u2500 images\n\u2502 \u251c\u2500\u2500 object\n\u2502 \u2502 \u251c\u2500\u2500 000001.jpg\n\u2502 \u2502 \u2502 ...\n\u2502 \u2502 \u2514\u2500\u2500 016185.jpg\n\u2502 \u251c\u2500\u2500 object.tgz\n\u2502 \u251c\u2500\u2500 random\n\u2502 \u2502 \u251c\u2500\u2500 000000000139.jpg\n\u2502 \u2502 \u2502 ...\n\u2502 \u2502 \u2514\u2500\u2500 000000581781.jpg\n\u2502 \u2514\u2500\u2500 random.zip\n```\nWe should also note that Stanford cars has 8k images, while the COCO 2017 validation set is 5k\nimages - it is not a 50/50 split! If we wanted to, we could weight these classes differently\nduring training to correct for this, but training will still work if we ignore it. It should\ntake about **2 minutes** to download the Stanford Cars, while COCO 2017 validation will take\n**1 minute**.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import os\nimport shutil\nimport urllib.request\n\n# Download datasets\nos.makedirs(f\"{FOLDER}/downloads\")\nos.makedirs(f\"{FOLDER}/images\")\nurllib.request.urlretrieve(\n \"https://data.deepai.org/stanfordcars.zip\", f\"{FOLDER}/downloads/target.zip\"\n)\nurllib.request.urlretrieve(\n \"http://images.cocodataset.org/zips/val2017.zip\", f\"{FOLDER}/downloads/random.zip\"\n)\n\n# Extract them and rename their folders\nshutil.unpack_archive(f\"{FOLDER}/downloads/target.zip\", f\"{FOLDER}/downloads\")\nshutil.unpack_archive(f\"{FOLDER}/downloads/random.zip\", f\"{FOLDER}/downloads\")\nshutil.move(f\"{FOLDER}/downloads/cars_train/cars_train\", f\"{FOLDER}/images/target\")\nshutil.move(f\"{FOLDER}/downloads/val2017\", f\"{FOLDER}/images/random\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Loading the Data\nCurrently, our data is stored on-disk as JPG files of various sizes. To train with it, we'll have\nto load the images into memory, resize them to be 64x64, and convert them to raw, uncompressed\ndata. Keras's ``image_dataset_from_directory`` will take care of most of this, though it loads\nimages such that each pixel value is a float from 0 to 255.\n\nWe'll also need to load labels, though Keras will help with this. From our subdirectory structure,\nit knows the images in ``/objects`` are one class, and those in ``/random`` another. Setting\n``label_mode='categorical'`` tells Keras to convert these into **categorical labels** - a 2x1 vector\nthat's either ``[1, 0]`` for an object of our target class, or ``[0, 1]`` vector for anything else.\nWe'll also set ``shuffle=True`` to randomize the order of our examples.\n\nWe will also **batch** the data - grouping samples into clumps to make our training go faster.\nSetting ``batch_size = 32`` is a decent number.\n\nLastly, in machine learning we generally want our inputs to be small numbers. We'll thus use a\n``Rescaling`` layer to change our images such that each pixel is a float between ``0.0`` and ``1.0``,\ninstead of ``0`` to ``255``. We need to be careful not to rescale our categorical labels though, so\nwe'll use a ``lambda`` function.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "IMAGE_SIZE = (64, 64, 3)\nunscaled_dataset = tf.keras.utils.image_dataset_from_directory(\n f\"{FOLDER}/images\",\n batch_size=32,\n shuffle=True,\n label_mode=\"categorical\",\n image_size=IMAGE_SIZE[0:2],\n)\nrescale = tf.keras.layers.Rescaling(scale=1.0 / 255)\nfull_dataset = unscaled_dataset.map(lambda im, lbl: (rescale(im), lbl))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### What's Inside Our Dataset?\nBefore giving this data set to our neural network, we ought to give it a quick visual inspection.\nDoes the data look properly transformed? Do the labels seem appropriate? And what's our ratio of\nobjects to other stuff? We can display some examples from our datasets using ``matplotlib``:\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n\nnum_target_class = len(os.listdir(f\"{FOLDER}/images/target/\"))\nnum_random_class = len(os.listdir(f\"{FOLDER}/images/random/\"))\nprint(f\"{FOLDER}/images/target contains {num_target_class} images\")\nprint(f\"{FOLDER}/images/random contains {num_random_class} images\")\n\n# Show some samples and their labels\nSAMPLES_TO_SHOW = 10\nplt.figure(figsize=(20, 10))\nfor i, (image, label) in enumerate(unscaled_dataset.unbatch()):\n if i >= SAMPLES_TO_SHOW:\n break\n ax = plt.subplot(1, SAMPLES_TO_SHOW, i + 1)\n plt.imshow(image.numpy().astype(\"uint8\"))\n plt.title(list(label.numpy()))\n plt.axis(\"off\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Validating our Accuracy\nWhile developing our model, we'll often want to check how accurate it is (e.g. to see if it\nimproves during training). How do we do this? We could just train it on *all* of the data, and\nthen ask it to classify that same data. However, our model could cheat by just memorizing all of\nthe samples, which would make it *appear* to have very high accuracy, but perform very badly in\nreality. In practice, this \"memorizing\" is called **overfitting**.\n\nTo prevent this, we will set aside some of the data (we'll use 20%) as a **validation set**. Our\nmodel will never be trained on validation data - we'll only use it to check our model's accuracy.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "num_batches = len(full_dataset)\ntrain_dataset = full_dataset.take(int(num_batches * 0.8))\nvalidation_dataset = full_dataset.skip(len(train_dataset))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Loading the Data\nIn the past decade, [convolutional neural networks](https://en.wikipedia.org/wiki/Convolutional_neural_network) have been widely\nadopted for image classification tasks. State-of-the-art models like [EfficientNet V2](https://arxiv.org/abs/2104.00298) are able\nto perform image classification better than even humans! Unfortunately, these models have tens of\nmillions of parameters, and thus won't fit on cheap security camera computers.\n\nOur applications generally don't need perfect accuracy - 90% is good enough. We can thus use the\nolder and smaller MobileNet V1 architecture. But this *still* won't be small enough - by default,\nMobileNet V1 with 224x224 inputs and alpha 1.0 takes ~50 MB to just **store**. To reduce the size\nof the model, there are three knobs we can turn. First, we can reduce the size of the input images\nfrom 224x224 to 96x96 or 64x64, and Keras makes it easy to do this. We can also reduce the **alpha**\nof the model, from 1.0 to 0.25, which downscales the width of the network (and the number of\nfilters) by a factor of four. And if we were really strapped for space, we could reduce the\nnumber of **channels** by making our model take grayscale images instead of RGB ones.\n\nIn this tutorial, we will use an RGB 64x64 input image and alpha 0.25. This is not quite\nideal, but it allows the finished model to fit in 192 KB of RAM, while still letting us perform\ntransfer learning using the official TensorFlow source models (if we used alpha <0.25 or a\ngrayscale input, we wouldn't be able to do this).\n\n### What is Transfer Learning?\nDeep learning has [dominated image classification](https://paperswithcode.com/sota/image-classification-on-imagenet) for a long time,\nbut training neural networks takes a lot of time. When a neural network is trained \"from scratch\",\nits parameters start out randomly initialized, forcing it to learn very slowly how to tell images\napart.\n\nWith transfer learning, we instead start with a neural network that's **already** good at a\nspecific task. In this example, that task is classifying images from [the ImageNet database](https://www.image-net.org/). This\nmeans the network already has some object detection capabilities, and is likely closer to what you\nwant then a random model would be.\n\nThis works especially well with image processing neural networks like MobileNet. In practice, it\nturns out the convolutional layers of the model (i.e. the first 90% of the layers) are used for\nidentifying low-level features like lines and shapes - only the last few fully connected layers\nare used to determine how those shapes make up the objects the network is trying to detect.\n\nWe can take advantage of this by starting training with a MobileNet model that was trained on\nImageNet, and already knows how to identify those lines and shapes. We can then just remove the\nlast few layers from this pretrained model, and add our own final layers. We'll then train this\nconglomerate model for a few epochs on our cars vs non-cars dataset, to adjust the first layers\nand train from scratch the last layers. This process of training an already-partially-trained\nmodel is called *fine-tuning*.\n\nSource MobileNets for transfer learning have been [pretrained by the TensorFlow folks](https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet_v1.md), so we\ncan just download the one closest to what we want (the 128x128 input model with 0.25 depth scale).\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "os.makedirs(f\"{FOLDER}/models\")\nWEIGHTS_PATH = f\"{FOLDER}/models/mobilenet_2_5_128_tf.h5\"\nurllib.request.urlretrieve(\n \"https://storage.googleapis.com/tensorflow/keras-applications/mobilenet/mobilenet_2_5_128_tf.h5\",\n WEIGHTS_PATH,\n)\n\npretrained = tf.keras.applications.MobileNet(\n input_shape=IMAGE_SIZE, weights=WEIGHTS_PATH, alpha=0.25\n)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Modifying Our Network\nAs mentioned above, our pretrained model is designed to classify the 1,000 ImageNet categories,\nbut we want to convert it to classify cars. Since only the bottom few layers are task-specific,\nwe'll **cut off the last five layers** of our original model. In their place we'll build our own\n\"tail\" to the model by performing respape, dropout, flatten, and softmax operations.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "model = tf.keras.models.Sequential()\n\nmodel.add(tf.keras.layers.InputLayer(input_shape=IMAGE_SIZE))\nmodel.add(tf.keras.Model(inputs=pretrained.inputs, outputs=pretrained.layers[-5].output))\n\nmodel.add(tf.keras.layers.Reshape((-1,)))\nmodel.add(tf.keras.layers.Dropout(0.1))\nmodel.add(tf.keras.layers.Flatten())\nmodel.add(tf.keras.layers.Dense(2, activation=\"softmax\"))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Fine Tuning Our Network\nWhen training neural networks, we must set a parameter called the **learning rate** that controls\nhow fast our network learns. It must be set carefully - too slow, and our network will take\nforever to train; too fast, and our network won't be able to learn some fine details. Generally\nfor Adam (the optimizer we're using), ``0.001`` is a pretty good learning rate (and is what's\nrecommended in the [original paper](https://arxiv.org/abs/1412.6980)). However, in this case\n``0.0005`` seems to work a little better.\n\nWe'll also pass the validation set from earlier to ``model.fit``. This will evaluate how good our\nmodel is each time we train it, and let us track how our model is improving. Once training is\nfinished, the model should have a validation accuracy around ``0.98`` (meaning it was right 98% of\nthe time on our validation set).\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "model.compile(\n optimizer=tf.keras.optimizers.Adam(learning_rate=0.0005),\n loss=\"categorical_crossentropy\",\n metrics=[\"accuracy\"],\n)\nmodel.fit(train_dataset, validation_data=validation_dataset, epochs=3, verbose=2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Quantization\nWe've done a decent job of reducing our model's size so far - changing the input dimension,\nalong with removing the bottom layers reduced the model to just 219k parameters. However, each of\nthese parameters is a ``float32`` that takes four bytes, so our model will take up almost one MB!\n\nAdditionally, it might be the case that our hardware doesn't have built-in support for floating\npoint numbers. While most high-memory Arduinos (like the Nano 33 BLE) do have hardware support,\nsome others (like the Arduino Due) do not. On any boards *without* dedicated hardware support,\nfloating point multiplication will be extremely slow.\n\nTo address both issues we will **quantize** the model - representing the weights as eight bit\nintegers. It's more complex than just rounding, though - to get the best performance, TensorFlow\ntracks how each neuron in our model activates, so we can figure out how most accurately simulate\nthe neuron's original activations with integer operations.\n\nWe will help TensorFlow do this by creating a representative dataset - a subset of the original\nthat is used for tracking how those neurons activate. We'll then pass this into a ``TFLiteConverter``\n(Keras itself does not have quantization support) with an ``Optimize`` flag to tell TFLite to perform\nthe conversion. By default, TFLite keeps the inputs and outputs of our model as floats, so we must\nexplicitly tell it to avoid this behavior.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def representative_dataset():\n for image_batch, label_batch in full_dataset.take(10):\n yield [image_batch]\n\n\nconverter = tf.lite.TFLiteConverter.from_keras_model(model)\nconverter.optimizations = [tf.lite.Optimize.DEFAULT]\nconverter.representative_dataset = representative_dataset\nconverter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]\nconverter.inference_input_type = tf.uint8\nconverter.inference_output_type = tf.uint8\n\nquantized_model = converter.convert()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Download the Model if Desired\nWe've now got a finished model that you can use locally or in other tutorials (try autotuning\nthis model or viewing it on [https://netron.app/](https://netron.app/)). But before we do\nthose things, we'll have to write it to a file (``quantized.tflite``). If you're running this\ntutorial on Google Colab, you'll have to uncomment the last two lines to download the file\nafter writing it.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "QUANTIZED_MODEL_PATH = f\"{FOLDER}/models/quantized.tflite\"\nwith open(QUANTIZED_MODEL_PATH, \"wb\") as f:\n f.write(quantized_model)\n# from google.colab import files\n# files.download(QUANTIZED_MODEL_PATH)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Compiling With TVM For Arduino\nTensorFlow has a built-in framework for deploying to microcontrollers - [TFLite Micro](https://www.tensorflow.org/lite/microcontrollers). However,\nit's poorly supported by development boards and does not support autotuning. We will use Apache\nTVM instead.\n\nTVM can be used either with its command line interface (``tvmc``) or with its Python interface. The\nPython interface is fully-featured and more stable, so we'll use it here.\n\nTVM is an optimizing compiler, and optimizations to our model are performed in stages via\n**intermediate representations**. The first of these is [Relay](https://arxiv.org/abs/1810.00952) a high-level intermediate\nrepresentation emphasizing portability. The conversion from ``.tflite`` to Relay is done without any\nknowledge of our \"end goal\" - the fact we intend to run this model on an Arduino.\n\n### Choosing an Arduino Board\nNext, we'll have to decide exactly which Arduino board to use. The Arduino sketch that we\nultimately generate should be compatible with any board, but knowing which board we are using in\nadvance allows TVM to adjust its compilation strategy to get better performance.\n\nThere is one catch - we need enough **memory** (flash and RAM) to be able to run our model. We\nwon't ever be able to run a complex vision model like a MobileNet on an Arduino Uno - that board\nonly has 2 kB of RAM and 32 kB of flash! Our model has ~200,000 parameters, so there is just no\nway it could fit.\n\nFor this tutorial, we will use the Nano 33 BLE, which has 1 MB of flash memory and 256 KB of RAM.\nHowever, any other Arduino with those specs or better should also work.\n\n### Generating our project\nNext, we'll compile the model to TVM's MLF (model library format) intermediate representation,\nwhich consists of C/C++ code and is designed for autotuning. To improve performance, we'll tell\nTVM that we're compiling for the ``nrf52840`` microprocessor (the one the Nano 33 BLE uses). We'll\nalso tell it to use the C runtime (abbreviated ``crt``) and to use ahead-of-time memory allocation\n(abbreviated ``aot``, which helps reduce the model's memory footprint). Lastly, we will disable\nvectorization with ``\"tir.disable_vectorize\": True``, as C has no native vectorized types.\n\nOnce we have set these configuration parameters, we will call ``tvm.relay.build`` to compile our\nRelay model into the MLF intermediate representation. From here, we just need to call\n``tvm.micro.generate_project`` and pass in the Arduino template project to finish compilation.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import shutil\nimport tflite\nimport tvm\n\n# Method to load model is different in TFLite 1 vs 2\ntry: # TFLite 2.1 and above\n tflite_model = tflite.Model.GetRootAsModel(quantized_model, 0)\nexcept AttributeError: # Fall back to TFLite 1.14 method\n tflite_model = tflite.Model.Model.GetRootAsModel(quantized_model, 0)\n\n# Convert to the Relay intermediate representation\nmod, params = tvm.relay.frontend.from_tflite(tflite_model)\n\n# Set configuration flags to improve performance\ntarget = tvm.target.target.micro(\"nrf52840\")\nruntime = tvm.relay.backend.Runtime(\"crt\")\nexecutor = tvm.relay.backend.Executor(\"aot\", {\"unpacked-api\": True})\n\n# Convert to the MLF intermediate representation\nwith tvm.transform.PassContext(opt_level=3, config={\"tir.disable_vectorize\": True}):\n mod = tvm.relay.build(mod, target, runtime=runtime, executor=executor, params=params)\n\n# Generate an Arduino project from the MLF intermediate representation\nshutil.rmtree(f\"{FOLDER}/models/project\", ignore_errors=True)\narduino_project = tvm.micro.generate_project(\n tvm.micro.get_microtvm_template_projects(\"arduino\"),\n mod,\n f\"{FOLDER}/models/project\",\n {\n \"arduino_board\": \"nano33ble\",\n \"arduino_cli_cmd\": \"/content/bin/arduino-cli\",\n \"project_type\": \"example_project\",\n },\n)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Testing our Arduino Project\nConsider the following two 224x224 images from the author's camera roll - one of a car, one not.\nWe will test our Arduino project by loading both of these images and executing the compiled model\non them.\n\n\n\nCurrently, these are 224x224 PNG images we can download from Imgur. Before we can feed in these\nimages, we'll need to resize and convert them to raw data, which can be done with ``imagemagick``.\n\nIt's also challenging to load raw data onto an Arduino, as only C/CPP files (and similar) are\ncompiled. We can work around this by embedding our raw data in a hard-coded C array with the\nbuilt-in utility ``bin2c`` that will output a file like below:\n\n```c\nstatic const unsigned char CAR_IMAGE[] = {\n 0x22,0x23,0x14,0x22,\n ...\n 0x07,0x0e,0x08,0x08\n};\n```\nWe can do both of these things with a few lines of Bash code:\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "%%bash\nmkdir -p ~/tests\ncurl \"https://i.imgur.com/JBbEhxN.png\" -o ~/tests/car_224.png\nconvert ~/tests/car_224.png -resize 64 ~/tests/car_64.png\nstream ~/tests/car_64.png ~/tests/car.raw\nbin2c -c -st ~/tests/car.raw --name CAR_IMAGE > ~/models/project/car.c\n\ncurl \"https://i.imgur.com/wkh7Dx2.png\" -o ~/tests/catan_224.png\nconvert ~/tests/catan_224.png -resize 64 ~/tests/catan_64.png\nstream ~/tests/catan_64.png ~/tests/catan.raw\nbin2c -c -st ~/tests/catan.raw --name CATAN_IMAGE > ~/models/project/catan.c" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Writing our Arduino Script\nWe now need a little bit of Arduino code to read the two binary arrays we just generated, run the\nmodel on them, and log the output to the serial monitor. This file will replace ``arduino_sketch.ino``\nas the main file of our sketch. You'll have to copy this code in manually..\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "%%writefile /root/models/project.ino\n#include \"src/model.h\"\n#include \"car.c\"\n#include \"catan.c\"\n\nvoid setup() {\n Serial.begin(9600);\n TVMInitialize();\n}\n\nvoid loop() {\n uint8_t result_data[2];\n Serial.println(\"Car results:\");\n TVMExecute(const_cast(CAR_IMAGE), result_data);\n Serial.print(result_data[0]); Serial.print(\", \");\n Serial.print(result_data[1]); Serial.println();\n\n Serial.println(\"Other object results:\");\n TVMExecute(const_cast(CATAN_IMAGE), result_data);\n Serial.print(result_data[0]); Serial.print(\", \");\n Serial.print(result_data[1]); Serial.println();\n\n delay(1000);\n}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Compiling Our Code\nNow that our project has been generated, TVM's job is mostly done! We can still call\n``arduino_project.build()`` and ``arduino_project.upload()``, but these just use ``arduino-cli``'s\ncompile and flash commands underneath. We could also begin autotuning our model, but that's a\nsubject for a different tutorial. To finish up, we'll verify no compiler errors are thrown\nby our project:\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "shutil.rmtree(f\"{FOLDER}/models/project/build\", ignore_errors=True)\narduino_project.build()\nprint(\"Compilation succeeded!\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Uploading to Our Device\nThe very last step is uploading our sketch to an Arduino to make sure our code works properly.\nUnfortunately, we can't do that from Google Colab, so we'll have to download our sketch. This is\nsimple enough to do - we'll just turn our project into a `.zip` archive, and call `files.download`.\nIf you're running on Google Colab, you'll have to uncomment the last two lines to download the file\nafter writing it.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "ZIP_FOLDER = f\"{FOLDER}/models/project\"\nshutil.make_archive(ZIP_FOLDER, \"zip\", ZIP_FOLDER)\n# from google.colab import files\n# files.download(f\"{FOLDER}/models/project.zip\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "From here, we'll need to open it in the Arduino IDE. You'll have to download the IDE as well as\nthe SDK for whichever board you are using. For certain boards like the Sony SPRESENSE, you may\nhave to change settings to control how much memory you want the board to use.\n\n### Expected Results\nIf all works as expected, you should see the following output on a Serial monitor:\n\n```\nCar results:\n255, 0\nOther object results:\n0, 255\n```\nThe first number represents the model's confidence that the object **is** a car and ranges from\n0-255. The second number represents the model's confidence that the object **is not** a car and\nis also 0-255. These results mean the model is very sure that the first image is a car, and the\nsecond image is not (which is correct). Hence, our model is working!\n\n## Summary\nIn this tutorial, we used transfer learning to quickly train an image recognition model to\nidentify cars. We modified its input dimensions and last few layers to make it better at this,\nand to make it faster and smaller. We then quantified the model and compiled it using TVM to\ncreate an Arduino sketch. Lastly, we tested the model using two static images to prove it works\nas intended.\n\n### Next Steps\nFrom here, we could modify the model to read live images from the camera - we have another\nArduino tutorial for how to do that [on GitHub](https://github.com/guberti/tvm-arduino-demos/tree/master/examples/person_detection). Alternatively, we could also\n[use TVM's autotuning capabilities](https://tvm.apache.org/docs/how_to/work_with_microtvm/micro_autotune.html) to dramatically improve the model's performance.\n\n\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.5" } }, "nbformat": 4, "nbformat_minor": 0 }