Cozmo, read to me
Do you know Cozmo? The friendly robot from Anki? Well…here he is…
Cozmo is a programmable robot that has many features…and one of those includes a camera…so you can Cozmo take a picture of something…and then do something with that picture…
To code for Cozmo you need to use Python…actually…Python 3 😉
For this blog, we’re going to need a couple of things…so let’s install them…
pip3 install ‘cozmo[camera]’
This will install the Cozmo SDK…and you will need to install the Cozmo app in your phone as well…
If you have the SDK installed already, you may want to upgrade it because if you don’t have the latest version it might not work…
pip3 install --upgrade cozmo
Now, we need a couple of extra things…
sudo apt-get install python-pygame
pip3 install pillow
pip3 install numpy
pygame is a games framework
pillow is a wrapper around the PIL library and it’s used to manage images.
numpy allows us to manage complex numbers in Python.
That was the easy part…as now we need to install OpenCV…which allows to manipulate images and video…
This one is a little bit tricky, so if you get stuck…search on Google or just drop me a message…
First, make sure that OpenCV is not installed by removing it…unless you are sure it’s working properly for you…
sudo apt-get uninstall opencv
Then, install the following prerequisites…
sudo apt-get install build-essential cmake pkg-config yasm python-numpy
sudo apt-get install libjpeg-dev libjpeg8-dev libtiff5-dev libjasper-dev
libpng12-dev
sudo apt-get install libavcodec-dev libavformat-dev libswscale-dev
libv4l-dev libdc1394-22-dev
sudo apt-get install libxvidcore-dev libx264-dev libxine-dev libfaac-dev
sudo apt-get install libgtk-3-dev libtbb-dev libqt4-dev libmp3lame-dev
sudo apt-get install libatlas-base-dev gfortran
sudo apt-get install libopencore-amrnb-dev libopencore-amrwb-dev
libtheora-dev libxvidcore-dev x264 v4l-utils
If by any chance, something is not available on your system, simply remove it from the list and try again…unless you’re like me and want to spend hours trying to get everything…
Now, we need to download the OpenCV source code so we can build it…from the source…
wget https://github.com/opencv/opencv/archive/3.4.0.zip
unzip opencv-3.4.0.zip //This should produce the folder opencv-3.4.0
Then, we need to download the contributions because there are some things not bundled in OpenCV by default…and you might need them for any other project…
wget https://github.com/opencv/opencv_contrib/archive/3.4.0.zip
unzip opencv-contrib-3.4.0.zip
//This should produce the folder opencv_contrib-3.4.0
As we have both folders, we can start compiling…
cd opencv-3.4.0
mkdir build
cd build
cmake -D CMAKE_BUILD_TYPE=RELEASE
-D CMAKE_INSTALL_PREFIX=/usr/local
-D INSTALL_PYTHON_EXAMPLES=OFF
-D CMAKE_CXX_COMPILER=/usr/bin/g++
-D INSTALL_C_EXAMPLES=OFF
-D OPENCV_EXTRA_MODULES_PATH=/YourPath/opencv_contrib-3.4.0/modules
-D PYTHON_EXECUTABLE=/usr/bin/python3.6
-D WITH_FFMPEG=OFF
-D BUILD_OPENCV_APPS=OFF
-D BUILD_OPENCD_TS=OFF
-D WITH_LIBV4L=OFF
-D WITH_CUDA=OFF
-D WITH_V4L=ON
-D WITH_QT=ON
-D WITH_LAPACK=OFF
-D WITH_OPENCV_BIOINSPIRED=OFF
-D WITH_XFEATURES2D=ON
-D WITH_OPENCL=OFF
-D WITH_FACE=ON
-D ENABLE_PRECOMPILED_HEADERS=ON
-D WITH_OPENCL=OFF
-D WITH_OPENCL_SVM=OFF
-D WITH_OPENCLAMDFFT=OFF
-D WITH_OPENCLAMDBLAS=OFF
-D WITH_OPENCV_DNN=OFF
-D BUILD_OPENCV_APPS=ON
-D BUILD_EXAMPLES=OFF ..
Keep extra attention that you need to pass the correct path to your opencv_contrib folder…so it’s better to pass the full path to avoid making errors…
And yes…that’s a pretty long command for a build…and it took me a long time to make it work…as you need to figure out all the parameters…
Once we’re done, we need to make it…as cmake will prepare the recipe…
make -j2
If there’s any mistake, simply do this…
make clean
make
Then, we can finally install OpenCV by doing this…
sudo make install
sudo ldconfig
To test that it’s working properly…simply do this…
python3
>>>import cv2
If you don’t have any errors…then we’re good to go -;)
That was quite a lot of work…anyway…we need an extra tool to make sure our image get nicely processed…
Download textcleaner and put in the same folder as your Python script…
And…just in case you’re wondering…yes…we’re going to have Cozmo take a picture…we’re going to process it…use SAP Leonardo’s OCR API and then have Cozmo read it back to us…cool, huh?
SAP Leonardo’s OCR API is still on version 2Alpha1…but regardless of that…it works amazing well -;)
Although keep in mind that if the result is not always pretty accurate that because of the lighting, the position of the image, your handwritting and the fact that the OCR API is still in Alpha…
Ok…so first things first…we need a white board…
And yes…my hand writing is far from being good… -:(
Now, let’s jump into the source code…
import cozmo
from cozmo.util import degrees
import PIL
import cv2
import numpy as np
import os
import requests
import json
import re
import time
import pygame
import _thread
def input_thread(L):
input()
L.append(None)
def process_image(image_name):
image = cv2.imread(image_name)
img = cv2.resize(image, (600, 600))
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(img, (5, 5), 0)
denoise = cv2.fastNlMeansDenoising(blur)
thresh = cv2.adaptiveThreshold(denoise, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 11, 2)
blur1 = cv2.GaussianBlur(thresh, (5, 5), 0)
dst = cv2.GaussianBlur(blur1, (5, 5), 0)
cv2.imwrite('imggray.png', dst)
cmd = './textcleaner -g -e normalize -o 12 -t 5 -u imggray.png out.png'
os.system(cmd)
def ocr():
url = "https://sandbox.api.sap.com/ml/ocr/ocr"
img_path = "out.png"
files = {'files': open (img_path, 'rb')}
headers = {
'APIKey': "APIKey",
'Accept': "application/json",
}
response = requests.post(url, files=files, headers=headers)
json_response = json.loads(response.text)
json_text = json_response['predictions'][0]
json_text = re.sub('\n',' ',json_text)
json_text = re.sub('3','z',json_text)
json_text = re.sub('0|O','o',json_text)
return json_text
def cozmo_program(robot: cozmo.robot.Robot):
robot.camera.color_image_enabled = False
L = []
_thread.start_new_thread(input_thread, (L,))
robot.set_head_angle(degrees(20.0)).wait_for_completed()
while True:
if L:
filename = "Message" + ".png"
pic_filename = filename
latest_image = robot.world.latest_image.raw_image
latest_image.convert('L').save(pic_filename)
robot.say_text("Picture taken!").wait_for_completed()
process_image(filename)
message = ocr()
print(message)
robot.say_text(message, use_cozmo_voice=True, duration_scalar=0.5).wait_for_completed()
break
pygame.init()
cozmo.run_program(cozmo_program, use_viewer=True, force_viewer_on_top=True)
Let’s analyze the code a little bit…
We’re going to use threads, as we need to have a window where we can see what Cozmo is looking at and another with Pygame where we can press “Enter” as command to have Cozmo taking a picture.
Basically, when we run the application, Cozmo will move his head and get into picture mode…then, if we press “Enter” (On the terminal screen) it will take a picture and then send it to our OpenCV processing function.
This function will simply grab the image, scale it, make it grayscale, do a GaussianBlur to blur the image and remove the noise and reduce detail. Then we’re going to apply a denoising to get rid of dust and fireflies…apply a threshold to separate the white and black pixels, and apply a couple more blurs…
Finally we’re to call textcleaner to further remove noise and make the image cleaner…
So, here is the original picture taken by Cozmo…
This is the picture after our OpenCV post-processing…
And finally, this is our image after using textcleaner…
Finally, once we have the image the way we wanted, we can call the OCR API which is pretty straightforward…
To get the API Key, simply go to https://api.sap.com/api/ocr_api/overview and log in…
Once we have the response back from the API, we can do some Regular Expressions cleanup just to make sure some characters doesn’t get wrongly recognized…
Finally, we can have Cozmo to read the message out loud -;) And just for demonstration purposes…
Here, I was lucky enough that the lighting and everything was perfectly setup…so it was a pretty clean response…further tests were pretty bad -:( But again…it’s important to have good lighting…
Of course…you wan to see a video of the process in action, right? Well…funny enough…my first try was perfect! Even better than this one…but I didn’t shoot the video -:( Further tries were pretty crappy until I could get something acceptable…and this is what you’re going to watch now…the sun coming through the window didn’t helped me…but it’s pretty good anyway…
Hope you liked this blog -:)
Very nice. Thank you
Glad you like it 🙂
Blag.
Thanks Alvaro Tejada Galindo for the great blog. I really like such kind of blogs where we mix different kind of things and come up with something cool like this. I remember using my phone camera and OCR API for the invoice processing example. The concept of API's has actually made the integration of different things so east.
Thanks
Nabheet
Thanks Nabheet 🙂 That's why I love to do 😉 Grab this from here and there and see what happens 😛
API's are pretty easy and convenient to use that's for sure 😀
Greetings,
Blag.
Wow!!!  Something so very fun.  And so many different things we could do with this. Sadly not just fun things, but work related things as Nabsheet Madan said - APIs are something we use a lot.  Everything must be assimilated to work as one mind. OK not quite that bad - but some days...
Thank you for the great blog - so nice to see you back!
Michelle
Thanks Michelle! I told you I was coming back 😉 Yes, API's are the way to go these days...
Greetings,
Blag.
Hello, I have trouble installing libxine-dev, it should now be libxine2-dev on Ubuntu 18.04 LTS.
I have also this problem
should be continuous: