Visuomotor robot arm coordination : Step 1 – Stereovision object recognition

Ok, so I want to build a system that is able to pick up an object it has seen. I called that project Visuomotor coordination because I’m willing to use the robot webcams to be used to estimate the movement that the robot has to make to reach the object seen. If you read this Wikipedia article, you’ll get what I mean.

So first step is to detect the object. I assume that the system knows what the object to pick up is. I assume it has the required data on this object so that it will be able to find it on the scene. As a vision system, I’ll use the two webcams I have on my robot arm. The idea on using both of these is to benefit from the stereo vision to extract some depth information to determine how far the object is from the robot arm gripper.

Object detection – Algorithms

There’s numerous ways to do object detection… There’s a lot of detection/recognition algorithms down there ! You can give a look at that OpenCV tutorial to have a better idea of what they are and their features.

I made some tests and read some documents and finally chose the SURF algorithm because it shows pretty good detection performance and also because it’s possible to make it faster by running it on the GPU instead of the CPU. For the cons, one should notice that it’s a commercial non free library, except for research purpose and personnal use. So no worries for my application but should keep that in mind though…

Stereo object detection : my idea

My idea was the following : use the SURF algorithm by starting from the tutorial sample demonstrating Feature Matching. This sample shows how to detect an object by matching it with a reference image of that same object. This will provide me with a way to detect the object in the scene using one of the two webcam (say the left one). Then my idea is to use that same algorithm to find the corresponding features in the second webcam image.
This way, I’ll obtain two point clouds (one for each webcam) that are mapped to the object I want to detect. Then by calculating the mean of each point cloud, I’ll get the center of the object seen on each image. So, in the end I have the object position on both images.

Let’s summarize the stereo object detection algorithm I’ll program :

  1. The inputs :
    1. Two webcam images (left webcam, right webcam)
    2. A reference image of the object to be found
  2. The function block :
    1. Extract SURF features from the reference image
    2. Extract SURF features from the left webcam image
    3. Match the features to find where the object is on the left webcam image
    4. Extract SURF features from the right webcam image
    5. Match the features found on step C to find where the object is on the right image
  3. The outputs :
    1. Point cloud of the object on the left image
    2. Point cloud of the object on the right image

After spending some hard time to understand the EmguCV functions and implementing some of mine, I finally came to a stable algorithm that is able to detect an object on both webcams and outputs the point clouds.

The results :

To develop this idea I created a Visual Studio project in which I experimented and put together all the blocks I needed to reach my goal. I ended with an application that shows the two webcam images with little circles and lines that clearly shows what have been found.

Here is a screenshot of the application detecting the object.

Stereo vision feature extraction application. Red points are instantaneous detected points, the blue ones are the average points found from the instantaneous points.

Stereo vision feature extraction application. Red points are instantaneous detected points, the blue ones are the average points found from the instantaneous points.

What you can see here, is an image formed of the concatenation of the left and right webcam. The object is a small wood cube on which I drew lines to make the object easier to detect. The blue points are the averaged points on each view, calculated from the instantaneous detected points. The red ones are the instantaneous detected features that belong to the reference image as well as the left and right image from the webcams.

The buttons are :

  • Take snapshot : to take a snapshot of the object to get the reference image. This image will then be used as a reference image. I also manually create a mask to specify the algorithm what exactly is my object in the scene.
  • Reset object position : this button resets the average position to restart from the newly detected points
  • Remember object position : this stores the object position as well a the robot arm position at which the object is being seen
  • Remember arm target position : it stores the object position (X and Y for each image), the robot arm position at which the object has been detected and the robot arm position when it’s on the object in a MongoDB database. For this step, once the object has been detected, I manually move the arm to be on the object, ready to grip it. In a sense, I demonstrate it what it should do.
  • Pause/resume object detection : as its name says, it pauses or resumes the object detection loop

Using this application and process, I obtained a database that is composed of many records like this one :

This is a view of one record created by the application

This is a view of one record created by the application

Now that I have that database, I can start to train a model that will calculate the arm final position thanks to the starting position and detected object position ! Now, this is really going to be interesting as I’ll see what the model is able to do ! Reach the object or not ?

In a nutshell

The idea is to make my robot arm to be able to reach an object it has seen with its webcams. I made an application that extracts features from a reference image and detect these same features in the left and right webcam image. This allows me to get the position of the object in each image. Then I built a database containg the object position on the images, the robot arm position when the object has been detected and the arm position when it’s on the object.

My next step will be to create a model that calculates the arm position that will make the arm on the object using the current robot arm position and the detected object position on the webcam images.

This will be the object of my next post, so stay tuned !

Publicités

Some news on my DeMIMOI library

First of all, I wish everyone coming to this blog a happy new year ! Should this year be healthy and make your projects come true !

It’s been a while now since I published the code of my DeMIMOI core library. I had some time to make some improvements and add some more features to make it even more attractive to me, so possibly for anyone willing to give it a try too.

I thought well when I hoped this platform would help me easily build systems and make a bigger one by connecting smaller ones. I successfully managed to use the library with neural networks (special thanks to Accord.Net and AForge.Net). For example I made DeMIMOI models of a NARX (Nonlinear AutoRegressive eXogenous) neural network and a ARX regression model (AutoRegressive eXogenous).
These models can mimic almost any dynamic system mostly because they have a backward link that make their outputs at time t-1 one of their inputs.
Using the DeMIMOI library this backward link is easy to model and to code since it’s just a connection between an input and an output that, in code, is translated by a myModel.Outputs[0][0].ConnectTo(myModel.Inputs[0][0]) for example. The data flow is then automatically managed by the DeMIMOI core.

I also started to put a database manager in a DeMIMOI model which can currently connect to a Mongo database. So I can save and retrieve data to build models or to act as some kind of a memory… Well it’s still in development now but the main features are already up ! I mean at least reading data from a pre-existing database. One step at a time right ?

To give you a better idea of what I’m talking about and what I’m doing, I’ll show you a picture of the system I’m currently working on.

First of all, let me just explain the background.

While coding the DeMIMOI ARX and NARX models, I wanted to build models that can learn the behavior of the servomotors of my AL5C robotic arm.
On a previous attempt last year, I had the arm moving freely and randomly while an application was recording at each time step the angle values from the potentiometers of each servo (thank you so much Phidgets !).

The results have been stored in a database that I can read using my DeMIMOI memory manager. Those data can then be sent to the ARX and NARX models, and also be used by the learner models to fit them to the servo behavior.

For this purpose, I coded the system which is described by the following diagram :

DeMIMOI NARX and ARX learning Lynxmotion AL5C servoBy the way, this image has been created using the GraphViz code that is automatically generated by the DeMIMOI collection that holds all the blocks of the system.
That’s a feature I’m quite proud of since it allows me to quickly check if what I coded is what I expected the system to be !

On the left of the diagram is the memory manager that reads the servo values from a Mongo database. The data produced is then fed to the ARX model and its teacher. They don’t need extra data processing, as opposed to the NARX model.

Indeed, the NARX model is made of a neural network that needs normalized data. That’s why you can see that the data coming from the memory block is also fed to a normalizer block that converts the raw data to a [-1, 1] range before being sent to the NARX and its teacher. Then the data coming out of the NARX model is denormalized to revert it back to its original data space.

On the ARX and NARX models you can clearly see the backward link that connects the output to the input. This link makes the network seamlessly recurrent. And again, in term of coding, it’s nothing harder than creating the link !

You also may have noticed the three probes that display some data values. This was for me to quickly check what are the model ouputs (simulated) compared to the real output (measured on the real system). On that run, the ARX is better than the NARX, but I currently didn’t push any analyze further to explain this…

My next work will now be focused on analyzing the results deeper, maybe working on the so called BPTT neural network (ouch ! it hurts !) or maybe even trying to make some kind of automated learning shell that would be able to test multiple models and parameters and then select the best one…
I know ! It seams like there’s going to be a huuuuge mountain to climb… I fight this feeling by telling me that I already climb quite a big part that it would be even worse to stop now !

I’ll let you know of my progress in a while. And do not hesitate to say hi or comment on this ! I’d be curious to know what it feels like for someone external !

Lynxmotion SSC32/AL5x Robotic Arm Library

It’s been a long time I did not post anything on my blog… I’ve been quite busy during the past few months !

I did not do any major development so far and I decided to slowly come back to my experiments by publishing some of my code. It’s been a while now that I thought it would be cool I publish my code for the Lynxmotion AL5x and its SSC32 board.
I was quite surprised to see that there’s no existing library for these devices.
I made some improvements on the code I wrote some time ago, mostly to support more SSC32 features and to have a better, cleaner and commented code.

So there you go guys, the Lynxmotion library is on my Github account : https://github.com/remyzerems/Lynxmotion

As you can see from the Github page, the main key features are driving servos, input/output access, SSC32 enumeration and of course AL5x joints driving.

Hope this will be in a way useful to somebody !

Instructions on how to modify the webcams

So the previous point was about adding stereo vision to my AL5C robot.

Now let’s do it ! I wrote an Instructable for a better reading. That’s a more common way for doing such a thing too !

So, follow this link to access the Instructable : http://www.instructables.com/id/Giving-sight-to-your-Lynxmotion-AL5C/

Finally here my AL5C stereo vision enabled :

Modified Lynxmotion AL5C

Feel free to share and comment (whatever if it’s here or on the Instructable !)

Adding Stereo Imaging to the Lynxmotion AL5C

My target now is to add stereo imaging to my robot arm. So to explain this I’ll make parts corresponding on each step, and make posts out of this, to ease comprehension and navigation on the blog…

But hey !! What’s « stereo imaging » ?? It’s an algorithm to infer 3D data from two images of a same scene, but taken at two different know points. It then gives a 2D image in grayscale like this famous one :

Stereo Imaging - Disparity map

You’ve got on top, the left and right image taken from two close points in a room.
On the bottom, the result : on the left, noisy image from the algorithm, on the right, the filtered result. (Image credits : http://afshin.sepehri.info/projects/ImageProcessing/ColorCalibration/color_calibration.htm)

The white color stands for closer objects, the darker it is, the further it is from the camera.
As you can see, it’s a powerful algorithm that allows to have the 3rd dimension, the depth from a scene. We can imagine using the produced data with object recognition algorithms or to help a robot to move itself in space avoiding obstacles, or maybe also in SLAM agorithms (SLAM stands for Simultaneous Localization and Mapping).

So, I started searching for a good camera. It has to be small enough to fit two of these in the robot arm, provide good quality images, have an auto focus feature…  And also, affordable ! That’s a lot of criteria isn’t it ?

After spending some time searching, I finally found the Logitech C525 which has it all !
I also searched on the web some information on what the cameras had inside to see how the PCB and the sensor looked like to have an idea for integrating the cameras on the robot arm (for example locate mounting holes, sensor position…).
There’s one video which helped me a lot to see what’s in the webcam, it’s here : www.youtube.com/watch?v=mBwH2hqGkck‎ thank you Peter !
Oh and, little detail on the webcam, it has an internal microphone, so later I could play with sounds too, in stereo !

Logitech C525

So, the Logitech C525 is an HD camera featuring 720p image resolution, auto focus and it’s quite affordable costing some 40€ each… For full specifications, see the official Logitech website.

Finally I bought two of these webcams. Next post will show you how to hack the webcams to install them on the robot arm ! Interesting part ha ? See that next time !

Some news and 2014 project updates

It’s been a while I did not publish anything on my blog… I’m sorry for that ! Poor robot arm, it’s now full of dust, waiting for me for ages… 😉
I was quite busy these days, and I was working on another project, more web oriented which has no link with artificial intelligence !

So well, I decided to go back to my AI experimentations. On my last workings I got stuck working with the camera I installed on the Lynxmotion. The webcam I used was seriously old and the image was kind of redish and blurry… It was also really slow and had no auto focus !! So no point to go any further…

I already worked on projects involving the use of cameras and by using some libraries, I discovered how powerful they were. I saw some functions that I would be curious to try, and the one that excited me much was Stereo Imaging ! I already thought about this possibility before, and even tryied to program my own implementation of the algorithm which did not really success even if it was beginning to work in some way, but was quite slow…
There’s some other functions which may be interesting like the POSIT algorithm, SURF and so on… Here are some stuff to investigate !

I’ll make a special post to explain in details the steps I came through on stereo imaging exploration.

Apart from that, I also have some stuff to publish on this blog such as my work on what follows the servo tweaking : getting the position of each joint of the robot. This is a good step in making some experimentations on AI since you then get data to feed the models with.

I also have in mind to buy me a better computer. I’m starting to feel slow with my installation. I dream of a good computer labeled as « gamer computer » with an Intel Core i something and a heavy NVidia graphics card, with 4 to 8Go of RAM… Mmmh I have to think about it and maybe save some money for that since it’s going to be expensive !!

So well, the next post will be about how I chose the webcam and stuff about installing the webcam on the robot arm.

The project / Le projet

My goal for this project is to explore AI techniques through diverse experiments. AI is really fascinating so in my spare time I like to try some stuff on this theme…
To do so, I bought the following items :

I first bought the AL5C robot and made some experiments on Neural Networks, but quickly realised I needed some feedback information from the arm (servos positions, video…).

That’s what the Phidgets stuff are for. I’ll make a specific post on how to modify the servos in order to get the position feedback and wire it to the Phidget card.

For the programming stuff I decided, as I’m familiar with, to use :

  • Microsoft Windows XP (well not the best choice ever but, must be compatible with newer ones)
  • Microsoft Visual Studio 2010
  • Aforge.Net and Accord.Net which are really good C# libraries to work with AI.
  • Virtual Machine to hold the whole configuration in a kind of nutshell (isolated from my everyday computer)

I first started with a nice but a little bit mathematic book : Apprentissage statistique from Deyfus. For French speaking people, there’s a good online pdf to read here : http://www.math.univ-toulouse.fr/~besse/pub/Appren_stat.pdf.

The advantage of this reading is that it allowed me to get the basics of some themes like neural network or data preprocessing.

So here is the current state of my customized robot arm :

The robot armTo better understand what the system is, here is the block diagram of my current configuration :

Block diagram of the robot arm