Visuomotor robot arm coordination : Step 2 – Build the model

At the previous step I collected data using my robot arm and the webcams I installed on it. The data contains the detected object coordinates on the left and right image, the robot arm joint angles where it detected the object and the same variables but when the arm is placed manually on the object.

So, for each detected object (i.e. database record), I recorded these values we have this :

  • Position of object on the left image (xL, yL)
  • Position of object on the right image (xR, yR)
  • Arm joint angles where the object has been seen : (alphaI, betaI, thetaI, gammaI)
  • Arm joint angles at the object location : (alphaT, betaT, thetaT, gammaT)

Now the goal is to build a model that will predict the arm joint angles that will place the arm at the location of the object. To make this prediction, the model inputs will be the object position on each image and the current arm joint angles.

I started by trying to analyze the data I gathered to determine its properties. I quickly found that the data won’t be linear, so that I’ll need a non linear model to map the data. I’m used to neural networks, they are powerful non linear models, so I decided to implement a framework that will help me make the model learn the data.

The learning process

To ease the learning process using the DeMIMOI models, I decided to proceed in 4 steps :

  1. Prepare and preprocess the data to be learnt
    The idea is to build a specific dataset out of the raw data. For example if I have to make a mathematical operation on the raw data, say a subtraction or even a sine.
    The resulting data is stored in a temporary storage (the PC RAM in my case).
  2. Teach the model
    The learning algorithm uses the temporary storage to feed the model with the data to learn.
  3. Plot the results of the obtained model
    At this step, you want to see what the model learnt, so you just plot the output of the model and the desired output together to see the error.
  4. Save the model
    Once the model fulfill the needs, it can be saved for a later use.

Learning process implementation

For the first three steps, I created three DeMIMOI_Collection that each holds the required elements of each step.
Now I’m going to show you each of them and describe what’s going on.

  1. Data preparation and preprocessing
    Visuo Motor Coordination - Data PreprocessingThis step requires raw data. The data comes from the Long Term Memory block which hides a MongoDB database. The raw data flowing out of this block is then directly fed to the STM (Short Term Memory) block. On this particular case, I could have sent the raw data directly to the model,but I needed to make some calculations before, which where then removed…
  2. Teaching the model
    Visuo Motor Coordination - Learning StepThis step is the learning step. The data created at the previous step is fed to the Neural Network learning algorithm. Before sending the data to this block, we must make some data scaling since neural networks need [0, 1] or [-1, 1] normalized data depending on the activation function. This is why I put two normalization blocks, one on the inputs and the other one on the outputs.
    On this picture, we clearly see what are the inputs and the desired outputs we want the model to learn.
  3. Plotting the results
    Visuo Motor Coordination - Data PlottingOnce the model is built and ready to operate, I want to see the results of the learning step. I reuse the STM data as input, the data normalizer to ensure data normalization, then the neural network gets normalized data, that is in turn normalized back to output units. Finally this data is plotted by the DeMIMOI_Chart0 that is mapped to the Windows Forms Chart control I placed on the form.
    This results to this interface :
    Visuo Motor Coordination - Application Learning StepWe can see that for each graph, the neural network outputs are really close to the desired outputs, which means that it converged to a « good » solution.
    The left textbox contains Graphviz code to draw the system structure I showed all along this article. The right one contains the Graphviz neural network structure.

Now, it seems that we managed to get a model that is able to predict the end effector position to place it on the object that has been seen. So now, that’s the exciting part : let’s build the application that will demonstrate this ability in real time ! It will be the subject of my next post !

Publicités

Visuomotor robot arm coordination : Step 1 – Stereovision object recognition

Ok, so I want to build a system that is able to pick up an object it has seen. I called that project Visuomotor coordination because I’m willing to use the robot webcams to be used to estimate the movement that the robot has to make to reach the object seen. If you read this Wikipedia article, you’ll get what I mean.

So first step is to detect the object. I assume that the system knows what the object to pick up is. I assume it has the required data on this object so that it will be able to find it on the scene. As a vision system, I’ll use the two webcams I have on my robot arm. The idea on using both of these is to benefit from the stereo vision to extract some depth information to determine how far the object is from the robot arm gripper.

Object detection – Algorithms

There’s numerous ways to do object detection… There’s a lot of detection/recognition algorithms down there ! You can give a look at that OpenCV tutorial to have a better idea of what they are and their features.

I made some tests and read some documents and finally chose the SURF algorithm because it shows pretty good detection performance and also because it’s possible to make it faster by running it on the GPU instead of the CPU. For the cons, one should notice that it’s a commercial non free library, except for research purpose and personnal use. So no worries for my application but should keep that in mind though…

Stereo object detection : my idea

My idea was the following : use the SURF algorithm by starting from the tutorial sample demonstrating Feature Matching. This sample shows how to detect an object by matching it with a reference image of that same object. This will provide me with a way to detect the object in the scene using one of the two webcam (say the left one). Then my idea is to use that same algorithm to find the corresponding features in the second webcam image.
This way, I’ll obtain two point clouds (one for each webcam) that are mapped to the object I want to detect. Then by calculating the mean of each point cloud, I’ll get the center of the object seen on each image. So, in the end I have the object position on both images.

Let’s summarize the stereo object detection algorithm I’ll program :

  1. The inputs :
    1. Two webcam images (left webcam, right webcam)
    2. A reference image of the object to be found
  2. The function block :
    1. Extract SURF features from the reference image
    2. Extract SURF features from the left webcam image
    3. Match the features to find where the object is on the left webcam image
    4. Extract SURF features from the right webcam image
    5. Match the features found on step C to find where the object is on the right image
  3. The outputs :
    1. Point cloud of the object on the left image
    2. Point cloud of the object on the right image

After spending some hard time to understand the EmguCV functions and implementing some of mine, I finally came to a stable algorithm that is able to detect an object on both webcams and outputs the point clouds.

The results :

To develop this idea I created a Visual Studio project in which I experimented and put together all the blocks I needed to reach my goal. I ended with an application that shows the two webcam images with little circles and lines that clearly shows what have been found.

Here is a screenshot of the application detecting the object.

Stereo vision feature extraction application. Red points are instantaneous detected points, the blue ones are the average points found from the instantaneous points.

Stereo vision feature extraction application. Red points are instantaneous detected points, the blue ones are the average points found from the instantaneous points.

What you can see here, is an image formed of the concatenation of the left and right webcam. The object is a small wood cube on which I drew lines to make the object easier to detect. The blue points are the averaged points on each view, calculated from the instantaneous detected points. The red ones are the instantaneous detected features that belong to the reference image as well as the left and right image from the webcams.

The buttons are :

  • Take snapshot : to take a snapshot of the object to get the reference image. This image will then be used as a reference image. I also manually create a mask to specify the algorithm what exactly is my object in the scene.
  • Reset object position : this button resets the average position to restart from the newly detected points
  • Remember object position : this stores the object position as well a the robot arm position at which the object is being seen
  • Remember arm target position : it stores the object position (X and Y for each image), the robot arm position at which the object has been detected and the robot arm position when it’s on the object in a MongoDB database. For this step, once the object has been detected, I manually move the arm to be on the object, ready to grip it. In a sense, I demonstrate it what it should do.
  • Pause/resume object detection : as its name says, it pauses or resumes the object detection loop

Using this application and process, I obtained a database that is composed of many records like this one :

This is a view of one record created by the application

This is a view of one record created by the application

Now that I have that database, I can start to train a model that will calculate the arm final position thanks to the starting position and detected object position ! Now, this is really going to be interesting as I’ll see what the model is able to do ! Reach the object or not ?

In a nutshell

The idea is to make my robot arm to be able to reach an object it has seen with its webcams. I made an application that extracts features from a reference image and detect these same features in the left and right webcam image. This allows me to get the position of the object in each image. Then I built a database containg the object position on the images, the robot arm position when the object has been detected and the arm position when it’s on the object.

My next step will be to create a model that calculates the arm position that will make the arm on the object using the current robot arm position and the detected object position on the webcam images.

This will be the object of my next post, so stay tuned !

Exploration status, things done and to come

Time has passed since my last post… I explored many subjects that are really fascinating to me !

Prolog, Golog and FSA planner project

For example, I integrated the swiProlog engine in DeMIMOI models so that I now can run some Prolog scripts inside my models. This could be interesting to implement logic based reasoning.
And on top of this, I discovered the Golog framework from which the FSAPlanner is derived. This upper level Prolog extension allows for an even crazier reasoning. This is called « planning with loops » in the litterature.
I tried using the planner on my robot arm but with very limited success… It’s not easy to get familiar with the language features and syntax, and even worse for me who have not that much experience with Prolog.

Steering Behaviors

I also made a DeMIMOI model that integrates Steering Behaviors from Craig Reynolds. This is a very interesting concept that allows an entity to move through its environment based on strategies such as seeking, fleeing, wandering, arriving, etc. A set of parameters such as speed, weight, steer force allow to customize the obtained behavior.
While I was programming my own implementation, I was vastly inspired by this source code from Rahul Sindhu. Though I separated the graphics stuff from the math underneath the steering behaviors and also changed the math from 2D vectors to any vector dimension. The latter is a key point because I may need to get a behavior in 3D space or even in 5D for example on my robot arm, with one dimension for each arm joint.

Control Theory models

Then I went back to DeMIMOI models, and focused on implementing some math and signal processing functions such as filters, PID controlers, summers/subractors blocks and so on.
I managed to get a model that is able to drive the AL5C arm by providing commands in sensor units. This allows to drive the arm without having to deal with servomotor driving commands and its correspondance in the feedback (i.e. potentiometer) data range, so no calibration needed.
In parallel I also went back on the DeMIMOI memory model I talked about on a previous post to add what we would call « short term memory ». It simply stores data in the PC RAM, i.e. for a short period of time.
Then I used this model and the math models to build a sample application that makes the robotic arm move the same way I show it manually. I mean, I manually move the arm, and while I’m doing this, the system stores each servomotor position thanks to the potentiometer feedback. Once I finished the desired movement, I ask the robot arm to play the recorded movement.
The model I built for this purpose is as follows :

This diadram shows the DeMIMOI models structure to manage a record/play movement. It shows a structure based on PID's, the same structure is used for each arm joint.

This diagram shows the DeMIMOI models structure to manage a record/play movement. It shows a structure based on PID’s, the same structure is used for each arm joint.

As usual, this picture is extracted from the model itself which is able to produce the Graphviz code of its own architecture.
The potentiometers positions are stored in the DeMIMOI_Memory while I move the arm. Then, the memory pointer is restarted so that it points to the first record. The stored positions are then fed to the control model based on PID controlers which in turn feed the robotic arm servomotors.
The reason for cascading two PIDs is to implement some kind of small stepping process to reach the desired position. The first PIDs (PID1, 3, 5, 7 and 9) are pure proportionnal. The other PIDs are pure integral. The diagram shows the structure configured for playing. For recording, I only update the first two models on the left (AL5C arm sensors and DeMIMOI_Memory0).

The obtained behavior is pretty good and the movements precision is quite impressive ! Well there’s some offsets that I think come from this : the potentiometer data precision is poorer than the servomotor commands. It introduces offsets and errors because of this difference I think.

Current project

So, now I have a robotic arm that is able to move quite precisely using the servomotors feedback units. This way, if I manage to build a behavior model that is able to move the arm to pick up an object, I’ll have a building block this will allow me to reach new possibilities !
My next goal is to build such a model, a model that is able to calculate the position of an object thanks to the webcams information. This part will be the subject of my next post, as I prefer making a specific post for that (pretty big) subject.

Neural network analysis and selection

Last time I spoke about exploring some algorithms that could be able to analyze a set of neural networks with different structures and then select the best one.
These kind of algorithms are very interesting in the fact that they make the best choice of neural network in term of architecture (i.e. the number of hidden neurons or of hidden layers…). It fits the neural network structure to the problem they have to solve, keeping the number of parameters (i.e. the complexity) of the problem as low as possible.

I found these algorithms very interesting to me because they could allow me to automate the process of trying manually different values of neural networks parameters such as the number of hidden neurons. It also extract some figures/scores that are comparable even if the networks have different structures. These scores would allow me to numerically compare many neural networks and see clearly which one performs best to finally use this one and throw away the others.

I then decided to give this big part a try. This is a big challenge because it includes lots of calculations that are not straight forward to me.
Fortunately, I bought a very complete book that’s called Apprentissage Statistique – G. Dreyfus. This book even if I don’t find well structured gives a lot of good tips and tricks and describes in quite good details each steps one should follow to make these algorithms come true.

So I followed the guidelines from that book and went through these steps :

  • Understand enough of the whole algorithms to determine the class architecture I should use. This step is something I always do to start with as good conditions as I can and not be trapped later on because I didn’t think about a particular case or didn’t understand how some stuff was actually working. That’s why I think it is really important to put things into perspective before rushing headlong (because you can’t see the wall in front of you once you run) !
  • Code some calculation routines such as calculating the number of parameter of a given model, calculating its Jacobian matrix, the mean squared errors, the residuals, the leverages, the determination coefficients, the Predicted Residual Sum of Squares (PRESS) score, etc.
  • Put all these figures together to make a decision on which model is the best among a batch.

While going through all these steps, I hit a complex problem I already faced some time ago : how can I know when I should stop my model learning loop ?
This is a recurrent problem when dealing with neural network. The learning phase is an iterative weight modification of very small amounts until the model finally converges or reaches an acceptable threshold of error for example 0.01%.
The problem in neural network is that if you have you model learn for too few iterations it does not perform well, but in the opposite, if you have it learn for too long, it becomes overfitted. This means it has become too specific to the problem you want to solve and thus, it does not perform well neither. So you have to find a middle point that is hard to find because it depends on many parameters such as the number of parameters of the model, the structure of the model (number of neurons, number of hidden layers and so on).
Until that day, I always used the basic workaround which consists of setting an empiric number of iterations which seems to give good results. But now I couldn’t use that one since the selection algorithm will have to perform on multiple different model structures so that I couldn’t consider using the basic approach…

I searched on the web to find how I can deal with that recurrent problem. I finally found that marvelous algorithm from Lutz Prechelt. « Early Stopping – but when? » In Genevieve B. Orr and Klaus-Robert Müller: Neural Networks: Tricks of the Trade, volume 1524 of LNCS, Springer, 1997.

This paper looked so incredible to me because I found it quite easy to code and to integrate in my current workflow. Once again, I tried to put it into perspective and decided how I should structure it in term of code.

After some tests and some hours of intense reflexion I finally came with a working early stop algorithm that I could integrate in the model analyzer algorithm and have it decide for me when it should stop learning.

In the end I have a working analyze and selction algorithm that is able to benchmark a given set of models and extract the best one to be used further on ! This was really exciting since I thought it would be quite out of my reach because it meant going through a lot of steps with each their own issues and difficulties…

I made a quick snapshot of the final results it gives me.

Model analyze and selection - resultsOn the upper left corner there’s the GraphViz code of the architecture of the whole system (see my previous post). On the bottom, there’s graphs that shows the final results :

  • The blue plot on the bottom represents the input data of the models
  • The blue plot on the top represents the desired output data (the data I measured directly on the system)
  • The orange plot is the model output of a linear model (ARX) that’s been taught at the same time with the neural networks. This is to compare this linear model with non linear models (i.e. neural networks)
  • The red one is the neural network that has been selected by the selection algorithm. So this one is the one that performs best
  • Finally, the orange bar plot represents the final scores of all the benchmarked models. On the X-Axis the number represents the number of neurons in the hidden layer.
  • The graph with Frequency written on it was for test purposes so there’s nothing on it…

Ok, this is it ! I’ll have to test this deeper to make sure everything works correctly but the results are pretty encouraging !

Some news on my DeMIMOI library

First of all, I wish everyone coming to this blog a happy new year ! Should this year be healthy and make your projects come true !

It’s been a while now since I published the code of my DeMIMOI core library. I had some time to make some improvements and add some more features to make it even more attractive to me, so possibly for anyone willing to give it a try too.

I thought well when I hoped this platform would help me easily build systems and make a bigger one by connecting smaller ones. I successfully managed to use the library with neural networks (special thanks to Accord.Net and AForge.Net). For example I made DeMIMOI models of a NARX (Nonlinear AutoRegressive eXogenous) neural network and a ARX regression model (AutoRegressive eXogenous).
These models can mimic almost any dynamic system mostly because they have a backward link that make their outputs at time t-1 one of their inputs.
Using the DeMIMOI library this backward link is easy to model and to code since it’s just a connection between an input and an output that, in code, is translated by a myModel.Outputs[0][0].ConnectTo(myModel.Inputs[0][0]) for example. The data flow is then automatically managed by the DeMIMOI core.

I also started to put a database manager in a DeMIMOI model which can currently connect to a Mongo database. So I can save and retrieve data to build models or to act as some kind of a memory… Well it’s still in development now but the main features are already up ! I mean at least reading data from a pre-existing database. One step at a time right ?

To give you a better idea of what I’m talking about and what I’m doing, I’ll show you a picture of the system I’m currently working on.

First of all, let me just explain the background.

While coding the DeMIMOI ARX and NARX models, I wanted to build models that can learn the behavior of the servomotors of my AL5C robotic arm.
On a previous attempt last year, I had the arm moving freely and randomly while an application was recording at each time step the angle values from the potentiometers of each servo (thank you so much Phidgets !).

The results have been stored in a database that I can read using my DeMIMOI memory manager. Those data can then be sent to the ARX and NARX models, and also be used by the learner models to fit them to the servo behavior.

For this purpose, I coded the system which is described by the following diagram :

DeMIMOI NARX and ARX learning Lynxmotion AL5C servoBy the way, this image has been created using the GraphViz code that is automatically generated by the DeMIMOI collection that holds all the blocks of the system.
That’s a feature I’m quite proud of since it allows me to quickly check if what I coded is what I expected the system to be !

On the left of the diagram is the memory manager that reads the servo values from a Mongo database. The data produced is then fed to the ARX model and its teacher. They don’t need extra data processing, as opposed to the NARX model.

Indeed, the NARX model is made of a neural network that needs normalized data. That’s why you can see that the data coming from the memory block is also fed to a normalizer block that converts the raw data to a [-1, 1] range before being sent to the NARX and its teacher. Then the data coming out of the NARX model is denormalized to revert it back to its original data space.

On the ARX and NARX models you can clearly see the backward link that connects the output to the input. This link makes the network seamlessly recurrent. And again, in term of coding, it’s nothing harder than creating the link !

You also may have noticed the three probes that display some data values. This was for me to quickly check what are the model ouputs (simulated) compared to the real output (measured on the real system). On that run, the ARX is better than the NARX, but I currently didn’t push any analyze further to explain this…

My next work will now be focused on analyzing the results deeper, maybe working on the so called BPTT neural network (ouch ! it hurts !) or maybe even trying to make some kind of automated learning shell that would be able to test multiple models and parameters and then select the best one…
I know ! It seams like there’s going to be a huuuuge mountain to climb… I fight this feeling by telling me that I already climb quite a big part that it would be even worse to stop now !

I’ll let you know of my progress in a while. And do not hesitate to say hi or comment on this ! I’d be curious to know what it feels like for someone external !

Lynxmotion SSC32/AL5x Robotic Arm Library

It’s been a long time I did not post anything on my blog… I’ve been quite busy during the past few months !

I did not do any major development so far and I decided to slowly come back to my experiments by publishing some of my code. It’s been a while now that I thought it would be cool I publish my code for the Lynxmotion AL5x and its SSC32 board.
I was quite surprised to see that there’s no existing library for these devices.
I made some improvements on the code I wrote some time ago, mostly to support more SSC32 features and to have a better, cleaner and commented code.

So there you go guys, the Lynxmotion library is on my Github account : https://github.com/remyzerems/Lynxmotion

As you can see from the Github page, the main key features are driving servos, input/output access, SSC32 enumeration and of course AL5x joints driving.

Hope this will be in a way useful to somebody !

Stereo vision color calibration

    Last time, I noticed that my webcams didn’t have the same color response. One is white clear, the other is red tinted…
This difference in color is not desirable when trying to apply stereo vision algorithms such as disparity mapping. It requires two images with the same aspect to allow matching algorithm to work best.

    I read an article from Afshin Sepehri called Color Calibration of Stereo Camera that describes in details a technique to deal with this problem.

About the article

    The article details a technique that allows correcting the stereo image couple by applying a mathematical function on each pixel to change their color to a calibrated one, which is the true color.

    He uses random test patterns composed of colored squares which are then printed to be viewed by the webcams to calibrate.

Afshin’s example test pattern

    Then, he has the true color, left viewed color and right viewed color. He applies a minimization algorithm to find the mathematical function to apply on the left and right image pixels to calibrate each image to obtain true and identical colors.

For the minimization process he has to approaches :

    1. Assuming that each true color component only depends on the same component : Rc = f(Rf) , Gc = f(Gf) , Bc = f(Bf)
    2. Assuming that each true color component depend on all the other components : Rc = f 1(Rf, Gf, Bf)   ,   Gc = f2 (Rf, Gf, Bf)   ,   Bc = f3 (Rf, Gf, Bf)

    After testing both, he concludes that the best one is the second one. He shows his results and it seems to be quite interesting !

    I was quite furstrated by the fact that the article does not show the outcome of this technique on the disparity map output to see if it’s really relevant or not… That’s why I decided to give this a try and implement my own color calibration algorithm based on Afshin’s article.

My implementation idea

    My approach is mainly the same than Afshin’s except that I decided to build a relative correcting algorithm whereas Afshin’s algorithm is an absolut one.
He tries to obtain the true colors on his final images which has some issues :

    • After printing the test patterns, there may be some color errors on the colors due to printer color calibration itself
    • Scene lighting and light reflections may change the webcam perception of the color and introduce errors
    • There’s two models, one for each image, so it may require high computing power to run it online in real time

My idea is based on these points :

    • One of the two webcam image is considered the reference, the calibration algorithm will have to correct the other image to be as close as possible from the reference
    • The algorithm does not give the true colors but we have both image colors calibrated

Implementation steps

On the programming part, I had the following points to code :

    • Random test chessboard pattern generator : generate one or more random test patterns as image files to be printed
    • Test pattern finder : algorithm to find and extract the chessboard from an image
    • Color pattern extraction : locate the colored squares and make an average of the pixels it contains. It gives a list of the colors of each square.
    • Minimization process : algorithm to build the model by finding a function to transform the old (uncalibrated) pixel colors to new calibrated pixel colors. It gives a transformation matrix that then can be saved to a file and loaded on another program, same as we do with extrinsics and intrinsics (see previous article)
    • Image transformation algorithm : point an image on the input and take the calibrated image on the output

Final result

Once all these point programmed, it gives a quite interesting result ! Here is a quick preview :

StereoColorCalibrationScreenshot

    On the top left, you can see the uncalibrated raw images from the webcams, on the top right, it’s the chessboards extracted from the stereo images.
On the bottom left, you can see the calibrated images output.

    On this sample application, I calibrated the right image using the left image as a reference. So, you can see that the right image before calibration is a little reddish and once it has been processed, it’s clearer and corresponds really well to the one on the left !
Mission cleared !

    Oh, I must mention though, that it runs quite slowly on my computer configuration as all this is running on a virtual machine. The VM runs on my 2007 Dell Inspiron 1520 laptop which begins to lack some processing capacity…
I’m currently considering buying a desktop computer with some « hardcore gamer » features… The problem is that it’s quite expensive… Hard choice !