Ok, so I want to build a system that is able to pick up an object it has seen. I called that project Visuomotor coordination because I’m willing to use the robot webcams to be used to estimate the movement that the robot has to make to reach the object seen. If you read this Wikipedia article, you’ll get what I mean.
So first step is to detect the object. I assume that the system knows what the object to pick up is. I assume it has the required data on this object so that it will be able to find it on the scene. As a vision system, I’ll use the two webcams I have on my robot arm. The idea on using both of these is to benefit from the stereo vision to extract some depth information to determine how far the object is from the robot arm gripper.
Object detection – Algorithms
There’s numerous ways to do object detection… There’s a lot of detection/recognition algorithms down there ! You can give a look at that OpenCV tutorial to have a better idea of what they are and their features.
I made some tests and read some documents and finally chose the SURF algorithm because it shows pretty good detection performance and also because it’s possible to make it faster by running it on the GPU instead of the CPU. For the cons, one should notice that it’s a commercial non free library, except for research purpose and personnal use. So no worries for my application but should keep that in mind though…
Stereo object detection : my idea
My idea was the following : use the SURF algorithm by starting from the tutorial sample demonstrating Feature Matching. This sample shows how to detect an object by matching it with a reference image of that same object. This will provide me with a way to detect the object in the scene using one of the two webcam (say the left one). Then my idea is to use that same algorithm to find the corresponding features in the second webcam image.
This way, I’ll obtain two point clouds (one for each webcam) that are mapped to the object I want to detect. Then by calculating the mean of each point cloud, I’ll get the center of the object seen on each image. So, in the end I have the object position on both images.
Let’s summarize the stereo object detection algorithm I’ll program :
- The inputs :
- Two webcam images (left webcam, right webcam)
- A reference image of the object to be found
- The function block :
- Extract SURF features from the reference image
- Extract SURF features from the left webcam image
- Match the features to find where the object is on the left webcam image
- Extract SURF features from the right webcam image
- Match the features found on step C to find where the object is on the right image
- The outputs :
- Point cloud of the object on the left image
- Point cloud of the object on the right image
After spending some hard time to understand the EmguCV functions and implementing some of mine, I finally came to a stable algorithm that is able to detect an object on both webcams and outputs the point clouds.
The results :
To develop this idea I created a Visual Studio project in which I experimented and put together all the blocks I needed to reach my goal. I ended with an application that shows the two webcam images with little circles and lines that clearly shows what have been found.
Here is a screenshot of the application detecting the object.
What you can see here, is an image formed of the concatenation of the left and right webcam. The object is a small wood cube on which I drew lines to make the object easier to detect. The blue points are the averaged points on each view, calculated from the instantaneous detected points. The red ones are the instantaneous detected features that belong to the reference image as well as the left and right image from the webcams.
The buttons are :
- Take snapshot : to take a snapshot of the object to get the reference image. This image will then be used as a reference image. I also manually create a mask to specify the algorithm what exactly is my object in the scene.
- Reset object position : this button resets the average position to restart from the newly detected points
- Remember object position : this stores the object position as well a the robot arm position at which the object is being seen
- Remember arm target position : it stores the object position (X and Y for each image), the robot arm position at which the object has been detected and the robot arm position when it’s on the object in a MongoDB database. For this step, once the object has been detected, I manually move the arm to be on the object, ready to grip it. In a sense, I demonstrate it what it should do.
- Pause/resume object detection : as its name says, it pauses or resumes the object detection loop
Using this application and process, I obtained a database that is composed of many records like this one :
Now that I have that database, I can start to train a model that will calculate the arm final position thanks to the starting position and detected object position ! Now, this is really going to be interesting as I’ll see what the model is able to do ! Reach the object or not ?
In a nutshell
The idea is to make my robot arm to be able to reach an object it has seen with its webcams. I made an application that extracts features from a reference image and detect these same features in the left and right webcam image. This allows me to get the position of the object in each image. Then I built a database containg the object position on the images, the robot arm position when the object has been detected and the arm position when it’s on the object.
My next step will be to create a model that calculates the arm position that will make the arm on the object using the current robot arm position and the detected object position on the webcam images.
This will be the object of my next post, so stay tuned !