3D Object Detection for TIAGo Robot: using a Faster R-CNN Network

Introduction and Motivation

In this project, we designed a strategy for solving the 3D object detection problem for a robot named TIAGo [1], which is working in a retail store. The whole working scenario for TIAGo is to recognize, pick up and deliver the products to the customers, so this project serves as its perception module. It will be able to recognize the target products appearing in the image view, calculate its 3D pose and deliver this information to other modules. In our case, the retail store has 5 products that need to be recognized: ‘biscuits’, ‘cocacola’, ‘pringles’, ‘pringles2’, ‘sprite’, with encoded ids from 0 to 4.

Figure 1: Products in the retail store. The products are labeled from 0 to 4 from left to right.

Design: 2D object detection

The model

We first started looking around for an object detection algorithm with code. We came across several good choices like YOLOv4, SqueezeDet and Faster R-CNN. At first, we chose to work with YOLOv4, but due to it being too computationally expensive and not reproducible in the given timeframe, when implementing the algorithm in a simulation. So, we chose Faster R-CNN, this algorithm was easy to train and fast to work with the real-time detection in the simulation.

Figure 2: Faster R-CNN, the unified object detection network, Source:[7]
Figure 3: Anchor boxes at (320,320), Source:[8]

Prepare the data

For collecting images, we could run the TIAGo in ROS and subscribe to the images published by the camera in a lower frequency, and filter out the wanted images which are representative of the five objects. We picked 121 images and started making the custom dataset. We used labelme [2], which is a graphical image annotation tool. By manually drawing polygons on the image and labeling them with different category names, a JSON file is generated by labelme for each image, in which the coordinates of those polygons and the label information are stored. In figure 4, is shown how we labeled the data using the labelme tool.

Figure 4: Annotation process of the items

Training the model

With the data prepared and ready to go, we started training the Faster R-CNN. The training of the network took us approximately 15 minutes, using an Nvidia GTX1650. This model is then saved and loaded later on in the ROS detection module. The trained network is saved as a .pth file and can be found on Github.


With the model trained, we went ahead and validated the model on the test set. Where we got the following results:

Figure 5: cls and bbox loss plot
Figure 6: bbox_mAP plotted per epoch

Deploy the network in ROS

Now it’s time to bring the magic to ROS. With the Faster R-CNN model trained and able to detect all the products, we implemented this in a general ROS package, so we can run this in a real-time Gazebo simulation. The ROS package is designed for running a 2D object detector’s inference using mmdetection in ROS, which can be found at [4]. There the installation and usage are explained.

def generate_obj(self, result, id, msg):
obj = Detection2D()
obj.header = msg.header
obj.source_img = msg
result = result[0]
obj.bbox.center.x = (result[0] + result[2]) / 2
obj.bbox.center.y = (result[1] + result[3]) / 2
obj.bbox.size_x = result[2] — result[0]
obj.bbox.size_y = result[3] — result[1]
obj_hypothesis = ObjectHypothesisWithPose()
obj_hypothesis.id = str(id)
obj_hypothesis.score = result[4]
return obj
im = np.frombuffer(msg.data, dtype = np.uint8).reshape(msg.height, msg.width, -1)
Figure 7: Faster R-CNN deployed in ROS

Design: 3D object detection

Our strategy to expand the current ability to 3D object detection is quite simple. TIAGo can generate raw point clouds using an RGB-D camera, and we know that the image pixels can be seen as the 2D projection of the actual 3D environment points on the image plane. So the 2D bounding box gained from the last section can generate a 3D frustum that restricts the whole raw point clouds into a certain area of our interest. We simply assume that the points located in the area form the target 3D object. In this way, we could filter out the small cluster of object point clouds from all raw points.

Figure 8: F-PointNet strategy, Source: [9]

Filter the object point cloud out

To achieve this, we could hack the ROS pointcloud2 message by indexing the wanted points from that data stored in it and package them into our newly defined object point cloud message. Here shows a small snippet from the codes:

sensor_msgs::PointCloud2 pcs_filtered;
int POINT_STEP = latestPointClouds.point_step; // NOTE: 32, actually
size_t tc_u_min = int(tc_u — tc_size_u / 2) * POINT_STEP;
size_t tc_v_min = int(tc_v — tc_size_v / 2);
size_t tc_u_max = tc_u_min + tc_size_u * POINT_STEP;
size_t tc_v_max = tc_v_min + tc_size_v;
pcs_filtered.header.frame_id = latestPointClouds.header.frame_id;
pcs_filtered.header.stamp = ros::Time::now();
pcs_filtered.height = tc_size_v;
pcs_filtered.width = tc_size_u;
pcs_filtered.fields = latestPointClouds.fields;
pcs_filtered.is_bigendian = latestPointClouds.is_bigendian;
pcs_filtered.point_step = POINT_STEP; //32
pcs_filtered.row_step = POINT_STEP * tc_size_u;
pcs_filtered.is_dense = latestPointClouds.is_dense;
int raw_row_step = latestPointClouds.row_step;
decltype(latestPointClouds.data) filtered_data = {};
for(size_t row = tc_v_min; row < tc_v_max; row++)
for(size_t col = tc_u_min; col < tc_u_max; col++)
auto dat = latestPointClouds.data[row * raw_row_step + col];
pcs_filtered.data = filtered_data;

Calculate the 3D pose

Since the five objects are all shaped as cylinders, we could simply make use of the cylinder segmentation approach in the PCL library here [5]. Furthermore, PAL Robotics already provides a cylinder detector node using PCL [6]. With this benefit, we just need to define its communication with our codes. The codes for segmentation are:

pcl::SACSegmentationFromNormals<PointT, pcl::Normal> seg;
pcl::PointIndices::Ptr inliers(new pcl::PointIndices);
seg.setNormalDistanceWeight (0.1);
seg.setRadiusLimits(minRadius, maxRadius);
seg.segment(*inliers, *cylinderCoefficients);

Communication overview

To integrate all parts together into the ROS system and make them work in the simulation, in this section we show the overall communication between these ROS nodes to give a clear data flowchart about how this strategy works. See the figure below.

Figure 9: Ros Nodes


Animation: Object Detection in ROS
Animation: Point Cloud of the sprite can


When looking back at the project we are very happy with the result. The object detection using Faster R-CNN was successful and performed very well. The implementation in ROS also went well. And with the use of the Faster R-CNN network, it was possible to detect the shopping items in real-time and calculate the 3D pose. So, we were able to implement everything we wanted to implement. Yet there are some improvements we would want to make, like adding more items or making an even less computationally expensive network. It would also be interesting to reproduce the Faster R-CNN from scratch and implement it again in ROS. And compare the results with each other and see where we could further improve the model. But due to time constraints, we used the Faster R-CNN network and implemented this in ROS. In the end, we are pretty happy with the results and implementation of the object detection for TIAGo.


All the code for this project can be found using the following links: The Faster r-CNN code used can be found at: https://github.com/open-mmlab/mmdetection The ROS package with our implementation of the Faster r-CNN can be found at: https://github.com/jianfengc11/TIAGo_find_object



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Aakaash Radhoe

Aakaash Radhoe

1 Follower

MSc Student Robotics at the Technical University of Delft.