• ### Experimenting with Deep Reinforcement Learning

07/14/2019 at 19:19 1 comment
I’ve made an attempt at implementing collision avoidance using Deep Reinforcement Learning - with partial success.

# Problem and Setup

In order to apply a Reinforcement Learning algorithm, the goal has to be specified in terms of maximizing cumulative reward.

The objective of the robot was to drive for the longest time without hitting any obstacle. The objective was shaped by a reward function - the robot would receive a penalty when hitting an obstacle. Since the robot receives zero reward for just driving around and a negative reward (penalty) for hitting an obstacle, by maximizing cumulative reward a collision avoiding behavior should emerge.

The collisions were detected automatically using an accelerometer. Any detected collision would also trigger a “back down and turn around” recovery action, so that the robot could explore it’s environment largely unattended. It would drive with a small fixed speed (around 40cm/s) and the Reinforcement Learning Agent would be in control of the steering angle.

# Reward Shaping and the Environment

Reinforcement Learning Agents make decisions based on current state. State usually consists of observations of the environment, sometimes extended by historical observations and some internal state of the Agent. For my setup, the only observation available to the Agent was a single (monocular) image from a camera and history of past actions, added so that velocity and “decision consistency” can be encoded in the state. The Agent’s steering control loop ran at 10Hz. There was also a much finer PID control loop for velocity control (maintaining a fixed speed in this case) running independently on the robot.

With the reward specified as in previous section, it was not surprising that the first learned behavior was to drive around in tight circles - as long as there is enough space, the robot would drive around forever. Although correct given the problem statement, this was not the behavior I was looking for. Therefore, I’ve added a small penalty that would keep increasing unless the robot was driving straight. I’ve also added a constraint on how much the steering angle can change between frames to reduce jerk.

This definition of the reward function is sparse - the robot can drive around for a long time before receiving any feedback when hitting an obstacle - especially when the Agent gets better. I improved my results by “backfilling” the penalty to a few frames before the collision to increase the number of negative examples. This is not and entirely correct thing to do, but worked well in my case, since the robot was driving at very low speeds and the Agent communicated with the robot via WiFi, so there was some lag involved anyway.

# Deep Details

For a Deep Reinforcement Learning algorithm I chose Soft Actor-Critic (SAC)(specifically the tf-agents implementation). I picked this algorithm since it promises to be sample-efficient (therefore decreasing data collection time, an important feature when running on a real robot and not a simulation) and there were already some successful applications on simulated cars and real robots.

Following the method described in Learning to Drive Smoothly… and Learning to Drive in a Day, in order to speed up training I encoded the images provided to the Agent using a Variational Auto-Encoder (VAE). Data for the VAE model was collected using a random driving policy, and once pre-trained, the VAE model’s weights were fixed during SAC Agent training.

# The Good, the Bad and the Next Steps

The Agent has successfully learned to navigate some obstacles in my apartment. The steering was smooth and the robot would generally prefer to drive straight for longer periods of time.

I was not able to consistently measure improvement in episode length (a common metric for sparse-reward-function problems) - it largely depended on the point in the apartment when the robot was started.

Unfortunately, the learned policy was not robust - the robot would not avoid previously unseen obstacles and was sensitive to lightning changes throughout the day.

I suspect the encoded representation was not enough for this type of task. I was counting on the Agent to learn to recognize the obstacles from cues such as visible floor shape (sort of horizon detection) but perhaps a VAE trained as a general image encoder-decoder was not powerful enough to expose such features. In the future I’d like to try both to encode some more useful information (such as predicted depth) and train the vision network along with the Agent. Which brings me to the main takeaway…

For experimentation, a simulator is a necessity. Training an Deep Learning Agent in real world is very time consuming and experimentation is tricky. On the other hand there is a lot to experiment with - from state representation to fine tuning of the hyperparameters of the actor and critic networks. And debugging a neural network running a real robot only sounds cool… Therefore, the next step for me will be building a car dynamics simulation.

• ### Localization Part 1.

In this update we will present a position tracking algorithm for a car-like robot. The algorithm will fuse readings from multiple sensors to obtain a “global” and local position. We will show an implementation based on ROS.

We will be using robot’s position estimates for bootstrapping an autonomous driving algorithm. Even though it is possible to develop an end-to-end learning algorithm that does not rely on explicit localization data (e.g. [1]), having a robust and reasonably accurate method of estimating robot’s past and current position is going to be essential for automated data collection and evaluating performance.

## Part 1. “Global” Localization

First and foremost, we would like to know the position of the robot relative to a known landmark, such as the start of a racetrack, center of the room etc. In a full-sized car this information can be obtained from a GPS sensor. Even though we could collect GPS readings from a DeepRC robot, the accuracy of this method would pose a problem: GPS position can be up to a few meters off and accuracy suffers even more indoors. In our case this is unacceptable, since the the robot needs to navigate small obstacles and few meter long racetracks. Other methods (such as SLAM) exist, but they come with their own set of requirements and drawbacks.

Instead we implement a small-scale positioning system using AR tags and an external camera. Both the robot and the “known landmark” (or origin point) will have a AR tag placed on top of them and we will use a AR tag detection algorithm to estimate their relative position. A pointcloud obtained from a RGBD camera will be used to improve the accuracy of pose estimation.

---------- more ----------

We use AprilTag for detecting the AR tags in external camera’s image. We modify an existing ROS package to update the version of AprilTag algorithm and add pixel coordinates of the corners of the detected tag - the source code is available on GitHub.

AR tag detection with AprilTag. Tags are overlaid with their ID and axes representing 2D position.

AprilTag gives us a 3D pose of the detected tag relative to the camera frame. We found the pose to be off by a couple centimeters along the Z-axis (depth) for our initial setup. We decided to use a point cloud aligned to the external camera’s image to refine the pose. We search the point cloud for a voxel closest to the detected tag’s corner and use it’s position as the translation of the tag’s pose. The orientation remains as given by the AprilTag algorithm. Out of a few tried approaches (including computing a centroid of all voxels within tag’s boundaries) this simple method yielded the best results. The code for pose refinement is presented below:

class ArTagPose:
_P = ... # Camera projection matrix.
_point_cloud =  ...
...
# detection is of type apriltags3_ros.msg.AprilTagDetection.
def _fix_detection(self, detection):
edges = np.array([(p.x, p.y) for p in detection.corner])

cloud = _ros_to_array(self._point_cloud)
cloud = cloud[
(cloud[:, 0] != 0) | (cloud[:, 1] != 0) | (cloud[:, 2] != 0)]

# Convert to [4xN] point matrix with added normalized dimension.
X = np.concatenate([cloud[:, :3].T, np.ones((1, cloud.shape[0]))], axis=0)

# Project to 2D points.
proj = np.dot(self._P, X)

# Normalize and drop last dimension.
for i in range(3):
proj[i] /= proj[2]
pts = proj.T[:, :2]

# Project points onto the plane. edges[0] is the bottom left corner of the
# QR tag.
origin_3d = _find_closest(edges[0, :], cloud, pts)

transform = self._publish_tf(translation=origin_3d,
rotation=detection.pose.pose.pose.orientation,
frame='tag_%i_rgbd' % detection.id[0],
time=latest_time)

def _publish_tf(self, translation, rotation, frame, parent, time):
""" Constructs, publishes and returns a TF frame message. """
...

# Helper functions.

def _ros_to_array(ros_cloud):
# Point cloud format from realsense is x,y,z as floats, 4 alignment bytes and
# int32 encoded color. The color is ignored here completely.
# TODO: generate a generic dtype based on pc2._get_struct_fmt()
t = np.dtype([('x', '<f4'), ('y', '<f4'), ('z', '<f4'), ('v', 'V8')])
arr = np.frombuffer(ros_cloud.data, dtype=t).view(dtype=np.float32).reshape(
(-1, 5))
arr = arr[:, :3]
return arr

def _find_closest(point_2d, cloud, cloud_proj):
ix = np.argmin(np.linalg.norm(cloud_proj - point_2d, axis=1))
return np.squeeze(cloud[ix, :3])


Once we have a refined position of robot’s tag and the origin point’s tags, we can compute a

where Txy is a 3D transformation matrix from coordinate frame x to y.

Visualization of tag detection aligned with point cloud. Three frames are visible: the camera’s, the origin’s and the robot’s tag’s. Below is the camera frame for the same scene.

We publish the final transformation matrix as a nav_msgs.msg.Odometry message. Detection and pose refinement is running at around 30 fps and after tweaking with camera’s exposure parameters can track the robot moving at 60cm/s.

Localization: part 2 will describe obtaining relative pose from an IMU sensor and a motor encoder and fusing the all measurements into one pose estimate.

1. Kendall, Alex, et al. "Learning to Drive in a Day." arXiv preprint arXiv:1807.00412 (2018).