https://arxiv.org/pdf/1902.08536.pdf
复现论文 An LSTM Network for Real-Time Odometry Estimation 截止时间:11月17日晚12点整
实现下图网络 可直接借鉴 https://github.com/ChiWeiHsiao/DeepVO-pytorch 网络很相似
相比DeepVO,主要写生成2*3601*1向量的预处理;两种配置(回归+带分类的回归)
用KITTI VO+ 链接中2D训练 https://strands.readthedocs.io/en/latest/datasets/mht_rgbd.html
论文翻译如下 摘要—由于二维激光扫描仪的准确性,重量 轻且成本低廉,因此对自动驾驶行业具有吸 引力。但是,由于每次扫描仅检测到周围环 境的2D切片 (slice),因此执行重要任务(例 如车辆定位)是一项挑战。
在本文中,我们提出了一个新颖的框架,该 框架探索了仅使用2D激光扫描仪,将深度递 归卷积神经网络(RCNN)用于里程计估计的 用途。 RCNN的应用不仅用卷积神经网络(
)实现了,从激光扫描仪数据中,提取 ,而且还使用长短期记忆( )递归
神经网络对相邻扫描帧之间的 进行 建模。
真实道路数据集上的结果表明,该方法无需 使用GPU加速即可实时运行,并且与其他方 法相比具有竞争优势,这是一种可以补充传 统本地化系统的有趣方法。
我们提出了一种方法,用完整的神经网络来 定位车辆,该方法的输入仅用了2D激光扫描 仪获取的数据输入序列。
Abstract—The use of 2D laser scanners is attractive for the autonomous driving industry because of its accuracy, light-weight and low-cost. However, since only a 2D slice of the surrounding environment is detected at each scan, it is a challenge to execute important tasks such as the localization of the vehicle.
In this paper we present a novel framework that explores the use of deep Recurrent Convolutional Neural Networks (RCNN) for odometry estimation using only 2D laser scanners. The application of RCNNs provides the tools to not only extract the features of the laser scanner data using Convolutional Neural Networks (CNNs), but in addition it
Results on a real road dataset show that the method can run in real-time without using GPU acceleration and have competitive performance compared to other methods, being an interesting approach that could complement traditional localization systems.
We propose a complete neural network method to localize a vehicle using only as input sequences of data acquired from 2D laser scanners.
CNN
特征
LSTM
可能连接
models the possible connections
among consecutive scans using the Long
Short-Term Memory (LSTM) Recurrent Neural
Network.
图1:所提出系统的概述。 完整的循环卷积神经网络(RCNN)将2D 激光扫描仪测量序列作为输入,通过一系 列CNN来学习其特征,RNN会使用这些 CNN来估计车辆的姿态。 输出是由两个值 组成的车辆的2D姿态,一个用于平移,另 一个用于旋转。
数据
我们使用KITTI [4]里程表数据集,并通过
。 尽管对于传感器来说这不是现实 的位置,但我们仍然可以将其用作使用这 种方法估算定位的概念证明。
拟议的网络采用循环卷积神经网络的格式 ,如图1所示。
主要思想是使用一系列CNN来提取两个连 续激光扫描仪之间的特征。
这些功能依次是循环神经网络(RNN)的 输入,更具体地说是长短期记忆(LSTM) RNN的输入,它学习了如何估计车辆的姿 态。
在本文的其余部分安排如下。
首先,我们在第二节介绍相关工作; 第三节介绍了提出的方法和网络设计。
实验结果列于第四节。 最后在第五节给出结论和观点。
Fig. 1: Overview of the proposed system. The complete Recurrent Convolutional Neural Network takes a sequence of 2D laser scanner measurements as input, learns its features by a sequence of CNNs, which are used by the RNN to estimate the poses of vehicle. The output is a 2D pose of the vehicle composed by two values, one for translation and another one for rotation.
We use the KITTI [4] odometry dataset and we
. Although it is not a realistic position for the sensor, we can still
use it as a proof of concept for estimating localization using this type of method.
The proposed network is in the format of a Recurrent Convolutional Neural Network, as shown in Figure 1.
The main idea is to use a in order to
s.
Sequentially, these features are the input of a Recurrent Neural Network (RNN), more specifically a Long Short-Term Memory (LSTM) RNN, which learns how to estimate the pose of the vehicle.
The remainder of the paper is organized as follows.
First, we present the related work in Section II; the proposed method and the design of the network is presented in Section III; experimental results are presented in Section IV; finally conclusion and perspectives are given in Section V .
simulate a 2D laser scanner by extracting
one of the Velodyne layers
提取其中一个V elodyne层,来模拟2D激光
扫描仪
sequence of CNNs
extract the features between two
sequential laser scanner
在我们的工作中,我们选择仅使用2D激光 扫描仪作为传感器,这样可以大大降低未 来智能汽车的价格。此时,李等人。 [22] 是唯一使用这种传感器通过深度学习方法 估算里程表的作者。他们提出了一个使用 CNN执行配准和闭环检测的网络。尽管闭 环网络显示出良好的准确性,但与经典方 法相比,扫描匹配结果仍然非常不准确。
基于[22]中提出的想法,我们提出了一种使 用深度学习网络的基于2D激光测距法估算 的新解决方案。我们探索将RNN与CNN一 起使用,从而学习中间特征(temporal features),以改善里程计结果。我们还为 CNN提出了一种新配置,使我们能够获得 更好的结果。最后,我们在室外环境中探 索该解决方案,并使用KITTI [4]数据集对 其进行训练和测试,该数据集包含不同场 景类型的序列。
所提出的方法在于通过估计一系列2D激光 扫描仪采集之间的转换来找到车辆的位
In our work we chose to use as the sensor only a 2D laser scanner, which could reduce considerably the price of future intelligent vehicles. At this moment, Li et al. [22] were the only authors to use this kind of sensor to estimate odometry by the use of a deep learning approach. They propose a network to perform scan matching and loop closure using CNNs. Although the loop closure networks shows good accuracy, the scan matching results are still very inaccurate compared to classic methods.
Based on the idea presented in [22], we propose a new solution for 2D laser-based odometry estimation using deep learning networks. We explore the use of a RNN along with CNNs to learn temporal features in order to improve the odometry results. We also propose a new configuration for the CNNs where we were able to achieve better results. Finally, we explore this solution in outdoor environments, training and testing it with the KITTI [4] dataset, which contains sequences different of type of scenarios.
The proposed approach consists in finding the vehicle displacement by estimating the transformation between a sequence of 2D laser scanner acquisitions. From two consecutive observations, where each observation is a 360◦ set of points measured during one laser rotation, the network predicts the transformation T = [∆d,∆θ], which represents the travelling distance ∆d and the orientation ∆θ between two consecutive laser scans (st−1,st).
Fig. 2: Data enconding for the 2D laser scanner with a 360◦ rotation range. At each time step the raw data of the laser scanner is separated into 0.1◦ bins.
两个连续的观测值
移。 根据
测值是
集合,网络 T = [∆d,∆θ],它表示 连续激光扫描(st-1,st)之间,行进距离 ∆d和两个方向之间的方向∆θ 。
一个激光旋转过程中测得的360°点
,其中每个观
预测变换
图2:具有360°旋转范围的2D激光扫描仪的 数据。 在每个时间步上,激光扫描仪的原 始数据都分为长为0.1◦的间隔(bin)。
首先计算特定间隔(bin)中所有点的平均 深度,并将其存储到向量中。 结果是一个 ,其中存储了每个bin的深
。
我们仅考虑车辆的2D位移,因为我们仅依 靠2D传感器。 因此,目标是学习最佳函数 g(.),该函数在时间t处将(st-1,st)映 射到T:
通过学习这些参数,得到 t 时刻的2D姿态 (xt, yt, theta_t)
这样,我们可以 ,并 在任何时间t 。 由于该 算法不执行任何形式的闭环操作,因此漂 移也会累积,从而降低了车辆定位的准确 性。
以下小节将详细介绍所提出的方法。 首先 ,我们显示激光扫描仪的原始数据已编 码。 依次介绍网络的配置和培训过程的细 节。
我们的数据编码基于先前的工作[22],其中 将2D激光扫描仪点集编码为1D向量。
这可以通过首先将原始扫描分装到分辨率 为0.1度间隔(bin)中来完成。 依次地,由于一组点可以落入同一bin中, 因此我们计算了该组的平均深度值。
最后,考虑360°旋转范围内的所有bin,我 们, 其中
表示。此过程如图2所示。 一旦我们从顺序扫描中处理了
。
The average depth of all the points in a specific bin is calculated and stored into a vector. The result is a 3601 size vector that stores the depth values for each bin.
We only consider the 2D displacement of the vehicle, since we are relying only on a 2D sensor. Therefore, the goal is to learn the optimal function g(.), which maps (st−1,st) to Tattimet:
In this way, we can accumulate the local poses of the vehicle and estimate the global position of the vehicle at any time t. Since the algorithm does not perform any sort of loop closure, drift can be also accumulated, thus reducing the accuracy of the vehicle’s localization.
The following subsections will present in details the proposed method. First, we show the raw data of the laser scanner is encoded. Sequentially, we present the configuration of the network and the specifics of the training process.
We base our data encoding on the previous work [22], where the 2D laser scanner point set is encoded into a 1D vector.
This can be done by first binning the raw scans into 0.1 degree bins of resolution 0.1 . Sequentially, since a group of points can fall into the same bin, we calculate the average depth value of this group.
Finally, considering all the bins of a 360◦ rotation range, we store the depth values into a 3601 size vector, where each possible bin angle average depth is represented by the elements in the vector. This process is presented in Figure 2.
Once we have processed two 1D vectors from sequential scans, we concatenate them to use as input for the network.
3601大小的向量
度值
累计车辆的局部姿态
估计车辆的全局位置
将深度值存储到大小为3601的向量中
每个可能的bin的角度的平均深度都由
向量中的元素
两个一维向
量,就将它们串联(Concat)起来以用作网
络的输入
通过这种方式,我们创建了一个尺寸为 2×3601的图像,代表了两次激光扫描仪的 采集。 这种格式允许使用标准卷积层来提取传感 器在周围环境中检测到的特征。
B.网络架构
先前针对2D和3D传感器的基于激光的里程 计估计[21] [22] [23]的工作仅探索了CNN的 使用,而没有任何深层的时间结构来估计 局部姿态。
我们提出了一种基于以前的视觉Odometry (VO)方法的体系结构,例如DeepVO [15],该体系结构不仅
,而且
接。 为此,RNN的使用很方便,因为它具有对
顺序依赖关系进行建模的能力。
通过将其添加到模型中,我们旨在通过使 用激光扫描仪在先前帧中检测到的隐式信 息,来提高姿态估计的准确性。
可以将该过程与经典SLAM方法中基于图的 方法进行比较[24],在该方法中,它采用了 一系列姿势特征并输出了这些姿势的更精 确估计。
In this way, we create a form of image of size 2×3601 that represents two acquisitions of the laser scanner.
This format allows to use standard convolutional layers to extract the features detected by the sensor in the surrounding environment.
B. Network Architecture
The previous work on laser-based odometry estimation [21][22][23], for 2D and 3D sensors, only explored the use of CNNs without any deep temporal structure to estimate the local poses.
We propose an architecture based on previous Visual Odometry (VO) methods, such as DeepVO [15], that not only extracts the features of the input data but also
.
The use of RNNs is convenient for this purpose because of its ability of modelling sequential dependencies.
By adding this to our model, we aim to increase the accuracy of the pose estimation by using implicitly information that was detected by the laser scanner in previous frames.
This process can be compared to graph-based approaches in classical SLAM methods [24], where it takes a sequential of pose features and outputs a more precise estimation of these poses.
提取输入数据的特征
估计连续图像输入之间的可能连
estimate the
possible connections among consecutive
image inputs
图3给出了所建议网络的体系结构。
1.将 采集(表示为大 小为3601的1维矢量) 起来,以创建网络 的输入张量。
2. 随后, 然后,在每次新数据采集时,RNN都会接收这
些特征以估计新姿态。
3. 我们用两个变量来表示2D运动,即平移(行
进距离∆d)和旋转(∆θ)。
此过程的主要目标是使用神经网络来学习激光 扫描仪数据的特征,同时使用所建议的CNN和 RNN的组合来匹配它们。
我们通过DeepVO网络同样的配置,DeepVO试 图实现相同的目标,但使用的输入是摄像机图 像。
由于我们的数据集大小相当小,因此我们为此 配置了相应的配置。此新配置显示在表I中。
activation.
两个预处理的激光扫描仪
串联
将张量馈入一维卷积和平均池化层的
序列,以了解两次采集之间的特征
。
Figure 3 presents the architecture of the proposed network.
1. Two pre-processed laser scanner acquisitions, represented as a 1 dimension vector of size 3601, are concatenated to create the input tensor of the network.
2. Sequentially,
Then, at each new data acquisition these features are received by the RNN to estimate new poses. 3. We represent the 2D motion by two variables, the translation (travelling distance ∆d) and rotation (∆θ) .
The main goal of this process is to use the neural network to learn the features of laser scanner data while simultaneously matching them using the proposed combination of CNN and RNN. We inspire our network on the configuration used by DeepVO, which tries to achieve the same goal but using as input camera images.
Since our data has a considerable smaller size we adapted the configuration for this purpose. This new configuration is presented in Table I.
the tensor is fed into the sequence
整流线性单元(ReLU)激活。
of 1D convolutional and average pool layers to
它具有6个1D卷积层,其中每个层后面都带有
learn the features between the two acquisitions.
It has 6
1D convolutional layers, where each layer is
followed by a rectified linear unit (ReLU)
在两个卷积层的每个序列之间,也有一个平均
我们添加了池化层,以通过提取最重要 的功能来降低计算复杂性,我们同时测试了最 大池化和平均池化,并且应用平均池化层获得 了更好的结果。
同样,考虑到输入的大小和我们可以用这种传 感器捕获的特征的大小,我们在对5和7大小的 内核进行了不同的配置测试之后,选择仅使用 大小为3的内核。
学习完这些特征之后,Conv6的输出将传递到 RNN进行顺序建模。 我们使用长期短期记忆(LSTM)单元作为 RNN,它能够学习长期依赖关系[25]。 因此,我们使用两个LSTM层的配置,其中
We added the pooling layers to reduce computation
complexity by extracting the most important features, we tested both max and average pooling and we obtained better results applying the average pooling layer.
Also, considering the size of the input and the size of features we can capture with this kind of sensor, we chose to use only kernels of size 3 after testing it with different configurations such as 5 and 7 size kernels.
After we learned the features, the output of Conv6 is passed to the RNN for sequential modelling. We use as our RNN, Long Short-Term Memory (LSTM) units, which are able to learn long-term dependencies [25].
For this reason, we use the configuration of two LSTM layers with the hidden states of the first
池层。
Between each sequence of two convolutional
layers, there is also one average pool layer.
,
。
C.训练 使用RNN的目的是发现激光扫描序列之间的时 间相关性。虽然原则上RNN是一个简单而强大 的模型,但在实践中,可能很难对其进行适当 训练以收敛到精确结果[26]。
。最后
第
一个LSTM的隐藏状态用作第二个LSTM的输
入。 这两个层定义了1024个隐藏状态
最后一个LSTM层在每个时间步输出两个值
,即旋转和平移
step. 因此,网络的训练分两个步骤进行。首先,我
LSTM being used as input for the second one.
The two layers are defined with 1024 hidden
states.
Finally,
们训练了与RNN分开的CNN的序列
练的目的是
。 一旦我们获得了网络预训练权重的CNN部分,
就可以训练整个网络,如图3所示。
。第一次训
C.Training
The goal of using RNNs is to discover temporal correlations between the sequence of laser scans. While in principle the RNN is a simple and powerful model, in practice, it can be
[26]. For this reason, the training of the network was
performed in two steps. First, we trained the
The
objective in the first training is to
. Once we obtained the CNN part of the network pre-trained weights, we trained the complete network as presented in Fig. 3.
the last LSTM layer outputs two
values, the rotation and translation at each time
在不考虑temporal时间信息的情况
下,仅对卷积层进行预训练,仅考虑从两次连
续激光扫描输入中获得的信息
it properly to converge to precise results
hard to train
sequence of CNNs separately from the RNN.
pre-train the
convolutional layers using no temporal
information, only considering the information
obtained from the two sequential laser scans
input
1)CNN预训练:要执行预训练,
。因此,我们 训练该卷积网络以学习等式1中所示的最右函
数g(。)。
在[23]中,作者建议,设计
。他们能够获得更好的结果,将 问题重新定位为:将针对 任务变为分类 任务,并继续针对 任务。这是可能
的,因为后续帧之间可能的旋转范围非常合 理。考虑到这一点,我们测试了两种配置以对 我们的网络进行 ,其一,纯回归任务,
。
对于仅回归任务,我们基于地面真相与估计的 平移和旋转值之间的欧几里得损失进行了训练 ,定义了完整的损失函数,如下所示:
对于分类任务,我们改用交叉熵损失函数对角 度进行分类,为训练定义新的完全损失函数为 :
在(3)和(4)中,Δd和Δθ是关于真实 平移和旋转值的相对值,∆d^和∆θ^是网络的 输出。我们使用参数β> 0来平衡旋转损耗和平 移损耗值之间的比例差异。 考虑到两个帧之间所有可能的角度变化,我们 创建了间隔为±5.6o,分辨率为0.1◦的类,从而 产生了112种可能的类。如[23]所示,与唯一回 归相比,作为旋转分类任务的结果要好得多, 因此,我们
。
1) CNN Pre-Training: To perform the pre-training, the output of the sequence of CNNs , one
to estimate the and another the translation. Sequentially, we train this convolutional network to learn the optimal function g(.) presented in Eq. 1.
In [23] the authors suggested that designing the network to regress the relative translation and rotation worked well only for translation, however the predicted rotation was still inaccurate. reformulating the problem as a
, and continuing as a
. This is possible because
the range of possible rotations between consequent frames is quite reasonable. Considering this, we tested two configurations to pre-train our network,
.
输出将馈送到两个不同的完全连接的层,一个
CNN序列的
用于估计旋转,另一个用于平移
is fed to two different fully connected layers
rotation
和旋转的网络,仅对平移有效,但是预测的旋
用于回归相对平移
转仍然不准确
平移的回归
预训练
旋 转的
for the rotation
one for the translation
classification task
regression
其二,回归和分类任务
For the regression-only task,
as a regression-only task,
and as a regression and classification task
training based on the
rotation values, defining the complete loss
Euclidean loss
ground truth and the estimated
function as follow;
For the , instead we used the to classify the angle,
defining the new complete loss function for the training as:
In (3) and (4), ∆d and ∆θ are
, and ∆d^and∆θ^theiroutputofthenetwork
counterparts. We use the parameter β > 0 to balance the
. Considering all the possible variation of angles between two frames,
we created classes for the interval ±5.6◦ with 0.1◦ resolution, resulting in 112 possible classes. As indicated in [23], the results as a classification task for rotation were better compared to the only regression, and for this reason we
.
classification task
we performed the between the
translation and
Cross-entropy loss function
relative
ground-truth translation and rotation values
scale difference between the rotation
and translation loss values
使用经过训练的CNN分类网络作为
角度估计,作为RNN的输入
network trained as a classification for angle
estimation as input for the RNN
used the CNN
2)RCNN训练:在预训练阶段初始化CNN层 的权重后,我们训练完整的RCNN网络。我们
,从而可以将其用作 ,以细 化结果。
对于RCNN,我们还测试了两种不同的配置 (回归和带分类的回归)。不同地,当我们将 整个任务(旋转和平移)视为回归问题时,我 们获得了更精确的结果,因此使用了(3)中 的损失函数。
我们认为之所以会这样,是因为一旦我们使用 对类别进行了首次估计,就可以更
轻松地利用回归,对旋转角进行估计。 表II:两个测试序列的RMSE转换漂移结果
(没有GPU加速)的每帧计算时间。
我们展示了仅使用CNN和RCNN的区别,两个 RCNN的结果都使用了预先训练的CNN分类网 络。 此外,我们还介绍了第III-C小节中介绍的不同 训练配置的误差。 我们还将提议的方法与其他两种深度学习里程 计估计方法进行了比较,一种方法使用单相机 作为传感器[15],另一种方法使用3D LiDAR [23]作为传感器。
2) RCNN Training: After initializing the weights of the CNN layers at the pre-training stage, we train the complete RCNN network.
, in a way that this could be used as to refine
the results.
For the RCNN we also tested the two different configurations (regression and regression with classification). Differently, we obtained more precise results treating the entire task (rotation and translation) as a regression problem, therefore using the loss function in (3).
We presume that this occurs because the estimation of the rotation as a regression was easier once we had a first estimation of the class using the
TABLEII:
. We show the difference between the use of only CNNs and RCNN, the results for both of the
RCNNs are using the pre-trained CNN-Classification network. In addition, we present the error for the different training configurations presented in Subsection III-C. We also compare the proposed approach to two other Deep Learning odometry estimation methods, one using as the sensor a monocamera [15] and the second using a 3D LiDAR [23]. However, the result presented for the method [23] is for a training dataset, since they chose different sequences for testing.
RNN的输入定义为,将CNN层的输出,与从预
训练网络获得的旋转和平移的估计结果相串联
the input of the RNN the output of the CNNs
We define as
RCNN的初始估计
layers concatenated with the estimated result for
预先训练的CNN以及通过RNN从先前帧中获得
rotation and translation, obtained from the
pre-trained network
a first estimation for the RCNN
的可能信息
但是,方法[23]的结果是
。
pre-trained CNNs, and with the possible
因为他们选择了不同的序列进行测试
针对训练数据集的,
information obtained by the RNN from previous
frames.
RMSE translation drift results for the
two testing sequences along with the computation
time per frame without GPU acceleration
为了进行验证,我们使用了KITTI数据集[4], 该数据集提供了针对室外环境在不同条件下的 多个序列。为了获得二维激光扫描仪数据集, 我们从 V elodyne数据中提取了一个360°的层。 如前所述,提取该层以模拟低成本的2D激光扫 描仪,这可能是自动驾驶行业为未来的智能汽 车维持合理价格的重要因素。
我们对于提出的方法使用KITTI里程表数据集 。在此数据集中,共有11个序列
,但是
我们选择这两个序列是因为它们不是 很长,为训练留下了更多数据,但是它们可以 仍然具有挑战性,并提出了所提出方法的潜 力。此外,在这两个序列中,我们注意到大多 数情况下,
。
为了验证我们的方法并与其他解决方案进行比 较,我们根据KITTI VO [4]评估指标,即
的 ,计算了漂移。由
于仅获得2D姿态,因此我们将所提出的方法改 编为仅在2D中计算误差。然后,通过所有子序 列误差的平均值,计算误差的分数。
表II列出了测试序列05和07的错误分数。分类 和回归之间分数的差异表明了为什么我们 对角度估计,选择对卷积层进行预训练作为的 分类任务, 当我们训练整个RCNN时,对整个(平移+角 度)姿态估计,而将其视为回归任务。
这些结果表明,将角度作为回归任务进行估计 对于网络而言太难了。
但是,一旦我们 获得更精确的角度。
For validation we use the KITTI dataset [4],. To obtain a 2D laser scanner dataset, we
. As mentioned before, the layer is extracted to
simulatealow-cost2Dlaserscanner。Weuse10 for
the proposed method. In this dataset there are 11 sequences, however
During this sequence the 2D laser scanner detects almost no obstacle。。。impossible(舍).
and for
. Additionally, during these we
noticed that
possible for the network to work as expected.
In order to validate our method and compare to other solutions, we calculated the drift a ccording to the KITTI VO [4] evaluation metrics, i.e.,
of
the
We adapt the proposed method to calculate the error only in 2D, since we
only obtain 2D poses. The error score is then calculated by the mean of all subsequence errors. Table II presents the error score for the testing
The difference of scores between classification and regression shows why
we chose to
, but to treat it as a regression task when we trained the
entire RCNN. These results suggest that estimating the angle as a regression task was too hard for the network to learn; however, once we had
we were able to refine the value and obtain a more precise angle.
中的10个序列
组成的序列01
。
在这10个序列中,我们将8个
one 360◦ layer from the Velodyne data
we eliminate the sequence 01
that consists of a trajectory mainly on a highway.
我们消除了由主要在高速公路上的轨迹
用于训练,将2个用于测试。我们用于训练序
Among the 10 sequences, we separate 8 for
training and 2 for testing
. We use for
列00、02、03、04、06、08和09,并测试序列
05和07。
,我们就能够完善该值并
extracted
了一个初步的估计时
sequences from the KITTI odometry dataset
sequences 00, 02, 03, 04, 06, 08 and 09
testing the sequences
averaged Root Mean Square Errrors (RMSEs)
training the
05 and 07
two sequences
most of the time the simulated 2D
模拟的2D激光扫描仪能够检测到障
laser scanner is able to detect obstacles, making it
碍物,从而使网络能够按预期工作
子序列(100,200, ..,800米)的平移误差
所有
translational error for all subsequences (100,
平均均方根误差(RMSE)
200,…, 800 meters).
sequences 05 and 07.
pre-train the convolutional layers as a
classification task for the angle estimation
a first estimation about the class in the
training of the RCNN,
对在RCNN的训练中的class有
我们还在表II中将提议的网络与其他两种深度 学习方法进行里程计估计进行了比较:
一种使用单相机,另一种使用3D激光扫描仪。 我们选择这两种方法是因为它们使用相同的数 据集,从而可以比较漂移结果,并且它们的计 算时间也由KITTI基准测试给出。
与仅使用单摄像机图像的网络相比,我们的方 法具有更好的结果,但是与3D激光扫描仪方法 相比,结果稍差。
可以预见到,[23]中确实会有更好的结果,因 为在我们的实验中,我们
。
此外,由于传感器位于车辆顶部,并且我们总 是
We also compare in Table II the proposed network to other two Deep Learning approaches for odometry estimation:
We chose these two approaches because they use the
same dataset, allowing us to compare the drift results, and also their computational time is presented by the KITTI benchmark.
Our approach has better results in comparison to the network using only mono-camera images, however slightly worse results as the 3D laser scanner method. The better result in [23] is expected, since in our experiments we are
. In addition, as the sensor is on top of the vehicle and we always
mono-camera and another a 3D laser scanner.
提取相对于车辆最平行的层,因此在某些帧
致错误估计平移和旋转。 但是,重要的是要提到3D激光扫描仪方法[23]
以0.23s的时间运行GPU加速,
而我们的方法在不使用GPU加速的情况下每帧 仅花费0.015s(2.6 GHz Intel Core i5,Intel Iris 1536 MB),
并且具有GPU加速(4.0 GHz Intel Core i7, GeForce GTX 1060),每帧速度可达0.001秒; 也就是说,速度提高了 (GPU配置), 而 ,并且使用的 是便宜得多的传感器。
由于我们输入的数据要少得多,因此有望实现 更快的处理速度,并且它允许使用简单的计算 资源实时获得里程计估计。
translation and rotation.
However, it is important to mention that the 3D laser scanner approach [23] takes 0.23s to run with GPU acceleration, while our approach takes only 0.015s per frame without using GPU acceleration (2,6 GHz Intel Core i5, Intel Iris 1536 MB), and it can be as fast as 0.001s per frame with GPU acceleration (4,0 GHz Intel Core i7, GeForce GTX 1060); i.e. a 230-fold increase in speed (GPU configuration), while obtaining only a 0.49% difference in drift score and using a much cheaper sensor. The faster processing is expected since we have a considerable smaller data input, and it allows to obtain odometry estimation in real-time with simple computational resources.
仅提取激光扫描仪的
一层,因此提供的信息量要少得多
extracting only one layer of the laser scanner,
which provides a significant less amount of
information
中未检测到太多或检测不到太多
,从而可能导
detected, causing possible wrong estimations of
one using the
extract the most
230倍
parallel layer in relation to the vehicle, there are
some frames where nothing or not much is
漂移得分仅获得0.49%的差异
我们可以在图4中观察到,即使最终可能发生 错误,我们仍然可以获得接近真实数据的轨 迹。但是,由于所提出的方法不执行任何形式 的闭环操作,因此随着时间的流逝,最终可能 会累积一个大误差,从而产生大的漂移,就像 序列07末尾那样。
因此,在图5中提供了一种更好地理解所提出 方法的准确性的方法。
它显示了测距估计(旋转和平移)以及测试序 列每一帧的地面真实性。
考虑这两个序列,平均旋转绝对误差为0.05度 ,而平均平移绝对误差为0.02米。
但是,在难以估计里程数的帧中,我们可能会 遇到高达0.4度的旋转误差和0.2米的平移误 差。
这些值表示网络在大多数情况下如何估算准确 的里程表,但是仍然存在一些困难的情况,可 能导致值不准确。
结果表明,所提出的方法很有前景,并且可以 作为智能车或任何移动机器人的传统定位方法 的补充,例如在没有车轮编码器或GPS信号的 情况下。
我们还可以期望,如果传感器位于理想位置, 例如对于自动驾驶汽车,在保险杠高度处,围 绕车辆的一组2D激光扫描仪,我们可以获得更 好的结果。
还需要提及的是,与其他深度学习应用程序相 比,
。
We can observe in Figure 4 that even with the eventual errors that can occur, we can still obtain a trajectory close to the ground truth. However, since the proposed approach does not perform any sort of loop closure, one eventual large error can be accumulated over time, generating a large drift like the one we have by the end of sequence 07.
For this reason, a better way to understand the accuracy of the proposed approach is presented in Figure 5. It shows the odometry estimation (rotation and translation) together with the ground truth for each frame of the testing sequences.
Considering these two sequences, the average rotation absolute error is 0.05 degrees, while the average translation absolute error is 0.02 meters. However, we can encounter errors up to 0.4 degrees to rotation and 0.2 meters to translation in frames where it is harder to estimate the odometry. These values present how the network can most of the time estimate accurate odometry, however there are still some difficult cases that can result in inaccurate values.
The results show how promising is the proposed method and it could be used as a complement to traditional localization methods for intelligent vehicle or any mobile robot, when for example there are no wheel encoders or GPS signal. We can also expect that if the sensor was located in an ideal position, for example for an autonomous car as a set of 2D laser scanners around the vehicle in the level of the bumper, we could obtain even better results. It is also important to mention that
我们使用相对较小的数据集训练了网络,
因此可以使用更多的训练序列来改善结果
we trained the network with a relatively small
dataset compared to other deep learning
applications, therefore the result could be
improved using more sequences for training.
五,结论与未来工作 在本文中,我们提出了一种基于RCNN的新颖 方法,仅使用2D激光扫描仪的数据来估算里 程。
。 所提出的网络表明,使用2D激光扫描仪不仅可
以通过低成本传感器提供良好的精度,而且需 要较少的计算资源即可实现实时性能。
使用KITTI里程表数据集评估了结果,从而可 以将其与其他深度学习方法进行比较。尽管结 果相对于这种方法具有竞争力,但是我们仍然 不希望深度学习方法能够替代传统方法,因为 它们仍然可以提供更好的准确性和对结果质量 的更好理解。
但是,由于该方法可以实时运行,并且可以在 不提供车轮编码器数据或缺少GPS信号的系统 中提供相对准确的值,因此它可以作为经典定 位估计方法的有趣补充。
此外,由于我们可以假定网络学习了在不同扫 描之间进行匹配的最佳功能,因此所提出的方 法显示了一种有前途的使用神经网络来了解2D 激光扫描仪检测到的环境的方法。
在未来的工作中,我们期望
。 另外,可以探索使用不止一个传感器来提高结 果的准确性。例如,仅使用单镜头相机无法获 得精确的结果,但是可能通过创建可以与2D激 光扫描仪进行融合的网络来改善这些结果。
V. CONCLUSION AND FUTURE WORK In this paper we presented a novel approach based on RCNNs to estimate the odometry using only the data of a 2D laser scanner. The combination of
The proposed network presents that the use of 2D laser scanners can not only
provide good accuracy with a low cost sensor, but also requires less computational resources to achieve real-time performance. The results were evaluated using the KITTI odometry dataset making it possible to compare it with other Deep Learning approaches. Although the results were competitive to this type of approaches, we still do not expect that the deep learning methods could replace classic approaches at this moment, since they can still provide a better accuracy and a better understanding of the quality of the results. However, the proposed approach could be an interesting complement for classic localization estimation methods, since it can be run in real-timeandcouldgiverelativelyaccuratevalues in systems where no wheel encoder data is provided or GPS signal is absent. Moreover, the proposed method shows a promising use of Neural Networks to understand the environment detected by a 2D laser scanner, since we can assume that the network learned what were the best features to match between different scans. In the future work, we expect that better results could be obtained after
Additionally, the use of more than one sensor could be explored to increase the accuracy of the results. For example,
the use of a mono-camera alone was not able to get precise results, but possibly by creating a network that could perform the fusion with a 2D laser scanner these results could be improved.
CNN和RNN的结合使我们能够实时实现
扫描特征的提取,并学习其顺序模型以获取智
能车辆的定位
CNNs and RNNs allows us to achieve in real-time
the extraction of scan features and learn their
sequential model to obtain the localization of an
intelligent vehicle.
使用具有真实2D激
光扫描仪的数据集训练网络后,可以获得更好
的结果,该二维激光扫描仪位于车辆的较好位
置
BONUS:
a better position at the vehicle.
training the network with
a dataset that has real 2D laser scanners located at
使用如下2D数据集对网络进行训练
https://strands.readthedocs.io/en/latest/dat
asets/mht_rgbd.html