|dc.description.abstract||This dissertation introduces a measure of navigation complexity for complete coverage path planning (CCPP). It also introduces a novel approach to CCPP using reinforcement learning (RL) to enable a mobile robot to cover an area, subject to constraints on time and environmental complexity. Target applications are those with many repeated obstacles and hard time constraints, such as cleaning an airline cabin during the gate time between flights.
For navigation complexity measurement, the challenge is to consider various environmental factors together to measure the difficulty of navigating through the bounded space under consideration. For two-dimensional coverage problems, a low-complexity environment could be pictured as an area bounded by a rectangle or circle containing no obstacles, while a high complexity environment might be represented as a region bounded by a complex contour and filled with many randomly placed obstacles of random shapes and sizes.
The size of environments and the numbers of corners on maps are commonly taken into consideration. However, the size of robots to conduct CCPP navigation, the number of obstacles and the location of obstacles on maps are factors that cannot be ignored.
To address the aforementioned challenge, I propose a method consistent with the Shannon Entropy Formula. Four environmental factors are considered as inputs in our navigation complexity measurement. These four inputs include the spatial area to be covered, robot size, number of obstacles, and obstacle locations.
The experimental results show that our approach enables the computation of navigation complexity in a variety of environments, and that the results are intuitively consistent with human observation. This approach provides a comprehensive complexity measurement as a reference for CCPP performance analysis.
For map coverage using RL, the framework trains the robot in a simulated environment to move to uncovered areas and to avoid frequent collisions using rewards. Additionally, it encourages the robot to complete map coverage missions efficiently and quickly.
I select the Machine-Learning Agent provided by Unity3D to build a fragment (sample cell) of an airline cabin environment in which to train the robot. I implement Proximal Policy Optimization as the main training network, and added curiosity functions (i.e., intrinsic rewards) to encourage the robot to explore uncovered areas during training. I use Generative Adversarial Imitation Learning to guide the training policy’s convergence close to the expert data.
Experimental results show that the optimal policy enables complete map coverage in complicated environments. I provide demonstrations comparing random motion methods to reinforcement learning networks to show differences in map coverage, trajectory length, and time-cost.||en_US