Studying the Applications of Probability Metrics and Divergence Measures in Solving Classic Control Tasks

Aima, Kartik

Metadata Field	Value	Language
dc.contributor.advisor	Biaz, Saad
dc.contributor.author	Aima, Kartik
dc.date.accessioned	2020-11-16T19:49:38Z
dc.date.available	2020-11-16T19:49:38Z
dc.date.issued	2020-11-16
dc.identifier.uri	http://hdl.handle.net/10415/7479
dc.description.abstract	Choosing the correct statistical distance for a machine learning problem is vital when estimating the degree of dissimilarity between two discrete distributions. In the distributional reinforcement learning problem, the distribution of returns that can be obtained by an agent are approximated across the entirety of the state space. To describe the expected behavior of the agent as it interacts with the environment in the distributional setting, the C51 algorithm initially proposed using the Wasserstein distance due to the convergence guarantees it offered for the policy evaluation problem. However due to the biased sample gradients produced by the Wasserstein distance, the KL divergence was ultimately used as the categorical loss function in the C51 algorithm. In this thesis we studied two potential class of statistical distances and empirically observed their performance as viable categorical loss functions in the C51 algorithm as compared to the KL divergence. The first were probability metrics such as the Sinkhorn divergence and the Energy distance which attempt to alleviate the poor sample and computational complexity of the exact Wasserstein distance. The second were divergence measures that were instances of both the f divergence and α divergence. We studied the training time and testing time performance of these variations on the Acrobot and Cartpole environments. We demonstrated that the statistical distances most suitable for approximating value distributions in these environments were divergence measures that possessed the zero-avoiding property or an amalgamation of zero-avoiding and zero-forcing properties. Strictly zero-forcing divergence measures were unsuitable for use as a categorical loss function in these environments. The Sinkhorn divergence was ill suited to serve as a categorical loss function whereas the Energy distance demonstrated evidence of learning in these environments, although its training performance paled in comparison to the more successful crop of divergence measures. This indicated that if an optimal transport based categorical loss function was to be used in the C51 algorithm, maximal entropic regularization would have to be applied.	en_US
dc.subject	Computer Science and Software Engineering	en_US
dc.title	Studying the Applications of Probability Metrics and Divergence Measures in Solving Classic Control Tasks	en_US
dc.type	Master's Thesis	en_US
dc.embargo.status	NOT_EMBARGOED	en_US

Files in this item

Name:: Kartik Aima, MS Thesis.pdf
Size:: 2.619Mb

Show simple item record