This Is AuburnElectronic Theses and Dissertations

Improving Programming Productivity with Statistical Models




Nguyen, Tam

Type of Degree

PhD Dissertation


Computer Science and Software Engineering


Modern applications often need to have very short time-to-market and upgrade cycles, and thus, short development time. To address this requirement, application developers often rely heavily on API frameworks and libraries when developing apps. However, learning the usages of API methods and objects is often challenging due to the fast-changing nature of API frameworks and the insufficiency of API documentation and source code examples. In our research, we focus on API usage patterns, i.e., frequently occurring patterns that appeared when using API objects and methods. Specifically, we develop statistical models that capture API usage patterns and use those models to improve programming productivity. This dissertation has three contributions. One is Salad, a novel approach to address the problem of learning API mobile frameworks. Salad can learn complex API usages involving several API objects and methods. Salad learns the usages from bytecode of Android mobile0 apps, of which millions are publicly available. The main component of Salad is HAPI, a statistical model of API usages and three algorithms to extract method call sequences from apps’ bytecode, to train HAPI based on those sequences, and to recommend method calls using the trained HAPIs. Salad can automatically generate recommendations for incomplete API usages, thus it could reduce the chance of API usage errors and improve code quality. The other main contribution of this dissertation is FuzzyCatch, a code recommendation approach for exception handling. FuzzyCatch learns usage rules involving API methods, exceptions, and handling actions from a large collection of high-quality apps publicly available. Based on fuzzy logic rules learned from thousands of those apps, Salad can predict if a runtime exception potentially occurs in a code snippet. Then, as the programmer requests, it can generate the try-catch statement with catch block containing code to catch that exception and the exception handling actions to recover from the exception The final contribution is Persona, a novel code recommendation model that focuses on the personal coding patterns of programmers while also combining project-specific and common code patterns. As a personalized model, Persona is built and updated for each programmer. It is composed of three sub-models: a model that captures personal code patterns of a programmer; a model that captures the project-level code patterns that the programmer is working on; and a general model that captures code patterns shared between multiple projects. Persona incorporates code patterns learned from the three sub-models together and utilizes those patterns for recommending code elements including variable names, class names, methods, and parameters.