Go with the flow: Data Flow Analysis for Binary Differencing
View/ Open
Date
2014-07-30Type of Degree
dissertationDepartment
Computer Science
Metadata
Show full item recordAbstract
Differencing in computer science is often used to quickly determine differences between two files. While this works well for plain text files, such as source code, applying differencing to binary executable files is more difficult. Compiled binary files contain lists of instructions that when executed, perform operations using functions and data specified by a higher level programming language, such as C++. Syntactic changes to these instructions, changes in the form of an instruction, do not always reflect semantic changes, changes in the behavior of an instruction. Depending on the settings and optimizations of the compiler, a series of instructions from a binary executable could perform the same function as a different series of instructions from a different binary. These types of differences are difficult to detect using current binary differencing methods. This dissertation explores software reverse engineering, binary differencing, and software semantics vs. syntax. We define a framework, we call Data Flow Binary Differencing, for performing binary differencing using data flow analysis and comparing the semantics of the data flow within a pair of functions. We discuss three use cases that illustrate how to implement the Data Flow Binary Differencing framework and show how our technique stands up against challenges faced by other binary differencing techniques. Our major contribution of this research is using data flow and assembly language semantics to define a method to compare a pair of functions and test for similarities. We also discover that testing for semantic differences versus syntactic differences within a binary can expose semantic differences introduced by an optimizing compiler.