Toward Understanding Compiler Bugs in GCC and LLVM

The artifact includes the source code and the dataset (i.e., bug reports, code revision logs and source code of GCC and LLVM) used in this paper. It is packaged into a single Virtualbox machine, and all the dependencie have been correctly set up in the virtual machine.

Download Link: issta16.ova
User/Password: issta16/issta16

Artifact Setup

Download: The virtual machine is available at issta16.ova, packaged in the Open Virtualization Format (OVF). Before downloading, please NOTE that this VM image is *very* large, as it contains data of over ten years for GCC and LLVM. The download image is ~13 GB, but after setup, the VM will take ~50 GB.

Prerequisite: In order to set up the VM, you need to install a version of VirtualBox (that at least supports OVF 2.0 standard). Besides, the VM is configured with 4GB memory, so make sure that your host machine has enough free memory. The VM image was created on Ubuntu 14.04, and should be portable on other platforms.

Setup: Open VirtualBox, then choose the menu File->Import Appliance. Next select the downloaded VM image, and click "Next". Finally click the "Import" button. It will take ~20 minutes for VirtualBox to unpack the image.

Login: Start the VM (this might take several minutes), and select the user issta16, then enter the password issta16.

Artifact Organization

The important folders and files in "/home/issta16" are listed as follows:


Start a terminal, and execute the script "./". After up to 60 minutes (it took me ~15 minutes to finish on my machine), all the analysis results will be put in the folder "/home/issta16/result"


The result folder "/home/issta16/result" contains two sub-folders, "gcc" and "clang", each of which stores the analysis results respectively. We below demonstrate how to interpret the results with a single compiler "gcc", and the interpretation of "clang" can be done similarly.

Note that, nearly all the plots in our paper are automatically constructed from the results in this folder, and some plots involve complex manipulation of the data. It is not easy to intuitively interpret all the files. Therefore, we pick some important files as follows that are also easy to understand.

Note that, given a file named "distribution.txt" in a folder, there is usually another file named "accu-distrubiton.txt". The latter one is the empirical cumulative distribution function of "distribution.txt".

Last update on April 25, 2016.