Find answers to common questions that may arise while you prepare the Artifact Description (AD) and Artifact Evaluation (AE) appendices for your paper submission. If your question is not addressed here, or by carefully reviewing the Transparency & Reproducibility page, please contact the Transparency & Reproducibility Initiative Committee.
- Are AD and AE appendices required to submit a paper to SC?
- Do I need to make my software open source in order to complete the AD appendix?
- How should I format my AD appendix?
- What information do I need to provide in the AD/AE appendix online form?
- Who will review my appendices?
- How will review of appendices interact with the double-blind review process?
- What is the impact of an AD appendix on scientific reproducibility?
- What is the impact of an AE appendix on scientific reproducibility?
- Why do I need to provide an AE appendix if my paper text explains why I believe my results are correct and they show all my work?
- What are “author created” artifacts and why make the distinction?
- What about proprietary author-created artifacts?
- Are the numbers used to draw out charts a data artifact?
- How do I make my data publicly with a stable identifier if it is huge?
ACM Artifact Badges
- What is an ACM Artifacts Available badge?
- How do I apply for an ACM Artifacts badge?
- What is an ACM Artifacts Evaluated badge?
Are AD and AE appendices required to submit a paper to SC?
An AD appendix is required for all SC paper submissions. An AE appendix is optional but strongly encouraged.
Do I need to make my software open source in order to complete the AD appendix?
No. You are not asked to make any changes to your computing environment in order to complete the appendix. The AD appendix is meant to describe the computing environment in which you produced your results. Any author-created software DOES need to be open source, however, to be eligible for the ACM Artifacts Available badge.
How should I format my AD appendix?
You don’t need to worry about formatting the Appendices. You will be presented with an online form during the paper submission with questions that you will answer directly on the submission site. After answering the questions, the system will automatically generate a PDF of the Appendix for you.
What information do I need to provide in the AD/AE appendix online form?
Be sure to familiarize yourself with the questions in the sample form before writing your paper, and ideally before or while you are producing your results:
Who will review my appendices?
The AD/AE appendices will be reviewed with your paper by the Technical Program Committee, but the artifact URLs will be removed from the version they review, as a precaution in support of double-blind review. In addition, the AD/AE Appendices Committee will review the unredacted appendices, and will check that artifacts are indeed available in the URLs provided. They will also help authors improve their appendices, in a double-open arrangement.
How will review of appendices interact with the double-blind review process?
The AD appendix should describe the data, software and hardware artifacts involved in producing the results. Reviewers could discover the author’s identity if they embark on an online search, but they will be asked not to, in support of double-blind review. Author-provided artifact URLs will be redacted from the appendices provided to the reviewers.
What is the impact of an AD appendix on scientific reproducibility?
Reproducibility depends on, as a first step, sharing the provenance of results with transparency, and the AD appendix is an instrument of documentation and transparency. A good AD appendix helps researchers document their results, and helps other researchers build from them.
What is the impact of an AE appendix on scientific reproducibility?
particularly effective in the case of results obtained using specialized computing platforms, not available to other researchers. Leadership computing platforms, novel testbeds, and experimental computing environments are of keen interest to the supercomputing community. Access to these systems is typically limited, however. Thus, most reviewers cannot independently check results, and the authors themselves may be unable to recompute their own results in the future, given the impact of irreversible changes in the environment (compilers, libraries, components, etc.). The various forms of Artifact Evaluation improve confidence that computational results from these special platforms are correct.
Why do I need to provide an AE appendix if my paper text explains why I believe my results are correct and they show all my work?
There are many good reasons for formalizing the artifact evaluation process. Standard practice varies across disciplines, and SC is an international, multi-disciplinary conference. Labeling the evaluation as such improves our ability to review the paper and improves reader confidence in the veracity of the results when approaching the work from a different background.
What are “author created” artifacts and why make the distinction?
Author created artifacts are the hardware, software, or data created by the paper’s authors. Only these artifacts need to be made available to facilitate reproducibility. Proprietary, closed source artifacts (e.g. commercial software and CPUs) will necessarily be part of many research studies. These proprietary artifacts should be described to the best of the author’s ability but do not need to be provided.
What about proprietary author-created artifacts?
The ideal case for reproducibility is to have all author created artifacts publicly available with a stable identifier. Papers involving proprietary, closed source author-created artifacts should indicate the availability of the artifacts and describe them as much as possible. Note that results dependent on closed source artifacts are not reproducible and are therefore ineligible for most of the ACM’s artifact review badges.
Are the numbers used to draw out charts a data artifact?
Not necessarily. Data artifacts are the data (input or output) required to reproduce the results, not necessarily the results themselves. For example, if your paper presents a system that generates charts from datasets then providing an input dataset would facilitate reproducibility. However, if the paper merely uses charts to elucidate results then the input data to whatever tool you used to draw those charts isn’t required to reproduce the paper’s results. The tool which drew the chart isn’t part of the study, so the input data to that tool is not a data artifact of this work.
How do I make my data publicly with a stable identifier if it is huge?
Use Zenodo. Contact them for information on how to upload extremely large datasets. You can easily upload datasets of 50GB or less, have multiple datasets, and there is no size limit on communities.
ACM Artifact Badges
What is an ACM Artifacts Available badge?
This badge is applied to papers in which associated artifacts have been made permanently available for retrieval.
How do I apply for an ACM Artifacts badge?
The AD/AE Appendix form will automatically determine eligibility for an ACM Artifacts Available badge on the basis of the answers to questions about the availability of author-created software, hardware or data products. The conditions of eligibility are:
- All author-created software artifacts are maintained in a public repository under an OSI- approved license.
- All author-created hardware artifacts are available and comply with the Open Source Hardware Definition.
- All author-created data artifacts are maintained in a public repository with a stable identifier, such as a DOI.
What is an ACM Artifacts Evaluated badge?
This badge is applied to papers whose associated artifacts have successfully completed an independent audit.
AD/AE Sample Form
Artifact Description (AD) appendices are required for all paper submissions to SC. Artifact Evaluation (AE) appendices are optional but encouraged.
The AD/AE Appendix will be automatically generated during submission time, after the authors respond to an online form. All authors are encouraged to familiarize themselves with the questions in this form, with plenty of time before submission.
Example Script for Machine-Generated Environment Data
In an effort to facilitate preparing the SC Artifact Description appendix, we provide a sample Bash shell script, collect_environment.sh, that queries the operating system for a variety of information. As written, the script is somewhat Linux-specific, but it should serve to provide ideas for similar scripts that run under other operating systems. The script should be run on one of the nodes that was used to gather the paper’s experimental data (i.e., a compute node, not a head node).
At the time of this writing, collect_environment.sh invokes the following commands:
- lsb_release: report the Linux distribution
- uname: report the kernel version and base architecture
- lscpu (or cat /proc/cpuinfo): report CPU characteristics
- cat /proc/meminfo: report virtual and physical memory capacity
- lspci: list all PCI devices
- lshw: list all hardware devices
- lsblk: list all block devices
- lsscsi: list all SCSI devices
- env: list all environment variables currently in use
- module list: list all loaded Environment Modules
- nvidia-smi: report characteristics and configuration of NVIDIA GPUs
There is some redundancy in information provided by the preceding tools, but note that not every tools is available on every system. The goal is to acquire as much information as possible on as many types of systems as possible.