For general FITS issues, see the information provided by the
FITS Support Office
and the FITS archive at NRAO.
In the following we shall use two mechanisms for verifying the integrity
of a FITS file or the data portion of an individual HDU (Header-Data
Unit): FITS checksums and SHA digests.
FITS Checksum Convention
Rob Seaman and Bill Pence have proposed the establishment of a FITS Checksum convention. The algorithm used is that of a 32-bit one's complement checksum. The proposed convention requires that the checksum over each HDU in a FITS file be -0 (0xffffffff), through the use of two related keywords in the header of the HDU: the value of the keyword DATASUM is a character string containing the decimal representation of the 32-bit one's complement checksum over the data part of the HDU; the value of the keyword CHECKSUM is a 16-character string made up in such a way that the 32-bit one's complement over the entire HDU is -0. Note that any changes to the header would affect the value of CHECKSUM, but not the value of DATASUM.
The FITS checksum provides a fair verification mechanism for detecting I/O errors and transmission errors. But it should be stressed that it is not robust against (malicious) falsification and a large class of data modifications, including cyclic rotations and byte swapping, involving file sections that are a multiple of four bytes in size.
All FITS files in the Chandra Data Archive will be properly checksummed, using this convention. Though the convention is only in proposal status, it is likely to become, if not a standard, at least accepted practice; various archive collections, like the RXTE archive, are applying this checksum already throughout.
Secure Hash Algorithm Digest
The NIST has formulated a Federal Information Processing Standard for a Secure Hash Algorithm, FIPS 180-1 (plain text or PDF). The SHA creates 160-bit digests of messages or files, using public and private keys, allowing the authentication of messages or data.
The SHA digest provides a fully tamper-proof mechanism for data integrity verification and validation. For our purposes, there is no reason to work with secret keys, assuming that the AXAF archive is itself secure. Instead, it will be sufficient to distribute a tool that can calculate SHA digests over selected file sections and that has the keys built into it. The validation consist of comparing those digests with the ones generated by the archive itself. It is virtually impossible to tamper with a file's contents and/or the keys without affecting the SHA digest.
In this section, the tools and their mechanics are described. The following section will provide directions on how to apply them.
fverify
HEASARC's Ftool fverify will verify the checksum of each HDU in a FITS file, as well as the compliance of each HDU's contents with the FITS standard and various HEASARC conventions.
verifyChksum
A tool that verifies the FITS checksum of any FITS file. The code may be obtained from the ASC.
Usage:
verifyChksum [fileName]Calculate the checksum of fileName; use stdin if the argument is missing. Returns 0 if OK, 0xffffffff for a zero-length file, otherwise the (non-zero) value of the checksum. This tools works on all architectures.
Compile with:
gcc verifyChksum.c -o verifyChksum
shaHDU
Calculates the SHA digest (FIPS 180-1) for either the data part of a specified HDU or an entire FITS file. Should be verified against a prerecorded value. This mechanism is much robuster than the checksum. The digest, as calculated, is not yet portable across architectures. It is guaranteed to work on big-endian machines. The code may be obtained from the CXC.
Usage:
shaHDU [-U] [-i FITSFile] [-e extension] [-f] -U show usage -i FITSFile: path to file; default: stdin -e extension: HDU number; 0: primary array -1: all HDUs default: 0 (if primary array exists) 1 (otherwise) -f return SHA digest over entire file; overrides -e option
Compile with:
gcc shaHDU.c -o shaHDU
There are various alternatives for validating FITS files, depending on one's requirements and objectives. Below we suggest strategies to answer the three or four most common questions relating to the integrity and authenticity of data files.
Is this file undamaged?
verifyChksum - Will give a fast and simple answer. If the file's checksum is correct, it most likely emerged intact from its transfer.
fverify - Will provide a more thorough check, but the answer will be much longer, both in size and time.
Are these two HDUs identical?
Assuming that one is really interested in the data part (remember: the headers contain the creation date and are likely to be different if created at different times):
DATASUM - One can simply compare the values of the DATASUM keyword in the two headers. However, one would be well advised to also ensure that the checksums for the two are still valid: someone may have changed the data without updating the checksums.
shaHDU - Comparing the SHA digests for the two HDU data parts will provide a definitive answer as to whether they are identical.
Are these two FITS files identical?
shaHDU Comparing the SHA digests for the two files will provide a definitive answer as to whether they are identical. One digest may be obtained from the file at hand, the other provided by the archive, from the archive's copy of the file, thus answering the question Is this FITS file authentic?