Validating the Integrity of Downloaded Files

file0001288630010

We’ve been downloading a lot of files lately.  If you are at all security-minded, you may wonder if what you downloaded is correct.  There are many ways to check the integrity of downloaded files; we’ll talk about a few of them here.

Using Hashes to Determine Integrity

At their core, files are simply a sequence of bytes.  When you copy a file from one location to another, it’s possible that some of the bytes could be altered.  This could happen by accident where the transfer mechanism makes a mistake copying the file.  However it could also happen intentionally if a 3rd-party actor sitting between the source and destination of the file changes some of the bytes.  We need a way to verify that a transfer occurred successfully without comparing each byte.

A hash is a computer algorithm that takes an arbitrary-length string of bytes (i.e. a file) as input and produces a fixed-length byte string as output.  A good hash algorithm has the following properties:

  • Once an output is produced, it’s hard to determine the input that created it.
  • Two inputs that are very similar, produce very different outputs.
  • The output hashes are big enough so that it is unlikely (but not impossible) for two inputs to have the same output

A commonly used hash algorithm is Message Digest #5 (MD5) which generates a 16-byte output hash from an arbitrary input.  It has all the above properties, but a better alternative is the Secure Hash Algorithm #1 (SHA1).  It produces a 20-byte output and is harder to guess than MD5.

For example, the hashes for the copy of notepad.exe on my development machine is:

MD5:  959a 31d0 cd01 3cea 0c66 db7c 03bc bddf 
SHA1: 1edd cdee b30c 9d76 cb46 ec88 99b0 17d1 6a4f 768d

Remember, 0x95=10010101.  Your output may be (very) different if you have a different version of notepad.

Microsoft’s File Checksum Integrity Verifier

If you’re using a Unix-like platform, you already have programs like sha1sum(1) and md5sum(1) to calculate hashes of file contents.  Read the manual pages for more details.  For Windows platforms, you’ll have to download a separate tool.  Microsoft provides the File Checksum Integrity Verifier (FCIV).   Download it and run the installer.

You can run this program against any file you wish.  For notepad.exe for example:

C:\Users\Lambert>c:\FCIV\fciv.exe -md5 c:\Windows\notepad.exe
//
// File Checksum Integrity Verifier version 2.05.
//
959a31d0cd013cea0c66db7c03bcbddf c:\windows\notepad.exe
C:\Users\Lambert>

Now let’s try it on a downloaded file.  Consider the latest version of the Windows binary zip file of Maven, a project management tool we’ll discuss shortly.   In addition to the zip of the binary, you can also download a .md5 file that contains the hash of the binary.  Grab that file too and compare its contents with the hash of the zip.

C:\Users\Lambert>c:\FCIV\fciv.exe Downloads\apache-maven-3.3.1-bin.zip
//
// File Checksum Integrity Verifier version 2.05.
//
067d9f8ecd6ff3981e2764e50846da5f downloads\apache-maven-3.3.1-bin.zip
C:\Users\Lambert>type Downloads\apache-maven-3.3.1-bin.zip.md5
067d9f8ecd6ff3981e2764e50846da5f
C:\Users\Lambert>

They match, so we can be reasonably assured we’ve downloaded a good copy.

GnuPG

While comparing its hash provides reasonable confidence that the file you retrieved is intact, it doesn’t say anything about the authenticity of the source.  For that we need more sophisticated cryptographic tools and information.  For our maven example, we can download a .asc file of the binary zip, also available from the download page, that includes the hash of the binary, crytographically signed by one of the maven developers, using GnuPG.

Cryptography At Work

We’ll cover cryptography in much more detail in future posts, but there are a few basics we should discuss now.  Encryption systems can be divided into two broad categories.

  • Symmetric Key Encryption uses a single key shared between the sender and receiver to encrypt a message before transmission and to decrypt it after receipt.
  • Asymmetric or Public Key Encryption uses algorithms that require a pair of keys.  If you encrypt a message with one key, you can only decrypt it with the other.

Symmetric key systems are generally more secure and efficient, but it’s difficult to manage and distribute keys among the communicants.  Public key systems can be used to establish secure communications with anyone without a lot of pre-coordination.  The process is:

  1. A key pair is generated for every communicant.  One key is labeled ‘public’ and can be shared openly with anyone.  The other key is labeled ‘private’, and must be kept secure by the holder.
  2. If Alice wants to send a secure message to Bob, she encrypts it with Bob’s public key.  She can ask Bob to send the key to her in the open, or look it up in a public directory.
  3. When Bob receives the message, he decrypts it with his private key and can be assured that he’s the only one who can read the original message.

Encryption

But assuring confidentiality is not the only goal.  Public key encryption can also be used to authenticate the sender of a message.  Here’s how:

  1. Alice encrypts the message, this time with her private key.
  2. When Bob receives it, he retrieves Alice’s public key and decrypts the encrypted message with it.
  3. If it works, Bob can be assured that the message could have only been encrypted with the holder of Alice’s private key, i.e. Alice.

Authentication

Note that, while Alice’s identity as the sender can be confirmed by Bob, the message’s confidentiality cannot.  Anyone can decrypt it with Alice’s public key if they intercept it.  To get both authentication and confidentiality, as well as assuring the message’s integrity, Alice could calculate the hash of the message, encrypt it with her private key, attach it to the message itself, then encrypt the whole thing with Bob’s public key before sending it.  The message hash, encrypted with Alice’s private key can be considered her digital signature of the message.  GnuPG performs this process for you.

Installing GnuPG

GnuPG is an open-source implementation of Phil Zimmerman‘s Pretty Good Privacy (PGP) algorithm.  It is generally distributed as source code only, but there are a number of 3rd-party binary distributions.  For Windows environments, try Gpg4win.com.  Download the current version, 2.2.4 (includes v2.0.27 of GnuPG), and run the installer, accepting all the default options.

Using GnuPG to verify the binary zip’s integrity is simply a matter of running the main GnuPG tool against the downloaded zip and associated .asc files.  Before we do this however, we must retrieve the public keys of the maven developers used to sign the message hash in the .asc file.  To do this, you can consult a key server like keys.gnupg.net using the Kleopatra tool that comes with GnuPG.  But unless you know exactly whose public key you need, you’re likely to consult the package’s homepage and retrieve a KEYS file with the required keys.  For Maven, try this.  To import these keys, download this file, run Kleopatra, and click on Import Certificates, pointing at the download KEYS file.

Verifying The Package

Finally, we now have all the files, tools, and knowledge we need to verify the integrity and source of our download maven package.  From the command line:

C:\Users\Lambert>"c:\Program Files (x86)\GNU\GnuPG\gpg2.exe" --verify c:\Users\Lambert\Downloads\apache-maven-3.3.1-bin.zip.asc c:\U
sers\Lambert\Downloads\apache-maven-3.3.1-bin.zip
gpg: Signature made 03/13/15 16:12:29 Eastern Daylight Time using DSA key ID BB617866
gpg: Good signature from "Sarel Jason van Zyl <jason@maven.org>" [unknown]
gpg: WARNING: This key is not certified with a trusted signature!
gpg: There is no indication that the signature belongs to the owner.
Primary key fingerprint: FB11 D4BB 7B24 4678 337A AD8B C7BF 26D0 BB61 7866

C:\Users\Lambert>

When we run the gpg2.exe command with the –verify option, GnuPG decripts the contents, looks up the signer (Sarel Jason van Zyl, 0xBB617866 in this case), decrypts the hash with Jason’s public key (which he should have), and compare it to the hash it calculates from the binary zip.  It matches, but there’s still one more problem.  How do you know the public key you have for Jason is legitimate?  To gather the public key securely, you need your own GnuPG key pair and be added to Apache’s Web of Trust.  We’ll cover how to do that later.  For now though, you can be very confident that the binary you have is genuine and accurate.

Apache provides an in-depth description of the tools and techniques required for signing releases.

What’s Next?

We’ll refer to these processes when we download new 3rd-Party packages in the future.  We’ll also cover this material in much more depth when we discuss Secure Sockets Layer (SSL) and encrypted email.

 

Leave a Reply

Your email address will not be published. Required fields are marked *