The hash code of a file is a number (typically 16 or 20 bytes in length) that is unique for just that file. There two standard routines for generating these codes, SHA-1 and MD5, which are universally accepted in the forensic and investigation world. This article will discuss the primary areas where you will encounter these routines.
Obviously a unique number is a digital signature of a file. Once generated it can be shown that at any later date, the file has not been changed intentionally or accidentally (in transmission etc). Thus the whole question of has a file been tampered with can be controlled in a very easy way.
The other main area is in identifying files by their contents. A hash value is generated by just the file contents, while the file name or date of file is not relevant. This can help an investigation in two very different ways. When examining a tape or disk for information, it is often necessary to eliminate by some means a very large number of system files. One may decide to ignore say all .EXE files, or all .DLL files, but in doing so, it is impossible to tell if these files do in fact store user information that could be relevant to an inquiry. What can be ignored are all system files that have not been changed since they were generated, for instance, by Microsoft. A new XP system contains a GB or so of files made up of a very large number of files. By having hash values of all of these original files, it is possible to eliminate these files in the certainty that they have not been changed or added to in any way since released by Microsoft. Data that has been hidden in a file with a standard operating system name, even of the same size and date, will never have the same hash value. To make life a bit easier for users, there are lists of hash values for many standard applications and operating systems on the web. A useful [website is the National Software Reference Library.]
The second useful application for investigations may often be based on the requirement to detect if certain files exist - typically related to pornographic image investigation. If somebody is suspected of downloading files from a certain site then the hash values of the files on their disk or backup tape may be compared with known databases and matches can be made irrespective of file name or location.
MM/PC has had the ability to create hash values as part of the forensic log for over a year now, but a new addition (on V4.05) is the ability to import hash tables in hashkeeper (MD5) format to work with the de-duplication routine to skip restoring standard operating system files from tape. The log will display the files that have been skipped, along with all hash values, in both SHA-1 and MD5 format. The log can be exported so that searching for known hash values may be carried out by user applications. Contact us today to learn more about this new feature for MM/PC.
For orginal article, click here.
Product and company names mentioned on this web page may be trademarks or registered trademarks of their perspective companies and are hereby acknowledged.
This article may be re-published as long as the following resource box is included at the end of the article and as long as you link to the email address and the URL mentioned in the resource box:
Article by eMag Solutions. For more articles on eDiscovery and Data Restoration, subscribe to our e-mail Newsletter by sending a blank email to newsletter@emaglink.com or by going to http://www.emaglink.com/.