Forensic Analysis
The subject of Forensics Analysis is huge with many books published on the subject. eMag has long been a leader in providing software in this challenging field, and our main area of expertise focuses on decoding tape and floppy disk formats. This article discusses the challenges of forensic tape analysis and how to extract the crucial data from a tape.
At the simplest level, to find out what is on a tape, it helps to be able to read the tape correctly. There are many different types of tape and while some look the same, the drives used to read them are different and so understanding the generational differences of media and the associated hardware is essential. For instance reading a DDS-4 tape is not possible in an (older generation) DDS-3 drive. While the tape will fit in the drive, the media is of a higher density than that supported by the drive. So matching the tape to the correct hardware is usually the first obstacle to overcome, and once you have overcome that you are ready to find out more about what's inside.
Data on a tape is laid out in a "format". A format can be thought of as being similar to a language. It has a specific syntax and set of rules and a definite logical layout. In computer terms we refer to nibbles, bits, bytes, words, long words, records, blocks, files, end of file markers, etc. These are all terms that you need to become familiar with when determining how to crack out data from a tape. A tape format expert will do what is called a "tape dump" where the contents of the tape are viewed in a raw format on the screen which in turn allows the format to be verified. This is usually as "deep" as someone needs to get into a tape. If need be software is written to access the tape, the format cracked and the data extracted.
Cracking a format is an interesting challenge. The secret is recognizing the patterns within the data. A format is normally structured so that it has set-up information encoded in it designed to tell the software reading it what to expect. The reading program then knows what to process and what to do with it. Some formats contain file names, dates, block sizes, etc, while others are more or less purely numerical. Text data within these formats can be encoded in ASCII (typically written by PC's & Unix machines) or EBCDIC (Mainframes, older technology). These are internally represented as numerical codes, and they are easily translated so that the data can be read back like text when viewing a tape dump.
If we look at our business arena, which is commercial tape & disk formats, we are primarily concerned with electronic document storage. These can be backup tapes, interchange formats and so on. Ultimately they contain files generated by a software package whether it be word processing, databases, spreadsheets, graphical images, etc. The issue one faces when reading these tapes is whether or not you access the original software that created the tapes. If you do not, then reading the tape becomes a real challenge.
The job of someone performing a forensics analysis is to be able to take a tape and get the data off in a fast, logical, cohesive and accurate fashion. At the simplest level, someone might want to know what the names of the files are on a tape and when these were created. The tape is placed in a drive and read with the metadata being output to a file. This is known as tape logging and helps create a timeline for the tape contents as well as indicating the dates, names and types of files present. An investigation may involve hundreds of tapes and having the ability to log each tape and then bore down into those logs looking for certain details is very useful. The investigator can then just restore the files needed rather than the whole save set, which could be Terabytes of data, saving them many days and vast expenses.
At a more complex level, one might want to go to a business and electronically capture every bit & byte on every disk on every computer at the office. This is typically done by making a series of disk images to tape. The tapes would then be taken to a remote site where they could be restored to other disks, and thus recreating the working environment at the original site. Analysis can then begin, digging down through the various layers of data looking for signs of fraud and incriminating evidence.
Digging down forensically into a hard disk is a lot easier than working with a tape. In most situations when you are reading a tape and you hit end of data, then that is it. There are no "old or earlier versions" stored on tape. The contents of a backup tape are a snapshot in time so if you need an older version of the file you need to read an earlier tape. It is generally not possible to rewrite a section of data on a tape so any file you read cannot be newer than the tapes creation date. Reading beyond end of data sometimes produces useful results, but again you require the proper software tools to help evaluate and extract the contents and this is usually not possible (without specialized hardware). So with these limitations in place, tape is actually a great tamper proof backup medium.
The problem with tapes is that there are no format layout standards and as a result there are numerous formats and many variations within these formats. Developing your own code to read these tapes therefore becomes a very major issue and is best left to specialists in the field.
So where does eMag fit into this picture? About fifteen years ago we started writing a restoration program to read tapes of an unknown origin and format. We also did the same for floppy disk formats. Both programs have grown tremendously over the years. Today we offer our MediaMerge/PC and MediaMerge/UNIX forensic analysis software packages that can read, automatically recognize and restore 100's of different tape formats and variations. And we have our InterMedia for Windows package that does the same for over 2000 floppy disk formats.
These packages give users access to data written on usually obsolete or non-accessible hardware. In most cases the user does not have to know anything about the tape, you just put it in the drive, tell the program to restore it and it does the rest. It is also used for data interchange between hardware platforms. For example if you want to read a VAX/VMS backup tape on a PC it can do this; likewise if you want to read and restore an older Backup Exec dataset and convert it to TAR for use on a UNIX machine, it can do this as well. If you want to look at the raw data layout of a tape on a byte-by-byte basis and maybe investigate what is beyond end of data, all interactively, you can do this as well. We have written an extremely comprehensive package that addresses the vast majority of your standard tape processing requirements, and this software has rapidly become one of the "must-have" electronic discovery tools.
If you would like to learn more about these software programs or eMag's tape analysis and forensic capabilities, please do not hesitate to contact us. The software is under continuous development as new formats and features are requested and added, and if you are one of our many customers, then you already know how we support the program. And if you are not, then we look forward to helping you someday soon!
Long-term Data Archiving
We have all come across pictures from the last century, sometimes a bit faded and maybe slightly yellow, but normally it is possible to make out the image and possibly work out who or what is in the picture.
The last photo I took was with a digital camera and then viewed on the computer screen. The question to be asked is, if in 100 years time this image file was discovered in a dusty loft, would anybody be able to view it? The issues are in fact very closely related to the article on data interchange a few months ago; namely the issues of medium, file format and data structure.
In the digital camera example, the image starts on a Compact Flash (CF) card and is then loaded onto a PC or possibly a CD-ROM. The image is a compressed JPEG. In 100 years time we would need to read a hard disk, assuming it would still spin, or a CD-ROM just to extract the raw file. Over the past 30 years or so, we have seen many standard media come and go such as 8" floppy disks, and recently 0.5" open reel tape is now being phased out. Reading the data structure in theory may be a bit easier, as it is just software, but do not assume anything as it is not easy to read a CP/M Word Star document anymore.
Storing a photo image may not be too important (and it could always be printed out) but the example does highlight the problem of trying to keep information in a format that may be accessed over many years. It is generally accepted that the only way to tackle this problem is to update the whole storage scheme every few years. Thus one needs to look at the media, the file structure, and data structure and update to something that has a long projected life span. As a result any data on open reel tape should now be transcribed onto possibly an LTO. It is worth trying to simplify the data structure at the same time and to make sure that it is not part of a propriety application. For structured records it is best to go for fixed length or delimited records in straight text, and not any packed or binary numbers. Indexing information is normally just used to speed up access, and this does not enhance the value of data. File structure wants to be simple and avoid data compression.
If the above route is planned, there are two significant advantages. The first is that data will be accessible in several years time, and secondly the amount of media required to store the data will be considerably less. As an example an open reel tape stores approx 100MB and an LTO approx 100GB (without compression). Thus 1000 open reel tapes could be stacked onto a single LTO, but preferably duplicated a few times for security reasons.
eMag Solutions can help with transition ranging from the 8" floppy though open reel tape and often from the lower capacity Exabyte, Dats, and DC600 drives. eMag will also advise on possible data re-structuring. Don't forget though that all data storage needs reviewing every few years. It is a waste of money to spend money on storage for something that can not be accessed, though if you are in the state, eMag well be able to help.
This article may be re-published as long as the following resource box is included at the end of the article and as long as you link to the email address and the URL mentioned in the resource box:
Article by eMag Solutions. For more articles on eDiscovery and Data Restoration, subscribe to our e-mail Newsletter by sending a blank email to newsletter@emaglink.com or by going to http://www.emaglink.com.