Skip navigation

Why Do We Need Conversion Tools?

History of Data Interchange

All computers are compatible, aren't they? Well not entirely and especially when it is important to move large volumes of data from one system to another. This brief article outlines some of the issues relating to data exchange and tape conversion.

Many PCs are compatible with each other, and the 3.5" 1.44MB floppy is a very good interchange media. However this type of industry standard keeps changing. In the 80s the standard interchange was an 8" single sided, single density disk, but how many systems can now read 8" disks? 1600bpi 0.5" tape (9-track) was also a good interchange for large systems, but many systems have moved to DLT, LTO, 3480/3490/3590, Exabytes, DATs etc and do not have 0.5" reel-to-reel support?

There are two main problems with data interchange, the physical tape format, and maybe less obvious, the way (logical format) that the tape has been written.

But why don't we just use 1.44MB disks, and zip the files onto them? The main reason is that this will work for up to say 20MB, but beyond this point it gets very difficult and prone to errors. Don't think about trying to zip a 12GB files onto floppy disks. The other reason is that some systems (mainframes) cannot write 1.44 MB floppy disks, and finally, many operators may not know how to zip files.

Many operators, but fortunately not all, only know how to backup their system or copy files onto a backup tape. Thus if it is necessary to obtain files from somewhere, it is possible that the only method of receiving them will be on a backup tape. At this point to read the tape you will need both a compatible tape drive and the backup/restore program which is compatible with the program that wrote the tape.

To handle tapes from many sources, it is therefore necessary to have many tape drives and either a very large number of software packages or a general-purpose tape and format handler. Our data interchange software, MediaMerge/PC (MM/PC), is the simple solution to the last problem of reading all types of backup tapes.

Physical Tapes

The only way to read a tape is in a compatible drive. Fortunately there are not too many families of tape drive, and there is a good degree of backup compatibility between recent drives and older drives. However drive standards that were produced in the 80s are now slowly being dropped from the most recent developments of the drive. Other tape drives may only support older standards as read only rather than read/write.

To be completely compatible with all possible tapes it may be necessary to have multiple versions of some drive types. The best example of this is the DC600 range than has grown from a 60MB capacity to 100GB. Fortunately many tape formats are still entirely backward compatible.

MM/PC is designed to read from most available SCSI tape drives. New drives are added as manufacturers release them so the library of supported drives grows on a regular basis. Sometimes all that is required is media conversion when a tape is effectively duplicated onto a different media but retaining the same file structure and same physical blocking.

Logical Tape Formats

The number of physical tape formats is very limited when compared with the number of logical tape formats. Although there are only 2 basic elements that may be written to a tape, e.g. a block, and a tape mark, the ways of writing these elements is almost unlimited. With format conversion these variations can all be handled.

There are probably only a few 'standard' ways of writing tapes, several 'de facto' standards, and many proprietary ways. However what is a good standard one-year may not be relevant as volumes of data expand, and tape technology changes. The best-known and recognized standard is ANSI labeled tapes. This is also known as IBM labeled tapes and is simply described as each file having a header, data, and trailer, separated by tape marks. It is simple well documented with not too may variations. Many 0.5" reel to reel tapes, and 3480/90 tapes still use this format when attached to a mainframe.

Unix is probably the next best 'de facto' standard. There are 2 basic variations as in TAR and CPIO. For each version there are about 5 pages of options that may be used when writing. MM/PC recognizes these options automatically when reading a tape.

The final tape format that is becoming a standard is NT backup which is also the same as many versions of Seagate backup.

It is the three 'standard' formats above that MM/PC writes.

Why so many variations?

Why are there so many other formats? The main reasons are 'backup' companies want to produce their own product and tie customers in. By having a proprietary formats, this can be done. The other reason is that what was good in 1980 is now probably inefficient and slow. Files have become much larger, and multi-gigabyte files are now fairly common. Compression routines help with both tape capacity and speed. Many current formats now allow for either software compression or using the hardware compression in the tape drive. Another major change over the past few years is the growth of long file names on PC operating systems. Older backup packages could not handle the long file names. Overall we have the situation of proprietary software and general evolution of formats for performance and capacity.

File lengths have increased dramatically over recent years. It was not long ago that PCs had a maximum disk size of 32MB. The limit is now very large and measured in GBs. One problem is similar to the year 2000 problems. File sizes are stored as numbers, and a typical large number on a PC or Unix system is 32 bits long. This number translates to a file size of 4GB or 2GB if signed. When file sizes exceed 4GB, the size is often represented as zero on many systems such as DOS. NT backup does handle this problem, but most versions of Unix CPIO and TAR do not support the long files. As backup software changes to handle the long files, new versions of the software will be required. Whenever possible, MM/PC will support new versions of the software packages.

Data Conversion

Although not part of the document, the reader should be aware that often media and format conversion are only part of the story. In many cases it is essential to also convert the data that has been read. Again eMag has many tools to help with all types of conversion, from simple ASCII / EBCDIC including packed fields, to complex records and databases.

How does one read a tape from anywhere?

The two answers to this question would be to have every tape drive and all backup packages. You also need several versions of DOS, UNIX, Macintosh etc to handle the old backup packages, OR MM/PC.

MediaMerge/PC would be cheaper and more flexible. It is also committed to continual addition of formats and drives.

With MM/PC a tape may be read from any system and the files or selected files loaded onto a PC for further processing. Typical applications can include routine data interchange, data for processing such as micro-fiching, or electronic discovery applications such as auditing and fraud detection. With the addition of the Record Reformatter, many record based files may be processed to produce easily manipulated ASCII files.

Read our newsletter article on Data Interchange.


Submit a request for an eMag rep to call you immediately.

Newsletter Signup
Sign up to our monthly newsletter.
Read latest newsletter.