Data Restoration, Recovery, Culling and Processing Products

Record Reformatter (RR32)

A tool to process EBCDIC or ASCII files and typically produce fixed length data

The Record Reformatter (RR32) is used primarily to restructure records for movement between different systems or to produce simple reports based on complex file structures. Restructuring is often used to expand packed numeric fields such as COMP-3 or binary/floating point numbers, to handle floating point numbers from a VAX system, and to read IBM-type records on your PC.

The RR32 can also be used to extract fields from a database for indexing or summary applications, or to perform an intelligent EBCDIC/ASCII conversion on data with a mixture of text and binary fields.

The RR32 has many built-in tools for analyzing unknown records and field structures, for calculating record lengths using data patterns, and for showing the data in several formats so that the user can select a sensible conversion.

Where Cobol data descriptions are available in a file form, the file can be read and the record structure determined automatically. Similarly, AS400 savelib tapes can be read with MediaMerge/PC to produce a Record Reformatter file description.

The data conversion routines and the general restructuring of records can assist in converting data files in applications requiring updates due to any left over year 2000 (Y2K) problem.

The RR32 can be used as a stand-alone tool, in conjunction with InterMedia for Windows or MM/PC, thus providing a very powerful method of handling many IBM type tapes and records.

About Records

Before details of how to use RR32 are explained, a few notes on record structure and definitions are required. As with all applications, there will be exceptions that cannot be handled. Records can be fixed length or variable length.

Fixed length records

A fixed length record always has the same length, but a file may be made up of many different types of record which may each have different lengths.

The RR32 analyze function attempts to determine the record length automatically even for records without carriage returns. The user can override any automatic record length determination.

A record is defined by unique codes in a fixed location irrespective of the record length. For example, the first two characters may be a two-digit number that determines the type of record. Up to 20 different types of record may be defined each with different record lengths and field structures. RR32 will operate with fixed-length records where some records are truncated (e.g., by a CRLF) indicating that the rest of the record is just spaces.

Fields are fixed lengths and locations within a record. Different record types will have different fields and field positions.

Before any work may be performed on records, it is essential to define the following points

Types of input records:

  • Record identifier code, position and length
  • Each record length
  • Field structure for each record type
  • Length of output record it translates to
  • Optional record terminator for truncated records

Output records are defined by:

  • Length
  • Field locations
  • Types of conversion between input an output fields
  • Value of filler for unmapped fields

All these values are entered via the RR32 Editor. In analysis mode, field types can be automatically determined. Due to the nature of text-based records, this initial analysis is not always correct, so full editing of the first approximation may be archived for further analysis. Alternatively the structure may be entered manually, from a Cobol Data definition file, or  DBase file.

Variable Length Records

With fixed length records, each field and record has a pre-determined length. With variable length records, fields and records are marked by end characters. A typical marking would be fields are separated by a comma, and records by a CR. Thus two records could like something like this:

field1,field2,A longer field,last field(CR)

fa,fb,fc,final field in this record(CR)

To define such a file structure, it is necessary to assign values for field and record delimiters. With RR32, both of these delimiters may be one or two characters long.

With variable length fields, only a single record structure can be handled.

Automatic Analysis Mode

A very powerful feature of RR32 is the ability to analyze records and create an outline input field definition automatically. Although this is not a complete substitute for documentation describing a record, it can be extremely useful in analyzing an otherwise unknown record.

RR32 analysis will try to determine both field breaks and field types. This includes text (ASCII or EBCDIC) and packed fields. The generated routine may then be edited as required. This can include adding fields, deleting, concatenating, and sorting field definitions.

Output creation and testing

Once an input record is defined, it typically is converted into a quote comma quote delimited record, for example. The output can be automatically generated based on the input field locations and descriptions. Several typical output types may be created such as comma delimited and Quote comma Quote. If an input field is described as a 'Strip field' it is not included in the output record.

The output may then be edited as required so that lengths can be controlled or modified. Fixed data fields may also be added.

Once created, the output is tested, and the first 128K of the input sample file is converted and displayed on a split screen of both input and output file using the InterMedia File Viewer.

Specification of Features

Maximum number of record types 20
Maximum number of field definitions 3000
Maximum file length Only limited by disk space
Record Analysis   First 50 records
Setup wizard Yes
Win95/98 Yes
Win NT 4.0, 2000 Yes
Win 3.x No

Field Conversions

Each field can be converted by the following commands. The position of each output field is entirely dependent on the user, and fields can be omitted or included multiple times as required.

Copy

Copy is probably the most commonly used conversion rule. It very simply copies the input field to the output field. If the input and output fields are different lengths, either end of the field will be truncated.

Copy Reverse

Copy Reverse copies the field as described above, and then reverses it. Thus a field such as "Hello 1234" will become "4321 olleH".

The reversing works on the final length of the output field, so if padding was required, the padding would end up at the start of the field.

For fields of 1,2,4 bytes in length, this operation is identical to Swap 8, Swap 16 and Swap 32.

Copy Pascal

Copy Pascal copies a Pascal text string. A Pascal string starts with a byte giving the length followed by the string. The length byte is stripped when copying

ASCII to EBCDIC

As in Copy but converts an ASCII input field to EBCDIC

EBCDIC to ASCII

As in copy but converts an EBCDIC field to ASCII

FILL ASCII

This fills the output field with the ASCII string defined in the parameters. Any codes in <> are treated as hex values. An example string is 12<09>Hello. It may typically be used for inserting tags between fields, or even a ',' to make a record ',' delimited

FILL HEX

FILL HEX is the same as Fill ASCII but allows the user to insert non-printing characters. For example, to insert a CRLF, the output string 0D0A >would be used.

FILL HEX can also be used to insert EBCDIC characters.

ASCII-PACKED

This converts an ASCII number to a packed fieldby using nibbles (4 bits) to represent the number. Thus, the number '1234' would be 31 32 33 34 in hex, and the hex would be converted to 01 23 4C, where C represents +. A D would represent -, and F, unsigned. This method of storing numbers effectively compresses the space required by a factor of 2, and is common within many IBM based record structures. It is also known as IBM COMP-3.

EBCDIC-PACKED

This converts an EBCDIC number to a packed field. (See ASCII-PACKED).

Packed to ASCII and Packed to EBCDIC

Packed fields are a very common occurrence in many IBM records. The numbers may be signed as above or unsigned, in which case a series of hex characters 12 34 56 would represent the decimal number 123456. There is a feature where packed numbers may be decoded even when placed on 4 bit (nibble) boundaries, rather than byte boundaries.

Convert Date

The convert date operator converts a date field to a DDMMYYYY date format. The actual output date format is selected on the configuration screen of the routine. The type of date conversion is dependent on the combo box at the right of the line.

Conversion options are:

  • Date 7-4-5. This relates to the bits of a 16-digit number, where the most significant bits represent the year, from 1900 to 2027. The next 4 digits are the month, 1-12, and the final 5 bits, the day 1-31.
  • DATE YYMMDD  This inserts system date as YYYY/MM/DD
  • DATE MMDDYY  This inserts system date as MM/DD/YYYY
  • DATE DDMMYY This inserts system date as DD/MM/YYYY
  • Julian Date, IBM date from about 4700BC!
  • Many more date formats are added as found

TIME

Inserts system time in output string, as HH:MM:SS

SWAP 32

Swaps 4 byte arrays. This can be useful to convert numbers from Little Endolian to Big Endolian.

SWAP 16

Swaps two characters from the input string. For example:

Input = InterMedia
output = nIetMrdeai

SWAP 8

Swaps two nibbles from an input byte. For example, 0D(hex) would be converted to D0(hex).

Record Count

This inserts the current record count

+/- Number HiLo

+/- Number LoHi

VAX Float

This converts the input binary number to an ASCII string. The output buffer is right justified, and if not large enough the most significant digits will be truncated. If the value is negative, a '-' sign will be added. This conversion feature can be extremely important when trying to import binary files into a text file format.

The range of numbers is:

Digits Output buffer size

Input Digit Output buffer size
1   4
2 6
3 8
4 11
8 38 max
10 200 (not yet implemented)
   

For VAX floating point numbers there are 4 defined lengths,

  • F-Float 4 bytes
  • D-Float 8 Bytes
  • G-Float 8 Bytes (Not implemented yet)
  • H-Float 16 Bytes (Not implemented yet)

All VAX numbers are signed, and the ordering is fixed

The size of the input number will be taken from the input field definition and may be 1-4 bytes in length. The ordering of the number will be high byte first for HiLo and low byte first for LoHi.

For floating point numbers (8 characters in length) the output is almost unlimited. If the output is longer than the field allowed for, the number will be displayed in scientific notation (e.g. 2.63E5). If the output from a floating point number contains invalid characters, this is most likely for one of two reasons:

  • It is not a floating point number
  • The order should be swapped, ie HiLo, or LoHi
  • The 6-digit character is a special floating point implementation. It is not known how standard this is. The 6-byte array is: Byte 1 mantissa, Byte 2-6 Exponent in Lo-Hi ordering
  • The exponent is in the range of 0.5 - 1.5
  • The mantissa is a multiple of 2
  • Number HiLo
  • Number LoHi

This is as above, but the number is not signed, and so the output buffer can be one character shorter

Cobol Num

This rule will convert signed strings from Cobol systems. The string includes its sign as part of the last digit. The output may also be formatted with the same commands as described below in Formatting Numeric fields.

Formatting Numeric fields

Numeric fields have an extra edit field at the right of the screen. This is to allow for formatting of the output. By using this the number of decimal places may be determined and leading zeros displayed or suppressed.

The options are extremely variable and the command line is in the structure below

  • #,4.2

where the symbols are as below

  • If a , or $ sign is shown this means that the money sign is added in front of the number.
  • If # is set then leading zeros are displayed
  • If a , is in the line, then significant numbers are broken down into groups of 3, separated by a comma, such as 1,234,567.12
  • The final number is the number of significant and decimal places. Thus 3.2 would be 3 leading digits and 2 decimal places.
  • If the field is left blank, then no numeric formatting will take place.
  • If the output field is too short, then significant digits are truncated.

Some examples:

  • 3.4 100.1234
  • #6.2 0000012.99
  • #,6.2 123,456.55
  • ,8.2 1,435.55
  • 5.0 43241

TransTab (InterMedia for Windows only)

The TransTab option is the same of 'Copy Field', but an IMW translation table may be applied. The translation table is any IMW table and it performs a complete string and byte translation. Thus the output string may be longer than the input string. If too long for the output field, it will be truncated (from the right).

Typical applications could be case mapping (make all lower case, or all upper case) or handling accented characters, or different EBCDIC conversions.

There is a limitation of a maximum number of 8 different translation tables definable within a single Record Reformatter table. There is no limit on the number of fields that may be converted

IB Field

This will insert and IntelliBase / IntelliBase 95 field marker. The parameter should be a number between 1 and 9999. The output data is always a 4-digit number, preceded by a 0EH, and followed by a 0FH. The length should always be 6.