Hyperstage Binary Format

With Hyperstage binary format load, individual rows are not separated by any special characters. There are also no value delimiters or qualifiers.

The structure of binary data files is as follows:

Data is stored contiguously: <row_size><nulls><data_col_1>...<data_col_n> and then the next data rows, without any line separator.

<row_size>

2-byte short integer indicating total number of bytes in this row (including all header bytes),

nulls> -

Binary map of null values, every byte reflecting to eight consecutive columns. Bit 0 means a normal value, bit 1 means null value. The length of <nulls> section is floor((number_of_columns+7)/8). For example, minimal number of bytes to cover the number of columns (one bit per column).

<data_col_1>

Data itself, depending on column type.

Floating point values are stored here as 8-byte values.

Most numerical values (integers, dates) are stored as 4-byte integers.

Fixed size texts (for example, CHAR(n)) are stored on the fixed number of n bytes.

Other text types (for example, VARCHAR(n)) have their length stored on the first two bytes, followed by the text.

For example, we have two floating point columns. In this case, the binary file will look like the following:

11, 0, 0, a1, a2, a3, a4, a5, a6, a7, a8, b1, b2, b3, b4, b5, b6, b7, b8

where (11, 0) is the 2-byte (HEX) representation of the record length after the first 0, the second 0 is null map (no nulls in this case), (a1a2a3a4a5a6a7a8) is an 8-byte representation of the first double and (b1b2b3b4b5b6b7b8) is an 8-byte representation of the second double. If the file contained 1000 rows, it will have a length of 19000 bytes.

The following schema illustrates the format of one row in the BINARY format.

Every row starts with L (2-byte integer) that specifies number of following bytes of data. Null indicators are an array of bits (one bit per each column). 1 on m-th bit means that the m-th value in the row is NULL.

The number of columns in a record determines the numbers of bytes in NULL indicators. For example, for a record that contains from one to eight column indicator bits are stored on one byte. If a record contains from nine to 16 columns, two bytes are used, and so on.

NULL indicators array is followed by N values where N is a number of columns in a row.

Formats and Lengths in Bytes for Particular Data Types

Data Type

Format

Length in Bytes

TINYINT

 

1

SMALLINT

 

2

MEDIUMINT

 

3

INTEGER

 

4

BIGINT

 

8

FLOAT

IEEE 4-byte Float

4

DOUBLE

IEEE 8-byte Double

8

DECIMAL (N,M)

(Actual value) * 10^M

N Length in Bytes

[1,2]      1
[3,4]      2
[5,9]      4
[10,18]    8

TIME

[sign] [h] hh:mm:ss

8-10

YEAR

2-byte integer

2

DATE

4-byte integer

yyyymmdd

where:

yyyy

Is the year (1900).

4

TIMESTAMP / DATETIME

yyyy-mm-dd hh:mm:ss

19

CHAR (N)

N characters

N

VARCHAR (N)

2-byte integer of value L followed by L characters.

2+L

BINARY (N)

N bytes

N

VARBINARY (N)

2-byte integer of value L followed by L bytes.

2+L

Note that CHAR is constant sized, whereas VARCHAR occupies only the size needed for the actual value. Integer and floating-point data are stored as a natural binary representation of these values (little endian).


WebFOCUS