Selecting, Reformatting, and Manipulating Characters

Reference:

In character semantics mode, selection tests against a mask are automatically adjusted to work with characters rather than bytes. Formats assigned by reformatting a field in a request or by defining a temporary field are interpreted in terms of characters. Character functions interpret all lengths in terms of characters.


Top of page

Example: Defining a Virtual Field

Consider the following DEFINE in the Master File for the EMPLOYEE data source:

DEFINE FIRST_ABBREV/A5 WITH FIRST_NAME = EDIT(FIRST_NAME, '99999$$$$$');$

In character semantics mode, format A5 is interpreted as five characters (up to 15 bytes on ASCII platforms, up to 20 bytes on EBCDIC platforms), and the comparison is performed based on this number of bytes. In byte semantics mode, format A5 is interpreted as five bytes, and the comparison is performed based on five bytes. In either case, the correct characters are compared and extracted.


Top of page

Example: Reformatting a Field

Consider the following PRINT command:

PRINT FIELD1/A10

In character semantics mode, format A10 is interpreted as 10 characters (up to 30 bytes), meaning that up to 30 bytes must be retrieved when this field is referenced. In byte semantics mode, format A10 means that 10 bytes will be retrieved. In either case, the field displays as 10 characters that take up 10 spaces on the report output.


Top of page

x
Reference: Character Functions That Support Character Semantics

In character semantics mode, all character manipulation functions interpret lengths in terms of characters. The following functions operate on alphanumeric strings in character semantics mode when Unicode is configured:

Note: The HEXBYT, BYTVAL, and CTRAN functions have been extended to handle multibyte characters in Unicode configurations. These functions use or produce numeric values to represent characters. In Unicode configurations, they use or produce values in the range:

To find the numeric value corresponding to a given character, find its hexadecimal code and convert to decimal with a hex calculator such as the Windows XP Calculator program. (Make sure to use the UTF-8 or UTF-EBCDIC code, not the Unicode code point, which would be the UTF-16 value.)

For example, assume you would like to create a variable of format A1 containing the euro sign. The euro sign in UTF-8 is, in hex, E282AC. Converting this to decimal gives 14849492. Thus, the proper DEFINE or COMPUTE would be:

EUROSIGN/A1 = HEXBYT(14849492, 'A1');

If you are creating a FOCEXEC with a UTF-8 compliant editor, you can also get the value of the euro sign in this way:

EUROVAL/I8 = BYTVAL('€', 'I8');

The CTRAN function replaces all occurrences of a character in a string with another character, given the decimal values that represent the hexadecimal codes for the two characters. Traditionally, this technique was used to replace characters that were difficult to input directly. Decimal values of characters can be complicated to determine. Therefore, if you want to replace characters or character strings that you can input directly using a UTF-compliant text editor, Information Builders recommends that you use the STRREP string replacement function.

The following translates all of the euro signs in a 40-character UTF-8 field to pound sterling signs (£ = C2A3 or 49827):

NEWFLD/A40 = CTRAN(40, OLDFLD, EUROVAL, 49827, 'A40');

iWay Software