Selecting, Reformatting, and Manipulating Characters

In character semantics mode, selection tests against a mask are automatically adjusted to work with characters rather than bytes. Formats assigned by reformatting a field in a request or by defining a temporary field are interpreted in terms of characters. Character functions interpret all lengths in terms of characters.

In character semantics mode, format A5 is interpreted as five characters (up to 15 bytes on ASCII platforms, up to 20 bytes on EBCDIC platforms), and the comparison is performed based on this number of bytes. In byte semantics mode, format A5 is interpreted as five bytes, and the comparison is performed based on five bytes. In either case, the correct characters are compared and extracted.

In character semantics mode, format A10 is interpreted as 10 characters (up to 30 bytes), meaning that up to 30 bytes must be retrieved when this field is referenced. In byte semantics mode, format A10 means that 10 bytes will be retrieved. In either case, the field displays as 10 characters that take up 10 spaces on the report output.

Reference: Character Functions That Support Character Semantics

In character semantics mode, all character manipulation functions interpret lengths in terms of characters. The following functions operate on alphanumeric strings in character semantics mode when Unicode is configured:

String manipulation and extraction functions.
GETTOK, OVRLAY, PARAG, REVERSE, SQUEEZ, STRIP, SUBSTR, SUBSTV, TRIM, TRIMV
Justification functions.
CTRFLD, LJUST, RJUST
Length and position functions.
ARGLEN, LENV, POSIT, POSITV
Format conversion functions.
EDIT
Decoding, comparison, and editing functions.
CHKFMT, EDIT, DECODE, SOUNDEX
String replacement functions.
CTRAN, HEXBYT, BYTVAL (see notes below), STRREP
Case translation functions.
LCWORD, LOCASE, LOCASV, UPCASE, UPCASV

Note: The HEXBYT, BYTVAL, and CTRAN functions have been extended to handle multibyte characters in Unicode configurations. These functions use or produce numeric values to represent characters. In Unicode configurations, they use or produce values in the range:

0 to 255 for 1-byte characters
256 to 65535 for 2-byte characters
65536 to 16777215 for 3-byte characters
16777216 to 4294967295 for 4-byte characters (primarily for EBCDIC)

To find the numeric value corresponding to a given character, find its hexadecimal code and convert to decimal with a hex calculator such as the Windows XP Calculator program. (Make sure to use the UTF-8 or UTF-EBCDIC code, not the Unicode code point, which would be the UTF-16 value.)

For example, assume you would like to create a variable of format A1 containing the euro sign. The euro sign in UTF-8 is, in hex, E282AC. Converting this to decimal gives 14849492. Thus, the proper DEFINE or COMPUTE would be:

EUROSIGN/A1 = HEXBYT(14849492, 'A1');

If you are creating a FOCEXEC with a UTF-8 compliant editor, you can also get the value of the euro sign in this way:

EUROVAL/I8 = BYTVAL('€', 'I8');

The CTRAN function replaces all occurrences of a character in a string with another character, given the decimal values that represent the hexadecimal codes for the two characters. Traditionally, this technique was used to replace characters that were difficult to input directly. Decimal values of characters can be complicated to determine. Therefore, if you want to replace characters or character strings that you can input directly using a UTF-compliant text editor, Information Builders recommends that you use the STRREP string replacement function.

The following translates all of the euro signs in a 40-character UTF-8 field to pound sterling signs (£ = C2A3 or 49827):

NEWFLD/A40 = CTRAN(40, OLDFLD, EUROVAL, 49827, 'A40');