A Proposed Macintosh VISCII Vietnamese Character Set

by

Hoc D. Ngo
The Vietnamese Standardization Working Group
Email: Viet-Std@Haydn.Stanford.EDU
Date: January 1996



Our development of MacVNkey, a Mac Vietnamese keyboard driver, necessitates a change of the VISCII 1.1 character set. This article briefly summarizes VISCII 1.1 in Section A. Section B points out its weakness in the Macintosh environment and the need for a new version called MacVISCII, which will work on all major platforms. Section C outlines the strategy for multilingual support within the framework of 8-bit VISCII. Section D summarizes recommendations for font vendors to upgrade existing VISCII fonts to support both VISCII 1.1 and MacVISCII. Section E concludes the article.

A) VISCII 1.1

In the fall of 1992 the Vietnamese Standardization Group (Viet-Std) published a report which proposed a 8-bit Vietnamese character encoding commonly known as VISCII (VIetnamese Standard Code for Information Interchange). The full encoding is given in Table 1, which is also known as VISCII 1.1 standard. The table shows 134 specific Vietnamese characters in addition to those already available in the 7-bit ASCII table. Since only 128 characters can be placed in the upper half plane of the 8-bit code space, the Viet-Std Group decided to encode the remaining six characters in the control region. They were chosen to be the upper case Vietnamese characters that are used "least" frequently.
 

    +======================================================================+
    |    ||  0 : 1 : 2 : 3 : 4 : 5 : 6 : 7 : 8 : 9 : A : B : C : D : E : F |
    +======================================================================+
    | 0x || NUL:SOH:A(?:ETX:EOT:A(~:A^~:BEL:BS :HT :LF :VT :FF :CR :SO :SI |
    |----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
    | 1x || DLE:DC1:DC2:DC3:Y? :NAK:SYN:ETB:CAN:Y~ :SUB:ESC:FS :GS :Y. :US |
    |----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
    | 2x || SP : ! : " : # : $ : % : & : ' : ( : ) : * : + : , : - : . : / |
    |----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
    | 3x ||  0 : 1 : 2 : 3 : 4 : 5 : 6 : 7 : 8 : 9 : : : ; : < : = : > : ? |
    |----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
    | 4x ||  @ : A : B : C : D : E : F : G : H : I : J : K : L : M : N : O |
    |----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
    | 5x ||  P : Q : R : S : T : U : V : W : X : Y : Z : [ : \ : ] : ^ : _ |
    |----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
    | 6x ||  ` : a : b : c : d : e : f : g : h : i : j : k : l : m : n : o |
    |----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
    | 7x ||  p : q : r : s : t : u : v : w : x : y : z : { : | : } : ~ :DEL|
    |----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
    | 8x || A. :A(':A(`:A(.:A^':A^`:A^?:A^.:E~ :E. :E^':E^`:E^?:E^~:E^.:O^'|
    |----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
    | 9x || O^`:O^?:O^~:O^.:O+.:O+':O+`:O+?:I. :O? :O. :I? :U? :U~ :U. :Y` |
    |----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
    | Ax || O~ :a(':a(`:a(.:a^':a^`:a^?:a^.:e~ :e. :e^':e^`:e^?:e^~:e^.:o^'|
    |----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
    | Bx || o^`:o^?:o^~:O+~:O+ :o^.:o+`:o+?:i. :U+.:U+':U+`:U+?:o+ :o+':U+ |
    |----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
    | Cx || A` :A' :A^ :A~ :A? :A( :a(?:a(~:E` :E' :E^ :E? :I` :I' :I~ :y` |
    |----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
    | Dx || DD :u+':O` :O' :O^ :a. :y? :u+`:u+?:U` :U' :y~ :y. :Y' :o+~:u+ |
    |----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
    | Ex || a` :a' :a^ :a~ :a? :a( :u+~:a^~:e` :e' :e^ :e? :i` :i' :i~ :i? |
    |----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
    | Fx || dd :u+.:o` :o' :o^ :o~ :o? :o. :u. :u` :u' :u~ :u? :y' :o+.:U+~|
    +======================================================================+

This encoding has been implemented on Unix and DOS successfully. In Windows 3.1, however, not all Vietnamese characters are renderable because the operating system deliberately excludes them. Among those that are not renderable are all control characters and the non-breaking space character. Specifically, the following Vietnamese characters cannot be rendered in Windows 3.1: The Viet-Std Group resolved this problem by specifying that each Vietnamese font must be provided in pair: one normal font that complies with Table 1 and one capital font ("Hoa") that is identical to the normal counterpart except that all lower case characters are replaced with the corresponding upper case. Whenever an upper case, say Y~, is not rendered, the user will have to switch to the corresponding capital font and type in the corresponding lower case letter, which is y~ in the said example. This method is called "font switching."

B) MacVISCII

The need to change VISCII 1.1 arises in the development of MacVNkey. On the Macintosh platform the operating system and the Mac Toolbox can display almost every 8-bit character except for a few that are reserved for cursor movement. Thus if any character is unrenderable, it is likely that the application chooses to do so. For example, Page Maker filters out all characters in the control region, Mac Write does not display the Macintosh non-breaking space. Because VISCII fonts are always supplied in pairs of normal and capital fonts, the user can enter all Vietnamese characters by switching to appropriate fonts. However, a large number of mono-font applications are widely used in the Macintosh environment such as TeachText and popular editors that we cannot ignore. After all, it is irrational to abandon them while the Macintosh operating system has the potential to support all Vietnamese-specific characters. Most Mac applications simply filter out or do not display the following VISCII characters: It is unfortunate that the frequently used letter "E^" is safe in Windows but not safe in Mac applications. If "E^" cannot be displayed on screen, the user will be surprised when typing a tone-marked E^ such as E^' or E^` because the intermediate character E^ is not echoed to the screen. MacVNkey provides an option that converts, on the fly, individual unsafe upper case letters to the corresponding lower case while remembering them as upper case for combination purposes. For instance, if the user wants "E^'", he will see "e^" after typing "E^" and will next see the correct "E^'" after typing the acute accent. To avoid rewriting the large installed base of Windows applications, we will remap only "Y." and "Y?" because control characters are never used as text in Windows-based applications. Specifically, The resulting character set will be hereafter called MacVISCII and is shown in Table 2. It is our hope that the current VISCII 1.1 standard will be ultimately superseded by MacVISCII, which will then be renamed VISCII 2.0. Such a move will result in an enormous advantage that we will have a unique VISCII standard for information interchange on all popular computing platforms. Although there is a tradeoff, no Windows users or developers suffer because the letters affected "Y?" and "Y." cannot be used as text. However, all DOS- and Unix-based VISCII applications will have to be upgraded to the new VISCII. Because their number is relatively smaller than that of Windows, the existing VISCII market is expected to suffer slightly.
 

    +======================================================================+
    |    ||  0 : 1 : 2 : 3 : 4 : 5 : 6 : 7 : 8 : 9 : A : B : C : D : E : F |
    +======================================================================+
    | 0x || NUL:SOH:A(?:ETX:EOT:A(~:A^~:BEL:BS :HT :LF :VT :FF :CR :SO :SI |
    |----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
    | 1x || DLE:DC1:DC2:DC3:DC4:NAK:SYN:Y? :Y. :Y~ :SUB:ESC:FS :GS :RS :US |
    |----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
    | 2x || SP : ! : " : # : $ : % : & : ' : ( : ) : * : + : , : - : . : / |
    |----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
    | 3x ||  0 : 1 : 2 : 3 : 4 : 5 : 6 : 7 : 8 : 9 : : : ; : < : = : > : ? |
    |----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
    | 4x ||  @ : A : B : C : D : E : F : G : H : I : J : K : L : M : N : O |
    |----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
    | 5x ||  P : Q : R : S : T : U : V : W : X : Y : Z : [ : \ : ] : ^ : _ |
    |----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
    | 6x ||  ` : a : b : c : d : e : f : g : h : i : j : k : l : m : n : o |
    |----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
    | 7x ||  p : q : r : s : t : u : v : w : x : y : z : { : | : } : ~ :DEL|
    |----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
    | 8x || A. :A(':A(`:A(.:A^':A^`:A^?:A^.:E~ :E. :E^':E^`:E^?:E^~:E^.:O^'|
    |----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
    | 9x || O^`:O^?:O^~:O^.:O+.:O+':O+`:O+?:I. :O? :O. :I? :U? :U~ :U. :Y` |
    |----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
    | Ax || O~ :a(':a(`:a(.:a^':a^`:a^?:a^.:e~ :e. :e^':e^`:e^?:e^~:e^.:o^'|
    |----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
    | Bx || o^`:o^?:o^~:O+~:O+ :o^.:o+`:o+?:i. :U+.:U+':U+`:U+?:o+ :o+':U+ |
    |----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
    | Cx || A` :A' :A^ :A~ :A? :A( :a(?:a(~:E` :E' :E^ :E? :I` :I' :I~ :y` |
    |----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
    | Dx || DD :u+':O` :O' :O^ :a. :y? :u+`:u+?:U` :U' :y~ :y. :Y' :o+~:u+ |
    |----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
    | Ex || a` :a' :a^ :a~ :a? :a( :u+~:a^~:e` :e' :e^ :e? :i` :i' :i~ :i? |
    |----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
    | Fx || dd :u+.:o` :o' :o^ :o~ :o? :o. :u. :u` :u' :u~ :u? :y' :o+.:U+~|
    +======================================================================+

C) Plan for 8-bit Multilingual Support

Roughly one million overseas Vietnamese are living in English-speaking countries and another million overseas Vietnamese in countries using Latin-based alphabet. Current VISCII standards aim at supporting Vietnamese letters only, hence serving the need of domestic users and those living in English-speaking countries only. VISCII characters are defined as precomposed 8-bit characters so that they can be used directly in existing 8-bit applications designed for English or Western Europe. Because the VISCII table has no empty slots for new characters, it is very difficult to support foreign characters without introducing further incompatibilities. Obviously the solution is to use 16-bit Unicode or font switching. But since we have already written keyboard drivers for all popular platforms (e.g., DOS, Windows, Mac, X-Windows), we might as well provide some limited support for 8-bit multilingual plain text. Currently there is practically no limit in the number of characters that can be rendered on the screen in Windows, Mac, and X-Windows. On DOS the limit is 512 characters in text mode for VGA and SuperVGA cards, and 256 for older cards. Thus one multilingual solution is to provide a variable-length character code. Our initial goal is to support only two fonts for a maximum of 512 characters. The first font is VISCII. The second font should not contain any graphic characters in the control region; it is hereafter called VMSCII, an acronym for Vietnamese Multilingual Standard Code for Information Interchange (chu+~ DDa ngu+~ or chu+~ Quo^'c te^'). Our preliminary proposal is to reserve CTRL-U (code 21) as the leading byte for characters in the second font. In other words, the plain text stream generally contains one-byte characters and two-byte characters. The leading byte of a two-byte character is always CTRL-U. Font rendering engines must render one-byte characters using a VISCII font and two-byte characters using a corresponding VMSCII font. This leads to a need for a convention in naming Vietnamese fonts. We suggest that a Vietnamese font be named as follows: where the suffix is empty for normal VISCII font, "H" for "Hoa" or capital VISCII font, and "Q" for "Quo^'c te^'" or VMSCII font. For instance, the three Vietnamese "University" fonts are Support for multilingual characters requires more research and development. Our future tasks will be as follows:
  1. Define multilingual characters in VMSCII.
  2. Design VMSCII fonts.
  3. Upgrade keyboard drivers for DOS, Windows, Mac, Unix, X-Windows.

D) New Versions of VISCII Fonts

In spite of their slight difference, VISCII 1.1 and MacVISCII characters can coexist in a single font. This implies that fonts of this type should contain double representations for "Y?" and "Y." to support both VISCII 1.1 and MacVISCII. In addition, in light of the proposed support for multilingual characters, it is useful to have a graphic character at CTRL-U, say the "unequal" sign ("=/"). Thus when a user reads or prints a multilingual plain text file using a tool that does not support VMSCII, the visual appearance of an unequal sign indicates that the next character is a multilingual character that should appear in a different font. (The unequal sign is chosen for this reason.) He then has to use an appropriate VMSCII-compliant tool to print or read the plain text file. A utility to convert multilingual plain text files to rich-text-format (RTF) should be provided because the resulting document can be read by word processors. In summary, all new versions of VISCII fonts should have the letter "Y?" at 0x14 (to accommodate VISCII 1.1) and 0x17 (to accommodate MacVISCII), the letter "Y." at 0x18 (MacVISCII) and 0x1E (VISCII 1.1), and the unequal sign "=/" at 0x15 for future support of multilingual fonts. This means that only the first two rows of old VISCII fonts need upgrading as shown in Table 3.

    +======================================================================+
    |    ||  0 : 1 : 2 : 3 : 4 : 5 : 6 : 7 : 8 : 9 : A : B : C : D : E : F |
    +======================================================================+
    | 0x || NUL:SOH:A(?:ETX:EOT:A(~:A^~:BEL:BS :HT :LF :VT :FF :CR :SO :SI |
    |----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
    | 1x || DLE:DC1:DC2:DC3:Y? :=/ :SYN:Y? :Y. :Y~ :SUB:ESC:FS :GS :Y. :US |
    +======================================================================+

MacVNkey is compliant with MacVISCII by default; however the user has an option to switch back to VISCII 1.1. All MacVISCII fonts released with MacVNkey since December 1995 have the first two rows conform with Table 3. Upgrading X-Windows-based keyboard drivers (vnterm) and fonts is in progress. Windows users can continue to use Windows-based fonts. Although all Vietnamese characters in the control region are not used in Windows, it is still desirable that new releases of Windows-based fonts conform to Table 3 to facilitate their conversions to other platforms.

E. Conclusion

Changing a popular standard with a large installed base of users is inevitably painful. To minimize disruption as much as possible we have decided to change only two of the least frequently used upper-case Vietnamese letters in the control region, thus completely sparing Windows users. The resulting character set, MacVISCII, can accommodate popular computing environments such as Mac, Unix, X-Windows, DOS, and Windows. It is expected to be named VISCII version 2.0 and supersede VISCII 1.1. We appeal to all Unix and DOS users to support the new version of VISCII for the sake of portability once and for all.


Updated: Dec 18, 2014 -- Viet-Std Group
Updated: Oct 23, 1996 -- Viet-Std Group