A Proposed Macintosh VISCII Vietnamese Character Set
by
Hoc D. Ngo
The Vietnamese Standardization Working Group
Email: Viet-Std@Haydn.Stanford.EDU
Date: January 1996
Our development of MacVNkey, a Mac Vietnamese keyboard driver, necessitates
a change of the VISCII 1.1 character set. This article briefly summarizes
VISCII 1.1 in Section A. Section B points out its weakness in the Macintosh
environment and the need for a new version called MacVISCII, which will work
on all major platforms. Section C outlines the strategy for multilingual
support within the framework of 8-bit VISCII. Section D summarizes
recommendations for font vendors to upgrade existing VISCII fonts to support
both VISCII 1.1 and MacVISCII. Section E concludes the article.
A) VISCII 1.1
In the fall of 1992 the Vietnamese Standardization Group (Viet-Std) published
a report which proposed a 8-bit Vietnamese character encoding commonly known
as VISCII (VIetnamese Standard Code for Information Interchange). The full
encoding is given in Table 1, which is also known as VISCII 1.1 standard.
The table shows 134 specific Vietnamese characters in addition to those
already available in the 7-bit ASCII table. Since only 128 characters can
be placed in the upper half plane of the 8-bit code space, the Viet-Std Group
decided to encode the remaining six characters in the control region. They
were chosen to be the upper case Vietnamese characters that are used
"least" frequently.
Table 1: Proposed 8-bit Encoding Standard VISCII 1.1
(8-bit characters are shown in VIQR format)
+======================================================================+
| || 0 : 1 : 2 : 3 : 4 : 5 : 6 : 7 : 8 : 9 : A : B : C : D : E : F |
+======================================================================+
| 0x || NUL:SOH:A(?:ETX:EOT:A(~:A^~:BEL:BS :HT :LF :VT :FF :CR :SO :SI |
|----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
| 1x || DLE:DC1:DC2:DC3:Y? :NAK:SYN:ETB:CAN:Y~ :SUB:ESC:FS :GS :Y. :US |
|----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
| 2x || SP : ! : " : # : $ : % : & : ' : ( : ) : * : + : , : - : . : / |
|----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
| 3x || 0 : 1 : 2 : 3 : 4 : 5 : 6 : 7 : 8 : 9 : : : ; : < : = : > : ? |
|----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
| 4x || @ : A : B : C : D : E : F : G : H : I : J : K : L : M : N : O |
|----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
| 5x || P : Q : R : S : T : U : V : W : X : Y : Z : [ : \ : ] : ^ : _ |
|----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
| 6x || ` : a : b : c : d : e : f : g : h : i : j : k : l : m : n : o |
|----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
| 7x || p : q : r : s : t : u : v : w : x : y : z : { : | : } : ~ :DEL|
|----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
| 8x || A. :A(':A(`:A(.:A^':A^`:A^?:A^.:E~ :E. :E^':E^`:E^?:E^~:E^.:O^'|
|----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
| 9x || O^`:O^?:O^~:O^.:O+.:O+':O+`:O+?:I. :O? :O. :I? :U? :U~ :U. :Y` |
|----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
| Ax || O~ :a(':a(`:a(.:a^':a^`:a^?:a^.:e~ :e. :e^':e^`:e^?:e^~:e^.:o^'|
|----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
| Bx || o^`:o^?:o^~:O+~:O+ :o^.:o+`:o+?:i. :U+.:U+':U+`:U+?:o+ :o+':U+ |
|----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
| Cx || A` :A' :A^ :A~ :A? :A( :a(?:a(~:E` :E' :E^ :E? :I` :I' :I~ :y` |
|----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
| Dx || DD :u+':O` :O' :O^ :a. :y? :u+`:u+?:U` :U' :y~ :y. :Y' :o+~:u+ |
|----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
| Ex || a` :a' :a^ :a~ :a? :a( :u+~:a^~:e` :e' :e^ :e? :i` :i' :i~ :i? |
|----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
| Fx || dd :u+.:o` :o' :o^ :o~ :o? :o. :u. :u` :u' :u~ :u? :y' :o+.:U+~|
+======================================================================+
This encoding has been implemented on Unix and DOS successfully. In Windows
3.1, however, not all Vietnamese characters are renderable because the
operating system deliberately excludes them. Among those that are not
renderable are all control characters and the non-breaking space character.
Specifically, the following Vietnamese characters cannot be rendered in
Windows 3.1:
- O~ (code 0xA0 or 160, Windows non-breaking space)
- A(? (code 0x02, CTRL-B)
- A(~ (code 0x05, CTRL-E)
- A^~ (code 0x06, CTRL-F)
- Y? (code 0x14 or 20, CTRL-T)
- Y~ (code 0x19 or 25, CTRL-Y)
- Y. (code 0x1E or 30, CTRL-^)
The Viet-Std Group resolved this problem by specifying that each Vietnamese
font must be provided in pair: one normal font that complies with Table 1
and one capital font ("Hoa") that is identical to the normal counterpart
except that all lower case characters are replaced with the corresponding
upper case. Whenever an upper case, say Y~, is not rendered, the user will
have to switch to the corresponding capital font and type in the
corresponding lower case letter, which is y~ in the said example. This
method is called "font switching."
B) MacVISCII
The need to change VISCII 1.1 arises in the development of MacVNkey.
On the Macintosh platform the operating system and the Mac Toolbox can
display almost every 8-bit character except for a few that are reserved for
cursor movement. Thus if any character is unrenderable, it is likely that
the application chooses to do so. For example, Page Maker filters out all
characters in the control region, Mac Write does not display the Macintosh
non-breaking space. Because VISCII fonts are always supplied in pairs of
normal and capital fonts, the user can enter all Vietnamese characters by
switching to appropriate fonts. However, a large number of mono-font
applications are widely used in the Macintosh environment such as
TeachText and popular editors that we cannot ignore. After all, it is
irrational to abandon them while the Macintosh operating system has the
potential to support all Vietnamese-specific characters. Most Mac
applications simply filter out or do not display the following VISCII
characters:
- "Y." (code 0x1E) is reserved for cursor up movement, used by
Macintosh Toolbox and most editors.
- "Y?" (code 0x14) is reserved for system character.
- "E^" (code 0xCA) is the Macintosh non-breaking space, unrenderable
in certain applications.
It is unfortunate that the frequently used letter "E^" is safe in Windows
but not safe in Mac applications. If "E^" cannot be displayed on screen,
the user will be surprised when typing a tone-marked E^ such as E^' or E^`
because the intermediate character E^ is not echoed to the screen. MacVNkey
provides an option that converts, on the fly, individual unsafe upper case
letters to the corresponding lower case while remembering them as upper case
for combination purposes. For instance, if the user wants "E^'", he will
see "e^" after typing "E^" and will next see the correct "E^'" after typing
the acute accent.
To avoid rewriting the large installed base of Windows applications, we will
remap only "Y." and "Y?" because control characters are never used as text
in Windows-based applications. Specifically,
- Y? will be moved to 0x17 (CTRL-W)
- Y. will be moved to 0x18 (CTRL-X)
The resulting character set will be hereafter called MacVISCII and is shown
in Table 2. It is our hope that the current VISCII 1.1 standard will be
ultimately superseded by MacVISCII, which will then be renamed VISCII 2.0.
Such a move will result in an enormous advantage that we will have a unique
VISCII standard for information interchange on all popular computing
platforms. Although there is a tradeoff, no Windows users or developers
suffer because the letters affected "Y?" and "Y." cannot be used as text.
However, all DOS- and Unix-based VISCII applications will have to be upgraded
to the new VISCII. Because their number is relatively smaller than that of
Windows, the existing VISCII market is expected to suffer slightly.
Table 2: Proposed 8-bit Encoding Standard for MacVISCII
(8-bit characters are shown in VIQR format)
+======================================================================+
| || 0 : 1 : 2 : 3 : 4 : 5 : 6 : 7 : 8 : 9 : A : B : C : D : E : F |
+======================================================================+
| 0x || NUL:SOH:A(?:ETX:EOT:A(~:A^~:BEL:BS :HT :LF :VT :FF :CR :SO :SI |
|----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
| 1x || DLE:DC1:DC2:DC3:DC4:NAK:SYN:Y? :Y. :Y~ :SUB:ESC:FS :GS :RS :US |
|----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
| 2x || SP : ! : " : # : $ : % : & : ' : ( : ) : * : + : , : - : . : / |
|----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
| 3x || 0 : 1 : 2 : 3 : 4 : 5 : 6 : 7 : 8 : 9 : : : ; : < : = : > : ? |
|----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
| 4x || @ : A : B : C : D : E : F : G : H : I : J : K : L : M : N : O |
|----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
| 5x || P : Q : R : S : T : U : V : W : X : Y : Z : [ : \ : ] : ^ : _ |
|----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
| 6x || ` : a : b : c : d : e : f : g : h : i : j : k : l : m : n : o |
|----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
| 7x || p : q : r : s : t : u : v : w : x : y : z : { : | : } : ~ :DEL|
|----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
| 8x || A. :A(':A(`:A(.:A^':A^`:A^?:A^.:E~ :E. :E^':E^`:E^?:E^~:E^.:O^'|
|----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
| 9x || O^`:O^?:O^~:O^.:O+.:O+':O+`:O+?:I. :O? :O. :I? :U? :U~ :U. :Y` |
|----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
| Ax || O~ :a(':a(`:a(.:a^':a^`:a^?:a^.:e~ :e. :e^':e^`:e^?:e^~:e^.:o^'|
|----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
| Bx || o^`:o^?:o^~:O+~:O+ :o^.:o+`:o+?:i. :U+.:U+':U+`:U+?:o+ :o+':U+ |
|----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
| Cx || A` :A' :A^ :A~ :A? :A( :a(?:a(~:E` :E' :E^ :E? :I` :I' :I~ :y` |
|----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
| Dx || DD :u+':O` :O' :O^ :a. :y? :u+`:u+?:U` :U' :y~ :y. :Y' :o+~:u+ |
|----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
| Ex || a` :a' :a^ :a~ :a? :a( :u+~:a^~:e` :e' :e^ :e? :i` :i' :i~ :i? |
|----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
| Fx || dd :u+.:o` :o' :o^ :o~ :o? :o. :u. :u` :u' :u~ :u? :y' :o+.:U+~|
+======================================================================+
C) Plan for 8-bit Multilingual Support
Roughly one million overseas Vietnamese are living in English-speaking
countries and another million overseas Vietnamese in countries using
Latin-based alphabet. Current VISCII standards aim at supporting
Vietnamese letters only, hence serving the need of domestic users and those
living in English-speaking countries only. VISCII characters are defined
as precomposed 8-bit characters so that they can be used directly in
existing 8-bit applications designed for English or Western Europe. Because
the VISCII table has no empty slots for new characters, it is very difficult
to support foreign characters without introducing further incompatibilities.
Obviously the solution is to use 16-bit Unicode or font switching. But
since we have already written keyboard drivers for all popular platforms
(e.g., DOS, Windows, Mac, X-Windows), we might as well provide some limited
support for 8-bit multilingual plain text.
Currently there is practically no limit in the number of characters that can
be rendered on the screen in Windows, Mac, and X-Windows. On DOS the limit
is 512 characters in text mode for VGA and SuperVGA cards, and 256 for older
cards. Thus one multilingual solution is to provide a variable-length
character code. Our initial goal is to support only two fonts for a
maximum of 512 characters. The first font is VISCII. The second font
should not contain any graphic characters in the control region; it is
hereafter called VMSCII, an acronym for Vietnamese Multilingual Standard
Code for Information Interchange (chu+~ DDa ngu+~ or chu+~ Quo^'c te^').
Our preliminary proposal is to reserve CTRL-U (code 21) as the leading byte
for characters in the second font. In other words, the plain text stream
generally contains one-byte characters and two-byte characters. The leading
byte of a two-byte character is always CTRL-U. Font rendering engines must
render one-byte characters using a VISCII font and two-byte characters using
a corresponding VMSCII font. This leads to a need for a convention in
naming Vietnamese fonts. We suggest that a Vietnamese font be named as
follows:
where the suffix is empty for normal VISCII font, "H" for "Hoa" or capital
VISCII font, and "Q" for "Quo^'c te^'" or VMSCII font. For instance, the
three Vietnamese "University" fonts are
VI University (thu+o+`ng, or normal VISCII font)
VI University H (Hoa, or capital VISCII font)
VI University Q (Quo^'c te^', or multilingual font)
Support for multilingual characters requires more research and development.
Our future tasks will be as follows:
- Define multilingual characters in VMSCII.
- Design VMSCII fonts.
- Upgrade keyboard drivers for DOS, Windows, Mac, Unix, X-Windows.
D) New Versions of VISCII Fonts
In spite of their slight difference, VISCII 1.1 and MacVISCII characters
can coexist in a single font. This implies that fonts of this type should
contain double representations for "Y?" and "Y." to support both VISCII 1.1
and MacVISCII.
In addition, in light of the proposed support for multilingual characters,
it is useful to have a graphic character at CTRL-U, say the "unequal" sign
("=/"). Thus when a user reads or prints a multilingual plain text file
using a tool that does not support VMSCII, the visual appearance of an
unequal sign indicates that the next character is a multilingual character
that should appear in a different font. (The unequal sign is chosen for
this reason.) He then has to use an appropriate VMSCII-compliant tool to
print or read the plain text file. A utility to convert multilingual plain
text files to rich-text-format (RTF) should be provided because the
resulting document can be read by word processors.
In summary, all new versions of VISCII fonts should have the letter "Y?" at
0x14 (to accommodate VISCII 1.1) and 0x17 (to accommodate MacVISCII), the
letter "Y." at 0x18 (MacVISCII) and 0x1E (VISCII 1.1), and the unequal sign
"=/" at 0x15 for future support of multilingual fonts. This means that only
the first two rows of old VISCII fonts need upgrading as shown in Table 3.
Table 3: Recommended control characters in new VISCII fonts
(8-bit characters are shown in VIQR format)
+======================================================================+
| || 0 : 1 : 2 : 3 : 4 : 5 : 6 : 7 : 8 : 9 : A : B : C : D : E : F |
+======================================================================+
| 0x || NUL:SOH:A(?:ETX:EOT:A(~:A^~:BEL:BS :HT :LF :VT :FF :CR :SO :SI |
|----||----:---:---:---:---:---:---:---:---:---:---:---:---:---:---:---|
| 1x || DLE:DC1:DC2:DC3:Y? :=/ :SYN:Y? :Y. :Y~ :SUB:ESC:FS :GS :Y. :US |
+======================================================================+
MacVNkey is compliant with MacVISCII by default; however the user has an
option to switch back to VISCII 1.1. All MacVISCII fonts released with
MacVNkey since December 1995 have the first two rows conform with Table 3.
Upgrading X-Windows-based keyboard drivers (vnterm) and fonts is in
progress. Windows users can continue to use Windows-based fonts. Although
all Vietnamese characters in the control region are not used in Windows, it
is still desirable that new releases of Windows-based fonts conform to
Table 3 to facilitate their conversions to other platforms.
Changing a popular standard with a large installed base of users is
inevitably painful. To minimize disruption as much as possible we have
decided to change only two of the least frequently used upper-case
Vietnamese letters in the control region, thus completely sparing Windows
users. The resulting character set, MacVISCII, can accommodate popular
computing environments such as Mac, Unix, X-Windows, DOS, and Windows. It
is expected to be named VISCII version 2.0 and supersede VISCII 1.1. We
appeal to all Unix and DOS users to support the new version of VISCII for
the sake of portability once and for all.
Updated:
Dec 18, 2014
-- Viet-Std Group
Updated:
Oct 23, 1996
-- Viet-Std Group