Network Working Group Vietnamese Standardization Working GroupRequest for Comments: 1456 May 1993
Conventions for Encoding the Vietnamese Language
VISCII: VIetnamese Standard Code for Information Interchange
VIQR: VIetnamese Quoted-Readable Specification
Revision 1.1
Status of this Memo
This memo provides information for the Internet community. It does not specify an Internet standard. Distribution of this memo is unlimited.
Abstract
This document provides information to the Internet community on the currently used conventions for encoding Vietnamese characters into 7-bit US ASCII and in an 8-bit form. These conventions are widely used by the overseas Vietnamese who are on the Internet and are active in USENET. This document only provides information and specifies no level of standard.
1. INTRODUCTION
In this paper we describe two conventions for representing Vietnamese
characters. VISCII (pronounced "visky") is an 8-bit character
encoding that is similar to that used with ISO-8859. VIQR
(pronounced "vicker") is a mnemonic encoding of Vietnamese characters
into US ASCII for use on 7-bit systems. There is substantial
existing online freely distributable software that implements these
conventions for UNIX and personal computers. These encodings enable
Vietnamese-language users to take full advantage of powerful tools
already developed for the English-speaking world, eliminating
unnecessary reinvention. This paper describes these conventions in
part so that MIME-compliant software might also support the
Vietnamese language.
NOTE: The accented Vietnamese letters are herein represented by their
VIQR equivalents, offset by enclosing angle brackets. For example,
the single letter "a acute" is written as , where the apostrophe
is the mnemonic symbol for the acute.
2. LINGUISTIC OVERVIEW
Note that one can resort to a composite encoding scheme to reduce
this requirement, but that would mean giving up on integration into
today's computing platforms which for the most part do not support
such schemes. In addition, the heavy use of diacritical marks in
Vietnamese text calls for a keyboard input scheme that does not
require extra keystrokes such as a special "compose" key to generate
accented letters. Because of the large number of possible
combinations, the scheme should also be easily learned and memorized.
Finally, to integrate Vietnamese into current electronic mail systems
which are still limited to 7 bits, there should be a representation
for Vietnamese text that is readily readable in its 7-bit form.
The Viet-Std group, an electronic standardization roundtable, has
worked over the past few years to draft proposals addressing these
issues. This has culminated in the conventions to be described
briefly in the next two sections. The detailed technical
considerations have been reported elsewhere [2]. In this memo we
give a brief outline of the working standards and describe supporting
software availability.
3. SPECIFICATION OF VISCII
The 8-bit VISCII encoding is shown below. Because of the limitations
of the 7-bit US ASCII character set, here we use the mnemonic form to
represent Vietnamese glyphs. See the VIQR specification below for
clarification of how diacritical marks are applied. The online
PostScript version of reference [2] may also be useful as it does
display each character correctly.
Table 1. VISCII 8-bit Encoding Table (v1.1)
*=======================================================================*
| | 0x 1x 2x 3x 4x 5x 6x 7x | 8x 9x Ax Bx Cx Dx Ex Fx |
|====|==================================================================|
| x0 | nul dle sp 0 @ P ` p | A. O^` O~ o^` A` DD a` dd |
| x1 | soh dc1 ! 1 A Q a q | A(' O^? a(' o^? A' u+' a' u+. |
| x2 | A(? dc2 " 2 B R b r | A(` O^~ a(` o^~ A^ O` a^ o` |
| x3 | etx dc3 # 3 C S c s | A(. O^. a(. O+~ A~ O' a~ o' |
| x4 | eot Y? $ 4 D T d t | A^' O+. a^' O+ A? O^ a? o^ |
| x5 | A(~ nak % 5 E U e u | A^` O+' a^` o^. A( a. a( o~ |
| x6 | A^~ syn & 6 F V f v | A^? O+` a^? o+` a(? y? u+~ o? |
| x7 | bel etb ' 7 G W g w | A^. O+? a^. o+? a(~ u+` a^~ o. |
| x8 | bs can ( 8 H X h x | E~ I. e~ i. E` u+? e` u. |
| x9 | ht Y~ ) 9 I Y i y | E. O? e. U+. E' U` e' u` |
| xA | lf sub * : J Z j z | E^' O. e^' U+' E^ U' e^ u' |
| xB | vt esc + ; K [ k { | E^` I? e^` U+` E? y~ e? u~ |
| xC | ff fs , < L \ l | | E^? U? e^? U+? I` y. i` u? |
| xD | cr gs - = M ] m } | E^~ U~ e^~ o+ I' Y' i' y' |
| xE | so Y. . > N ^ n ~ | E^. U. e^. o+' I~ o+~ i~ o+. |
| xF | si us / ? O _ o DEL| O^' Y` o^' U+ y` u+ i? U+~ |
*=======================================================================*
4. SPECIFICATION OF VIQR MNEMONICS
Table 2. VIQR Mnemonics for Vietnamese Diacritics
*=====================================================*
| Diacritic | Char | ASCII Code | D<a^'>u |
|=====================================================|
| breve | ( | 0x28, left paren | tr<a(>ng |
| circumflex | ^ | 0x5E, caret | m<u~> |
| horn | + | 0x2B, plus sign | m<o'>c |
|-------------+------+--------------------+-----------|
| acute | ' | 0x27, apostrophe | s<a('>c |
| grave | ` | 0x60, backquote | huy<e^`>n |
| hook above | ? | 0x3F, question | h<o?>i |
| tilde | ~ | 0x7E, tilde | ng<a~> |
| dot below | . | 0x2E, period | n<a(.>ng |
|-------------+------+--------------------+-----------|
| d bar | dd | (repeated d) | <dd> |
| D bar | DD | (repeated D) | <DD> |
*=====================================================*
5. SUPPORTING SOFTWARE
6. MIME CONSIDERATIONS
7. SECURITY CONSIDERATIONS
[2] Viet-Std, "A Unified Framework for Vietnamese Information
Processing-v1.1," published on the Internet, available for FTP
from Sonygate.Sony.COM:tin/viet-std, September 1992.
For more information, please contact the authors at:
viet-std@haydn.stanford.edu
Updated as of November 1996
As a romanized language, Vietnamese appears to lend itself readily to
integration into existing English-based systems. To cite a simple
example, consider implementing support for French in such systems.
One can allocate code positions in the 8-bit space necessary for
accented letters such as
VISCII stands for VIetnamese Standard Code for Information
Interchange, an 8-bit encoding specification. Its salient features
are:
VIQR, VIetnamese Quoted-Readable specification, is not an encoding
convention but is rather a convention for typing, reading, and
transferring Vietnamese data using only the 7-bit ASCII character
set. With VIQR, accented Vietnamese letters are represented by the
vowel followed by ASCII characters whose appearances resemble those
of the corresponding Vietnamese diacritical marks. For example, the
phrase "N
Because of its mnemonic nature, the VIQR typing method is easy to
learn and remember. In pure 8-bit environments, special-purpose
software developers may wish to devise more efficient input schemes,
but the intent is for all Vietnamese keyboard software to support the
basic VIQR method to minimize learning time for Vietnamese who will
already be familiar with the mnemonic method described here.
VISCII & VIQR have been successfully implemented on various
platforms. The work has been carried out primarily by the TriChlor
software group, a non-profit spin-off from Viet-Std. Software by
other individuals and groups have also been developed. In addition,
commercial software entities have indicated that they would support
the standards in the form of VISCII-compliant keyboards and fonts.
The current software selection from the TriChlor group enables users
to use Vietnamese on existing Unix, MS-DOS, and Windows systems,
including such operations as Vietnamese file naming, Vietnamese
keyboarding within any application, electronic mail and news filters
for Unix, printing to various printer languages, incorporating
Vietnamese in such document preparation systems as TeX, Word for
Windows, WordPerfect, using Vietnamese in databases (e.g., Paradox)
and spreadsheets (e.g., SC on Unix or Excel in Windows).
Vietnamese-specific applications are also available and include a
large song lyric database, several poetry collections in hypertext
format, a Windows-based fortune teller, a text-based multiple-choice
test program in Vietnamese, etc. In short, software exists that
supports thorough integration of Vietnamese into existing platforms,
allowing Vietnamese users to take advantage of all the powerful tools
already available in English-only environments.
Translation between 8-bit VISCII 1.1 and other character sets,
particularly ISO-10646/Unicode 1.1, has been included in the Plan 9
operating systems' tcs utility that has been made available by Andrew
Hume of AT&T Bell Laboratories.
For use with MIME-compliant software, the value "VISCII" has been
registered as a charset with the Internet Assigned Numbers Authority
for the VISCII encoding convention described above, and the value
"VIQR" has been registered with the Internet Assigned Numbers
Authority as a charset for the VIQR mnemonic encoding convention
described above. Implementation of support for these two MIME
character set types is not mandatory to comply with RFC-1341. If the
encoding conventions described above are used in MIME email or news,
the appropriate MIME character set type value should be used to label
the body-part containing such text.
Security issues are not discussed in this memo.
REFERENCES
[1] International Organization for Standardization. ISO 8859/x: 8-
bit International Code Sets. ISO, 1977.
AUTHORS' ADDRESSES
Cuong T. Nguyen
Center for Integrated Systems
CIS 062--MC 4070
Stanford, CA 94305-4070
Phone: (415) 725-3721
Email: cuong@haydn.Stanford.EDU
Hoc D. Ngo
Vista Research, Inc.
100 View St, Suite 200
P.O. Box 998
Mountain View, CA 94042
Phone: (415) 966-1194 x311
Email: ngo@nas.nasa.gov
Cuong M. Bui
National Semiconductor Corp.
3388 Burgundy Dr.
San Jose, CA 95132
Phone: (408) 721-6873
Email: bui@berlioz.nsc.com
Thanh van Nguyen
Roche Image Analysis Systems
95 First Str Suite 110
Los Altos, CA 94022
Phone: (415) 917-2022
Fax: (415) 917-2025
Email: thanh@rias.com
REFERENCES
[2]
Viet-Std, "A Unified Framework for Vietnamese Information
Processing-v1.1," published on the Internet, September 1992.
Available for FTP from ftp.mit.media.edu:/pub/Vietnet/Viet-std
or haydn.stanford.edu:/VN/viet-std.
Click here for download information.
AUTHORS' ADDRESSES
Cuong T. Nguyen
Electrical & Electronic Engineering
University of Science & Technology
Clear Water Bay, Kowloon, Hong Kong
Phone: +852 2358-7066
Fax: +852 2335-0194
Email: eenguyen@ee.ust.hk