shall be defined by the use of the appropriate locking-shift functions."
Kermit programs should "agree otherwise" that the default G0 character set is
the US-ASCII/ISO-646-IRV (International Reference Version) 7-bit character
set; thus international transfer syntax can be identical to Normal Kermit
transfer syntax when transferring 7-bit text files.  There are no defaults for
G1, G2, or G3, in the interest of fairness to all countries and peoples.

When the text contains characters outside the ASCII range, an escape sequence
from Table 5 must be issued, designating the alphabet to which they belong
(using the identification letters shown in Table 5) to the desired
intermediate character set G0, G1, G2, or G3.  This sequence must be given
before the first occurrence of a character in that alphabet.  If no such
sequence is given, then all characters are treated as ASCII data, including
<ESC>, the shift characters, and bytes with their 8th bits set to one.  In
other words, the file transfer behaves in the normal Kermit fashion for text
files.

Since ISO 8859 character sets are subject to revision from time to time, an
alphabet selector may be preceded by <ESC>&F, where F is the revision number
(@ = 1, A = 2, B = 3, etc).  For example, <ESC>&@<ESC>-A means Latin Alphabet
Number One, Revision One.  (This information is from ISO 2022 6.3.13.)

ISO 2022 escape sequences are inserted into the data, and are
indistinguishable by the Kermit packet encoder/decoder from the data itself.
Therefore these escape sequences may be broken across packets, just as any
other data may be.

UNKNOWN ALPHABETS

It is not required that the sender preannounce all of a file's character sets
prior to transfer.  Suppose a file contains a mixture of alphabets, some known
to the receiver, others not.  At some point, an alphabet designator arrives
which the receiving Kermit does not recognize.  Should the receiving Kermit
cancel the file transfer, or accept the unknown code?  A new command is
provided to let the user control what happens in this situation:

  SET UNKNOWN-ALPHABET {KEEP, CANCEL}.

If the user elects CANCEL, then the receiver will behave as if the user
had manually cancelled the file, i.e. it will put the character "X" in the
data field of its next acknowledgement, and the sender (assuming it supports
this feature) will stop sending the file.

If the user elects KEEP, the file will be accepted in its entirety.  But the
unknown code should be marked in case the user wants to fix it afterwards.  To
do this, receiving program accepts the designator for the unknown alphabet and
stores it in the file as data, with subsequent characters stored untranslated.
When the unknown character set is shifted out of (or the end of file arrives),
the receiving Kermit stores the ISO-2022 Coding Method Delimiter, <ESC>d, and
resumes translation.  If the unknown alphabet is shifted back into, the
designating escape sequence is stored again, and the process resumes.  Unknown
alphabets may be nested in this manner.

The default behavior should be "KEEP".  This command should also be effective
at Level 1, where it would simply prevent the receiving Kermit from refusing
a file on the basis of the character set used to transfer it.

LOCAL FILE REPRESENTATION

This proposal assumes nothing about the representation of the file on the
local storage medium.  It may be ASCII, EBCDIC, a proprietary word processor
format, IBM code page, or anything else.  It is an implementation "detail" for
Kermit programmer to convert between the local file representation for
multi-alphabet text files, and Kermit's file transfer syntax.

In some cases, the file itself (or its directory entry) might contain the
necessary identifying information, in which case the sending Kermit program
can automatically emit the appropriate escape sequences during file transfer.
In others, the user will have to tell the sending program how the file is
encoded.  The suggested command is:

  SET FILE TYPE <xxx>

where <xxx> specifies how the file is (or when receiving, is to be) encoded on
disk.  This will necessarily be highly dependent on the system's conventions,
or the conventions of the applications to be used with the file (e.g. a
multi-language word processing program).  Possibilities for <xxx> might
include application names like WORDPERFECT, XYWRITE, NOTA-BENE, MACWRITE,
ALEPH-BET, PC-HANGUL.

BREAKING THE RULES

If the local file is not encoded according to ISO 2022 rules, it may contain
<ESC>, <SO>, and <SI> characters.  It is up to the Kermit program to know
what these characters mean in the context of the file's format, and to either
strip them from the file or translate them to something else.  The ISO 2022
rules forbid the use of these characters as data to be transferred.

If a file is to be transferred using international syntax, and it contains
any of the characters significant to this syntax, namely <ESC>, <SI>, <SO>,
<SS2>, or <SS3>, then such characters should be prefixed during transmission
with Datalink Escape, <DLE>, C0 character 01/00 (Control-P).  Furthermore,
if <DLE> itself occurs in the data, it should also be prefixed with <DLE>.

LEVEL-2 PERFORMANCE

Kermit programs may use the full range of ISO 2022 code extension techniques,
including use of G0, G1, G2, and G3 in both the 7-bit and 8-bit environments,
with both single-byte and multibyte character sets.  In the general case, G0
will be used for ASCII and English, G1 for the "native language" of the local
country or region, G2 for a third language, and G3 for a fourth.  Additional
character sets may be swapped in and out of G2 and G3 as required.

Transmission of 8-bit data in the 7-bit environment is accomplished by Kermit
using 8th-bit prefixing, which is an optional feature of the Kermit protocol.
However, most popular implementations of Kermit do include this feature.  If a
Kermit program cannot do 8th-bit prefixing, then it must operate in the ISO
2022 7-bit environment, shifting GL among the intermediate graphics sets
G0-G3.

If the Kermit program can do 8th-bit prefixing, the choice of the ISO 2022
7-bit or 8-bit environment is entirely independent of the communication
channel.  Selection of the ISO 2022 7-bit or 8-bit environment should be made
on other grounds, such as transmission efficiency or program simplicity.  For
example, if the ISO 2022 8-bit environment is used on a 7-bit channel, then
Kermit will have to do 8th-bit prefixing.

On a 7-bit communication channel, the best choice of ISO 7-bit or 8-bit
environment depends on the nature of the data to be transferred.  If there is
little or no 8-bit data (as in English text), it doesn't matter.  If there is
frequent shifting between 7-bit and 8-bit characters (as in French or
Portuguese), then single shifts would tend to be more efficient than locking
shifts, and Kermit's 8th-bit prefixing is equivalent to a single shift.
Therefore, use the ISO 8-bit environment and let Kermit do the prefixing.  If
there are along strings of 8-bit characters, as in "right-sided" languages
like Russian, Greek, Arabic, and Hebrew, then locking shifts are more
efficient -- use the ISO 7-bit environment.

In Japan, many computer systems use at least three character sets, Roman
(close to ASCII), Katakana (a 1-byte code), and Kanji (a 2-byte code).  Kanji
is specified in JIS X 0208, which also includes Roman, Hiragana, Katakana, and
some other character sets, but these are double width and not normally used.
Roman characters are usually taken from the left half of JIS X 0201, and
Katakana from the right half.  Japanese text frequently shifts between Roman,
Kana, and Kanji, and therefore requires three active character sets, for
example G0 (Roman), G1 (Kana), and G2 or G3 (Kanji).  In the 8-bit
environment, data transfer can be quite efficient: locking shifts are used to
shift GL between Roman and Kana, and any bytes with the 8th bit set to one
automatically invoke Kanji in GR as a multi-byte character set.  In the 7-bit
environment, locking shifts would also be used to select Kanji.  Note that
locking shifts are more efficient in this case than Kermit 8th-bit prefixing
because Kanji characters consist of more than one byte, and tend to occur in
runs.  For Japanese, therefore, it is better to use the ISO 7-bit environment
on a 7-bit communication channel.

The situation is summarized in Table 4.

_____________________________________________________________________________

                            ISO 2022 Environment
                     7-bit                       8-bit
       +------------------------------+-----------------------------+
       | Recommended for right-       | Recommended for 2-sided     |
 7-bit | sided languages like Greek,  | languages like French,      |
  data | Russian, Arabic, Hebrew.     | German, etc.  Use Kermit's  |
  path | Use ISO 2022 locking shifts. | 8th-bit prefix for special  |
       | Also for Japanese.           | characters.                 |
       +------------------------------+-----------------------------+
       | No reason to use ISO 7-bit   | Clear transmission of 8-bit |
 8-bit | environment on a clear 8-bit | characters.  Use for both   |
  data | communication channel.       | left- and right-sided       |
  path | OK for 7-bit ASCII, though.  | languages.                  |
       |                              |                             |
       +------------------------------+-----------------------------+

          Table 4: Selecting ISO 7- vs 8-Bit Environment
_____________________________________________________________________________

The user should have control over whether the ISO-2022 7-bit or 8-bit
environment is used.  To allow this, the command SET TRANSFER-SYNTAX
INTERNATIONAL may be extended as follows:

  SET TRANSFER-SYNTAX INTERNATIONAL [ {7, 8} ]

which means that an optional final field may be included to specify the
7- or 8-bit ISO-2022 environment.  The default should be 8, since it is the
most efficient method in most cases.

If Kermit -- at all levels -- offered locking shifts in addition to single
shifts, then international syntax could always proceed in the 8-bit
environment, and this would simplify implementation considerably.  A proposal
on locking shifts for Kermit is forthcoming.

FILE TRANSFER SYNTAX EXAMPLES

A simple 7-bit ASCII text file can be transmitted in the normal Kermit manner
for text files, without any escapes or shifts, even in ISO 2022 mode.  The
"encoding" file attribute, if used with international transfer syntax, could
be "*#IAJ2"I2" (encoding = international with GL = G0, ISO 2022 7-bit
environment, character set = ASCII).  Or it could be simply "*!A" (ASCII).

A text file containing characters from a language or languages covered by a
single alphabet other than ASCII can be transferred exactly like an ASCII text
file, except that the attribute, if used, would denote the character set, e.g.
"*!C2$I100" for Latin-1.  In the 7-bit environment, international syntax can
be used to cut down on Kermit's 8th-bit prefixing overhead, in which case the
attributes might look like "*#IBJ2$144", and any strings of GR characters would
be preceded by LS1 and transmitted with their high-order bits set to zero.

A multi-character-set text file will require an escape sequence to identify
each alphabet.  The attribute packet would show international encoding,
optionally including the ISO 2022 facilities announcers, and the character
sets, as in "*#ICK2)I100,I144".

In the 7-bit environment, <SO> and <SI> are used to shift between the G0 and
G1 sets.  In the absence of any specific designators, the G0 set is presumed
to be ASCII.  Example:

  A dangerous German word is "gef<ESC>-A<SO>d<SI>hrlich".

In this case, the only extended character is the umlaut-a in "gefaehrlich"
(where ae is a way of writing umlaut-a without an umlaut).  <ESC>-A designates
Latin-1 into G1, <SO> shifts GL out to G1, "d" is the left-half equivalent
of umlaut-a, and <SI> shifts GL back in to G0.

For clarity and consistency with the ISO-2022 recommendations, it is
recommended that the text begin with explicit character set designations, and
then explicitly shift into the G0 set, rather than defaulting to it:

  <ESC>(B<ESC>-A<SI>A dangerous German word is "gef<SO>d<SI>hrlich".

A text file containing characters from multiple ISO 8859 alphabets requires an
designation sequence for each alphabet.  In the 7-bit environment, SO and SI
can be used to shift between G0 and G1 of the current alphabet, and <ESC>(B
can be used to select G0 of any of the alphabets, since these are all the
same.  For example, the following text contains the same word in English,
French, and Russian:

  <ESC>-A<SI>Disappointed, d<SO>ig<SI>u, <ESC>-L<SO>`PW^gP`^RP]]kY<SI>.

The first escape sequence assigns Latin Alphabet No. 1 to G1, and the
subsequent <SO> and <SI> shifts apply to its G0 and G1 set, which is used to
form the English and French words.  The second escape sequence assigns the
Latin/Cyrillic 96-character set to G1, and the subsequent shifts apply to this
new set.

Another 7-bit example, in which the same word is repeated in English,
Russian, and German, shows how a locking shift remains in effect when the
alphabet is changed.  We begin in Latin/Cyrillic, start with an English word
from G0, shift to G1 for the Russian word, and while still in G1 switch to
Latin Alphabet No. 1 for German to get the umlaut-A at the beginning of
Aenderung (where Ae = umlaut-uppercase-A), and shift back to G0 for the rest
of the word:

  <ESC>-LAlteration <SO>_U`UTU[ZP <ESC>-AD<SI>nderung.

Some rules and hints to remember:

1. In the 8-bit communication environment, always use 8-bit character
   transmission -- it's more efficient.

2. There can be no more than four character sets designated at one time.
   Generally designate ASCII to G0, the most frequently-used non-ASCII set
   to G2, less frequently used sets to G3 and G1.  If a file has more than 
   four sets, swap the least frequently used sets in and out of G3 and G1.

3. Single shifts can only be used with G2 and G3.  This is why G2 and G3
   are preferred to G1.

4. Only two character sets can be invoked at once in the 8-bit communication
   environment, and only one in the 7-bit environment.

TERMINAL EMULATION

While not part of the Kermit file transfer protocol, terminal emulation is a
feature of many Kermit programs.  It is hoped that these terminal emulators
will evolve along the lines of the ISO standards mentioned above.  In some
cases, this is already a fact, insofar as DEC VT300 series terminals already
follow these standards and Kermit programs are beginning to emulate these
terminals.

In this regard, it is important to note that not all languages are written
from left to right, top to bottom.  Hebrew and Arabic are two examples of
right-to-left languages, and Japanese and Chinese may be written top to
bottom.  The order of the text characters on disk or on the transmission line
do not necessarily reflect their order on the screen or the printed page.

Kermit should be as easy to use as possible, but should still give the user
the ability to specify exactly what character codes are in use for both
terminal emulation and file transfer.  There should also be a consistent set
of commands for all Kermit programs.

SPECIAL EFFECTS

Today, most multi-alphabet files are produced by proprietary text processing
programs.  These programs have many functions besides switching among
alphabets.  They may also endow text with special attributes such as boldface,
italic, underline, super- or subscript, color, etc, and render characters in a
variety of type styles and sizes.  Each text processing program may have its
own unique formats and conventions.

These special effects are not addressed by this proposal.  Nevertheless, it is
likely that a multi-alphabet file produced by a text processing program also
contains special effects.  In order for a Kermit program to send a
multi-alphabet file, it must have detailed knowledge of the file's format and
coding conventions.  Therefore, the Kermit program should be able to strip out
the special effects, and send only the text.  Otherwise the result would be
meaningless when received on an unlike system or for use with a different
application.  (When transferring such files between like systems or compatible
applications, Kermit binary mode transfers will suffice.)

At some future time, it might be possible to adapt one of the popular document
description languages to the Kermit protocol, so that Kermit will be able to
transfer formatted documents between unlike systems and applications.
Presently, there are many competing would-be standards including IBM DCA and
DIA, DEC DDIF, US Navy DIF, Postscript.  There are also two ISO standards
emerging in this area: Standard Generalized Markup Language (ISO 8879, 9069,
and 9573), and Office Document Architecture (ISO 8613).  This is an area for
further study.


APPENDIX A: STANDARDS

ANSI X3.4 (1986), "Coded Character Sets - 7-bit American Standard Code for
  Information Interchange" (US ASCII), is the 7-bit code currently used by
  Kermit for transferring text files. 

ISO 646 (1983) (= ECMA-6), "Information Processing - ISO 7-bit Coded Character 
  Sets for Information Interchange", gives us a 7-bit character set equivalent
  to ASCII with provision for substituting "national characters" in selected
  positions.

ISO 4873 (1986) (= ECMA-43), "Information Processing - ISO 8-bit Code for
  Information Interchange - Structure and Rules for Implementation", defines
  8-bit character sets, their graphic and control regions, and how to extend
  an 8-bit character set by using multiple intermediate graphics sets.

ISO 2022 (1986) (= ECMA-35), "Information Processing - ISO 7-bit and 8-bit
  Coded Character Sets - Code Extension Techniques", describes how to use
  8-bit character sets in both 7-bit and 8-bit environments, and how to switch
  among different character sets and alphabets.

ISO International Register of Coded Character Sets to be Used with Escape
  Sequences.  This is the source of the ISO registration numbers.

ISO 2375 (1985) "Data Processing - Procedure for Registration of Escape
  Sequences".  The procedure by which a character set gets into the above
  register and has a registration number and designating escape sequence
  assigned to it.

JIS X 0202, "Code Extension Techniques for Use the the Code for Information
  Interchange", the Japanese counterpart of ISO 2022.

ANSI X3.41-1974, "Code Extension Techniques for Use with the 7-Bit Coded
  Character Set of the American National Standard Code for Information
  Interchange", describes 7- and 8-bit codes and extension techniques in
  approximately the same manner as ISO 4873 and ISO 2022.

ISO 8859 (1987-present) (see Table 5 for ECMA equivalents), "Information
  Processing - 8-Bit Single-Byte Coded Graphic Character Sets", defines the
  actual 8-bit character sets to be used for many of the world's languages.
  The left half of each of these is the same as ASCII and ISO 646.  Each
  character, including those with diacritics, is represented by a single byte.

ISO is the Internation Standardization Organization, ANSI is the American
National Standards Institute, ECMA is the European Computer Manufacturers
Association.  JIS means Japan Industrial Standard.

The ISO/ECMA standards discussed in this proposal may be obtained free of
charge in their ECMA form by writing to:

  ECMA Headquarters
  Rue du Rhone 114
  CH-1204 Geneva
  SWITZERLAND

Be sure to specify the title and the ECMA number of each standard requested.
ISO standards can also be ordered from the UN bookstore, but not for free:

  CCITT
  United Nations Bookstore
  United Nations Building
  New York, NY  10017

ANSI standards may be ordered, for a fee, from: