UTF-8 as told by Steal Pike

Steal Pike explains how Ken
Thompson
invented UTF-8 in a single evening and the way in which they
collectively constructed the principle machine-huge implementation in not as a lot as a week.

Field: UTF-8 history
From: "Steal 'Commander' Pike" 
Date: Wed, 30 Apr 2003 22: 32: 32 -0700 (Thu 06: 32 BST)
To: mkuhn (at) acm.org, henry (at) spsystems.get
Cc: ken (at) entrisphere.com

Looking around at some UTF-8 background, I take a look at up on the an identical unsuitable
legend being repeated over and over.  The unsuitable version is:
    1. IBM designed UTF-8.
    2. Belief 9 implemented it.
That will not be correct.  UTF-8 changed into once designed, in entrance of my eyes, on a
placemat in a Fresh Jersey diner one night in September or so 1992.

What took put changed into once this.  We had extinct the distinctive UTF from ISO 10646
to create Belief 9 strengthen 16-bit characters, but we hated it.  We had been
shut to shipping the machine when, gradual one afternoon, I obtained a
name from some people, I judge at IBM - I endure in suggestions them being in Austin
- who had been in an X/Originate committee assembly.  They wished Ken and me to
vet their FSS/UTF form.  We understood why they had been introducing a
unique form, and Ken and I all correct away realized there changed into once a chance
to make exercise of our ride to form a terribly correct same old and collect the
X/Originate guys to push it out.  We advised this and the deal changed into once, if we
would possibly perhaps well perhaps perhaps attain it swiftly, OK.  So we went to dinner, Ken figured out the
bit-packing, and when we came abet to the lab after dinner we known as
the X/Originate guys and defined our blueprint.  We mailed them an outline
of our spec, and in addition they spoke back asserting that it changed into once greater than theirs (I
produce not reflect I ever in actual fact saw their proposal; I know I produce not
endure in suggestions it) and the way in which swiftly would possibly perhaps well perhaps perhaps we implement it?  I judge this changed into once a
Wednesday night and we promised a total operating machine by Monday,
which I judge changed into once when their huge vote changed into once.

So as that night Ken wrote packing and unpacking code and I started
tearing into the C and graphics libraries.  The subsequent day all of the code
changed into once accomplished and we began converting the text data on the machine
itself.  By Friday a while Belief 9 changed into once operating, and fully operating,
what would be known as UTF-8.  We known as X/Originate and the relief, as they
bellow, is a miniature rewritten history.

Why didn't we correct exercise their FSS/UTF?  As I endure in suggestions, it changed into once because
in that first cell telephone name I sang out a checklist of desiderata for this kind of
encoding, and FSS/UTF changed into once lacking as a minimal one - the capacity to
synchronize a byte stream picked up mid-bustle, with much less that one
character being consumed earlier than synchronization.  Becuase that changed into once
lacking, we felt free - and got freedom - to roll our have.

I judge the "IBM designed it, Belief 9 implemented it" legend originates
in RFC2279.  On the time, we had been so overjoyed UTF-8 changed into once catching on we
didn't bellow the relaxation referring to the bungled history.  Neither of us is at
the Labs from now on, but I bet there would possibly perhaps be an email thread within the archive
there that would strengthen our legend and I could perhaps well perhaps perhaps be in a plan to gather someone
to dig it out.

So, full kudos to the X/Originate and IBM people for making the choice
happen and for pushing it forward, but Ken designed it with me
cheering him on, no topic the history books bellow.

-snatch

Date: Sat, 07 Jun 2003 18: 44: 05 -0700
From: "Steal `Commander' Pike" 
To: Markus Kuhn 
cc: [email protected], [email protected],
   Greger Leijonhufvud 
Field: Re: UTF-8 history

I requested Russ Cox to dig during the archives. I in actual fact have hooked up his message.
I judge you'll agree it supports the legend I sent earlier. The mail we
sent to X/Originate (I reflect Ken did the making improvements to and mailing of that legend)
involves a unique desideratum #6 about discovering character boundaries.
We are going to by no methodology know how noteworthy the distinctive X/Originate proposal influenced us;
the two proposals are very diversified but attain fraction some characteristics.
I produce not endure in suggestions having a glimpse at it intimately, nonetheless it changed into once a actually very lengthy time within the past.
I very clearly endure in suggestions Ken writing on the placemat and wished we had
saved it!

-snatch

From: Russ Cox 
To: [email protected]
Field: utf digging
Date-Sent: Saturday, June 07, 2003 7: 46 PM -0400


bootes's /sys/src/libc/port/rune.c changed from the
division-heavy worn utf on sep 4 1992.
the version that made it into the dump
is dated 19: 51: 55.  it changed into once commented
the next day but otherwise remained unchanged
except nov 14 1996, when runelen changed into once accelerated by
inspecting the rune explicitly as a replace of
utilizing runetochar's return tag.  would possibly perhaps well perhaps perhaps 26 2001
changed into once the next and closing alternate, to add runenlen.

here are some mails out of your mail boxes
that flip up by grepping for utf.  the principle
refers to utf.c, which is a replica of a wctomb and mbtowc
that handle the total 6-byte utf-8 encoding of 32-bit runes.
it is comparatively grotesque, with all of the logic in again watch over float.
i steal it changed into the code within the proposal
as a outcomes of that first mail.

in /usr/ken/utf/xutf i learned a replica of what
looks to be the distinctive not-self-synchronizing
encoding proposal, with the utf-8 blueprint tacked
onto the cease (beginning at "We outline 7 byte kinds").
that's additionally below.  the version below is the principle,
dated sep 2 23: 44: 10.  it went through comparatively a couple of
edits to turn into the second mail below by the
morning of Sep 8.

the mail log exhibits that second mail going out
as successfully as taking a while to attain abet to ken.

helix: Sep  8 03: 22: 13: ken: upas/sendmail: remote inet!xopen.co.uk!xojig 
>From ken Tue Sep  8 03: 22: 07 EDT 1992 ([email protected]) 6833
helix: Sep  8 03: 22: 13: ken: upas/sendmail: delivered snatch From ken Tue Sep 
8 03: 22: 07 EDT 1992 6833
helix: Sep  8 03: 22: 16: ken: upas/sendmail: remote pyxis!andrew From ken 
Tue Sep  8 03: 22: 07 EDT 1992 (andrew) 6833
helix: Sep  8 03: 22: 19: ken: upas/sendmail: remote coma!dmr From ken Tue 
Sep  8 03: 22: 07 EDT 1992 (dmr) 6833
helix: Sep  8 03: 25: 52: ken: upas/sendmail: delivered snatch From ken Tue Sep 
8 03: 24: 58 EDT 1992 141
helix: Sep  8 03: 36: 13: ken: upas/sendmail: delivered ken From ken Tue Sep 
8 03: 36: 12 EDT 1992 6833

revel in.



>From ken Fri Sep  4 03: 37: 39 EDT 1992
you would favor to glimpse at
    /usr/ken/utf/utf.c
and take a look at up on must you would create it prettier.

>From ken Tue Sep  8 03: 22: 07 EDT 1992
Right here is our modified FSS-UTF proposal.  The phrases are the an identical as on
the earlier proposal.  My apologies to the author.  The code has been
tested to some stage and desires to be fairly correct form.  We have now
transformed Belief 9 to make exercise of this encoding and are about to state a
distribution to an preliminary put of abode of college customers.

File Machine Obtain Current Persona Hassle Transformation Format (FSS-UTF)
--------------------------------------------------------------------------

With the approval of ISO/IEC 10646 (Unicode) as an global
same old and the anticipated huge unfold exercise of this popular coded
character put of abode (UCS), it is mandatory for historically ASCII based fully mostly
working systems to devise ways to take care of illustration and
handling of the substantial quantity of characters which would perhaps perhaps perhaps perhaps be imaginable to be
encoded by this unique same old.

There are quite so a lot of challenges presented by UCS which wants to be handled
by historic working systems and the C-language programming
ambiance.  The most mandatory of these challenges is the encoding
blueprint extinct by UCS. More precisely, the difficulty is the marrying of
the united statessame old with fresh programming languages and fresh
working systems and utilities.

The challenges of the programming languages and the united statessame old are
being handled by other actions within the exchange.  Then again, we are
unruffled faced with the handling of UCS by historic working systems
and utilities.  Prominent among the many working machine UCS handling
concerns is the illustration of the records at some level of the file machine.  An
underlying assumption is that there would possibly perhaps be an absolute requirement to
preserve the fresh working machine instrument funding while at
the an identical time taking profit of the exercise the substantial quantity of
characters equipped by the UCS.

UCS offers the aptitude to encode multi-lingual text within a
single coded character put of abode.  Then again, UCS and its UTF variant attain not
defend null bytes and/or the ASCII slash ("/") making these character
encodings incompatible with fresh Unix implementations.  The
following proposal offers a Unix successfully matched transformation structure of
UCS such that Unix systems can strengthen multi-lingual text in a single
encoding.  This transformation structure encoding is intended to be extinct
as a file code.  This transformation structure encoding of UCS is
intended as an intermediate step in direction of full UCS strengthen.  Then again,
since shut to all Unix implementations face the an identical obstacles in
supporting UCS, this proposal is intended to present a popular and
successfully matched encoding at some level of this transition stage.


Aim/Fair
--------------

With the realization that nearly all, if not all, of the issues surrounding
the handling and storing of UCS in historic working machine file
systems are understood, the target is to outline a UCS
transformation structure which additionally meets the requirement of being usable
on a historic working machine file machine in a non-disruptive
formulation.  The intent is that UCS will seemingly be the assignment code for the
transformation structure, which is usable as a file code.

Requirements for the Transformation Format
--------------------------------------

Below are the pointers that had been extinct in defining the UCS
transformation structure:

    1) Compatibility with historic file systems:

    Ancient file systems disallow the null byte and the ASCII
    slash character as a phase of the file title.

    2) Compatibility with fresh programs:

    The fresh model for multibyte processing is that ASCII does
    not happen anywhere in a multibyte encoding.  There wants to be
    no ASCII code values for any phase of a transformation structure
    illustration of a personality that changed into once not within the ASCII
    character put of abode within the united statesrepresentation of the character.

    3) Ease of conversion from/to UCS.

    4) The first byte would possibly perhaps well perhaps perhaps unruffled imprint the quantity of bytes to
    alter to in a multibyte sequence.

    5) The transformation structure would possibly perhaps well perhaps perhaps unruffled not be extravagant in
    terms of quantity of bytes extinct for encoding.

    6) It wants to be imaginable to secure the originate up of a personality
    efficiently beginning from an arbitrary set in a byte
    stream.


Proposed FSS-UTF
----------------

The proposed UCS transformation structure encodes UCS values within the vary
[0,0x7fffffff] utilizing multibyte characters of lengths 1, 2, 3, 4, 5,
and 6 bytes.  For all encodings of more than one byte, the preliminary
byte determines the quantity of bytes extinct and the excessive-grunt bit in
each and each byte is put of abode.  Every byte that doesn't originate up 10xxxxxx is the
originate up of a UCS character sequence.

An straight forward formulation to endure in suggestions this transformation structure is to illustrate that the
quantity of excessive-grunt 1's within the principle byte signifies the quantity of
bytes within the multibyte character:

   Bits  Hex Min  Hex Max  Byte Sequence in Binary
1    7  00000000 0000007f 0vvvvvvv
2   11  00000080 000007FF 110vvvvv 10vvvvvv
3   16  00000800 0000FFFF 1110vvvv 10vvvvvv 10vvvvvv
4   21  00010000 001FFFFF 11110vvv 10vvvvvv 10vvvvvv 10vvvvvv
5   26  00200000 03FFFFFF 111110vv 10vvvvvv 10vvvvvv 10vvvvvv 10vvvvvv
6   31  04000000 7FFFFFFF 1111110v 10vvvvvv 10vvvvvv 10vvvvvv 10vvvvvv 
10vvvvvv

The united statesprice is correct the concatenation of the v bits within the multibyte
encoding.  When there are more than one ways to encode a tag, as an instance
UCS 0, fully the shortest encoding is correct.

Below are pattern implementations of the C same old wctomb() and
mbtowc() capabilities which imprint the algorithms for converting
from UCS to the transformation structure and converting from the
transformation structure to UCS. The pattern implementations encompass error
tests, a couple of of which couldn't be mandatory for conformance:

typedef
struct
{
    int   cmask;
    int   cval;
    int   shift;
    lengthy  lmask;
    lengthy  lval;
} Tab;

static
Tab       tab[] =
{
    0x80, 0x00,  0*6,   0x7F,         0,            /1 byte sequence */
    0xE0, 0xC0,  1*6,   0x7FF,        0x80,         /2 byte sequence */
    0xF0, 0xE0,  2*6,   0xFFFF,              0x800,        /3 byte sequence */
    0xF8, 0xF0,  3*6,   0x1FFFFF,     0x10000,      /4 byte sequence */
    0xFC, 0xF8,  4*6,   0x3FFFFFF,    0x200000,     /5 byte sequence */
    0xFE, 0xFC,  5*6,   0x7FFFFFFF,   0x4000000,    /6 byte sequence */
    0,                                              /cease of desk */
};

int
mbtowc(wchar_t *p, char *s, size_t n)
{
    lengthy l;
    int c0, c, nc;
    Tab *t;

    if(s == 0)
        return 0;

    nc = 0;
    if(n <= nc)
        return -1;
    c0 = *s & 0xff;
    l = c0;
    for(t=tab; t->cmask; t++) {
        nc++;
        if((c0 & t->cmask) == t->cval) {
            l &= t->lmask;
            if(l < t->lval)
                return -1;
            *p = l;
            return nc;
        }
        if(n <= nc)
            return -1;
        s++;
        c = (*s ^ 0x80) & 0xFF;
        if(c & 0xC0)
            return -1;
        l = (l<<6) | c;
    }
    return -1;
}

int
wctomb(char *s, wchar_t wc)
{
    long l;
    int c, nc;
    Tab *t;

    if(s == 0)
        return 0;

    l = wc;
    nc = 0;
    for(t=tab; t->cmask; t++) {
        nc++;
        if(l <= t->lmask) {
            c = t->shift;
            *s = t->cval | (l>>c);
            while(c > 0) {
                c -= 6;
                s++;
                *s = 0x80 | ((l>>c) & 0x3F);
            }
            return nc;
        }
    }
    return -1;
}

>From ken Tue Sep  8 03: 24: 58 EDT 1992
i mailed it out, nonetheless it went correct into a dim gap.
i didnt collect my reproduction. it wants to be hung up on the
internat address with coma down or one thing.

>From ken Tue Sep  8 03: 42: 43 EDT 1992
i within the smash got my reproduction.



--- /usr/ken/utf/xutf from dump of Sep 2 1992 ---

 File Machine Obtain Current Persona Hassle Transformation Format (FSS-UTF)
 --------------------------------------------------------------------------

 With the approval of ISO/IEC 10646 (Unicode) as an global
 same old and the anticipated huge unfold exercise of this popular coded
 character put of abode (UCS), it is mandatory for historically ASCII based fully mostly
 working systems to devise ways to take care of illustration and
 handling of the substantial quantity of characters which would perhaps perhaps perhaps perhaps be imaginable to be
 encoded by this unique same old.

 There are quite so a lot of challenges presented by UCS which wants to be handled
 by historic working systems and the C-language programming
 ambiance. The most mandatory of these challenges is the encoding
 blueprint extinct by UCS.  More precisely, the difficulty is the marrying of
 the united statessame old with fresh programming languages and fresh
 working systems and utilities.

 The challenges of the programming languages and the united statessame old are
 being handled by other actions within the exchange.     Then again, we are
 unruffled faced with the handling of UCS by historic working systems and
 utilities. Prominent among the many working machine UCS handling concerns is
 the illustration of the records at some level of the file machine. An underlying
 assumption is that there would possibly perhaps be an absolute requirement to preserve up the
 fresh working machine instrument funding while on the an identical time
 taking profit of the exercise the substantial quantity of characters equipped by
 the UCS.

 UCS offers the aptitude to encode multi-lingual text within a single
 coded character put of abode.  Then again, UCS and its UTF variant attain not defend
 null bytes and/or the ASCII slash ("/") making these character encodings
 incompatible with fresh Unix implementations.  The following proposal
 offers a Unix successfully matched transformation structure of UCS such that Unix
 systems can strengthen multi-lingual text in a single encoding.  This
 transformation structure encoding is intended to be extinct as a file code.
 This transformation structure encoding of UCS is intended as an
 intermediate step in direction of full UCS strengthen.  Then again, since shut to all
 Unix implementations face the an identical obstacles in supporting UCS, this
 proposal is intended to present a popular and successfully matched encoding at some level of
 this transition stage.


 Aim/Fair
 --------------

 With the realization that nearly all, if not all, of the issues surrounding the
 handling and storing of UCS in historic working machine file systems
 are understood, the target is to outline a UCS transformation structure
 which additionally meets the requirement of being usable on a historic
 working machine file machine in a non-disruptive formulation. The intent is
 that UCS will seemingly be the assignment code for the transformation structure, which
 is usable as a file code.

 Requirements for the Transformation Format
 --------------------------------------

 Below are the pointers that had been extinct in defining the UCS
 transformation structure:

     1) Compatibility with historic file systems:

    Ancient file systems disallow the null byte and the ASCII
    slash character as a phase of the file title.

     2) Compatibility with fresh programs:

    The fresh model for multibyte processing is that ASCII does
    not happen anywhere in a multibyte encoding.  There wants to be no
    ASCII code values for any phase of a transformation structure
    illustration of a personality that changed into once not within the ASCII character
    put of abode within the united statesrepresentation of the character.

     3) Ease of conversion from/to UCS.

     4) The first byte would possibly perhaps well perhaps perhaps unruffled imprint the quantity of bytes to alter to in a
    multibyte sequence.

     5) The transformation structure would possibly perhaps well perhaps perhaps unruffled not be extravagant by formulation of
    quantity of bytes extinct for encoding.


 Proposed FSS-UTF
 ----------------

 The proposed UCS transformation structure encodes UCS values within the vary
 [0,0x7fffffff] utilizing multibyte characters of lengths 1, 2, 3, 4, and 5
 bytes.  For all encodings of more than one byte, the preliminary byte
 determines the quantity of bytes extinct and the excessive-grunt bit in each and each byte
 is put of abode.

 An straight forward formulation to endure in suggestions this transformation structure is to illustrate that the
 quantity of excessive-grunt 1's within the principle byte is the an identical as the quantity of
 subsequent bytes within the multibyte character:

    Bits  Hex Min  Hex Max         Byte Sequence in Binary
 1    7  00000000 0000007f 0zzzzzzz
 2   13  00000080 0000207f 10zzzzzz 1yyyyyyy
 3   19  00002080 0008207f 110zzzzz 1yyyyyyy 1xxxxxxx
 4   25  00082080 0208207f 1110zzzz 1yyyyyyy 1xxxxxxx 1wwwwwww
 5   31  02082080 7fffffff 11110zzz 1yyyyyyy 1xxxxxxx 1wwwwwww 1vvvvvvv

 The bits integrated within the byte sequence is biased by the minimal tag
 so that if all of the z's, y's, x's, w's, and v's are zero, the minimal
 tag is represented.  In the byte sequences, the lowest-grunt encoded
 bits are within the closing byte; the excessive-grunt bits (the z's) are within the
 first byte.

 This transformation structure makes exercise of the byte values for your complete vary of
 0x80 to 0xff, inclusive, as phase of multibyte sequences.  Given the
 assumption that at most there are seven (7) precious bits per byte, this
 transformation structure is shut to minimal in its quantity of bytes extinct.

 Below are pattern implementations of the C same old wctomb() and
 mbtowc() capabilities which imprint the algorithms for converting from
 UCS to the transformation structure and converting from the transformation
 structure to UCS.  The pattern implementations encompass error tests, some
 of which couldn't be mandatory for conformance:

#outline OFF1   0x0000080
#outline OFF2   0x0002080
#outline OFF3   0x0082080
#outline OFF4   0x2082080

int wctomb(char *s, wchar_t wc)
{
       if (s == 0)
           return 0;       /no shift states */
#ifdef wchar_t_is_signed
       if (wc < 0)
           goto execrable;
#endif
       if (wc <= 0x7f)         /matches in 7 bits */
       {
           s[0] = wc;
           return 1;
       }
       if (wc <= 0x1fff + OFF1)        /fits in 13 bits */
       {
           wc -= OFF1;
           s[0] = 0x80 | (wc >> 7);
           s[1] = 0x80 | (wc & 0x7f);
           return 2;
       }
       if (wc <= 0x7ffff + OFF2)       /fits in 19 bits */
       {
           wc -= OFF2;
           s[0] = 0xc0 | (wc >> 14);
           s[1] = 0x80 | ((wc >> 7) & 0x7f);
           s[2] = 0x80 | (wc & 0x7f);
           return 3;
       }
       if (wc <= 0x1ffffff + OFF3)     /fits in 25 bits */
       {
           wc -= OFF3;
           s[0] = 0xe0 | (wc >> 21);
           s[1] = 0x80 | ((wc >> 14) & 0x7f);
           s[2] = 0x80 | ((wc >> 7) & 0x7f);
           s[3] = 0x80 | (wc & 0x7f);
           return 4;
       }
#if !defined(wchar_t_is_signed) || defined(wchar_t_is_more_than_32_bits)
       if (wc > 0x7fffffff)
           goto execrable;
#endif
       wc -= OFF4;
       s[0] = 0xf0 | (wc >> 28);
       s[1] = 0x80 | ((wc >> 21) & 0x7f);
       s[2] = 0x80 | ((wc >> 14) & 0x7f);
       s[3] = 0x80 | ((wc >> 7) & 0x7f);
       s[4] = 0x80 | (wc & 0x7f);
       return 5;
execrable:;
       errno = EILSEQ;
       return -1;
}


int mbtowc(wchar_t *p, const char *s, size_t n)
{
       unsigned char *uc;      /so that each and each particular person bytes are nonnegative */

       if ((uc = (unsigned char *)s) == 0)
           return 0;               /no shift states */
       if (n == 0)
           return -1;
       if ((*p = uc[0]) < 0x80)
           return uc[0] != '';   /return 0 for '', else 1 */
       if (uc[0] < 0xc0)
       {
           if (n < 2)
               return -1;
           if (uc[1] < 0x80)
               goto execrable;
           *p &= 0x3f;
           *p <<= 7;
           *p |= uc[1] & 0x7f;
           *p += OFF1;
           return 2;
       }
       if (uc[0] < 0xe0)
       {
           if (n < 3)
               return -1;
           if (uc[1] < 0x80 || uc[2] < 0x80)
               goto execrable;
           *p &= 0x1f;
           *p <<= 14;
           *p |= (uc[1] & 0x7f) << 7;
           *p |= uc[2] & 0x7f;
           *p += OFF2;
           return 3;
       }
       if (uc[0] < 0xf0)
       {
           if (n < 4)
               return -1;
           if (uc[1] < 0x80 || uc[2] < 0x80 || uc[3] < 0x80)
               goto execrable;
           *p &= 0x0f;
           *p <<= 21;
           *p |= (uc[1] & 0x7f) << 14;
           *p |= (uc[2] & 0x7f) << 7;
           *p |= uc[3] & 0x7f;
           *p += OFF3;
           return 4;
       }
       if (uc[0] < 0xf8)
       {
           if (n < 5)
               return -1;
           if (uc[1] < 0x80 || uc[2] < 0x80 || uc[3] < 0x80 || uc[4] < 0x80)
               goto execrable;
           *p &= 0x07;
           *p <<= 28;
           *p |= (uc[1] & 0x7f) << 21;
           *p |= (uc[2] & 0x7f) << 14;
           *p |= (uc[3] & 0x7f) << 7;
           *p |= uc[4] & 0x7f;
           if (((*p += OFF4) & ~(wchar_t)0x7fffffff) == 0)
               return 5;
       }
bad:;
       errno = EILSEQ;
       return -1;
}

We define 7 byte types:
T0 0xxxxxxx      7 free bits
Tx 10xxxxxx      6 free bits
T1 110xxxxx      5 free bits
T2 1110xxxx      4 free bits
T3 11110xxx      3 free bits
T4 111110xx      2 free bits
T5 111111xx      2 free bits

Encoding is as follows.
>From hex By hex      Sequence             Bits
00000000  0000007f      T0                   7
00000080  000007FF      T1 Tx                11
00000800  0000FFFF      T2 Tx Tx             16
00010000  001FFFFF      T3 Tx Tx Tx          21
00200000  03FFFFFF      T4 Tx Tx Tx Tx              26
04000000  FFFFFFFF      T5 Tx Tx Tx Tx Tx    32

Some notes:

1. The 2 byte sequence has 2^11 codes, yet fully 2^11-2^7
are allowed. The codes within the vary 0-7f are unlawful.
I judge here's preferable to a pile of magic additive
constants for no accurate profit. A associated observation applies
to all of the longer sequences.

2. The 4, 5, and 6 byte sequences are fully there for
political causes. I'd buy to delete these.

3. The 6 byte sequence covers 32 bits, the FSS-UTF
proposal fully covers 31.

4. All of the sequences synchronize on any byte that's
not a Tx byte.

Read More

Share your love