[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [MiNT] UTF-8 support



On 2000-4-26, Guido Flohr <gufl0000@stud.uni-sb.de> wrote:

> Some people may know that I am very fond of i18n and l10n, so
> here is my opinion on the topic:
> 
> The most annoying thing (after backslashes in path names and
> the colon/drive letter stuff)

I have said _many_ times before that GEM applications only need
to see DOS pathnames.  This means that MiNT should _not_ mount
all drives to U: since UNIX apps do not need to know about drive
letters; they all work in paths relative to / (GEM apps can
still access the info, bydirectly accessing the partition). 

> in MiNT to me is the MS-DOS codeset that we still use.  Even
> Microsoft have migrated to ISO 8859-1 when they introduced
> Windows, we never did something alike.

ISO-8859-1 was obsolete before it came out.  ISO-8859-15 is what
should be used, if we insist upon keeping things in the 8-bit.

> Next problem is the GUI elements in the codeset (closer,
> fuller, scroll arrows).  I think that future AES versions
> should overcome this nuisance and automagically translate
> control characters < ASCII SPACE in BOXCHAR AES objects
> (maybe in all strings) into the corresponding window elements
> instead of taking them verbatim from the system font.  This
> would allow to use arbitrary fonts as system fonts.

The Bitstream character table already maps entities to fontset
positions this way.  You call the entity number, not its
arbitrary position in any given font.

> The VDI is not so bad.  Since the data type wchar_t is
> equivalent to short int the MiNTLib it would be no problem to
> offer an advanced string API in the VDI library bindings (as
> far as I can see it all implementations of wchar_t silently
> assume that wide character strings are UTF-8).  

You are aware that NVDI 5 has a built-in UTF conversion map?

> I have hand-made Monaco fonts (default size only) for
> ISO-8859-1, ISO-8859-2, ISO-8859-3 and ISO-8859-5 (plus
> KOIR-8 because Russians seem to prefer that encoding to
> ISO-8859-5).  

KOI-8R makes a lot more sense than ISO-8859-5, because if you
chop off the 8th bit, you end up with phonetically corresponding
letters of the roman alphabet, as they would be pronouced by
7-bit users e.g. Americans, Brits.

> The alternative to GDOS bitmap fonts are the scalable formats
> supported by recent (N)VDI implementations.  But IMHO the
> appearance - at least on the screen - of scalable fonts is
> currently very poor.

The font engine on Linux is about as bad.  IMHO, Mac is the one
with the best scaling.

> It's a pity that so few people are concerned about
> internationalization for MiNT.

For me, internationalization is probably the biggest obstacle to
staying on Atari.  Already, I need access to the Latin-9 set on a
daily basis and, since I started using Russian, Windows 1251 and
KOI-8R are needed too.  

The problem is, most UNIX applications assume that Latin-1 is
used if one is running in 8-bit.  Even worse, Netscape (except on
the Linux version) Perl and just about everything else completely
skipped Latin 9 and went straight for UTF-8.  

As for cyrillic, two problems arise:  

1) Few apps recognize either character-sets and none on MiNT do.

2) Applications assume that a Russian keyboard's alternate layout
is US-QWERTY, not Finnish-QWERTY (the problem exists on every
other OS), which means that when switching character table the
character set reverts to US-ASCII (if you're lucky, to Latin-1).

Already, Atari Corp was dumb enough to hard-code keyboard layouts
with the TOS language (the Falcon's attempt at this was a joke:
integrate the 6 best-selling languages into one TOS - no real
provision for additional languages) and they had their own custom
keyboard layout, which means that keys are not quite where you
expect them; you have to modify your typing.  TOS also never
really dealt with deadkeys properly.

-- 
Martin-Éric Racine  http://funkyware.atari.org/  Atari TT030 FAQ
Lappeenranta, Finland.  Surfing on a Intel/Microsoft-free GEM OS