Mailing List Archived Message #210

From: "Doug Bissett" <> Full Headers
Undecoded message
Subject: Re: [lswitcher-dev] lSwitcher-2-93-0-RC_6.wpi
Date: Sun, 15 Aug 2021 15:27:59 -0600 (MDT)
To: "lSwitcher Developers Mailing List" <>

On 2021-08-15, at 14:52:27, Gregg Young wrote:
> On Sun, 15 Aug 2021 15:36:16 -0400 Lewis wrote:
> >
> >Hi...
> >
> >On 08/15/21 08:57 am, Alfredo Fernández Díaz wrote:
> >>"Morning,"
> >>
> >>On 2021/08/15 04:04, Lewis wrote:
> >>>Hi...
> >>>
> >>>Changing the codepage isn't enough. The content needs to be converted to UTF-8
> >>>(it was still CP850).
> >>
> >>As I tried to explain (albeit maybe too briefly, sorry) this still breaks (more?) things...
> >>
> >>I am perfectly aware that the WIS contains non-English characters, so specifying CODEPAGE is not enough -- what you
> state there must be the one in use as well, so if the original used CP850 characters, a proper conversion is in order, sure.
> >>
> >>Still, WarpIN is not handling this correctly...
> >>
> >><snip>
> >>>This gave me a UTF-8 script, which properly renders Ulrich's name and which
> >>>then matches what's in the WarpIN db (no error report of missing XWP).
> >>
> >>Lewis, did you notice I reported this was a problem that showed up /on a Russian system/, and nowhere else? -- Ulrich's
> name was always properly processed and rendered on my main system (main CP always 850).
> >>
> >
> >You got me there. I was only testing in English. My first go-round told me that XWP was not installed, as it couldn't match
> Ulrich's name in the db. Once I converted the script to UTF-8, all was right with the world.
> >
> >>I am attaching two screenshots to illustrate that something (which may or may not be new, and/or related to the problem
> with not finding XWP in the database) breaks when you convert the WIS:
> >>
> >>lsw@ru_CP850.png shows how the readme (CP 850) is rendered on this Russian system under CP 866 when the wis is
> CP850-encoded: see the "?" on my name? That is possibly a rendering-only, cosmetic problem.
> >>
> >>Now, let's convert the WIS to UTF (and change its CODEPAGE attribute accordingly), and fire up WarpIN on that again:
> see lsw@ru_CP850.png, look at my name again.
> >>
> >>That is a UTF conversion problem, which may or may not be related to the one I reported initially, but we definitely brought
> it up converting the WIS to CP 1208 aka UTF8.
> >>
> >
> >The WarpIN source says that we handle extracted files (EXTRACTFROMPCK) like so:
> >
> >if (!G_pCurrentPageInfo->_ulExtractFromPck)
> >    str2Insert.assignUtf8(pLocals->_pCodecGui,
> >                          G_pCurrentPageInfo->_ustrReadmeSrc);
> >else
> >{
> >    // use _strReadmeSrc as a file name:
> >    // V1.0.11 (2006-08-31) [pr]: was using Unicode filename for Readme @@fixes 812
> >    ULONG cpSrc = Engine._pCurrentArchive->_pScript->_ulCodepage;
> >    BSUniCodec codecSrc(cpSrc);
> >    BSString strReadmeSrc(&codecSrc, G_pCurrentPageInfo->_ustrReadmeSrc);
> >    BSString strTempFileName;
> >    APIRET arc;
> >
> >    if (!(arc = Engine.ExtractTempFile(G_pCurrentPageInfo->_ulExtractFromPck,
> >                                       strReadmeSrc.c_str(),  // V1.0.11 (2006-08-11)
> >                                       &strTempFileName)))
> >    {
> >        // successfully extracted:
> >        PSZ pszContent = NULL;
> >        if (!(arc = doshLoadTextFile(strTempFileName.c_str(),
> >                                     &pszContent,
> >                                     NULL)))
> >        {
> >            // check what codepage the script was created in...
> >            // we assume that the "readme" file was written in
> >            // the same codepage. If the codepage is different
> >            // from our current one, we'll need to convert:
> >            if (cpSrc == pLocals->_pCodecGui->QueryCodepage())
> >                // easy
> >                str2Insert = pszContent;
> >            else
> >            {
> >                // alright, different:
> >                // convert file contents to Unicode
> >                ustring ustr(&codecSrc, pszContent);
> >                // convert Unicode to display codepage
> >                str2Insert.assignUtf8(pLocals->_pCodecGui, ustr);
> >            }
> >            free(pszContent);
> >        }
> >        else
> >str2Insert._printf(nlsGetString(WPSI_ERRORREADINGPCKFILE),
> >                               arc,
> >                               strReadmeSrc.c_str(), // V1.0.11 (2006-08-31) [pr]
> >G_pCurrentPageInfo->_ulExtractFromPck);
> >    }
> >    else
> >str2Insert._printf(nlsGetString(WPSI_ERROREXTRACTINGPCKFILE),
> >                           arc,
> >                           strReadmeSrc.c_str(), // V1.0.11 (2006-08-31) [pr]
> >G_pCurrentPageInfo->_ulExtractFromPck);
> >
> >So, we convert the CP850 Readme to UTF-8. So far, so good. However, when we then need to convert to the display
> codepage (CP866, in this case), we run into a slight problem (note that Readme.UTF8 is the original readme which I converted
> via iconv):
> >
> >[j:\] iconv -f UTF-8 -t 866 Readme.UTF8 > Readme.866
> >iconv.exe: Readme.UTF8:108:12: cannot convert
> >
> >Line 108, char 12 is "á" in your name. Hmmm... I'm not sure what to do here. There is a WarpIN preference for display
> codepage, which defaults to process codepage. However, on a Russian system, it would seem highly illogical to change this
> merely to read a few characters which can't be rendered in 866.
> >
> >Also, this is not a font thing. I have dropped myriad fonts onto the dialog, all with the same result: "?" for the characters in
> your name.
> >
> >I fall back on my contention that this is not a WarpIN bug. WarpIN accepts the content of an external file as the same
> codepage as specified for the WIS, and then converts to UTF-8, and finally to the display codepage. It's a conundrum, I grant
> you. I just haven't figured an adequate workaround as yet.
> Hi Lewis
> This is a warpin bug. REQUIRES="Ulrich Möller\XWorkplace\Kernel\1\0\1" isn't meant to be user readable it is an internal
> check.
> I think you will also see problems on a Russian system with PACKAGEID="Ulrich Möller\XWorkplace\Kernel\1\0\1" installed
> post 1.0.24. If this is done with codepage 1208 or no codepage. the database will contain  "Ulrich
> M?ller\XWorkplace\Kernel\1\0\1". If you have a REQUIRES="Ulrich Möller\XWorkplace\Kernel\1\0\1" codepage 1208 it will
> probably work but if you have a wis with this that is codepage 850 it will fail since the ö will be present. The ö isn't present in
> codepage 866. Only ASCII character (0-127) are (probably) guaranteed  between codepages.
> What is needed is for warpin to convert these "internal use" strings to codepage 850 use them and then convert the rest to
> codepage 866.  
> The other problem is with wises with no codepage (most if not all of which are codepage 850). These fail for
> REQUIRES="Ulrich Möller\XWorkplace\Kernel\1\0\1" on Russian systems because they are now read out as codepage 866
> (process default). This case requires that the "internal use" be read first in codepage 850 and used before the codepage 866
> (default) read. This can also be fixed by assuming they are codepage 850 not the process codepage.
> OK some questions about this. What version of iconv are you using I have found several? What are the exact steps to build
> the wis? I assume you need to reconvert any time you edit the file unless you use a UTF-8 enabled editor (are there any).
> Thanks
> Gregg    

I think the main "problem", is that the WarpIn database, is not a real database, and simply stores whatever it is told to store. That data comes from the WIS (hopefully not changed). From what I have seen, it stores it in whatever the system code page is, at the time the package is installed. Change the code page, and it is possible that some characters change (or don't exist). Then they won't match any more. This would actually be pretty rare. In this case, it is the special character in the Vendor name that is messing it up (a quick scan shows no other similar cases, but that is only for what I have installed). WarpIn could probably be changed to store special characters as an escaped character, but that means that old databases would need to be converted, or accepted as they are, somehow.

There seems to be a different problem, where displayed text is not being handled properly. That should probably always use the system code page, and any characters that cannot be properly displayed should be replaced by whatever character is standard. I would assume that Russian text would always display properly in CP 866. What is in the database should be irrelevant, but it should always match the WIS contents so other programs can check it. It seems, to me, that a translation is being done on what is stored in the database, and that should not be the case. How that can be done, is a whole different question.

From Doug Bissett's ArcaOS system
dougb007 AT
... There is no job so simple it cannot be done wrong.

Subscribe: Feed, Digest, Index.
Mail to ListMaster