Mailing List lswitcher-dev@2rosenthals.com Message #207
From: "Lewis" <lswitcher-dev@2rosenthals.com> Full Headers
Undecoded message
Subject: Re: [lswitcher-dev] lSwitcher-2-93-0-RC_6.wpi
Date: Sun, 15 Aug 2021 15:36:16 -0400
To: lSwitcher Developers Mailing List <lswitcher-dev@2rosenthals.com>

Hi...

On 08/15/21 08:57 am, Alfredo Fernández Díaz wrote:
"Morning,"

On 2021/08/15 04:04, Lewis wrote:
Hi...

Changing the codepage isn't enough. The content needs to be converted to UTF-8
(it was still CP850).

As I tried to explain (albeit maybe too briefly, sorry) this still breaks (more?) things...

I am perfectly aware that the WIS contains non-English characters, so specifying CODEPAGE is not enough -- what you state there must be the one in use as well, so if the original used CP850 characters, a proper conversion is in order, sure.

Still, WarpIN is not handling this correctly...

<snip>
This gave me a UTF-8 script, which properly renders Ulrich's name and which
then matches what's in the WarpIN db (no error report of missing XWP).

Lewis, did you notice I reported this was a problem that showed up /on a Russian system/, and nowhere else? -- Ulrich's name was always properly processed and rendered on my main system (main CP always 850).


You got me there. I was only testing in English. My first go-round told me that XWP was not installed, as it couldn't match Ulrich's name in the db. Once I converted the script to UTF-8, all was right with the world.

I am attaching two screenshots to illustrate that something (which may or may not be new, and/or related to the problem with not finding XWP in the database) breaks when you convert the WIS:

lsw@ru_CP850.png shows how the readme (CP 850) is rendered on this Russian system under CP 866 when the wis is CP850-encoded: see the "?" on my name? That is possibly a rendering-only, cosmetic problem.

Now, let's convert the WIS to UTF (and change its CODEPAGE attribute accordingly), and fire up WarpIN on that again: see lsw@ru_CP850.png, look at my name again.

That is a UTF conversion problem, which may or may not be related to the one I reported initially, but we definitely brought it up converting the WIS to CP 1208 aka UTF8.


The WarpIN source says that we handle extracted files (EXTRACTFROMPCK) like so:

if (!G_pCurrentPageInfo->_ulExtractFromPck)
    str2Insert.assignUtf8(pLocals->_pCodecGui,
                          G_pCurrentPageInfo->_ustrReadmeSrc);
else
{
    // use _strReadmeSrc as a file name:
    // V1.0.11 (2006-08-31) [pr]: was using Unicode filename for Readme @@fixes 812
    ULONG cpSrc = Engine._pCurrentArchive->_pScript->_ulCodepage;
    BSUniCodec codecSrc(cpSrc);
    BSString strReadmeSrc(&codecSrc, G_pCurrentPageInfo->_ustrReadmeSrc);
    BSString strTempFileName;
    APIRET arc;

    if (!(arc = Engine.ExtractTempFile(G_pCurrentPageInfo->_ulExtractFromPck,
                                       strReadmeSrc.c_str(),  // V1.0.11 (2006-08-11)
                                       &strTempFileName)))
    {
        // successfully extracted:
        PSZ pszContent = NULL;
        if (!(arc = doshLoadTextFile(strTempFileName.c_str(),
                                     &pszContent,
                                     NULL)))
        {
            // check what codepage the script was created in...
            // we assume that the "readme" file was written in
            // the same codepage. If the codepage is different
            // from our current one, we'll need to convert:
            if (cpSrc == pLocals->_pCodecGui->QueryCodepage())
                // easy
                str2Insert = pszContent;
            else
            {
                // alright, different:
                // convert file contents to Unicode
                ustring ustr(&codecSrc, pszContent);
                // convert Unicode to display codepage
                str2Insert.assignUtf8(pLocals->_pCodecGui, ustr);
            }
            free(pszContent);
        }
        else
str2Insert._printf(nlsGetString(WPSI_ERRORREADINGPCKFILE),
                               arc,
                               strReadmeSrc.c_str(), // V1.0.11 (2006-08-31) [pr]
G_pCurrentPageInfo->_ulExtractFromPck);
    }
    else
str2Insert._printf(nlsGetString(WPSI_ERROREXTRACTINGPCKFILE),
                           arc,
                           strReadmeSrc.c_str(), // V1.0.11 (2006-08-31) [pr]
G_pCurrentPageInfo->_ulExtractFromPck);

So, we convert the CP850 Readme to UTF-8. So far, so good. However, when we then need to convert to the display codepage (CP866, in this case), we run into a slight problem (note that Readme.UTF8 is the original readme which I converted via iconv):

[j:\] iconv -f UTF-8 -t 866 Readme.UTF8 > Readme.866
iconv.exe: Readme.UTF8:108:12: cannot convert

Line 108, char 12 is "á" in your name. Hmmm... I'm not sure what to do here. There is a WarpIN preference for display codepage, which defaults to process codepage. However, on a Russian system, it would seem highly illogical to change this merely to read a few characters which can't be rendered in 866.

Also, this is not a font thing. I have dropped myriad fonts onto the dialog, all with the same result: "?" for the characters in your name.

I fall back on my contention that this is not a WarpIN bug. WarpIN accepts the content of an external file as the same codepage as specified for the WIS, and then converts to UTF-8, and finally to the display codepage. It's a conundrum, I grant you. I just haven't figured an adequate workaround as yet.

--
Lewis

Subscribe (FEED) Subscribe (DIGEST) Subscribe (INDEX) Unsubscribe Mail to Listmaster