On Sun, 15 Aug 2021 15:36:16 -0400 Lewis wrote:
>On 08/15/21 08:57 am, Alfredo Fernández Díaz wrote:
>>On 2021/08/15 04:04, Lewis wrote:
>>>Changing the codepage isn't enough. The content needs to be converted to UTF-8
>>>(it was still CP850).
>>As I tried to explain (albeit maybe too briefly, sorry) this still breaks (more?) things...
>>I am perfectly aware that the WIS contains non-English characters, so specifying CODEPAGE is not enough -- what you state there must be the one in use as well, so if the original used CP850 characters, a proper conversion is in order, sure.
>>Still, WarpIN is not handling this correctly...
>>>This gave me a UTF-8 script, which properly renders Ulrich's name and which
>>>then matches what's in the WarpIN db (no error report of missing XWP).
>>Lewis, did you notice I reported this was a problem that showed up /on a Russian system/, and nowhere else? -- Ulrich's name was always properly processed and rendered on my main system (main CP always 850).
>You got me there. I was only testing in English. My first go-round told me that XWP was not installed, as it couldn't match Ulrich's name in the db. Once I converted the script to UTF-8, all was right with the world.
>>I am attaching two screenshots to illustrate that something (which may or may not be new, and/or related to the problem with not finding XWP in the database) breaks when you convert the WIS:
>>lsw@ru_CP850.png shows how the readme (CP 850) is rendered on this Russian system under CP 866 when the wis is CP850-encoded: see the "?" on my name? That is possibly a rendering-only, cosmetic problem.
>>Now, let's convert the WIS to UTF (and change its CODEPAGE attribute accordingly), and fire up WarpIN on that again: see lsw@ru_CP850.png, look at my name again.
>>That is a UTF conversion problem, which may or may not be related to the one I reported initially, but we definitely brought it up converting the WIS to CP 1208 aka UTF8.
>The WarpIN source says that we handle extracted files (EXTRACTFROMPCK) like so:
> // use _strReadmeSrc as a file name:
> // V1.0.11 (2006-08-31) [pr]: was using Unicode filename for Readme @@fixes 812
> ULONG cpSrc = Engine._pCurrentArchive->_pScript->_ulCodepage;
> BSUniCodec codecSrc(cpSrc);
> BSString strReadmeSrc(&codecSrc, G_pCurrentPageInfo->_ustrReadmeSrc);
> BSString strTempFileName;
> APIRET arc;
> if (!(arc = Engine.ExtractTempFile(G_pCurrentPageInfo->_ulExtractFromPck,
> strReadmeSrc.c_str(), // V1.0.11 (2006-08-11)
> // successfully extracted:
> PSZ pszContent = NULL;
> if (!(arc = doshLoadTextFile(strTempFileName.c_str(),
> // check what codepage the script was created in...
> // we assume that the "readme" file was written in
> // the same codepage. If the codepage is different
> // from our current one, we'll need to convert:
> if (cpSrc == pLocals->_pCodecGui->QueryCodepage())
> // easy
> str2Insert = pszContent;
> // alright, different:
> // convert file contents to Unicode
> ustring ustr(&codecSrc, pszContent);
> // convert Unicode to display codepage
> str2Insert.assignUtf8(pLocals->_pCodecGui, ustr);
> strReadmeSrc.c_str(), // V1.0.11 (2006-08-31) [pr]
> strReadmeSrc.c_str(), // V1.0.11 (2006-08-31) [pr]
>So, we convert the CP850 Readme to UTF-8. So far, so good. However, when we then need to convert to the display codepage (CP866, in this case), we run into a slight problem (note that Readme.UTF8 is the original readme which I converted via iconv):
>[j:\] iconv -f UTF-8 -t 866 Readme.UTF8 > Readme.866
>iconv.exe: Readme.UTF8:108:12: cannot convert
>Line 108, char 12 is "á" in your name. Hmmm... I'm not sure what to do here. There is a WarpIN preference for display codepage, which defaults to process codepage. However, on a Russian system, it would seem highly illogical to change this merely to read a few characters which can't be rendered in 866.
>Also, this is not a font thing. I have dropped myriad fonts onto the dialog, all with the same result: "?" for the characters in your name.
>I fall back on my contention that this is not a WarpIN bug. WarpIN accepts the content of an external file as the same codepage as specified for the WIS, and then converts to UTF-8, and finally to the display codepage. It's a conundrum, I grant you. I just haven't figured an adequate workaround as yet.
This is a warpin bug. REQUIRES="Ulrich Möller\XWorkplace\Kernel\1\0\1" isn't meant to be user readable it is an internal check.
I think you will also see problems on a Russian system with PACKAGEID="Ulrich Möller\XWorkplace\Kernel\1\0\1" installed post 1.0.24. If this is done with codepage 1208 or no codepage. the database will contain "Ulrich M?ller\XWorkplace\Kernel\1\0\1". If you have a REQUIRES="Ulrich Möller\XWorkplace\Kernel\1\0\1" codepage 1208 it will probably work but if you have a wis with this that is codepage 850 it will fail since the ö will be present. The ö isn't present in codepage 866. Only ASCII character (0-127) are (probably) guaranteed between codepages.
What is needed is for warpin to convert these "internal use" strings to codepage 850 use them and then convert the rest to codepage 866.
The other problem is with wises with no codepage (most if not all of which are codepage 850). These fail for REQUIRES="Ulrich Möller\XWorkplace\Kernel\1\0\1" on Russian systems because they are now read out as codepage 866 (process default). This case requires that the "internal use" be read first in codepage 850 and used before the codepage 866 (default) read. This can also be fixed by assuming they are codepage 850 not the process codepage.
OK some questions about this. What version of iconv are you using I have found several? What are the exact steps to build the wis? I assume you need to reconvert any time you edit the file unless you use a UTF-8 enabled editor (are there any).