From: "Lewis" Received: from [72.86.41.184] (account lgrosenthal@2rosenthals.com HELO [192.168.201.140]) by 2rosenthals.com (CommuniGate Pro SMTP 5.4.10) with ESMTPSA id 2280227 for lswitcher-dev@2rosenthals.com; Sun, 15 Aug 2021 15:36:17 -0400 Subject: Re: [lswitcher-dev] lSwitcher-2-93-0-RC_6.wpi To: lSwitcher Developers Mailing List References: Message-ID: <61196CB0.5020600@2rosenthals.com> Date: Sun, 15 Aug 2021 15:36:16 -0400 User-Agent: Mozilla/5.0 (OS/2; Warp 4.5; rv:38.0) Gecko/20100101 Firefox/38.0 SeaMonkey/2.35 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit Hi... On 08/15/21 08:57 am, Alfredo Fernández Díaz wrote: > "Morning," > > On 2021/08/15 04:04, Lewis wrote: >> Hi... >> >> Changing the codepage isn't enough. The content needs to be converted to >> UTF-8 >> (it was still CP850). > > As I tried to explain (albeit maybe too briefly, sorry) this still breaks > (more?) things... > > I am perfectly aware that the WIS contains non-English characters, so > specifying CODEPAGE is not enough -- what you state there must be the one > in use as well, so if the original used CP850 characters, a proper > conversion is in order, sure. > > Still, WarpIN is not handling this correctly... > > >> This gave me a UTF-8 script, which properly renders Ulrich's name and which >> then matches what's in the WarpIN db (no error report of missing XWP). > > Lewis, did you notice I reported this was a problem that showed up /on a > Russian system/, and nowhere else? -- Ulrich's name was always properly > processed and rendered on my main system (main CP always 850). > You got me there. I was only testing in English. My first go-round told me that XWP was not installed, as it couldn't match Ulrich's name in the db. Once I converted the script to UTF-8, all was right with the world. > I am attaching two screenshots to illustrate that something (which may or > may not be new, and/or related to the problem with not finding XWP in the > database) breaks when you convert the WIS: > > lsw@ru_CP850.png shows how the readme (CP 850) is rendered on this Russian > system under CP 866 when the wis is CP850-encoded: see the "?" on my name? > That is possibly a rendering-only, cosmetic problem. > > Now, let's convert the WIS to UTF (and change its CODEPAGE attribute > accordingly), and fire up WarpIN on that again: see lsw@ru_CP850.png, look > at my name again. > > That is a UTF conversion problem, which may or may not be related to the > one I reported initially, but we definitely brought it up converting the > WIS to CP 1208 aka UTF8. > The WarpIN source says that we handle extracted files (EXTRACTFROMPCK) like so: if (!G_pCurrentPageInfo->_ulExtractFromPck) str2Insert.assignUtf8(pLocals->_pCodecGui, G_pCurrentPageInfo->_ustrReadmeSrc); else { // use _strReadmeSrc as a file name: // V1.0.11 (2006-08-31) [pr]: was using Unicode filename for Readme @@fixes 812 ULONG cpSrc = Engine._pCurrentArchive->_pScript->_ulCodepage; BSUniCodec codecSrc(cpSrc); BSString strReadmeSrc(&codecSrc, G_pCurrentPageInfo->_ustrReadmeSrc); BSString strTempFileName; APIRET arc; if (!(arc = Engine.ExtractTempFile(G_pCurrentPageInfo->_ulExtractFromPck, strReadmeSrc.c_str(), // V1.0.11 (2006-08-11) &strTempFileName))) { // successfully extracted: PSZ pszContent = NULL; if (!(arc = doshLoadTextFile(strTempFileName.c_str(), &pszContent, NULL))) { // check what codepage the script was created in... // we assume that the "readme" file was written in // the same codepage. If the codepage is different // from our current one, we'll need to convert: if (cpSrc == pLocals->_pCodecGui->QueryCodepage()) // easy str2Insert = pszContent; else { // alright, different: // convert file contents to Unicode ustring ustr(&codecSrc, pszContent); // convert Unicode to display codepage str2Insert.assignUtf8(pLocals->_pCodecGui, ustr); } free(pszContent); } else str2Insert._printf(nlsGetString(WPSI_ERRORREADINGPCKFILE), arc, strReadmeSrc.c_str(), // V1.0.11 (2006-08-31) [pr] G_pCurrentPageInfo->_ulExtractFromPck); } else str2Insert._printf(nlsGetString(WPSI_ERROREXTRACTINGPCKFILE), arc, strReadmeSrc.c_str(), // V1.0.11 (2006-08-31) [pr] G_pCurrentPageInfo->_ulExtractFromPck); So, we convert the CP850 Readme to UTF-8. So far, so good. However, when we then need to convert to the display codepage (CP866, in this case), we run into a slight problem (note that Readme.UTF8 is the original readme which I converted via iconv): [j:\] iconv -f UTF-8 -t 866 Readme.UTF8 > Readme.866 iconv.exe: Readme.UTF8:108:12: cannot convert Line 108, char 12 is "á" in your name. Hmmm... I'm not sure what to do here. There is a WarpIN preference for display codepage, which defaults to process codepage. However, on a Russian system, it would seem highly illogical to change this merely to read a few characters which can't be rendered in 866. Also, this is not a font thing. I have dropped myriad fonts onto the dialog, all with the same result: "?" for the characters in your name. I fall back on my contention that this is not a WarpIN bug. WarpIN accepts the content of an external file as the same codepage as specified for the WIS, and then converts to UTF-8, and finally to the display codepage. It's a conundrum, I grant you. I just haven't figured an adequate workaround as yet. -- Lewis