This bug is still reproducible in TeXmacs 1.99.20. The tm
file is attached at the end.
If the conversion is handled by TeXmacs, I propose to use C++ STL to perform the conversion: codecvt_utf16 and wstring_convert.
I don’t know what was the result of the discussion between @darcy and Joris about the modernization of C++ codes. I don’t think it worthy converting everything, but when there are buggy codes, it might be good to replace them with the correct modern C++11 codes.
I modified the sample code which produces the correct UTF-16BE (and also UTF-8, UTF-16LE codes):
#include <iostream>
#include <string>
#include <locale>
#include <codecvt>
#include <iomanip>
// utility function for output
void hex_print(const std::string& s)
{
std::cout << std::hex << std::setfill('0');
for(unsigned char c : s)
std::cout << std::setw(2) << static_cast<int>(c) << ' ';
std::cout << std::dec << '\n';
}
int main()
{
// wide character data
// std::wstring wstr = L"z\u00df\u6c34\U0001f34c"; // or L"zß水🍌"
std::wstring wstr = L"𝔸";
// wide to UTF-8
std::wstring_convert<std::codecvt_utf8<wchar_t>> conv1;
std::string u8str = conv1.to_bytes(wstr);
std::cout << "UTF-8 conversion produced " << u8str.size() << " bytes:\n";
hex_print(u8str);
// wide to UTF-16le
std::wstring_convert<std::codecvt_utf16<wchar_t, 0x10ffff, std::little_endian>> conv2;
std::string u16str = conv2.to_bytes(wstr);
std::cout << "UTF-16le conversion produced " << u16str.size() << " bytes:\n";
hex_print(u16str);
// wide to UTF-16be
std::wstring_convert<std::codecvt_utf16<wchar_t>> conv3;
std::string u16bestr = conv3.to_bytes(wstr);
std::cout << "UTF-16be conversion produced " << u16bestr.size() << " bytes:\n";
hex_print(u16bestr);
}
The tm
file:
<TeXmacs|1.99.20>
<style|generic>
<\body>
<section|<math|\<bbb-A\>>>
<subsection|<math|\<bbb-B\>>>
<subsubsection|<math|\<bbb-A\>>>
<section|<math|\<bbb-C\>>>
<section|<math|\<bbb-U\>>>
</body>
<\initial>
<\collection>
<associate|page-height|auto>
<associate|page-medium|paper>
<associate|page-type|letter>
<associate|page-width|auto>
</collection>
</initial>
<\references>
<\collection>
<associate|auto-1|<tuple|1|1>>
<associate|auto-2|<tuple|1.1|1>>
<associate|auto-3|<tuple|1.1.1|1>>
<associate|auto-4|<tuple|2|1>>
<associate|auto-5|<tuple|3|1>>
</collection>
</references>
<\auxiliary>
<\collection>
<\associate|toc>
<vspace*|1fn><with|font-series|<quote|bold>|math-font-series|<quote|bold>|1<space|2spc><with|mode|<quote|math>|\<bbb-A\>>>
<datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
<no-break><pageref|auto-1><vspace|0.5fn>
<with|par-left|<quote|1tab>|1.1<space|2spc><with|mode|<quote|math>|\<bbb-B\>>
<datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
<no-break><pageref|auto-2>>
<with|par-left|<quote|2tab>|1.1.1<space|2spc><with|mode|<quote|math>|\<bbb-A\>>
<datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
<no-break><pageref|auto-3>>
<vspace*|1fn><with|font-series|<quote|bold>|math-font-series|<quote|bold>|2<space|2spc><with|mode|<quote|math>|\<bbb-C\>>>
<datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
<no-break><pageref|auto-4><vspace|0.5fn>
<vspace*|1fn><with|font-series|<quote|bold>|math-font-series|<quote|bold>|3<space|2spc><with|mode|<quote|math>|\<bbb-U\>>>
<datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
<no-break><pageref|auto-5><vspace|0.5fn>
</associate>
</collection>
</auxiliary>