This bug is still reproducible in TeXmacs 1.99.20. The tm
file is attached at the end.
If the conversion is handled by TeXmacs, I propose to use C++ STL to perform the conversion: codecvt_utf16 and wstring_convert.
I don’t know what was the result of the discussion between @darcy and Joris about the modernization of C++ codes. I don’t think it worthy converting everything, but when there are buggy codes, it might be good to replace them with the correct modern C++11 codes.
I modified the sample code which produces the correct UTF-16BE (and also UTF-8, UTF-16LE codes):
#include <iostream>
#include <string>
#include <locale>
#include <codecvt>
#include <iomanip>
// utility function for output
void hex_print(const std::string& s)
std::cout << std::hex << std::setfill('0');
for(unsigned char c : s)
std::cout << std::setw(2) << static_cast<int>(c) << ' ';
std::cout << std::dec << '\n';
int main()
// wide character data
// std::wstring wstr = L"z\u00df\u6c34\U0001f34c"; // or L"zß水🍌"
std::wstring wstr = L"𝔸";
// wide to UTF-8
std::wstring_convert<std::codecvt_utf8<wchar_t>> conv1;
std::string u8str = conv1.to_bytes(wstr);
std::cout << "UTF-8 conversion produced " << u8str.size() << " bytes:\n";
// wide to UTF-16le
std::wstring_convert<std::codecvt_utf16<wchar_t, 0x10ffff, std::little_endian>> conv2;
std::string u16str = conv2.to_bytes(wstr);
std::cout << "UTF-16le conversion produced " << u16str.size() << " bytes:\n";
// wide to UTF-16be
std::wstring_convert<std::codecvt_utf16<wchar_t>> conv3;
std::string u16bestr = conv3.to_bytes(wstr);
std::cout << "UTF-16be conversion produced " << u16bestr.size() << " bytes:\n";
The tm