Could TeXmacs store source file in a readable format?

Could TeXmacs store source file in a readable format for non-ascii characters? For example, save .tm format in UTF-8 txt files with non-ascii characters as is, instead of using non-readable codes.

In Mogan, there is an experimental feature: Use UTF-8 for CJK in TM format

It is a work around. XmacsLabs are planning to design a new format called tmu (TeXmacs format with Unicode support).

2 Likes

I tried, but the saved .tm file cannot be re-opened by TeXmacs. Could this feature be added to TeXmacs? Or could UTF-8 .tm format be exchangeable?

Could this feature be added to TeXmacs?

It was added to GNU TeXmacs but it was reverted then.

This is experimental feature is a work-around. To be honest, the revert by Joris is absolutely correct. Because it may cause other problems.

OK, I see. But it’s still a convenient feature, in my opinion, that could, for example, facilitate check or even edit the source file occasionally without openning TeXmacs.

Then I’ll give Mogan another try. Thanks for your reply.

Darcy: do you remember which kind of problems it generated? I think we should aim to have sooner or later a unicode compatible format even for TeXmacs.

BTW, this was mentioned a decade ago: https://lists.gnu.org/archive/html/texmacs-dev/2012-09/msg00014.html

Yes, I’m aware of this: changing the internal workings of TeXmacs is an extensive endeavour which is not likely to happen soon. But using unicode for storing the TeXmacs files is a different matter. Upon loading the file could be converted to TeXmacs encoding and upon saving it could be converted back to utf8. This could be a first step for a gradual adoption of unicode. Since Darcy already tried, I was wondering what was going wrong.

Joris said it was “dangerous” in commit 13916. One reason I see is that saving in a new (internally utf-8) format while keeping the same file extension breaks loading these files with TeXmacs programs that are not upgraded to handle them, with no explanation for their users, as @hffqyd reports in his above comment.
Also one would need to handle the new format when copying-to-TeXmacs - pasting-from-TeXmacs, in converters, etc.

Yes, I see. Then maybe we can have an additional extension (as @darcy suggested) .tmu or .tmx which will allow us to use a new format for the data.

tm format is not in UTF-8. That’s the major problem.

If I understand correctly, the Unix-like systems do not distinguish files by extension. It seems better to record the difference in the header, say.

Yeah, but TeXmacs can still distinguish .tm from, if named as @mgubi suggested, .tmu by file extension.

But could it destroy backward compatibility to have the same extension as mentioned by @pjoyez?

I suppose a new format/file extension might be better.

If you look at tm files, you can see that the first line looks like

<TeXmacs|2.1.2>

I am not sure how TeXmacs deals with this exactly, but it seems to me that TeXmacs uses this to distinguish the file types. For example, if you save files in “TeXmacs Scheme”, you get a .stm file with initial characters being

(document (TeXmacs "2.1.2") ...

and even if you rename it as tm file, TeXmacs could recognize it.


I don’t agree with distinction by file extensions is that they are usually indicative to users, and programs should not depend on that — programs should distinguish files via contents.