Reddit discussion on simplifying TeXmacs document trees in real-time

Check it out:

https://www.reddit.com/r/compsci/comments/m4aww1/is_there_an_efficient_algorithm_for_simplifying/

Can you clarify what is the problem you are trying to solve? I’m using TeXmacs since long time and I have hard time to recall a situation where I wished for some form of “simplification”. What happened to me often is that, while editing inexperienced users leave around a lot of garbage, like empty superscripts/subscripts which then cause errors in LaTeX conversion. This could be easy to find and remove, but I do not think this need to run continuously in the background. Also, TeXmacs do already do some simplifications (I think) like merging nearby similar environments, but again this is needed only very locally at the insertion point.

When you edit a document in TeXmacs, you are constantly creating underlying structure that has no visual effect but that does affect future editing.

To help you out, TeXmacs uses the focus box and status bar to reveal some of this underlying structure. But it would be better to not worry about it.

Having a simplification algorithm would make the underlying structure be the same irrespective of the way you edit the document. This would make future editing more predictable and there is less need for the focus box and status bar.

As an example, suppose you have a list of items:

  • first item
  • second item
  • third item
  • fourth item

Then you decide to gray out the first two items. Later, you decide to gray out the third item. So this results in two gray regions that are adjacent to each other even though only one is needed. And this affects future editing in an annoying way.

As another example, the order of nesting various things (e.g., boldface and italics) matters even when the visual output is the same. Again, this affects future editing in an annoying way.

Maybe I can translate what you are saying in another way: you would like that there is a one-to-one correspondence between the structure and the appearance of the document. Right now it is possible to deduce the appearance from the structure, but not the structure from the appearance.

An approach alternative to the one-to-one correspondence would be a “dual-view” editing, where one has the source and the typeset document side-by-side and can edit both, while editing each changes both.

A program written in the '80s as part of a PhD thesis, Lilac, did that. Here is a link to a pdf of the thesis (both links provide the same document):


http://www.bitsavers.org/pdf/dec/tech_reports/SRC-RR-33.pdf

I think it would be more ambitious to have one-to-one correspondence (or close to it) to make editing in WYSIWYG nicer and more predictable.

So why not try real-time document tree simplification?

I would need to think about it, but I can see a downside: if the appearance determines the structure, it makes the document less flexible—expressing the same thing with a different structure may make it easier to modify the document in a different way.

Apparently, a solution to this editing problem is discussed here:

https://medium.engineering/why-contenteditable-is-terrible-122d8a40e480

Dear Amir,
I do not see how that post applies to TeXmacs. Indeed as far as I can see TeXmacs behaves very differently to any other editor I know in that at every moment the user has a visual feedback on what is the structure of the document, and also that you have “infinitesimal” positions, just before or just after a tag. I do not know of any other editor which has this features and these eliminate many of the problems I saw discussed in that post, or elsewhere for that matter. This is a basic decision and to me is a good feature which you seems not to like, indeed you say: “To help you out, TeXmacs uses the focus box and status bar to reveal some of this underlying structure. But it would be better to not worry about it.”. But you do not give arguments to support this. Why would be better not to worry? One has to worry, because it affects subsequent editing operations and because by design TeXmacs is a structured editor, and not just a WYSIWYG editor, indeed I would maintain that TeXmacs is not WYSIWYG more than WYSIWYM or other paradigms which do not really fit what editing experience is about in TeXmacs. It would be useful to me to understand exactly what is the proposal you want to put forward and why. It seems to require big changes in TeXmacs so one has better to be sure that it is worth. Simplification is a vague concept. What you want to simplify? Simplification or normalisation? Use cases (beyond those you already listed which do not seem convincing to me, like the one on lists)? Note that TeXmacs do already some simplification: if you start a math mode just after another math mode then they will be merged. Actually in my personal experience this is very irritating because most of the time I mean to start a new math mode. Any automatic decision, out of the control of the user, is to be evaluated carefully because remove the control to the final user. This is the problem discussed in the post you mention, as far as I can see.

Giovanni, this is not completely correct: TeXmacs reflects precisely the structure of the document in the visual appearance, this is why we call it structured editor and put an emphasis on this (see the main site where is written " * It provides a unified and user friendly framework for editing structured documents with different types of content: text, mathematics, graphics, interactive content, slides, etc."). See the paper: https://www.texmacs.org/joris/gut/gut-abs.html In that paper the flaws of standard editors are discussed together with the solutions offered by TeXmacs. In order for all this discussion to be fruitful it would be nice to try to identify current flaws of TeXmacs and then propose an improved editing mechanism.

See for example Section 3.4 “Transparency and controllability” in the paper of Joris I cited before where he discusses the tradeoff and says explicitly “In a wysiwyg editor, it is not clear whether the text “bold blue” is bold and blue or blue and bold. Furthermore, how to put the cursor in a position where it is possible to type bold but not blue text?”

If you can position the cursor where it is possible to type bold but not blue, how would you position the cursor to type blue but not bold?

In the current implementation is not possible of course, there is no cursor position corresponding to this possible editing operation, short of creating a new inside environment. This is a tradeoff of structured documents: for example this prevents to have overlapping environments were you start environment A, start B, then end A and then end B. This is also another limitation, which is for example quite serious since you cannot create an environment which highlight text which spans several other environments (but not contains them directly). So the approach followed by TeXmacs has limitations, certain. However any choice would have. In the case you suggest (bold/blue vs. blue/bold) one can imagine to introduce editing operation which swap the two innermost environments, but under conditions. It would be useful? Maybe, but adds complexity to the user interface. It is worth? I do not know. What would you propose in the case bold/blue you mention? What a “simplification algorithm” would handle the situation?

I had a student which didn’t got fully the concepts underlying texmacs and in a paper we were writing created a series of nested environments black/blue/back/blue, i.e. instead of removing the blue environment (which he did not need anymore) he created an internal black environment, just inside. This is of course an error and could be easily simplified, e.g. when TeXmacs write the file to disk or at another moment when the user ask for “normalisation” of a document. There are also other situation that come to my mind where user leave garbage. On one hand it would be better to educate the user to use the tool properly, on the other one could foresee a normalisation operation happening at writing (or reading) time. I’m more skeptical about any automation which works uncontrolled from the user.

I know that paper, and I have quoted it in other discussions in support of TeXmacs :slight_smile:
Said this, as far as I have experienced, TeXmacs reflects precisely the structure of the document in the visual appearance when you include the behaviour under the movement of the cursor; or, perhaps this is more precise: when you see the document with all of the possible cursor positions (but maybe this is wrong and the previous one is the correct one).

If you do not move the cursor, and keep it away from the place where a given structural unit is placed, an empty environment can (it depends on the environment) become invisible, for example, and different environments can look the same (e.g. two macros which expand in the same way, because maybe you haven’t yet decided how to distinguish them). By the way in this case the “Next similar” command is helpful to find all of the macros of a given kind. IMHO dual-view editing (maybe available as an option) would give additional help there.

I agree that the simplifications (proposal of Amir) would need to remain under control of the user.

I need some time to go through it, although, at a very quick first read, the sentence “A good WYSIWYG editor is axiomatically inconsistent with a good general-purpose HTML editor” doesn’t thrill me very much (admitting that I am understanding the issue, I do think that TeXmacs frames are at least a partial solution for that).