Discussion:
bug#23501: Non-regex-based syntax highlighting
(too old to reply)
Nir Friedman
2016-05-10 03:12:47 UTC
Permalink
I'm considering using emacs as a platform for C++ development. One thing
that seems to lag behind on emacs at the moment is that all of the syntax
highlighting for C++ is (as far as I can tell) regex based. This severely
limits the accuracy and discrimination that the syntax highlighter can
achieve. There are now some packages for emacs that use a clang based
backends to get actual AST information. Perhaps it would be possible to
write some kind of hooks or template for major modes that would make it
easier for package authors to change how syntax highlighting is performed
in major modes?
Richard Stallman
2016-05-10 15:57:25 UTC
Permalink
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

We develop GCC as well as Emacs. To adopt a competitor to GCC
as a "solition" would be self defeating.

A proper solution is to extend GCC so that it does the necessary job.
--
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.
Eli Zaretskii
2016-05-10 15:59:22 UTC
Permalink
Date: Mon, 9 May 2016 23:12:47 -0400
I'm considering using emacs as a platform for C++ development. One thing that seems to lag behind on
emacs at the moment is that all of the syntax highlighting for C++ is (as far as I can tell) regex based. This
severely limits the accuracy and discrimination that the syntax highlighter can achieve. There are now some
packages for emacs that use a clang based backends to get actual AST information. Perhaps it would be
possible to write some kind of hooks or template for major modes that would make it easier for package
authors to change how syntax highlighting is performed in major modes?
Sorry, I don't think I really understand what is the complaint/issue
you are raising here, and what solution would you like to suggest for
those issues. Could you perhaps elaborate? A specific example where
the current code doesn't work would be a good starting point.

Thanks.
Nir Friedman
2016-05-10 18:55:41 UTC
Permalink
For instance, suppose I write some C++ that looks like this:

using MyType = Something::OtherType;

There's no way to determine locally whether Something is a namespace or
itself a type, so a regex based syntax highlighter cannot consistently
color namespaces and classes differently. To take one example, Eclipse will
perform this determination and will consistently color namespaces and
classes any color you like. It can do this because it parses the code and
uses the AST. It makes many more useful distinctions which cannot be made
locally; for example when calling a function foo from a member function bar
of an object, there is no way to easily tell whether foo is also a member
of the same object as bar, or whether foo is just a free function in the
same namespace. One has privileged access and the other probably doesn't,
so it's a genuinely useful distinction.

I guess I'm a bit less clear on the solution, because I don't have a good
sense of who the owner of the C++ major mode is, and how the code is
structured. My thinking was that perhaps hooks could be added to make it
easier for plugin writers to modify the syntax coloring of the major mode.
As opposed to plugin writers needing to rewrite the C++ major mode from
scratch just to change the syntax coloring.
Post by Nir Friedman
Date: Mon, 9 May 2016 23:12:47 -0400
I'm considering using emacs as a platform for C++ development. One thing
that seems to lag behind on
emacs at the moment is that all of the syntax highlighting for C++ is
(as far as I can tell) regex based. This
severely limits the accuracy and discrimination that the syntax
highlighter can achieve. There are now some
packages for emacs that use a clang based backends to get actual AST
information. Perhaps it would be
possible to write some kind of hooks or template for major modes that
would make it easier for package
authors to change how syntax highlighting is performed in major modes?
Sorry, I don't think I really understand what is the complaint/issue
you are raising here, and what solution would you like to suggest for
those issues. Could you perhaps elaborate? A specific example where
the current code doesn't work would be a good starting point.
Thanks.
Eli Zaretskii
2016-05-10 19:21:35 UTC
Permalink
Date: Tue, 10 May 2016 14:55:41 -0400
I guess I'm a bit less clear on the solution, because I don't have a good sense of who the owner of the C++
major mode is, and how the code is structured. My thinking was that perhaps hooks could be added to make
it easier for plugin writers to modify the syntax coloring of the major mode. As opposed to plugin writers
needing to rewrite the C++ major mode from scratch just to change the syntax coloring.
Colors are added at display time, so hooks will not help here. Or at
least it isn't immediately clear to me how they could help.

I suggest to study how syntax highlighting works in Emacs, including
the JIT font-lock feature and its relation to the display engine.
Until you have a good understanding of how this stuff works, I don't
think you will be able to come with a design for hooks which external
tools could use for this purpose.
Nir Friedman
2016-05-10 20:16:03 UTC
Permalink
My idea for a hook was basically to make it possible to provide a callback
function to the Major mode. If this callback function is provided, then
when a new file is loaded or an existing one saved with modifications, the
callback function is called with the full path to the file. The callback
function must return something that basically tells the major mode how to
color everything. A simple way would just be to return a list of the colors
for every single non-whitespace character taken sequentially. A single very
fast pass through this list would then be able to color every character.

Is there a reason why that would not be workable? Also, can you point me to
where exactly (e.g. via link to the emacs github mirror) the major modes
are stored?
Date: Tue, 10 May 2016 14:55:41 -0400
I guess I'm a bit less clear on the solution, because I don't have a
good sense of who the owner of the C++
major mode is, and how the code is structured. My thinking was that
perhaps hooks could be added to make
it easier for plugin writers to modify the syntax coloring of the major
mode. As opposed to plugin writers
needing to rewrite the C++ major mode from scratch just to change the
syntax coloring.
Colors are added at display time, so hooks will not help here. Or at
least it isn't immediately clear to me how they could help.
I suggest to study how syntax highlighting works in Emacs, including
the JIT font-lock feature and its relation to the display engine.
Until you have a good understanding of how this stuff works, I don't
think you will be able to come with a design for hooks which external
tools could use for this purpose.
Eli Zaretskii
2016-05-11 07:49:34 UTC
Permalink
Date: Tue, 10 May 2016 16:16:03 -0400
My idea for a hook was basically to make it possible to provide a callback function to the Major mode. If this
callback function is provided, then when a new file is loaded or an existing one saved with modifications, the
callback function is called with the full path to the file.
The syntax highlighting should change also when you modify the buffer,
not only when you save it. How will that work with your proposed hook?
The callback function must return something that
basically tells the major mode how to color everything. A simple way would just be to return a list of the colors
for every single non-whitespace character taken sequentially. A single very fast pass through this list would
then be able to color every character.
The hook cannot return a color, because the colors are defined via
faces. It should return faces instead.
Is there a reason why that would not be workable?
Maybe it is workable, but you are missing too many details of how
syntax highlight works in Emacs. As I wrote previously, I encourage
you to study how that works, in order for the proposal to be workable
and practical.
Also, can you point me to where exactly (e.g. via link to the
emacs github mirror) the major modes are stored?
It's not the major mode that you need to look at, it's the font-lock
machinery. Major modes just use the font-lock features by setting the
font-lock faces on portions of the buffer. Then at display time, the
visible portion of the buffer are displayed as specified by those
faces. You will see that each major mode simply sets the font-lock
faces, and leaves the rest to the core features.

See font-lock.el and font-core.el for the font-lock features, and
jit-lock.el for the JIT coloring of the visible portions of the
buffer.
John Mastro
2016-05-11 17:56:32 UTC
Permalink
Post by Nir Friedman
Is there a reason why that would not be workable? Also, can you point me to
where exactly (e.g. via link to the emacs github mirror) the major modes are
stored?
To find a particular major mode (or other library), you can use
`find-library'. For instance, try `M-x find-library RET cc-mode RET' and
`M-x find-library RET font-lock RET'.
--
john
Loading...