Debugging Emacs Issues
When working with Emacs, you can use M-x report-emacs-bug
RET to report any issues you notice.
The Emacs community is very responsive and corrects some of the
issues that are filed. Therefore, reporting issues may
enhance your productivity and also that of others.
The
file etc/DEBUG
that is contained in the Emacs source explains how to build Emacs
with support for debugging in the sense of using the
GNU Debugger (GDB). This is a low-level tool that can be
quite useful for developers, but it is not needed
to apply the methods I explain in this text.
Instead of wrestling with low-level tools, I explain a few general
techniques that only require Emacs itself to find and
demonstrate issues, and to report them in such a way that
they can be easily corrected.
General advice
Here are a few general tips for detecting and debugging
Emacs issues:
- Be vigilant. Is there any issue in Emacs that
annoys you, even barely noticably? Do not tolerate
such issues! If you are a tolerant person, this may
be tough for you at first. However, there is no need
to accept wrong programs! By improving a program, you
will improve your own environment and also that of
many others.
- Try searching with brute force. Many issues can be
triggered by simply repeatedly performing one or two
simple steps in an automated way. This is well worth
trying and can
expose memory leaks,
rare timing issues
and several other problems.
- Use Emacs's built-in features to verify internal
properties. For example, see Ami
Fischman's ingenious
approach for tracking down a memory leak. You
can set a timer to continuously perform such tasks
in the background.
- Be diligent. Very often, you will need to slow
down, because you will run into too many issues at once
to keep track of all of them if you are going too fast.
These methods carry over also to other projects. For example, I
found crashes, memory leaks, starvation issues,
rare mistakes and
race conditions
by applying the same ideas in other contexts.
Case studies
Here is a short account describing how I found a series of issues
in Emacs, starting from 3 initial issues. I am telling
you this to give you a few ideas for locating and reproducing
issues. Also, I would like to encourage you to also report any
issues you find, and to never lose hope even if you find many
additional problems along the way. In fact, you will come to
regard that as the typical case.
In 2007, I was working on a complex document, and I
regularly encountered the following 3 issues:
- Hang: Sometimes, particularly after a network
connection, Emacs would simply hang, and could no
longer be interrupted with C-g.
- Crash: Sometimes, Emacs would simply crash, especially when
using M-<.
- No flyspell: Sometimes, flyspell was
unexpectedly simply disabled, even though I had enabled
it and relied on it for spellchecking.
Each of these issues was extremely annoying to me. However, I was
so busy with writing the document and related papers that I did
not take the time to look into any of these issues. I briefly
filed the first issue as a GNUS problem, but never heard back
and did not have the time to follow up on it. In retrospect,
I would likely even have saved time if I had taken a
day off to reproduce and file at least one of
these issues.
In March 2008, the document was finished, and I finally had
the time to look into all these issues for real.
I first filed the issue that had annoyed me the most: Emacs
simply hanging.
#84: 23.0.60; Occasional hangs in flyspell-mode and ispell-word
A few months later, I filed another, slightly different case that
seemed at least related:
#425: 23.0.60; Hang in wait_reading_process_output
After this, I finally started to construct systematic
test cases. The idea was to simply repeatedly invoke
something via a program, and see what happens. The design of Emacs
makes such tests very simple and pleasant: Most Emacs features can
be triggered by very simple Elisp programs.
The first test case I constructed repeatedly triggers one of
the situations in which I had noticed that Emacs would sometimes
hang:
spellchecking a word in the current buffer. The idea is
easy to implement in Emacs Lisp,
using ispell-word:
(let ((n 0))
(with-temp-buffer
(insert "test")
(while t
(setq n (1+ n))
(when (= (mod n 100) 0)
(message "n: %s -- %s" n (emacs-uptime)))
(ispell-word nil t))))
With this test case, I could not reproduce the hang, but I
found an issue in the underlying spellchecking program
with it!
#496: 23.0.60; ispell-word becomes increasingly slower
With Aspell up to and including 0.60.0, the following invocation
uses increasingly more memory:
while true; do echo "-"; done | aspell -a
At this point, I began to doubt some of the believes I had
hitherto held: If even the spellchecker contains such mistakes,
maybe Emacs is also not as robust as I had thought.
For the time being, the slowdown in the spellchecker prevented me
from running more complex cases around the clock. So I decided to
find the cause of the crash mentioned
above. I remembered that the crash had once happened when
working with SVG files, so I concentrated on SVG-related
workflows. One of the first issues I found was:
#501: 23.0.60; Viewing SVG files: Error when pressing C-v, M-v
Very soon after that, I encountered the crash again, but had not yet
found a reliable test case:
#502: 23.0.60; Occasional crash when viewing SVG files
While working on all this, the hang that was
the most annoying issue also kept reappearing. I filed an
additional issue since it looked different from the case I had
already reported:
#532: 23.0.60; hang, then crash
For the time being, I could not do more about this.
Now, regarding the flyspell issue: Picture
yourself working on a long and complex text,
with flyspell-mode enabled. Then, after several
hours of working on the text, you realize that
flyspell no longer underlines
spelling mistakes because it was silently
disabled. If you rely on the spellchecker and work under the
assumption that it is running, this can cause a lot of
additional work because you have to re-check parts you have
already written. In addition, having to worry about the
spellchecker is a major detraction from your actual work. This is
completely unacceptable, and so I wanted to find the cause of
this problem.
When the spell checker was silently disabled, I knew (from
running ps on a terminal) that the
underlying aspell process was also no longer
running. Thus, I decided to pinpoint the exact moment
the aspell process stopped running, and make
Emacs alert me if that happened.
A simple way to do this is:
- invoke ps from within Emacs
- write its output into a temporary buffer
- use automated text search to see whether
"aspell" appears in that buffer. If
it doesn't, then aspell has stopped running.
In Emacs Lisp, I wrote this as follows:
(defun aspell-alive-p ()
(with-temp-buffer
(let ((p (start-process "ps" (current-buffer) "ps" "-A")))
(while (eq (process-status p) 'run)
(accept-process-output p nil nil t))
(goto-char (point-min))
(search-forward-regexp
(format "%s.*aspell" (process-id ispell-process)) nil t))))
This sounds like one of the simplest approaches. Yet, it exposed
an underlying additional issue of Emacs: When I first did it like
this, the aspell process unexpectedly stopped
running just by applying this simple recipe! In fact, I found
out that just invoking the following form destroys
the aspell process that is used
by flyspell-mode:
(with-temp-buffer (start-process "ps" (current-buffer) "ps"))
Thus I filed the following issue, trimmed down to the essence, and
using bc as an example process:
#554:
OSX: with-temp-buffer kills unrelated processes
Meanwhile, I continued with the following definition, which checks
twice per second
whether flyspell-post-command-hook is still enabled:
(defun my-flyspell-check ()
(unless (memq 'flyspell-post-command-hook post-command-hook)
(when flyspell-mode
(with-current-buffer (get-buffer-create "flywarn")
(insert "Flyspell no longer active!\n"))
(display-buffer "flywarn"))))
(setq flycheck-timer (run-with-timer 0 0.5 'my-flyspell-check))
For safety, I am still, to this day, running Emacs with this
background check constantly enabled! It's simply awesome that you
can write Emacs definitions that check integrity constraints of
Emacs itself.
It was getting time to look again at possible causes of
the hang. By then, I had already
found a combination of actions that always produced the
hang. It involved three invocations of GNUS, and also a
spellchecker. The recipe is described
in f.el, comprising the following
definitions and instructions:
;; 1) emacs -Q f.el -f eval-buffer
;; 2) M-x gnus RET q y
;; M-x gnus RET q y
;; 3) M-! killall -9 aspell RET
;; 4) M-x gnus RET q y
(defun reactivate-flyspell ()
(unless (memq 'flyspell-post-command-hook post-command-hook)
(flyspell-mode 1)))
(setq my-idle (run-with-idle-timer 0.1 t 'reactivate-flyspell))
Before reporting such a quite complex recipe, I plastered the
networking code with debugging information and tracked the actual
sequence of low-level events within Emacs. I then wrote a
simple self-contained test case that clearly exhibited the
same issue, using three file descriptors:
#562: 23.0.60; OSX: make-network-process reuses existing file descriptors
Two days later, I applied textual bisection and a global variable
to pinpoint the code that caused the problematic behaviour only on
the second invocation of GNUS or a network connection.
The solution of this problem, which had accompanied me for months,
was to apply the following single line patch:
diff --git a/src/process.c b/src/process.c
index b0bebeb..b5aebdc 100644
--- a/src/process.c
+++ b/src/process.c
@@ -3366,7 +3374,7 @@ usage: (make-network-process &rest ARGS) */)
hints.ai_protocol = 0;
#ifdef HAVE_RES_INIT
- res_init ();
+ /* res_init (); */
#endif
ret = getaddrinfo (SDATA (host), portstring, &hints, &res);
It turned out that this was correctly diagnosed by
YAMAMOTO Mituharu months before
I reported this issue, but
the solution
suggested by Chong Yidong had unfortunately not
been applied. It also turned out that
upgrading my OS to the then latest version
would have solved the issue as well.
This still left two of my main issues unresolved. I wanted to
track down the cause of the crash
next. Having successfully solved the previous issue, I became
increasingly relentless in the ways I tested Emacs. To look for
causes of the crash, I again applied search
by brute force: I knew that the crash had once
happened when viewing an SVG file, and so repeatedly
displaying such a file may again trigger it.
Therefore, I wrote an Emacs Lisp program that
simply repeatedly displays an SVG file:
(progn
(find-file "~/emacs/etc/images/splash.svg")
(while t
(image-toggle-display)
(redisplay)))
Again, I could not elicit a crash in this way, but I found yet
another issue in Emacs:
#576: 23.0.60; displaying SVG leaks memory
Then, I also encountered the crash when working with other files,
and I eventually constructed the following test case, which
sufficed to correct the issue:
#580: 23.0.60; OSX: Crash in show-paren-mode
Thus, two of the three primary issues were now fixed.
In the following weeks, I celebrated and recapitulated what I had
found out. In doing so, I remembered an additional issue I
had encountered, and had hurriedly passed over, when
constructing test cases for the hang:
Sometimes, when exiting Emacs, it would ask me to
quit fewer processes than I knew I had started.
Thus, I filed one more issue:
#723: 23.0.60; query-on-exit-flag sometimes unexpectedly nil
It is understandable if you do not take the time to pursue
additional issues that are seemingly unrelated to what you
actually want to accomplish. Still, my recommendation is to
work very diligently, and if necessary, quite slowly and
carefully when debugging programs. In very many cases, you will
find several additional issues in this process. These issues
may also help to improve robustness and even allow
additional methods of stressing the core features you "actually"
care about.
Once more, a good approach is to simply let a program perform
the work for you. In the above case, I eventually
triggered a hard crash of the entire
operating system by repeatedly starting and stopping a
process:
(let ((n 0))
(while t
(setq n (1+ n))
(message "iteration: %s" n)
(delete-process (start-process "bc" nil "bc"))))
I thus reported yet another issue:
#726: 23.0.60; OSX: Complete OS crash
Also in this case, it turned out that upgrading the OS to
the then latest version would have solved the issue.
At this point, I was already running Emacs with the background
check shown above, and I had already seen the warning being
triggered that flyspell was no longer active. I thus
filed the following issue:
728: 23.0.60; flyspell checking is sometimes silently disabled
If you have read through the above, you can appreciate how good it
feels to receive the response "You need to try and track down the
porigian of this message." when filing this issue.
One month later, I found a reliable test case and submitted a
patch that corrected this issue. Thus, at last, all 3 issues
I initially mentioned were fixed!
Opening words
Ulrich
Neumerkel told me two powerful metaphors about programs:
First, in all programs there is a path that is well-trodden
and unlikely to contain mistakes. Once you leave this path, you
will immediately run into mistake after mistake. Of the cases
above, a few arose only because I was using a different
operating system than most other Emacs users had at
that time. Most of the issues arose because I was using the
available functionality in different ways than it had been
used in the past, or because I was the first to notice and
report them.
Second, the functionality of a program is in a way like an
organic muscle: If you stress it in a
systematic way, it tends to get stronger
over time.
With these opening words,
happy M-x report-emacs-bug RETting!
Main page