Friday, August 31, 2007

Pretty printing comments

(Note: there's a new demo implementing what's described below; sadly, snapdrive.net seems to have a lot of downtime, if the link doesn't work please try later)

Pretty printing is simple, right? Just render the syntax tree as text in a pretty way and that's it.

Hmmm. What about comments? Some pretty printers (Synpl earlier versions included) just throw away comments, since they don't appear in the syntax tree. This is very wrong, as it loses precious information.

If we pretty print comments, where do we place them? The shape of the program may have changed drastically by reformatting it.

The solution I adopted is pretty naive, but it works rather well. This is what I do:
  • when tokenizing the source, collect comments in a separate list; record the content and the starting and ending positions for each comment;
  • allow some syntax tree nodes to have "after" and "before" comments; a good list of nodes to allow comments is: statements, item declarations in "DECLARE" sections of blocks, functions, procedures, triggers, packages (both spec and body) and types (also both spec and body);
  • associate each comment with the closest node that accepts comments and doesn't overlap the comment;
  • when pretty printing the node, print the comments with the same level of indentation as the node (for both "above" and "before" list of comments);
This works well, but... what about cases like:

begin
if 1 < a then
c := 1;
else
null;
-- uncomment next PL/SQL line to remove warnings about variable c
-- sometimes being used before initialization

--c := 2;
end if;
null;
end;
There are three comments here. They should all be associated to the null statement after the else. This is what happens for the first two comments. But the third is "closer" to the second null statement, and is therefore inserted in its "before" list and shows between end if and null in the pretty print. This is clearly wrong. Also, the blank line between comments is lost in translation.

The solution? Group consecutive comments together into a larger comment, and preserve the blank lines in between.

After implementing comment grouping things work as expected. There are plenty of things left to chance, such as:
  • what to do with code inside comments
  • what about broken code inside comments
  • what about formatting paragraphs in comments so they fit in the required number of columns?
Turns out pretty printing is an art in itself :)

No comments: