Friday, August 31, 2007

Pretty printing comments

(Note: there's a new demo implementing what's described below; sadly, snapdrive.net seems to have a lot of downtime, if the link doesn't work please try later)

Pretty printing is simple, right? Just render the syntax tree as text in a pretty way and that's it.

Hmmm. What about comments? Some pretty printers (Synpl earlier versions included) just throw away comments, since they don't appear in the syntax tree. This is very wrong, as it loses precious information.

If we pretty print comments, where do we place them? The shape of the program may have changed drastically by reformatting it.

The solution I adopted is pretty naive, but it works rather well. This is what I do:
  • when tokenizing the source, collect comments in a separate list; record the content and the starting and ending positions for each comment;
  • allow some syntax tree nodes to have "after" and "before" comments; a good list of nodes to allow comments is: statements, item declarations in "DECLARE" sections of blocks, functions, procedures, triggers, packages (both spec and body) and types (also both spec and body);
  • associate each comment with the closest node that accepts comments and doesn't overlap the comment;
  • when pretty printing the node, print the comments with the same level of indentation as the node (for both "above" and "before" list of comments);
This works well, but... what about cases like:

begin
if 1 < a then
c := 1;
else
null;
-- uncomment next PL/SQL line to remove warnings about variable c
-- sometimes being used before initialization

--c := 2;
end if;
null;
end;
There are three comments here. They should all be associated to the null statement after the else. This is what happens for the first two comments. But the third is "closer" to the second null statement, and is therefore inserted in its "before" list and shows between end if and null in the pretty print. This is clearly wrong. Also, the blank line between comments is lost in translation.

The solution? Group consecutive comments together into a larger comment, and preserve the blank lines in between.

After implementing comment grouping things work as expected. There are plenty of things left to chance, such as:
  • what to do with code inside comments
  • what about broken code inside comments
  • what about formatting paragraphs in comments so they fit in the required number of columns?
Turns out pretty printing is an art in itself :)

Friday, August 17, 2007

New Synpl demo

Here's the first demo since switching languages to Scala.

What's in there:
  • a parser for many SQL and PL/SQL language elements (blocks, basic variable definitions, most types of statements, most DML stuff, procedures, functions, packages)
  • a basic analyzer (can look at a PL/SQL block and report variables that are used before being initialized, shadowed variables (same names used in inner blocks), variables that are defined and/or initialized but are never used (usage is not detected in all cases, and initialization by INTO clauses is not recognized)
  • a pretty printer (adds missing elements - for instance, if the user forgot a 'THEN', the parser will report an error but recover and the pretty printer will show the source with all such errors corrected)
  • a small GUI to help with testing
What's missing:
  • many SQL and PL/SQL language elements are not identified by the parser (explicit join syntax for SQL, variable initialization at definition, PL/SQL objects and arrays etc.)
  • most of the analyzer features
  • advanced pretty printing (such as: detect total length of a SELECT query, and if it doesn't overflow the current line, print it on a single line)
Use 'run.bat' to launch the GUI. Use "File|Open" within the GUI to reach a sample PL/SQL source that demonstrates the error-recovery features of the parser, the current analyzer features and the pretty printer.


The GUI was tested with Java 1.6.

Thursday, August 16, 2007

Synpl in Scala

I've decided to rewrite synpl in Scala. It's much better than C# for this particular task because of several reasons:
  • it compiles to Java bytecodes, which means I can build a JAR file and then use it to build JDeveloper or SqlDeveloper extensions
  • Scala has pattern matching and "case classes" which make it very easy to build AST trees and to walk those trees to do static analysis
So far, so good. These blog entries (1) (2) proved very useful as a Scala cheat sheet and saved me a lot of time.

Right now a significant chunk of the PL/SQL grammar is implemented and some static analysis also works (detecting variable use before initialization). There's plenty of work to be done, but it looks like Scala is a great language choice.

Probably tomorrow or the day after I'll put together a new demo.

Wednesday, June 27, 2007

New Ideas

I've been thinking about synpl and applications of a SQL and PL/SQL source code analyzer for some time now.

Here's some of the ideas that I had (some of them don't require a parser):
  • build a visual representation of the database; have the triggers/constraints listed next to each table - I understand Visual Studio offers something like this, even for Oracle, I need to look into this;
  • look into the code and find out what indexes should exist (analyze WHERE clauses and INSERTs, try to find a balance between them)
  • analyze where clauses and discover undeclared foreign keys
  • build a measure of the connection between tables (strongly connected vs. weakly connected) and then build sets of closely related tables - could help people trying to understand vast databases with lots of tables by pointing out which tables are most closely related; trying to understand a group of 10 tables that are mostly connected to one another and have a few connections to other tables is much easier than trying to understand a group of 50 tables;
  • build a Django admin-like app with Java hooks (load classes for validation, preprocessing, postprocessing, custom actions, reporting); Application Express does this already? need to look into it;
  • analyze INSERTs and UPDATEs and try to infer if some fields only have a limited number of values (missing restrictions?)
Parsing the SQL queries to discover connections between tables should go beyond SQL queries in PL/SQL (i.e. use SQL queries in other languages) but the source code isn't always available. It's very rarely available, in fact.

Sunday, June 24, 2007

New synpl demo

I've prepared a new demo of synpl.

It's probably broken in many places as I didn't get a chance to test it properly.

What's new:
  • the ability to 'correct' slightly incorrect syntax (such as missing a ';' or an 'if' in 'end if')
  • it can show the corrected version in a separate window; the corrections are highlighted
  • it can load/save samples

Tuesday, June 19, 2007

Plans

I'm thinking of switching Synpl to Java, so that I can write a JDeveloper extension - and later a SqlDeveloper one. The problem is that the JDeveloper extension APIs aren't that well documented. There are even two competing APIs, one that is specific to Oracle and the other which is a JSR (198, to be more precise).

There are illuminating code samples for the Oracle API (which I hope can be used to develop an extension if combined with the API's JavaDoc). On the other hand, the JSR 198 seems more restrictive. One one forum discussion someone was complaining about the fact that it's impossible to write a code formatter with the JSR 198 API. I'm not sure that's true, but from what I've seen in the JavaDoc it's certainly a difficult task.

Hmmm. Back to the GIS application and paper ;)

Thursday, June 14, 2007

Why?

What's synpl? A simple PL/SQL parser (for now) and analyzer (in the near future) written in an attempt to provide friendly compile/parse error messages and warnings about problems that the compiler is silent about.

Since I need to show synpl to people and email seems to be a tedious way to do it, I've started this blog to help me communicate with users willing to test synpl and provide feedback.

The first available demo is here. You need Windows and .NET framework 2 to run it (the GUI seems to be running fine under Mono 1.2.4 on Debian, but it may break in the future).