Wednesday, May 20, 2009

Gedit API Text Coordinates

I need to be able to convert between positions expressed as (line, column) pairs or (offset-in-characters-from-the-beginning-of-the-file). I also need to be able to get and set the cursor position and get and set the current selection. Also get the bounds of the document (first and last position).

The content of this post (and its "sister posts" about the Gedit API) is in a "exploratory notes" format. It doesn't provide a full API description, it is just meant to help me rediscover quickly how to perform certain tasks. I expected I would get this information from some blog somewhere, but it doesn't seem to be the case. I'm making these notes public because I suspect someone may find them useful at some point. If you're that someone, don't forget to use dir(object) in the Python console to explore further, and to browse the C APIs for Gtk/Gdk and Gedit if you need to dig deeper. I used to have some experience with Gtk, but that is a long time ago and I used it from C, not Python. So I'm a kind of newbie to Gtk/Gedit programming. I never used the Gnome APIs ;-)

Gedit uses the concepts of iterators and ranges to define positions in text and regions of text. An iterator can be used to inspect the text it points to, it can be moved forward and backward, it can be inspected to see if it is placed on the end of a line or word (this last feature is not that useful to me, as I'll have my own definitions for word/paragraph/block).

How do we get an iterator? Easy:

>>> v = window.get_active_view()
>>> b = v.get_buffer()
>>> b.insert_at_cursor("First line.\nSecond line\n.")
>>> start, end = b.get_bounds()
>>> end.get_line()
2
>>> end.get_line_offset()
0
>>> end.get_offset()
25


So, get_bounds() returns a sequence containing the iterator that points to the beginning of the document and the end of the document (the first two lines are pretty self explanatory, a "buffer" is a "document", the model displayed in the view).

"end" has a more interesting position ("start" is on offset 0, line 0, column 0, obviously). get_line() returns the line (counting from 0 as the first line), get_line_offset() returns the column (also counting from 0 as the first column) and get_offset() returns the number of characters since the beginning of the file (I suspect this will be the information I will use, as it is easily converted to line/column formats).

Next: look at the text pointed by an iterator, move the iterator.

>>> start.get_chars_in_line()
12
>>> start.get_char()
u'F'
>>> start.forward_char()
True
>>> start.get_char()
u'i'
>>> start.forward_to_line_end()
True
>>> start.get_char()
u'\n'
>>> start.get_offset()
11
>>> start.get_line()
0
>>> start.get_line_offset()
11
>>> start.forward_char()
True
>>> start.get_offset()
12
>>> start.get_line()
1
>>> start.get_line_offset()
0


So I've looked at the char pointed to by the iterator, moved the iterator and played with its position to make sure my assumptions about get_line(), get_line_offset() and get_offset() are correct.

It's a bit weird to use an iterator named "start" to do all this, but since iterators are mutable objects, it worked. BTW, we didn't affect the beginning of the file in any way. "start" just happened to begin its life by pointing at the beginning of the file, that's it.

How to set the position of an iterator if we know the "offset", for instance?

>>> start.get_offset()
12
>>> end.set_offset(start.get_offset())
>>> end.get_offset()
12
>>> end.get_char()
u'S'
>>> start.get_char()
u'S'
>>> end.get_line()
1
>>> end.get_line_offset()
0


So end and start point to the same position, and I can use an iterator to convert between the offset-from-BOF and (line, column) representations of a position. I can also set the line and column via set_line() and set_line_offset() then get the offset-from-BOF using get_offset(), so the conversion works both ways.

How do I get the iterator at cursor? Place the cursor at the end of the word "Second" on the second line of text using the mouse.

>>> b.get_insert()

>>> b.get_iter_at_mark(b.get_insert())

>>> at_cursor = b.get_iter_at_mark(b.get_insert())
>>> at_cursor.get_line()
1
>>> at_cursor.get_line_offset()
6
>>> at_cursor.get_char()
u' '


I don't know what a mark is yet, and I don't know if these iterators and marks don't need to be removed somehow to avoid memory leaks. However, I'm not too worried about that now. If it becomes a problem, I'll fix it.

The "at_cursor" iterator seems to be positioned properly, though.

How to position the cursor using an iterator (the reverse of getting the position of the cursor as an iterator)? I'll just try to move the cursor over the space between "Second" and "line", then get the position of the cursor again and checked it moved.


>>> at_cursor.forward_char()
True
>>> b.place_cursor(at_cursor)
>>> at_cursor_new = b.get_iter_at_mark(b.get_insert())
>>> at_cursor_new.get_offset()
19
>>> at_cursor.get_offset()
19


So the relevant call is "b.place_cursor()".

What if I want to find out the coordinates of the current selection? Make sure no text is selected.

>>> b.get_selection_bounds()
()


Now select "ond" in "Second" on the second line.

>>> b.get_selection_bounds()
(, )
>>> sel0, sel1 = b.get_selection_bounds()
>>> sel0.get_offset()
15
>>> sel1.get_offset()
18
>>> sel1.get_char()
u' '


So "get_selection_bounds()" returns two iterators, one for the begining and the other for the end of the selection. "sel1" points to the first character after the selection.

How do I set the selection? Deselect the text.

>>> b.select_range(sel0, sel1)


The text is selected again.

How do I get the text between two iterators (using a loop and get_char() seems wasteful, especially since Python strings are immutable)?

>>> b.get_text(sel0, sel1)
'ond'


That concludes the things I wanted to explore in this post. Again, these are just a newbie's personal notes and I will use them to draft a plugin. There are probably better ways to accomplish the tasks described in this post. I was just happy to find one way to perform them, given the lack of documentation (this perception of the documentation is very subjective, some APIs have far worse documentation, some far better; however, since I use the Python "dir()" function and a lot of trial and error, I dare say "lack of documentation").

No comments: