Which data structure/s is used in implementation of editors like notepad. This data structure should be extensible, and should support various features like edition, deletion, s
We wrote an editor for an old machine (keep in mind that this was a while ago, about 1986, so this is from memory, and the state of the art may have advanced somewhat since then) which we managed to get to scream along, performance wise, by using fixed memory blocks from self-managed pools.
It had two pools, each containing a fixed number of specific-sized blocks (one pool was for line structures, the other for line-segment structures). It was basically a linked list of linked lists.
Memory was pre-allocated (for each region) from a 'malloc()
'-like call, and we used 65,535 blocks (0 through 65,534 inclusive, block number 65,535 was considered the null block, an end-of-list indicator).
This allowed each for 65, 535 lines (384K or 512K for the padded version) and about 1.6G of file size (taking 2G of allocated space), which was pretty big back then. That was the theoretical file size limit - I don't think we ever approached that in reality since we never allocated the full set of line segment structures.
Not having to call malloc()
for every little block of memory gave us a huge speed increase, especially as we could optimise our own memory allocation routines for fixed size blocks (including inlining the calls in the final optimised version).
The structures in the two pools were as follows, with each line being a single byte):
Line structure (6/8 bytes) Line-segment structure (32 bytes)
+--------+ +--------+
|NNNNNNNN| |nnnnnnnn|
|NNNNNNNN| |nnnnnnnn|
|PPPPPPPP| |pppppppp|
|PPPPPPPP| |pppppppp|
|bbbbbbbb| |LLLLLLLL|
|bbbbbbbb| |LLLLLLLL|
|........| |xxxxxxxx|
|........| :25 more :
+--------+ : x lines:
+--------+
where:
x
point to the line segment pool.N
was a block number for the next line (null meaning this was the last line in the file).P
the the block number for the previous line (null meaning this was the first line in the file).b
was the block number for the first line segment in that line (null meaning the line was empty)..
was reserved padding (to bump the structure out to 8 bytes).n
was the block number for the next line segment (null meaning this was the last segment in the line).p
was the block number for the previous line segment (null meaning this was the first segment in the line).L
was the block number for the segment's line block.x
was the 26 characters in that line segment.The reason the line structure was padded was to speed up the conversion of block numbers into actual memory locations (shifting left by 3 bits was much faster than multiplying by 6 in that particular architecture and extra memory used was only 128K, minimal compared to the total storage used) although we did provide the slower version for those who cared more about memory.
We also had an array of 100 16-bit values which contained the line segment (and line number so we could quickly go to specific lines) at roughly that percentage (so that array[7] was the line that was roughly 7% into the file) and two free pointers to maintain the free list in each pool (this was a very simple one way list where N
or n
in the structure indicated the next free block and free blocks were allocated from, and put back to, the front of these lists).
There was no need to keep a count of the characters in each line segment since 0-bytes were not valid in files. Each line segment was allowed to have 0-bytes at the end that were totally ignored. Lines were compressed (i.e., line segments were combined) whenever they were modified. This kept block usage low (without infrequent and lengthy garbage collection) and also greatly sped up search-and-replace operations.
The use of these structures allowed very fast editing, insertion, deletion, searching and navigation around the text, which is where you're likely to get most of your performance problems in a simple text editor.
The use of selections (we didn't implement this as it was a text mode editor that used vi-like commands such as 3d
to delete 3 lines or 6x
to delete 6 characters) could be implemented by having a {line#/block, char-pos}
tuple to mark positions in the text, and use two of those tuples for a selection range.