Data structure for text editor

后端 未结 5 1560
误落风尘
误落风尘 2021-01-29 20:41

This is an interview question. What data structure would you use to store the text in a text editor?

相关标签:
5条回答
  • 2021-01-29 20:52

    On good old ZX-Spectrum one (or more, I do not know) text exditor used very simple structure.

    There was one big buffer, which occupied all free RAM. Text was split in two parts at the cursor. Part before the cursor, was placed at the beginning of the buffer, and the rest at the end of the buffer. As text typed, data simply added to the end of first part, and when cursor is moved, text is copied forth and back.

    Buffer layout:

    Hello, World!
            ^------Cursor here
    
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |H|e|l|l|o|,| |W| <free>  |o|r|l|d|!|
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                ^         ^        |
    begin           cur1      cur2    end
    

    That's, how some edit operations was made:

    Type a char:    buffer[cur1++] = character
    
    Backspace:      cur1--
    
    Cursor left:    buffer[--cur2] = buffer[--cur1]
    
    Cursor right:   buffer[cur1++] = buffer[cur2++]
    

    Buffer in action:

                 Hello, W..............orld!
    Press right          ^             ^
                 Hello, Wo..............rld!
    Press backspace       ^             ^
                 Hello, W...............rld!
    Press 0              ^              ^
                 Hello, W0..............rld!
                          ^             ^
    
    0 讨论(0)
  • 2021-01-29 20:53

    As @Vovanium already mentioned the basic theory of how gap buffer can be used, I have implemented a version of C/C++.

    Code:

    #include <stdio.h>
    #define SZ 1000
    
    char leftArray[SZ], rightArray[SZ];
    int leftCount, rightCount;
    int cursorPos;
    
    /*
     * Private APIs
     */
    
    void printArray(){
    
        for(register int i = 0; i < leftCount; i++){
            printf("%c", leftArray[i]);
        }
    
        for(register int i = rightCount - 1; i >= 0; i--){
            printf("%c", rightArray[i]);
        }
        printf("\n");
    }
    
    void adjust(int pos){
    
        while(leftCount > pos){
            rightArray[rightCount++] = leftArray[--leftCount];
        }
    
        while(leftCount < pos){
            leftArray[leftCount++] = rightArray[--rightCount];
        }
    }
    
    
    /*
     * Public APIs for Text Editor
     */
    
    void init(){
    
        cursorPos = leftCount = rightCount = 0;
    }
    
    void addWord(char word[], int len){
    
        adjust(cursorPos);
    
        for(register int i = 0; i < len; i++){
            leftArray[leftCount++] = word[i];
        }
        leftArray[leftCount] = 0;
    }
    
    void addBackSpace(){
    
        adjust(cursorPos);
        leftCount--;
    }
    
    void moveCurson(int newPosition){
    
        cursorPos = newPosition;
    }
    
    void subString(int pos, int length, char result[]){
    
            adjust(cursorPos);
    
        int k = 0;
            int l = 0;
        while(k + pos < leftCount && k < length){
            result[k] = leftArray[pos + k];
            k++;
        }
    
        length -= k;
        while( l < length){
            result[k++] = rightArray[rightCount - 1 - l];
            l++;
        }
    }
    
    0 讨论(0)
  • 2021-01-29 20:54

    Rope

    A rope is essentially a binary tree whose leaves are arrays of characters. A node in the tree has a left child and a right child - the left child is the first part of the string, while the right child is the final part of the string. Concatenation of two ropes simply involves the creation of a new tree node with both ropes as children. To ensure logarithmic time indexing and sub-string operations the resulting rope may need to be balanced. Various balancing strategies are possible.

    The main advantages of ropes as compared to storing strings as character arrays is that they enable much faster concatenation than ordinary strings, and don't require a large contiguous memory space to store a large string. The main disadvantages are greater overall space usage and slower indexing, both of which become more severe as the tree structure becomes larger and deeper. However, many practical applications of indexing involve only iteration over the string, which remains fast as long as the leaf nodes are large enough to benefit from cache effects.

    0 讨论(0)
  • 2021-01-29 20:57

    I know it's too late for an answer, but I found The Craft of Text Editing book really useful. It contains description of several buffer models with their pros and cons. Unfortunately, it doesn't mention Ropes data structure.

    0 讨论(0)
  • 2021-01-29 21:06

    You might find this interesting, even if it does not exactly answer your question:

    Most efficient data structure to add styles to text

    I am hoping that the discussion will go to fascinating places :-)

    0 讨论(0)
提交回复
热议问题