docx4j find and replace

后端 未结 4 1892
梦如初夏
梦如初夏 2021-01-06 17:42

I have docx document with some placeholders. Now I should replace them with other content and save new docx document. I started with docx4j and found this method:

         


        
4条回答
  •  一向
    一向 (楼主)
    2021-01-06 18:09

    I created a library to publish my solution because it's quite a lot of code: https://github.com/phip1611/docx4j-search-and-replace-util

    The workflow is the following:

    First step:

    // (this method was part of your question)  
    List texts = getAllElementFromObject(docxDocument.getMainDocumentPart(), Text.class);
    

    This way we get all actual Text-content in the correct order but without style markup in-between. We can edit the Text-objects (by setValue) and keep styles.

    Resulting problem: Search-text/placeholders can be split accoss multiple Text-instances (because there can be style markup that is invisble in-between in original document), e.g. ${FOOBAR}, ${ + FOOBAR}, or $ + {FOOB + AR}

    Second step:

    Concat all Text-objects to a full string / "complete string"

    Optional completeStringOpt = texts.stream().map(Text::getValue).reduce(String::concat);
    

    Third step:

    Create a class TextMetaItem. Each TextMetaItem knows for it's Text-object where it's content begins and ends in the complete string. E.g. If the Text-objects for "foo" and "bar" results in the complete string "foobar" than indices 0-2 belongs to "foo"-Text-object and 3-5 to "bar"-Text-object. Build a List

    static List buildMetaItemList(List texts) {
        final int[] index = {0};
        final int[] iteration = {0};
        List list = new ArrayList<>();
        texts.forEach(text -> {
            int length = text.getValue().length();
            list.add(new TextMetaItem(index[0], index[0] + length - 1, text, iteration[0]));
            index[0] += length;
            iteration[0]++;
        });
        return list;
    }
    

    Fourth step:

    Build a Map where the key is the index/char in the complete string. This means the map's length equals completeString.length()

    static Map buildStringIndicesToTextMetaItemMap(List texts) {
        List metaItemList = buildMetaItemList(texts);
        Map map = new TreeMap<>();
        int currentStringIndicesToTextIndex = 0;
        // + 1 important here! 
        int max = metaItemList.get(metaItemList.size() - 1).getEnd() + 1;
        for (int i = 0; i < max; i++) {
            TextMetaItem currentTextMetaItem = metaItemList.get(currentStringIndicesToTextIndex);
            map.put(i, currentTextMetaItem);
            if (i >= currentTextMetaItem.getEnd()) {
                currentStringIndicesToTextIndex++;
            }
        }
        return map;
    }
    

    interim result:

    Now you have enough metadata to delegate every action you want to do on the complete string to the corresponding Text object! (To change the content of Text-objects you just need to call (#setValue()) That's all what's needed in Docx4J to edit text. All style info etc will be preserved!

    last step: search and replace

    1. build a method that finds all occurrences of your possible placeholders. You should create a class like FoundResult(int start, int end) that stores begin and end indices of a found value (placeholder) in the complete string

      public static List findAllOccurrencesInString(String data, String search) {
          List list = new ArrayList<>();
          String remaining = data;
          int totalIndex = 0;
          while (true) {
              int index = remaining.indexOf(search);
              if (index == -1) {
                  break;
              }
      
              int throwAwayCharCount = index + search.length();
              remaining = remaining.substring(throwAwayCharCount);
      
              list.add(new FoundResult(totalIndex + index, search));
      
              totalIndex += throwAwayCharCount;
          }
          return list;
      } 
      

      using this I build a new list of ReplaceCommands. A ReplaceCommand is a simple class and stores a FoundResult and the new value.

    2. next you must order this list from the last item to the first (order by position in complete string)

    3. now you can write a replace all algorithm because you know what action needs to be done on which Text-object. We did (2) so that replace operations won't invalidate indices of other FoundResults.

      3.1.) find Text-object(s) that needs to be changed 3.2.) call getValue() on them 3.3.) edit the string to the new value 3.4.) call setValue() on the Text-objects

    This is the code that does all the magic. It executes a single ReplaceCommand.

       /**
         * @param texts All Text-objects
         * @param replaceCommand Command
         * @param map Lookup-Map from index in complete string to TextMetaItem
         */
        public static void executeReplaceCommand(List texts, ReplaceCommand replaceCommand, Map map) {
            TextMetaItem tmi1 = map.get(replaceCommand.getFoundResult().getStart());
            TextMetaItem tmi2 = map.get(replaceCommand.getFoundResult().getEnd());
            if (tmi2.getPosition() - tmi1.getPosition() > 0) {
                // it can happen that text objects are in-between
                // we can remove them (set to null)
                int upperBorder = tmi2.getPosition();
                int lowerBorder = tmi1.getPosition() + 1;
                for (int i = lowerBorder; i < upperBorder; i++) {
                    texts.get(i).setValue(null);
                }
            }
    
           if (tmi1.getPosition() == tmi2.getPosition()) {
                // do replacement inside a single Text-object
    
                String t1 = tmi1.getText().getValue();
                int beginIndex = tmi1.getPositionInsideTextObject(replaceCommand.getFoundResult().getStart());
                int endIndex = tmi2.getPositionInsideTextObject(replaceCommand.getFoundResult().getEnd());
    
                String keepBefore = t1.substring(0, beginIndex);
                String keepAfter = t1.substring(endIndex + 1);
    
                tmi1.getText().setValue(keepBefore + replaceCommand.getNewValue() + keepAfter);
            } else {
                // do replacement across two Text-objects
    
                // check where to start and replace 
                // the Text-objects value inside both Text-objects
                String t1 = tmi1.getText().getValue();
                String t2 = tmi2.getText().getValue();
    
                int beginIndex = tmi1.getPositionInsideTextObject(replaceCommand.getFoundResult().getStart());
                int endIndex = tmi2.getPositionInsideTextObject(replaceCommand.getFoundResult().getEnd());
    
                t1 = t1.substring(0, beginIndex);
                t1 = t1.concat(replaceCommand.getNewValue());
                t2 = t2.substring(endIndex + 1);
    
                tmi1.getText().setValue(t1);
                tmi2.getText().setValue(t2);
            }
        }
    

提交回复
热议问题