问题
I have a .txt document which consists of one word followed up with a date in one line, and so on in each line.
How can Notepad++ recognize same words in different lines and delete duplicate lines?
回答1:
Assuming the dates can be different for the same occurrence of the same word and you want to keep the one that appears first in the file then this should work (make sure your file end with a new line for this):
- Go to the "Replace" dialog (you can do Ctrl+F and go to replace tab).
- In the "Search Mode" at the bottom select "Regular expression" (make sure ". matches newline" is not selected).
- In the "Find what:" field type
(\s*\w+ )(.*\r\n)((.*\r\n)*)\1.*\r\n
- In the "Replace with:" field type
\1\2\3
- Click "Replace" until there are no more occurrences ("Replace All" does not seem to work for this, and perhaps there exists a better regex for which it will work, but I have not found it).
I've tested this on the file:
testing330 05:09-24/08
whatever 10:55-25/08
testing 15:57-26/08
testing667 19:22-30/08
linux 00:29-31/08
testing330 00:29-31/08
windows 12:25-31/08
And the result was:
testing330 05:09-24/08
whatever 10:55-25/08
testing 15:57-26/08
testing667 19:22-30/08
linux 00:29-31/08
windows 12:25-31/08
回答2:
Not a direct answer to your question, but I found this article based on the title. I was looking to just delete duplicate lines. I found an easy way to do that here
- Mark all the text (CTRL+A). Click TextFX → Click TextFX Tools → Check +Sort outputs only UNIQUE (at column) lines (if not already checked).
- Click TextFX → Click TextFX Tools → Click Sort lines case insensitive (at column)
回答3:
You can use EditPlus on Windows OR TextWrangler on Mac to sort and remove duplicated lines easy.
After Notepad++ 6.5.2 (free) you can sort lines OR you can install the plugin "TextFX Characters" using the "Plugin Manager".
TextFX includes numerous features to transform selected text. Featuring: * Interactive Brace Matching * Quote handling * Character case alternation * Text rewrap * Column Lineup * Fill Text Down * Insert counter text down * Text to code conversion * Numeric Conversion * URI & HTML encoding * HTML to text conversion * Submit text to W3C * Text sorting * Ascii Chart * Leading whitespace repair * Autoclose HTML & braces Homepage: http://textfx.no-ip.com/textfx/
回答4:
For me personally, here are the steps I follow. Let's assume you have only 1 column of data in column A.
- Import the data into Excel.
- Sort the data.
- Insert a function to check for duplicates. Cell B2 would be: =IF(A2=A1,"Duplicate","")
- Select all of column B.
- Copy.
- Paste special and paste the values.
- Sort the data according to column B.
- Delete all the ones marked with "Duplicate".
- Copy the data back to Notepad++
I thought there was a plugin like this, but can't find it now. Otherwise, this link may help you.
来源:https://stackoverflow.com/questions/18768727/notepad-deleting-lines-containing-duplicate-words