Quickly Convert (.rtf|.doc) Files to Markdown Syntax with PHP

后端 未结 7 1750
轮回少年
轮回少年 2021-01-29 18:50

I\'ve been manually converting articles into Markdown syntax for a few days now, and it\'s getting rather tedious. Some of these are 3 or 4 pages, italics and other emphasized t

相关标签:
7条回答
  • 2021-01-29 19:38

    ProgTips has a possible solution with a Word macro (source download):

    A simple macro (source download) for converting the most trivial things automatically. This macro does:

    • Replace bold and italics
    • Replace headings (marked heading 1-6)
    • Replace numbered and bulleted lists

    It's very buggy, I believe it hangs on larger documents, however I'm NOT stating it's a stable release anyway! :-) Experimental use only, recode and reuse it as you like, post a comment if you've found a better solution.

    Source: ProgTips

    Macro source

    Installation

    • open WinWord,
    • press Alt+F11 to open the VBA editor,
    • right click the first project in the project browser
    • choose insert->module
    • paste the code from the file
    • close macro editor
    • go tools>macro>macros; run the macro named MarkDown

    Source: ProgTips

    Source

    Macro source for safe keeping if ProgTips deletes the post or the site gets wiped out:

    '*** A simple MsWord->Markdown replacement macro by Kriss Rauhvargers, 2006.02.02.
    '*** This tool does NOT implement all the markup specified in MarkDown definition by John Gruber, only
    '*** the most simple things. These are:
    '*** 1) Replaces all non-list paragraphs to ^p paragraph so MarkDown knows it is a stand-alone paragraph
    '*** 2) Converts tables to text. In fact, tables get lost.
    '*** 3) Adds a single indent to all indented paragraphs
    '*** 4) Replaces all the text in italics to _text_
    '*** 5) Replaces all the text in bold to **text**
    '*** 6) Replaces Heading1-6 to #..#Heading (Heading numbering gets lost)
    '*** 7) Replaces bulleted lists with ^p *  listitem ^p*  listitem2...
    '*** 8) Replaces numbered lists with ^p 1. listitem ^p2.  listitem2...
    '*** Feel free to use and redistribute this code
    Sub MarkDown()
        Dim bReplace As Boolean
        Dim i As Integer
        Dim oPara As Paragraph
        
            
        'remove formatting from paragraph sign so that we dont get **blablabla^p** but rather **blablabla**^p
        Call RemoveBoldEnters
        
        
        For i = Selection.Document.Tables.Count To 1 Step -1
                Call Selection.Document.Tables(i).ConvertToText
        Next
        
        'simple text indent + extra paragraphs for non-numbered paragraphs
        For i = Selection.Document.Paragraphs.Count To 1 Step -1
            Set oPara = Selection.Document.Paragraphs(i)
            If oPara.Range.ListFormat.ListType = wdListNoNumbering Then
                If oPara.LeftIndent > 0 Then
                    oPara.Range.InsertBefore (">")
                End If
                oPara.Range.InsertBefore (vbCrLf)
            End If
            
            
        Next
        
        'italic -> _italic_
        Selection.HomeKey Unit:=wdStory
        bReplace = ReplaceOneItalic  'first replacement
        While bReplace 'other replacements
            bReplace = ReplaceOneItalic
        Wend
    
        'bold-> **bold**
        Selection.HomeKey Unit:=wdStory
        bReplace = ReplaceOneBold 'first replacement
        While bReplace
            bReplace = ReplaceOneBold 'other replacements
        Wend
        
       
        
        'Heading -> ##heading
        For i = 1 To 6 'heading1 to heading6
            Selection.HomeKey Unit:=wdStory
            bReplace = ReplaceH(i) 'first replacement
            While bReplace
                bReplace = ReplaceH(i) 'other replacements
            Wend
        Next
        
        Call ReplaceLists
        
        
        Selection.HomeKey Unit:=wdStory
    End Sub
    
    
    '***************************************************************
    ' Function to replace bold with _bold_, only the first occurance
    ' Returns true if any occurance found, false otherwise
    ' Originally recorded by WinWord macro recorder, probably contains
    ' quite a lot of useless code
    '***************************************************************
    Function ReplaceOneBold() As Boolean
        Dim bReturn As Boolean
    
        Selection.Find.ClearFormatting
        With Selection.Find
            .Text = ""
            .Forward = True
            .Wrap = wdFindContinue
            .Font.Bold = True
            .Format = True
            .MatchCase = False
            .MatchWholeWord = False
            .MatchWildcards = False
            .MatchSoundsLike = False
            .MatchAllWordForms = False
        End With
        
        bReturn = False
        While Selection.Find.Execute = True
            bReturn = True
            Selection.Text = "**" & Selection.Text & "**"
            Selection.Font.Bold = False
            Selection.Find.Execute
        Wend
        
        ReplaceOneBold = bReturn
    End Function
    
    '*******************************************************************
    ' Function to replace italic with _italic_, only the first occurance
    ' Returns true if any occurance found, false otherwise
    ' Originally recorded by WinWord macro recorder, probably contains
    ' quite a lot of useless code
    '********************************************************************
    Function ReplaceOneItalic() As Boolean
        Dim bReturn As Boolean
    
            Selection.Find.ClearFormatting
        
        With Selection.Find
            .Text = ""
            .Forward = True
            .Wrap = wdFindContinue
            .Font.Italic = True
            .Format = True
            .MatchCase = False
            .MatchWholeWord = False
            .MatchWildcards = False
            .MatchSoundsLike = False
            .MatchAllWordForms = False
        End With
        
        bReturn = False
        While Selection.Find.Execute = True
            bReturn = True
            Selection.Text = "_" & Selection.Text & "_"
            Selection.Font.Italic = False
            Selection.Find.Execute
        Wend
        ReplaceOneItalic = bReturn
    End Function
    
    '*********************************************************************
    ' Function to replace headingX with #heading, only the first occurance
    ' Returns true if any occurance found, false otherwise
    ' Originally recorded by WinWord macro recorder, probably contains
    ' quite a lot of useless code
    '*********************************************************************
    Function ReplaceH(ByVal ipNumber As Integer) As Boolean
        Dim sReplacement As String
        
        Select Case ipNumber
        Case 1: sReplacement = "#"
        Case 2: sReplacement = "##"
        Case 3: sReplacement = "###"
        Case 4: sReplacement = "####"
        Case 5: sReplacement = "#####"
        Case 6: sReplacement = "######"
        End Select
        
        Selection.Find.ClearFormatting
        Selection.Find.Style = ActiveDocument.Styles("Heading " & ipNumber)
        With Selection.Find
            .Text = ""
            .Replacement.Text = ""
            .Forward = True
            .Wrap = wdFindContinue
            .Format = True
            .MatchCase = False
            .MatchWholeWord = False
            .MatchWildcards = False
            .MatchSoundsLike = False
            .MatchAllWordForms = False
        End With
        
       
         bReturn = False
        While Selection.Find.Execute = True
            bReturn = True
            Selection.Range.InsertBefore (vbCrLf & sReplacement & " ")
            Selection.Style = ActiveDocument.Styles("Normal")
            Selection.Find.Execute
        Wend
        
        ReplaceH = bReturn
    End Function
    
    
    
    '***************************************************************
    ' A fix-up for paragraph marks that ar are bold or italic
    '***************************************************************
    Sub RemoveBoldEnters()
        Selection.HomeKey Unit:=wdStory
        Selection.Find.ClearFormatting
        Selection.Find.Font.Italic = True
        Selection.Find.Replacement.ClearFormatting
        Selection.Find.Replacement.Font.Bold = False
        Selection.Find.Replacement.Font.Italic = False
        With Selection.Find
            .Text = "^p"
            .Replacement.Text = "^p"
            .Forward = True
            .Wrap = wdFindContinue
            .Format = True
        End With
        Selection.Find.Execute Replace:=wdReplaceAll
        
        Selection.HomeKey Unit:=wdStory
        Selection.Find.ClearFormatting
        Selection.Find.Font.Bold = True
        Selection.Find.Replacement.ClearFormatting
        Selection.Find.Replacement.Font.Bold = False
        Selection.Find.Replacement.Font.Italic = False
        With Selection.Find
            .Text = "^p"
            .Replacement.Text = "^p"
            .Forward = True
            .Wrap = wdFindContinue
            .Format = True
        End With
        Selection.Find.Execute Replace:=wdReplaceAll
    End Sub
    
    '***************************************************************
    ' Function to replace bold with _bold_, only the first occurance
    ' Returns true if any occurance found, false otherwise
    ' Originally recorded by WinWord macro recorder, probably contains
    ' quite a lot of useless code
    '***************************************************************
    Sub ReplaceLists()
        Dim i As Integer
        Dim j As Integer
        Dim Para As Paragraph
            
        Selection.HomeKey Unit:=wdStory
        
        'iterate through all the lists in the document
        For i = Selection.Document.Lists.Count To 1 Step -1
            'check each paragraph in the list
            For j = Selection.Document.Lists(i).ListParagraphs.Count To 1 Step -1
                Set Para = Selection.Document.Lists(i).ListParagraphs(j)
                'if it's a bulleted list
                If Para.Range.ListFormat.ListType = wdListBullet Then
                            Para.Range.InsertBefore (ListIndent(Para.Range.ListFormat.ListLevelNumber, "*"))
                'if it's a numbered list
                ElseIf Para.Range.ListFormat.ListType = wdListSimpleNumbering Or _
                                                        wdListMixedNumbering Or _
                                                        wdListListNumOnly Then
                    Para.Range.InsertBefore (Para.Range.ListFormat.ListValue & ".  ")
                End If
            Next j
            'inserts paragraph marks before and after, removes the list itself
            Selection.Document.Lists(i).Range.InsertParagraphBefore
            Selection.Document.Lists(i).Range.InsertParagraphAfter
            Selection.Document.Lists(i).RemoveNumbers
        Next i
    End Sub
    
    '***********************************************************
    ' Returns the MarkDown indent text
    '***********************************************************
    Function ListIndent(ByVal ipNumber As Integer, ByVal spChar As String) As String
        Dim i  As Integer
        For i = 1 To ipNumber - 1
            ListIndent = ListIndent & "    "
        Next
        ListIndent = ListIndent & spChar & "    "
    End Function
    

    Source: ProgTips

    0 讨论(0)
提交回复
热议问题