Removing non alphanumeric characters in a batch variable

后端 未结 3 787
慢半拍i
慢半拍i 2021-01-03 08:07

In batch, how would I remove all non alphanumeric (a-z,A-Z,0-9,_) characters from a variable?

I\'m pretty sure I need to use findstr and a regex.

相关标签:
3条回答
  • 2021-01-03 08:21

    The solutionof MC ND works, but it's really slow (Needs ~1second for the small test sample).

    This is caused by the echo "!_buf!"|findstr ... construct, as for each character the pipe creates two instances of cmd.exe and starts findstr.

    But this can be solved also with pure batch.
    Each character is tested if it is in the map variable

    :test
    
        set "_input=Th""i\s&& is not good _maybe_???"
        set "_output="
        set "map=abcdefghijklmnopqrstuvwxyz 1234567890"
    
    :loop
    if not defined _input goto endLoop    
    for /F "delims=*~ eol=*" %%C in ("!_input:~0,1!") do (
        if "!map:%%C=!" NEQ "!map!" set "_output=!_output!%%C"
    )
    set "_input=!_input:~1!"
        goto loop
    
    :endLoop
        echo(!_output!
    

    And it could be speed up when the goto loop is removed.
    Then you need to calculate the stringLength first and iterate then with a FOR/L loop over each character.
    This solution is ~6 times faster than the above method and ~40 times faster than the solution of MC ND

    set "_input=Th""i\s&& is not good _maybe_!~*???"
    set "_output="
    set "map=abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ 1234567890"
    %$strLen% len _input
    
    for /L %%n in (0 1 %len%) DO (
        for /F "delims=*~ eol=*" %%C in ("!_input:~%%n,1!") do (
            if "!map:%%C=!" NEQ "!map!" set "_output=!_output!%%C"
        )
    )
    exit /b
    

    The macro $strlen can be defined with

    set LF=^
    
    
    ::Above 2 blank lines are required - do not remove
    @set ^"\n=^^^%LF%%LF%^%LF%%LF%^^":::: StrLen pResult pString
    set $strLen=for /L %%n in (1 1 2) do if %%n==2 (%\n%
            for /F "tokens=1,2 delims=, " %%1 in ("!argv!") do (%\n%
                set "str=A!%%~2!"%\n%
                  set "len=0"%\n%
                  for /l %%A in (12,-1,0) do (%\n%
                    set /a "len|=1<<%%A"%\n%
                    for %%B in (!len!) do if "!str:~%%B,1!"=="" set /a "len&=~1<<%%A"%\n%
                  )%\n%
                  for %%v in (!len!) do endlocal^&if "%%~b" neq "" (set "%%~1=%%v") else echo %%v%\n%
            ) %\n%
    ) ELSE setlocal enableDelayedExpansion ^& set argv=,
    
    0 讨论(0)
  • 2021-01-03 08:38

    EDITED - @jeb is right. This works but is really, really slow.

    @echo off
        setlocal enableextensions enabledelayedexpansion
        set "_input=Th""i\s&& is not good _maybe_???"
        set "_output="
    :loop
        if not defined _input goto endLoop
        set "_buf=!_input:~0,1!"
        set "_input=!_input:~1!"
        echo "!_buf!"|findstr /i /r /c:"[a-z 0-9_]" > nul && set "_output=!_output!!_buf!"
        goto loop
    :endLoop
        echo !_output!
        endlocal
    

    So, back to the drawing board. How to make it faster? lets try to do as less operations as we can and use as much long substring as we can. So, do it in two steps

    1.- Remove all bad characters that can generate problems. To do it we will use the hability of for command to identify these chars as delimiters , and then join the rest of the sections of god characters of string

    2.- Remove the rest of the bad characters, locating them in string using the valids charactes as delimiters to find substrings of bad characters, replacing then in string

    So, we end with (sintax adapted to what has been answered here)

    @echo off
    
        setlocal enableextensions enabledelayedexpansion
    
        rem Test empty string
        call :doClean "" output
        echo "%output%"
    
        rem Test mixed strings
        call :doClean "~~asd123#()%%%^"^!^"~~~^"""":^!!!!=asd^>^<bm_1" output
        echo %output%
        call :doClean "Thi\s&& is ;;;;not ^^good _maybe_!~*???" output
        echo %output%
    
        rem Test clean string
        call :doClean "This is already clean" output
        echo %output%
    
        rem Test all bad string
        call :doClean "*******//////\\\\\\\()()()()" output
        echo "%output%"
    
        rem Test long string
        set "zz=Thi\s&& is not ^^good _maybe_!~*??? "
        set "zz=TEST: %zz%%zz%%zz%%zz%%zz%%zz%%zz%%zz%%zz%%zz%%zz%%zz%%zz%%zz%%zz%%zz%%zz%%zz%%zz%%zz%"
        call :doClean "%zz% TEST" output
        echo %output%
    
        rem Time long string
        echo %time%
        for /l %%# in (1 1 100) do call :doClean "%zz%" output
        echo %time%
    
        exit /b
    
    rem ---------------------------------------------------------------------------
    :doClean input output
        setlocal enableextensions enabledelayedexpansion
        set "map=abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890 "
        set "input=%~1"
        set "output="
    
    rem Step 1 - Remove critical delimiters
    (
    :purgeCritical
        for /L %%z in (1 1 10) do (
            for /f tokens^=1^-9^,^*^ delims^=^=^"^"^~^;^,^&^*^%%^:^!^(^)^<^>^^ %%a in ("!input!") do ( 
                set "output=!output!%%a%%b%%c%%d%%e%%f%%g%%h%%i"
                set "input=%%j" 
            )
            if not defined input goto outPurgeCritical
        )
        goto purgeCritical
    )
    :outPurgeCritical
    
    rem Step 2 - remove any remaining special character
    (
    :purgeNormal
        for /L %%z in (1 1 10) do (
            set "pending="
            for /f "tokens=1,* delims=%map%" %%a in ("!output!") do (
                set "output=!output:%%a=!"
                set "pending=%%b"
            )
            if not defined pending goto outPurgeNormal
        )
        goto purgeNormal
    )
    :outPurgeNormal
    
        endlocal & set "%~2=%output%"
        goto :EOF
    

    Maybe not the fastest, but at least a "decent" solution

    0 讨论(0)
  • 2021-01-03 08:38
    @echo eof
    
    call :purge "~~asd123#()%%%^"^!^"~~~^:^=asd^>^<bm_1" var
    echo (%var%)
    goto :eof
    
    
    :purge StrVar  [RtnVar]
    setlocal disableDelayedExpansion
    set "str1=%~1"
    setlocal enableDelayedExpansion
    
    for %%a in ( -  ! @ # $ % ^^ ^&  + \ / ^< ^>  . '  [ ] { }  ` ^| ^"  ) do (
       set "str1=!str1:%%a=!"
     )
    
     rem dealing with some delimiters
    
    
     set "str1=!str1:(=!"
     set "str1=!str1:)=!"
     set "str1=!str1:;=!"
     set "str1=!str1:,=!"
     set "str1=!str1:^^=!"
     set "str1=!str1:^~=!"
    
     set "temp_str=" 
     for %%e in (%str1%) do (
      set "temp_str=!temp_str!%%e"
     )
    
    endlocal & set "str1=%temp_str%"
    
    
    
    setlocal disableDelayedExpansion
    set "str1=%str1:!=%"
    set "str1=%str1::=%"
    set "str1=%str1:^^~=%"
    
    for /f "tokens=* delims=~" %%w in ("%str1%") do set "str1=%%w"
    
    endlocal & set "str1=%str1%"
    
    
    
    endlocal &  if "%~2" neq "" (set %~2=%str1%) else echo %str1%
    
    goto :eof
    

    Still cannot deal with ~ and = but working on it

    EDIT: = now will be cleared EDIT: ~ now will be cleared

    0 讨论(0)
提交回复
热议问题