How to modify this sed awk command so that the output goes to a file of choice?

前端 未结 3 1965
小鲜肉
小鲜肉 2021-01-23 15:48

I am using the last command from this SO answer https://stackoverflow.com/a/54818581/80353

cap()(cd /tmp;rm -f *.vtt;youtube-dl --skip-download --write-auto-sub          


        
相关标签:
3条回答
  • 2021-01-23 16:19

    Thank You @KimStacks @RavinderSingh13 @Oguz-Ismail for posting these solutions above and in the previous post

    I managed to get results in the .vtt file with youtube-dl --skip-download --write-auto-sub $youtube_url

    However, the format of the output is not ideal for my purpose. I have to delete line by line in order to remove the time as well as the /n new line. So I would like to customize the code syntax to fit my requirements.

    NOTE: Not sure whether it's a new query or not, so I will post it here for now:

    1. I have tried all the steps suggested in previous post and here as well but I still can not understand:
    • How to insert the "$youtube_url" inside the code below?

      cap()(cd /tmp;rm -f *.vtt;youtube-dl --skip-download --write-auto-sub "$1";\
      sed '1,/^$/d' *.vtt|sed 's/<[^>]*>//g'|awk -F. 'NR%8==1{printf"%s ",$1}NR%8==3'\
      |tee -a "$2")
      
    1. I tried editing the numbers from 0 to 3 to -1 in 'NR%8==1{printf"%s ",$1}NR%8==3', on both ends but not successfully getting the right format inside the .vtt file. Thus, Is it possible to have:
    • transcripted text printed continously as sentences, rather than each subtitle printed as new lines?

    • remove printout of start time?

    0 讨论(0)
  • 2021-01-23 16:30

    Considering that you want to see output on screen as well as you want to save output into a output file too, if this is the case could you please try following.

    cap()(cd /tmp;rm -f *.vtt;youtube-dl --skip-download --write-auto-sub "$1";sed '1,/^$/d' *.vtt|sed 's/<[^>]*>//g'|awk -F. 'NR%8==1{printf"%s ",$1}NR%8==3'|tee -a "$2")
    

    OR in non-one liner form use:

    cap()(cd /tmp;rm -f *.vtt;youtube-dl --skip-download --write-auto-sub "$1";\
    sed '1,/^$/d' *.vtt|sed 's/<[^>]*>//g'|awk -F. 'NR%8==1{printf"%s ",$1}NR%8==3'\
    |tee -a "$2")
    

    Please make sure that you have provided complete path in your variable eg--> relative_or_absolute_path_of_text_or_markdown_file="/full/path/output_file.txt" etc just an example. I couldn't test it since I don't have mechanism for vtt files etc in my box.

    In case you don't want to print information on screen and simply want to save output into output file then as @oguz ismail's comment use only tee "$2" not tee -a "$2" as I shown above.

    0 讨论(0)
  • 2021-01-23 16:33

    Here's a detailed bash script for those who wants to save the subs file with a relative path.

    The result is saved as plaintext, removing time, new lines and other markup.

    #!/bin/bash
    # video-cap.sh videoUrl sub.txt
    
    # Download captions only and save in a .vtt file
    youtube-dl --skip-download --write-auto-sub "$1";
    
    # Find .vtt files in current directory created within last 3 seconds, limit to 1
    vtt=$(find . -cmin -0.05 -name "*.vtt" | head -1)
    
    # Extract the subs and save as plaintext, removing time, new lines and other markup
    sed '1,/^$/d' "$vtt" \
      | sed 's/<[^>]*>//g' \
      | awk -F. 'NR%8==1{$1}NR%8==3' \
      | tr '\n' ' ' > "$2"
    
    # Remove the original .vtt subs file
    rm -f "$vtt"
    
    0 讨论(0)
提交回复
热议问题