I am using the last command from this SO answer https://stackoverflow.com/a/54818581/80353
cap()(cd /tmp;rm -f *.vtt;youtube-dl --skip-download --write-auto-sub
Thank You @KimStacks @RavinderSingh13 @Oguz-Ismail for posting these solutions above and in the previous post
I managed to get results in the .vtt file with youtube-dl --skip-download --write-auto-sub $youtube_url
However, the format of the output is not ideal for my purpose. I have to delete line by line in order to remove the time as well as the /n
new line. So I would like to customize the code syntax to fit my requirements.
NOTE: Not sure whether it's a new query or not, so I will post it here for now:
How to insert the "$youtube_url" inside the code below?
cap()(cd /tmp;rm -f *.vtt;youtube-dl --skip-download --write-auto-sub "$1";\
sed '1,/^$/d' *.vtt|sed 's/<[^>]*>//g'|awk -F. 'NR%8==1{printf"%s ",$1}NR%8==3'\
|tee -a "$2")
'NR%8==1{printf"%s ",$1}NR%8==3'
, on both ends but not successfully getting the right format inside the .vtt file. Thus, Is it possible to have:transcripted text printed continously as sentences, rather than each subtitle printed as new lines?
remove printout of start time?
Considering that you want to see output on screen as well as you want to save output into a output file too, if this is the case could you please try following.
cap()(cd /tmp;rm -f *.vtt;youtube-dl --skip-download --write-auto-sub "$1";sed '1,/^$/d' *.vtt|sed 's/<[^>]*>//g'|awk -F. 'NR%8==1{printf"%s ",$1}NR%8==3'|tee -a "$2")
OR in non-one liner form use:
cap()(cd /tmp;rm -f *.vtt;youtube-dl --skip-download --write-auto-sub "$1";\
sed '1,/^$/d' *.vtt|sed 's/<[^>]*>//g'|awk -F. 'NR%8==1{printf"%s ",$1}NR%8==3'\
|tee -a "$2")
Please make sure that you have provided complete path in your variable eg--> relative_or_absolute_path_of_text_or_markdown_file="/full/path/output_file.txt"
etc just an example. I couldn't test it since I don't have mechanism for vtt files etc in my box.
In case you don't want to print information on screen and simply want to save output into output file then as @oguz ismail's comment use only tee "$2"
not tee -a "$2"
as I shown above.
Here's a detailed bash script for those who wants to save the subs file with a relative path.
The result is saved as plaintext, removing time, new lines and other markup.
#!/bin/bash
# video-cap.sh videoUrl sub.txt
# Download captions only and save in a .vtt file
youtube-dl --skip-download --write-auto-sub "$1";
# Find .vtt files in current directory created within last 3 seconds, limit to 1
vtt=$(find . -cmin -0.05 -name "*.vtt" | head -1)
# Extract the subs and save as plaintext, removing time, new lines and other markup
sed '1,/^$/d' "$vtt" \
| sed 's/<[^>]*>//g' \
| awk -F. 'NR%8==1{$1}NR%8==3' \
| tr '\n' ' ' > "$2"
# Remove the original .vtt subs file
rm -f "$vtt"