问题
I'm currently searching for an application or a script that does a correct word count for a LaTeX document.
Up till now, I have only encountered scripts that only work on a single file but what I want is a script that can safely ignore LaTeX keywords and also traverse linked files...ie follow \include
and \input
links to produce a correct word-count for the whole document.
With vim, I currently use ggVGg CTRL+G
but obviously that shows the count for the current file and does not ignore LaTeX keywords.
Does anyone know of any script (or application) that can do this job?
回答1:
I use texcount
. The webpage has a Perl script to download (and a manual).
It will include tex
files that are included (\input
or \include
) in the document (see -inc
), supports macros, and has many other nice features.
When following included files you will get detail about each separate file as well as a total. For example here is the total output for a 12 page document of mine:
TOTAL COUNT
Files: 20
Words in text: 4188
Words in headers: 26
Words in float captions: 404
Number of headers: 12
Number of floats: 7
Number of math inlines: 85
Number of math displayed: 19
If you're only interested in the total, use the -total
argument.
回答2:
I went with icio's comment and did a word-count on the pdf itself by piping the output of pdftotext
to wc
:
pdftotext file.pdf - | wc - w
回答3:
latex file.tex
dvips -o - file.dvi | ps2ascii | wc -w
should give you a fairly accurate word count.
回答4:
In Texmaker interface you can get the word count by right clicking in the PDF preview:
回答5:
To add to @aioobe,
If you use pdflatex, just do
pdftops file.pdf
ps2ascii file.ps|wc -w
I compared this count to the count in Microsoft Word in a 1599 word document (according to Word). pdftotext
produced a text with 1700+ words. texcount
did not include the references and produced 1088 words. ps2ascii
returned 1603 words. 4 more than in Word.
I say that's a pretty good count. I am not sure where's the 4 word difference, though. :)
回答6:
I use the following VIM script:
function! WC()
let filename = expand("%")
let cmd = "detex " . filename . " | wc -w | perl -pe 'chomp; s/ +//;'"
let result = system(cmd)
echo result . " words"
endfunction
… but it doesn’t follow links. This would basically entail parsing the TeX file to get all linked files, wouldn’t it?
The advantage over the other answers is that it doesn’t have to produce an output file (PDF or PS) to compute the word count so it’s potentially (depending on usage) much more efficient.
Although icio’s comment is theoretically correct, I found that the above method gives quite accurate estimates for the number of words. For most texts, it’s well within the 5% margin that is used in many assignments.
回答7:
For a very basic article class document I just look at the number of matches for a regex to find words. I use Sublime Text, so this method may not work for you in a different editor, but I just hit Ctrl+F
(Command+F
on Mac) and then, with regex enabled, search for
(^|\s+|"|((h|f|te){)|\()\w+
which should ignore text declaring a floating environment or captions on figures as well as most kinds of basic equations and \usepackage
declarations, while including quotations and parentheticals. It also counts footnotes and \emph
asized text and will count \hyperref
links as one word. It's not perfect, but it's typically accurate to within a few dozen words or so. You could refine it to work for you, but a script is probably a better solution, since LaTeX source code isn't a regular language. Just thought I'd throw this up here.
回答8:
Overleaf has a word count feature:
Overleaf v2:
Overleaf v1:
来源:https://stackoverflow.com/questions/2974954/correct-word-count-of-a-latex-document