Rust vs python program performance results question

问题

I wrote a program that count words.

Here is the program

use std::collections::HashMap;
use std::io;
use std::io::prelude::*;

#[derive(Debug)]
struct Entry {
    word: String,
    count: u32,
}

static SEPARATORS: &'static [char] = &[
    ' ', ',', '.', '!', '?', '\'', '"', '\n', '(', ')', '#', '{', '}', '[', ']', '-', ';', ':',
];

fn main() {
    if let Err(err) = try_main() {
        if err.kind() == std::io::ErrorKind::BrokenPipe {
            return;
        }
        // Ignore any error that may occur while writing to stderr.
        let _ = writeln!(std::io::stderr(), "{}", err);
    }
}

fn try_main() -> Result<(), std::io::Error> {
    let mut words: HashMap<String, u32> = HashMap::new();
    let stdin = io::stdin();
    for result in stdin.lock().lines() {
        let line = result?;
        line_processor(line, &mut words)
    }
    output(&mut words)?;
    Ok(())
}

fn line_processor(line: String, words: &mut HashMap<String, u32>) {
    let mut word = String::new();

    for c in line.chars() {
        if SEPARATORS.contains(&c) {
            add_word(word, words);
            word = String::new();
        } else {
            word.push_str(&c.to_string());
        }
    }
}

fn add_word(word: String, words: &mut HashMap<String, u32>) {
    if word.len() > 0 {
        if words.contains_key::<str>(&word) {
            words.insert(word.to_string(), words.get(&word).unwrap() + 1);
        } else {
            words.insert(word.to_string(), 1);
        }
        // println!("word >{}<", word.to_string())
    }
}

fn output(words: &mut HashMap<String, u32>) -> Result<(), std::io::Error> {
    let mut stack = Vec::<Entry>::new();

    for (k, v) in words {
        stack.push(Entry {
            word: k.to_string(),
            count: *v,
        });
    }

    stack.sort_by(|a, b| b.count.cmp(&a.count));
    stack.reverse();

    let stdout = io::stdout();
    let mut stdout = stdout.lock();
    while let Some(entry) = stack.pop() {
        writeln!(stdout, "{}\t{}", entry.count, entry.word)?;
    }
    Ok(())
}

It some arbitrary text file as input and counts words to produce some output like :

15  the
14  in
11  are
10  and
10  of
9   species
9   bats
8   horseshoe
8   is
6   or
6   as
5   which
5   their

I compile it like this :

cargo build --release

I run it like that:

cat wiki-sample.txt | ./target/release/wordstats  | head -n 50

wiki-sample.txt file I use is here

I compared execution time with the python (3.8) version which is:

import sys
from collections import defaultdict

# import unidecode

seps = set(
    [
        " ",
        ",",
        ".",
        "!",
        "?",
        "'",
        '"',
        "\n",
        "(",
        ")",
        "#",
        "{",
        "}",
        "[",
        "]",
        "-",
        ";",
        ":",
    ]
)


def out(result):
    for i in result:
        print(f"{i[1]}\t{i[0]}")


if __name__ == "__main__":
    c = defaultdict(int)

    for line in sys.stdin:
        words = line.split(" ")
        for word in words:
            clean_word = []
            for char in word:
                if char not in seps and char:
                    clean_word.append(char)
            r = "".join(clean_word)
            # r = unidecode.unidecode(r)
            if r:
                c[r] += 1

    r = sorted(list(c.items()), key=lambda x: -x[1])
    try:
        out(r)
    except BrokenPipeError as e:
        pass

I run it like this :

cat /tmp/t.txt | ./venv/bin/python3 src/main.py | head -n 100

Average computation times are : rust -> 5', python3.8 -> 19'
python version is (i think) less optimized (a split on the whole line requires an extra O(n))
This is single threaded process, and a quite simple program
Most of computing time is in the word loop processing, output is almost instant.
I also removed library code that remove accents to be more close to standard libraries of both languages.

Question : Is it normal that rust performs "only" ~3-4 times better ?

I am also wondering if I am not missing something here because I find computation time is quite long for "only" 100Mb data. I don't think (naively) there is some processing with a lower big O for this, I might be wrong.

I am used to compare some python code to some equivalent in go, java or vlang and I often have something like 20x to 100x factor speed for these benches.

Maybe cpython is good at this kind of processing, maybe I miss something in rust program (I am very new to rust) to make it more efficient.

I am frighten to miss something big in my tests, but any thought about this ?

Edit: following folks advices, I have now version below:

use std::collections::HashMap;
use std::io;
use std::io::prelude::*;

#[derive(Debug)]
struct Entry<'a> {
    word: &'a str, // word: String,
    count: u32,
}

static SEPARATORS: &'static [char] = &[
    ' ', ',', '.', '!', '?', '\'', '"', '\n', '(', ')', '#', '{', '}', '[', ']', '-', ';', ':',
];

fn main() {
    if let Err(err) = try_main() {
        if err.kind() == std::io::ErrorKind::BrokenPipe {
            return;
        }
        // Ignore any error that may occur while writing to stderr.
        let _ = writeln!(std::io::stderr(), "{}", err);
    }
}

fn try_main() -> Result<(), std::io::Error> {
    let mut words: HashMap<String, u32> = HashMap::new();
    let stdin = io::stdin();
    for result in stdin.lock().lines() {
        let line = result?;
        line_processor(line, &mut words)
    }
    output(&mut words)?;
    Ok(())
}

fn line_processor(line: String, words: &mut HashMap<String, u32>) {
    let mut l = line.as_str();
    loop {
        if let Some(pos) = l.find(|c: char| SEPARATORS.contains(&c)) {
            let (head, tail) = l.split_at(pos);
            add_word(head.to_owned(), words);
            l = &tail[1..];
        } else {
            break;
        }
    }
}

fn add_word(word: String, words: &mut HashMap<String, u32>) {
    if word.len() > 0 {
        let count = words.entry(word).or_insert(0);
        *count += 1;
    }
}

fn output(words: &mut HashMap<String, u32>) -> Result<(), std::io::Error> {
    let mut stack = Vec::<Entry>::new();

    for (k, v) in words {
        stack.push(Entry {
            word: k.as_str(), // word: k.to_string(),
            count: *v,
        });
    }

    stack.sort_by(|a, b| a.count.cmp(&b.count));

    let stdout = io::stdout();
    let mut stdout = stdout.lock();
    while let Some(entry) = stack.pop() {
        writeln!(stdout, "{}\t{}", entry.count, entry.word)?;
    }
    Ok(())
}

Which takes arount 2.6' on my computer now. This is way better and almost 10 times faster than python version which is very better but still not what I expected (that is not a real problem). There might be some other optimisations that I does not have in mind for now.

回答1:

In add_word() you circumvent the borrowing problems with new copies of word (.to_string()).

You could just access once for all the counter you want to increment.

let count = words.entry(word).or_insert(0);
*count += 1;

You could also avoid many string reallocations in line_processor() by working directly on the line as a &str.

let mut l = line.as_str();
loop {
    if let Some(pos) = l.find(|c: char| SEPARATORS.contains(&c)) {
        let (head, tail) = l.split_at(pos);
        add_word(head.to_owned(), words);
        l = &tail[1..];
    } else {
        break;
    }
}

When it comes to the output() function new copies of the strings are performed in order to initialise the Entry struct. We could change Entry to

#[derive(Debug)]
struct Entry<'a> {
    word: &'a str,  // word: String,
    count: u32,
}

and then only work on the &str inside the original strings (in words).

stack.push(Entry {
    word: k.as_str(), // word: k.to_string(),
    count: *v,
});

Moreover, the inplace reverse of the sorted vector can be avoided if we invert the sorting criterion.

stack.sort_by(|a, b| a.count.cmp(&b.count));
// stack.reverse();

I guess these are the main bottlenecks in this example.

On my computer, timing with <wiki-sample.txt >/dev/null gives these speedups:

original -->  × 1 (reference)
using l.find()+l.split_at() --> × 1.48
using words.entry() --> × 1.25
using both l.find()+l.split_at() and words.entry() --> × 1.73
using all the preceding and &str in Entry and avoiding reverse --> x 2.05

来源：https://stackoverflow.com/questions/65782157/rust-vs-python-program-performance-results-question

标签

python

performance

rust

text-processing