Okay, so I've hit a brick wall.
Edit: Using bytes.IndexByte()
in my count()
function makes it run almost twice as fast. bytes.IndexByte()
is written in assembly instead of Go. Still not C speed, but closer.
I have two programs, one in C and one in Go that both count newlines in a file. Super simple. The C program runs in ~1.5 seconds, the Go in ~4.25 seconds on a 2.4GB file.
Am I hitting Go's speed limit? If so, what, exactly, is causing this? I can read C, but I can't read Assembly so comparing the C's asm and the Go's asm doesn't do much to me except show that the Go has ~400 more lines (ignoring the .ascii section).
While I know Go can't match C step-for-step, I wouldn't assume a 4x slowdown.
Ideas?
Here's the cpuprofile of the Go:
Here's the C (compiled w/ gcc -Wall -pedantic -O9
)
#include <stdio.h> #include <stdlib.h> #include <stdint.h> #include <string.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <errno.h> #define BUFFER_SIZE (16 * 1024) int main() { const char *file = "big.txt"; int fd = open (file, O_RDONLY); char buf[BUFFER_SIZE + 1]; uintmax_t bytes; size_t bytes_read; size_t lines; posix_fadvise (fd, 0, 0, POSIX_FADV_SEQUENTIAL); while ((bytes_read = safe_read (fd, buf, BUFFER_SIZE)) > 0) { char *p = buf; // error checking while ((p = memchr (p, '\n', (buf + bytes_read) - p))) { ++p; ++lines; } bytes += bytes_read; } printf("%zu\n", bytes); printf("%zu\n", lines); return 0; }
And the Go:
package main import ( "flag" "fmt" "io" "os" "runtime/pprof" "syscall" ) const ( POSIX_FADV_SEQUENTIAL = 2 NewLineByte = '\n' // or 10 BufferSize = (16 * 1024) + 1 ) var Buffer = make([]byte, BufferSize) func fadvise(file *os.File, off, length int, advice uint32) error { _, _, errno := syscall.Syscall6(syscall.SYS_FADVISE64, file.Fd(), uintptr(off), uintptr(length), uintptr(advice), 0, 0) if errno != 0 { return errno } return nil } func count(s []byte) int64 { count := int64(0) for i := 0; i < len(s); i++ { if s[i] == NewLineByte { count++ } } return count } func main() { file, err := os.Open("big.txt") if err != nil { panic(err) } var lines int64 var bytes int64 fadvise(file, 0, 0, POSIX_FADV_SEQUENTIAL) for { n, err := file.Read(Buffer) if err != nil && err != io.EOF { panic(err) } lines += count(Buffer[:n]) bytes += int64(n) if err == io.EOF { break } } fmt.Printf("%d\n", bytes) fmt.Printf("%d\n", lines) }