I am looking to convert a string array to a byte array in GO so I can write it down to a disk. What is an optimal solution to encode and decode a string array ([]string
The gob package will do this for you http://godoc.org/encoding/gob
Example to play with http://play.golang.org/p/e0FEZm-qiS
same source code is below.
package main
import (
"bytes"
"encoding/gob"
"fmt"
)
func main() {
// store to byte array
strs := []string{"foo", "bar"}
buf := &bytes.Buffer{}
gob.NewEncoder(buf).Encode(strs)
bs := buf.Bytes()
fmt.Printf("%q", bs)
// Decode it back
strs2 := []string{}
gob.NewDecoder(buf).Decode(&strs2)
fmt.Printf("%v", strs2)
}
To illustrate the problem, convert []string
to []byte
and then convert []byte
back to []string
, here's a simple solution:
package main
import (
"encoding/binary"
"fmt"
)
const maxInt32 = 1<<(32-1) - 1
func writeLen(b []byte, l int) []byte {
if 0 > l || l > maxInt32 {
panic("writeLen: invalid length")
}
var lb [4]byte
binary.BigEndian.PutUint32(lb[:], uint32(l))
return append(b, lb[:]...)
}
func readLen(b []byte) ([]byte, int) {
if len(b) < 4 {
panic("readLen: invalid length")
}
l := binary.BigEndian.Uint32(b)
if l > maxInt32 {
panic("readLen: invalid length")
}
return b[4:], int(l)
}
func Decode(b []byte) []string {
b, ls := readLen(b)
s := make([]string, ls)
for i := range s {
b, ls = readLen(b)
s[i] = string(b[:ls])
b = b[ls:]
}
return s
}
func Encode(s []string) []byte {
var b []byte
b = writeLen(b, len(s))
for _, ss := range s {
b = writeLen(b, len(ss))
b = append(b, ss...)
}
return b
}
func codecEqual(s []string) bool {
return fmt.Sprint(s) == fmt.Sprint(Decode(Encode(s)))
}
func main() {
var s []string
fmt.Println("equal", codecEqual(s))
s = []string{"", "a", "bc"}
e := Encode(s)
d := Decode(e)
fmt.Println("s", len(s), s)
fmt.Println("e", len(e), e)
fmt.Println("d", len(d), d)
fmt.Println("equal", codecEqual(s))
}
Output:
equal true
s 3 [ a bc]
e 19 [0 0 0 3 0 0 0 0 0 0 0 1 97 0 0 0 2 98 99]
d 3 [ a bc]
equal true
Lets ignore the fact that this is Go for a second. The first thing you need is a serialization format to marshal the []string
into.
There are many option here. You could build your own or use a library. I am going to assume you don't want to build your own and jump to serialization formats go supports.
In all examples, data is the []string
and fp is the file you are reading/writing to. Errors are being ignored, check the returns of functions to handle errors.
Gob is a go only binary format. It should be relatively space efficient as the number of strings increases.
enc := gob.NewEncoder(fp)
enc.Encode(data)
Reading is also simple
var data []string
dec := gob.NewDecoder(fp)
dec.Decode(&data)
Gob is simple and to the point. However, the format is only readable with other Go code.
Next is json. Json is a format used just about everywhere. This format is just as easy to use.
enc := json.NewEncoder(fp)
enc.Encode(data)
And for reading:
var data []string
dec := json.NewDecoder(fp)
dec.Decode(&data)
XML is another common format. However, it has pretty high overhead and not as easy to use. While you could just do the same you did for gob and json, proper xml requires a root tag. In this case, we are using the root tag "Strings" and each string is wrapped in an "S" tag.
type Strings struct {
S []string
}
enc := xml.NewEncoder(fp)
enc.Encode(Strings{data})
var x Strings
dec := xml.NewDecoder(fp)
dec.Decode(&x)
data := x.S
CSV is different from the others. You have two options, use one record with n rows or n records with 1 row. The following example uses n records. It would be boring if I used one record. It would look too much like the others. CSV can ONLY hold strings.
enc := csv.NewWriter(fp)
for _, v := range data {
enc.Write([]string{v})
}
enc.Flush()
To read:
var err error
var data string
dec := csv.NewReader(fp)
for err == nil { // reading ends when an error is reached (perhaps io.EOF)
var s []string
s, err = dec.Read()
if len(s) > 0 {
data = append(data, s[0])
}
}
Which format you use is a matter of preference. There are many other possible encodings that I have not mentioned. For example, there is an external library called bencode. I don't personally like bencode, but it works. It is the same encoding used by bittorrent metadata files.
If you want to make your own encoding, encoding/binary is a good place to start. That would allow you to make the most compact file possible, but I hardly thing it is worth the effort.
You can do something like this:
var lines = []string
var ctx = []byte{}
for _, s := range lines {
ctx = append(ctx, []byte(s)...)
}
I would suggest to use PutUvarint and Uvarint for storing/retrieving len(s)
and using []byte(str)
to pass str
to some io.Writer
. With a string length known from Uvarint
, one can buf := make([]byte, n)
and pass the buf
to some io.Reader
.
Prepend the whole thing with length of the string array and repeat the above for all of its items. Reading the whole thing back is again reading first the outer length and repeating n-times the item read.
to convert []string
to []byte
var str = []string{"str1","str2"}
var x = []byte{}
for i:=0; i<len(str); i++{
b := []byte(str[i])
for j:=0; j<len(b); j++{
x = append(x,b[j])
}
}
to convert []byte
to string
str := ""
var x = []byte{'c','a','t'}
for i := 0; i < len(x); i++ {
str += string(x[i])
}