Storing a string as UTF8 in C#

前端 未结 4 736
忘了有多久
忘了有多久 2021-02-01 14:30

I\'m doing a lot of string manipulation in C#, and really need the strings to be stored one byte per character. This is because I need gigabytes of text simultaneously in memory

4条回答
  •  鱼传尺愫
    2021-02-01 15:05

    Not really. System.String is designed for storing strings. Your requirement is for a very particular subset of strings with particular memory benefits.

    Now, "very particular subset of strings with particular memory benefits" comes up a lot, but not always the same very particular subset. Code that is ASCII-only isn't for reading by human beings, so it tends to be either short codes, or something that can be handled in a stream-processing manner, or else chunks of text merged in with bytes doing other jobs (e.g. quite a few binary formats will have small bits that translate directly to ASCII).

    As such, you've a pretty strange requirement.

    All the more so when you come to the gigabytes part. If I'm dealing with gigs, I'm immediately thinking about how I can stop having to deal with gigs, and/or get much more serious savings than just 50%. I'd be thinking about mapping chunks I'm not currently interested in to a file, or about ropes, or about a bunch of other things. Of course, those are going to work for some cases and not for all, so yet again, we're not talking about something where .NET should stick in something as a one-size-fits-all, because one size will not fit all.

    Beyond that, just the utf-8 bit isn't that hard. It's all the other methods that becomes work. Again, what you need there won't be the same as someone else.

提交回复
热议问题