问题
This is being done in Typescript, but the algorithm is applicable to any language, I think.
I am sending log data (parsed AWS ALB logs) to New Relic, and their maximum payload size is 10^6 bytes. What I'm doing right now is encoding the entire ALB log I get from S3 in JSON, gzipping it, and then examining the size via Buffer.byteLength
. If it's in excess of 900,000 bytes (I want to leave some headroom, because the gzipped data doesn't exactly scale linearly with the number of log entries) I create a multiplier as 900,000 / byte length and break the log entries into chunks of that size as shown below.
This works, but I'm concerned that the algorithm won't work as well when the data are more heterogeneous. That 900,000 number is fairly arbitrary, after all. Is there a better way to break these records up? I suppose I could try and dynamically determine the optimal chunk size, but I feel like that would needlessly burn up a lot of CPU.
import { chunk } from 'lodash'
async function chunkify(messages: Array<unknown>): Promise<Array<Buffer>> {
const postdata = [{ logs: messages }]
const postdataGzipped: Buffer = (await gzip(
JSON.stringify(postdata)
)) as Buffer
if (postdataGzipped.byteLength < MaxPayloadSize) {
return [postdataGzipped]
} else {
const multiplier = MaxPayloadSize / postdataGzipped.byteLength
const chunkSize = Math.floor(messages.length * multiplier)
console.info(
`Break ${messages.length} messages into chunks of (up to) ${chunkSize} elements each`
)
const chunks: Buffer[] = await Promise.all(
chunk(messages, chunkSize).map(
(messageChunk) =>
gzip(JSON.stringify([{ logs: messageChunk }])) as Promise<Buffer>
)
)
return chunks
}
来源:https://stackoverflow.com/questions/64124467/breaking-gzipped-json-into-chunks-of-arbitrary-size