Lisp - write to file using low memory footprint

我怕爱的太早我们不能终老 提交于 2020-07-03 09:37:44

问题


I have large hash tables that I am writing to disk as an occasional backup. I am finding that as I map the hash tables and write to a file, the RAM usage skyrockets compared to the size of the hash.

I am running lisp on emacs with slime and sbcl 2.0.3.176. System is Ubuntu 19.10 on a dell server.

Data is multiple levels of hash tables. The basic structure of it is:

customer-ht - hash table of structs called customer, keyed on lists of integers, like (1 2), (1 3)

(defstruct customer
  (var1 0)
  (var2 (make-hash-table))
  (var3 (make-hash-table)))

var2 hash table is simple key/value where keys are integers 1, 2, etc and value is always 'T

var3 hash table has keys that are integers and its value is another hash table where keys are lists of integers (1 2 3) (1 5 7) and value is always 'T

So, customer (1 2) has

  • var1 = 5,

  • var2 = hash table of key 3, value 'T

  • var3 = hash table of key 9, value = hash table of key (5 6 7), value 'T

I'm using this to map and write to file:

(defun write-cust-to-file (filename)
  (with-open-file (s filename
                    :direction :output
                    :if-exists :supersede)
    (maphash
      #'(lambda (cust-key cust-data)
          (format s "~A ~A~%" cust-key customer-var1)
          (maphash
           #'(lambda (k1 v1)
               (declare (ignore v1))
               (format s "~A ~A~%" cust-key k1))
           (customer-var2 cust-data))
          (maphash
           #'(lambda (k1 v1)
               (maphash
                #'(lambda (k2 v2)
                    (declare (ignore v2))
                    (format s "~A ~A~%" (list cust-key "X" k1) k2))
                v1))
           (customer-var3 cust-data)))
      customer-ht))
  nil)

There are more vars in the struct, like these, that are all written using same maphash/write code. So, each customer struct is quite large.

When I run this, my RAM explodes. All my data in RAM is around 20GB. When I run this, it goes to 40GB+. I'm starting to think that the maphashes are duplicating data from the structs as they run. If I run a similar write function to the maphash section above that uses k1 and k2 (2 nested mappings) on a hash that doesn't have a struct, no memory increase occurs.

Is there a way to write to file in LISP that doesn't use any extra RAM (or at least very little)? I'll take a performance hit to save my RAM.

Additional info: I ran dstat while running this and found that writing to disk is not continuous. It writes a large block (20MB-120MB) about every 30 seconds, with small 12K writes every 5 seconds or so. Also, RAM usage tops out before the function completes writing. So, is the data being stored somewhere while waiting to write to disk? Or is it just allocating some memory? Running (gc :full 'T) afterword recovers all the extra RAM.


回答1:


This isn't a complete answer. I think that whatever is causing the leakage is SBCL-specific, so probably your best bet is to find out where the SBCL people hang out (assuming it's not here) and ask them.

However one thing to do would be to instrument the GC to see if you can work out what's going on. You can do this by, for instance:

(defun dribble-gc-info ()
  (format *debug-io* "~&GC: ~D bytes consed~%"
          (sb-ext:get-bytes-consed)))

(defun hook-gc (&optional (log-file nil))
  (pushnew 'dribble-gc-info sb-ext:*after-gc-hooks*)
  (when log-file
    (setf (sb-ext:gc-logfile) log-file)))

(defun unhook-gc ()
  (setf sb-ext:*after-gc-hooks*
        (delete 'dribble-gc-info sb-ext:*after-gc-hooks*))
  (if (sb-ext:gc-logfile)
      (prog1 (sb-ext:gc-logfile)
        (setf (sb-ext:gc-logfile) nil))
      nil))

Then (hook-gc "/tmp/x.out") will cause it to both tell you when GCs run and how much memory has been consumed in total, and write copious information to /tmp/x.out. It may be that this would at least give you a start in working out what's happening.

Another thing which just conceivably might help would be to insert occasional calls to force-output on the stream you're writing to: it's possible (but I think unlikely) that some weird buffering is going on which is causing it to make bad decisions about how big the lisp-side buffer for the file should be.




回答2:


Try loop to loop over the hash-tables.

sth like:

(loop for k1 
      being the hash-key 
      using (hash-value v1) of (customer-var1 cust-data)
      do (format s "~A ~A~%" k1 v1))

Or if you don't need the values:

(loop for k being the hash-key of (customer-var2 cust-data)
      do (format <whatever you need...>))

Originally I thought maphash would collect values but it does not, as @tfb pointed out. Then I don't know.



来源:https://stackoverflow.com/questions/62317706/lisp-write-to-file-using-low-memory-footprint

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!