I\'m working on writing a function in Clojure that will process a file character by character. I know that Java\'s BufferedReader class has the read() method that reads one
(with-open [reader (clojure.java.io/reader "path/to/file")] ...
I prefer this way to get a reader
in clojure. And, by character by character
, do you mean in file access level, like read
, which allow you control how many bytes
to read?
As @deterb pointed out, let's check the source code of line-seq
(defn line-seq
"Returns the lines of text from rdr as a lazy sequence of strings.
rdr must implement java.io.BufferedReader."
{:added "1.0"
:static true}
[^java.io.BufferedReader rdr]
(when-let [line (.readLine rdr)]
(cons line (lazy-seq (line-seq rdr)))))
I faked a char-seq
(defn char-seq
[^java.io.Reader rdr]
(let [chr (.read rdr)]
(if (>= chr 0)
(cons chr (lazy-seq (char-seq rdr))))))
I know this [1], but I think it shows that you can directly call char-seq
reads all chars into memory.read
on BufferedReader
. So, you can write your code like this:
(let [chr (.read rdr)]
(if (>= chr 0)
;do your work here
))
How do you think?
[1] According to @dimagog's comment, char-seq
not read all char into memory thanks to lazy-seq
You're pretty close - keep in mind that Strings are a sequence. (concat "abc" "def")
results in the sequence (\a \b \c \d \e \f)
.
mapcat
is another really useful function for this - it will lazily concatenate the results of applying the mapping fn to the sequence. This means that mapcat
ing the result of converting all of the line strings to a seq
will be the lazy sequence of characters you're after.
I did this as (mapcat seq (line-seq reader))
.
For other advice:
clojure.java.io/reader
function instead of directly creating the classes.withopen
clause, being able to test the actual processing code outside of the file reading code is quite useful.When navigating multiple (potentially nested) sequences consider using for
. for
does a nice job handling nested for loop type cases.
(take 100 (for [line (repeat "abc") char (seq line)] (prn char)))
Use prn
for debugging output. It gives you real output, as compared to user output (which hides certain details which users don't normally care about).
I'm not familiar with Java or the read() method, so I won't be able to help you out with implementing it.
One first thought is maybe to simplify by using slurp, which will return a string of the text of the entire file with just (slurp filename)
. However, this would get the whole file, which maybe you don't want.
Once you have a string of the entire file text, you can process any string character by character by simply treating it as though it were a sequence of characters. For example:
=> (doseq [c "abcd"]
(prntln c))
a
b
c
d
=> nil
Or:
=> (remove #{\c} "abcd")
=> (\a \b \d)
You could use map
or reduce
or any sort of sequence manipulating function. Note that after manipulating it like a sequence, it will now return as a sequence, but you could easily wrap the outer part in (reduce str ...)
to return it back to a string at the end--explicitly:
=> (reduce str (remove #{\c} "abcd"))
=> "abd"
As for your problem with your specific code, I think the problem lies with what words
is: a vector of strings. When you print each words
you are printing a vector. If at the end you replaced the line (println words)
with (doseq [w words] (println w)))
, then it should work great.
Also, based on what you say you want your output to look like (a vector of all the different words in the file), you wouldn't want to only do (println w)
at the base of your expression, because this will print values and return nil
. You would simply want w
. Also, you would want to replace your doseq
s with for
s--again, to avoid return nil
.
Also, on improving your code, it looks generally great to me, but--and this is going with all the first change I suggest above (but not the others, because I don't want to draw it all out explicitly)--you could shorten it with a fun little trick:
(doseq [item seq]
(let [words (split item #"\s")]
(doseq [w words]
(println w))))
;//Could be rewritten as...
(doseq [item s
:let [words (split item #"\s")]
w words]
(println w))