I am trying to create a random real, integers, alphanumeric, alpha strings and then writing to a file till the file size reaches 10MB.
The code is a
You literally create billions of objects which you then quickly throw away. In this case, it's probably better to write the strings directly into the file instead of concatenating them with ''.join()
.
Two major reasons for observed "slowness":
write()
about one million times.Create your data in a Python data structure first and call write()
only once.
This is faster:
t0 = time.time()
open("bla.txt", "wb").write(''.join(random.choice(string.ascii_lowercase) for i in xrange(10**7)))
d = time.time() - t0
print "duration: %.2f s." % d
Output: duration: 7.30 s.
Now the program spends most of its time generating the data, i.e. in random
stuff. You can easily see that by replacing random.choice(string.ascii_lowercase)
with e.g. "a"
. Then the measured time drops to below one second on my machine.
And if you want to get even closer to seeing how fast your machine really is when writing to disk, use Python's fastest (?) way to generate largish data before writing it to disk:
>>> t0=time.time(); chunk="a"*10**7; open("bla.txt", "wb").write(chunk); d=time.time()-t0; print "duration: %.2f s." % d
duration: 0.02 s.
The while loop under main calls generate_alphanumeric
, which chooses several characters out of (fresh randomly generated) strings composed of twelve ascii letters and twelve numbers. That's basically the same as choosing randomly either a random letter or a random number twelve times. That's your main bottleneck. This version will make your code one order of magnitude faster:
def generate_alphanumeric(self):
res = ''
for i in range(12):
if random.randrange(2):
res += random.choice(string.ascii_lowercase)
else:
res += random.choice(string.digits)
return res
I'm sure it can be improved upon. I suggest you take your profiler for a spin.