How to make a generator callable?

后端未结

关注

 3  625

I\'m trying to create a dataset from a CSV file with 784-bit long rows. Here\'s my code:

import tensorflow as tf

f = open(\"test.csv\", \"r\")
csvreader = c


                      
              相关标签:


      
      
        
          3条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  有刺的猬        
                
              
                            
                2021-01-04 10:21
              
            
            
                                                                       
From the docs, which you linked:


  The generator argument must be a callable object that returns an
  object that support the iter() protocol (e.g. a generator function)


This means you should be able to do something like this:

import tensorflow as tf
import csv

with open("test.csv", "r") as f:
    csvreader = csv.reader(f)
    gen = lambda: (row for row in csvreader)
    ds = tf.data.Dataset()
    ds.from_generator(gen, [tf.uint8]*28**2)


In other words, the function you pass must produce a generator when called. This is easy to achieve when making it an anonymous function (a lambda).

Alternatively try this, which is closer to how it is done in the docs:

import tensorflow as tf
import csv


def read_csv(file_name="test.csv"):
    with open(file_name) as f:
        reader = csv.reader(f)
        for row in reader:
            yield row

ds = tf.data.Dataset.from_generator(read_csv, [tf.uint8]*28**2)


(If you need a different file name than whatever default you set, you can use functools.partial(read_csv, file_name="whatever.csv").)

The difference is that the read_csv function returns the generator object when called, whereas what you constructed is already the generator object and equivalent to doing:

gen = read_csv()
ds = tf.data.Dataset.from_generator(gen, [tf.uint8]*28**2)  # does not work

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  被撕碎了的回忆        
                
              
                            
                2021-01-04 10:27
              
            
            
                                                                       
Yuck, two years later... But hey! Another solution! :D 

This might not be the cleanest answer but for generators that are more complicated, you can use a decorator. I made a generator that yields two dictionaries, for example:

>>> train,val = dataloader("path/to/dataset")
>>> x,y = next(train)
>>> print(x)
{"data": [...], "filename": "image.png"}

>>> print(y)
{"category": "Dog", "category_id": 1, "background": "park"}


When I tried using the from_generator, it gave me the error:

>>> ds_tf = tf.data.Dataset.from_generator(
    iter(mm),
    ({"data":tf.float32, "filename":tf.string},
    {"category":tf.string, "category_id":tf.int32, "background":tf.string})
    )
TypeError: `generator` must be callable.


But then I wrote a decorating function

>>> def make_gen_callable(_gen):
        def gen():
            for x,y in _gen:
                 yield x,y
        return gen
>>> train_ = make_gen_callable(train)


>>> train_ds = tf.data.Dataset.from_generator(
    train_,
    ({"data":tf.float32, "filename":tf.string},
    {"category":tf.string, "category_id":tf.int32, "background":tf.string})
    )

>>> for x,y in train_ds:
        break

>>> print(x)
{'data': <tf.Tensor: shape=(320, 480), dtype=float32, ... >,
 'filename': <tf.Tensor: shape=(), dtype=string, ...> 
}

>>> print(y)
{'category': <tf.Tensor: shape=(), dtype=string, numpy=b'Dog'>,
 'category_id': <tf.Tensor: shape=(), dtype=int32, numpy=1>,
 'background': <tf.Tensor: shape=(), dtype=string, numpy=b'Living Room'>
}


But now, note that in order to iterate train_, one has to call it 

>>> for x,y in train_():
        do_stuff(x,y)
        ...

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  傲寒        
                
              
                            
                2021-01-04 10:28
              
            
            
                                                                       
The generator argument (perhaps confusingly) should not actually be a generator, but a callable returning an iterable (for example, a generator function). Probably the easiest option here is to use a lambda. Also, a couple of errors: 1) tf.data.Dataset.from_generator is meant to be called as a class factory method, not from an instance 2) the function (like a few other in TensorFlow) is weirdly picky about parameters, and it wants you to give the sequence of dtypes and each data row as tuples (instead of the lists returned by the CSV reader), you can use for example map for that:

import csv
import tensorflow as tf

with open("test.csv", "r") as f:
    csvreader = csv.reader(f)
    ds = tf.data.Dataset.from_generator(lambda: map(tuple, csvreader),
                                        (tf.uint8,) * (28 ** 2))

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复