Using Keras ImageDataGenerator in a regression model

后端 未结 4 1531
别跟我提以往
别跟我提以往 2020-12-30 02:29

I want to use the

flow_from_directory

method of the

ImageDataGenerator

to generate training data for a

相关标签:
4条回答
  • 2020-12-30 02:54

    I think that organizing your data differently, using a DataFrame (without necessarily moving your images to new locations) will allow you to run a regression model. In short, create columns in your DataFrame containing the file path of each image and the target value. This allows your generator to keep regression values and images properly synced even when you shuffle your data at each epoch.

    Here is an example showing how to link images with binomial targets, multinomial targets and regression targets just to show that "a target is a target is a target" and only the model might change:

    df['path'] = df.object_id.apply(file_path_from_db_id)
    df
    
           object_id   bi  multi                                    path     target
    index                                                               
    0         461756  dog  white    /path/to/imgs/756/61/blah_461756.png   0.166831
    1        1161756  cat  black   /path/to/imgs/756/61/blah_1161756.png   0.058793
    2        3303651  dog  white   /path/to/imgs/651/03/blah_3303651.png   0.582970
    3        3367756  dog   grey   /path/to/imgs/756/67/blah_3367756.png  -0.421429
    4        3767756  dog   grey   /path/to/imgs/756/67/blah_3767756.png  -0.706608
    5        5467756  cat  black   /path/to/imgs/756/67/blah_5467756.png  -0.415115
    6        5561756  dog  white   /path/to/imgs/756/61/blah_5561756.png  -0.631041
    7       31255756  cat   grey  /path/to/imgs/756/55/blah_31255756.png  -0.148226
    8       35903651  cat  black  /path/to/imgs/651/03/blah_35903651.png  -0.785671
    9       44603651  dog  black  /path/to/imgs/651/03/blah_44603651.png  -0.538359
    10      49557622  cat  black  /path/to/imgs/622/57/blah_49557622.png  -0.295279
    11      58164756  dog   grey  /path/to/imgs/756/64/blah_58164756.png   0.407096
    12      95403651  cat  white  /path/to/imgs/651/03/blah_95403651.png   0.790274
    13      95555756  dog   grey  /path/to/imgs/756/55/blah_95555756.png   0.060669
    

    I describe how to do this in great detail with examples here:

    https://techblog.appnexus.com/a-keras-multithreaded-dataframe-generator-for-millions-of-image-files-84d3027f6f43

    0 讨论(0)
  • 2020-12-30 03:01

    At this moment (newest version of Keras from January 21st 2017) the flow_from_directory could only work in a following manner:

    1. You need to have a directories structured in a following manner:

      directory with images\
          1st label\
              1st picture from 1st label
              2nd picture from 1st label
              3rd picture from 1st label
              ...
          2nd label\
              1st picture from 2nd label
              2nd picture from 2nd label
              3rd picture from 2nd label
              ...
          ...
      
    2. flow_from_directory returns batches of a fixed size in a format of (picture, label).

    So as you can see it could only be used for a classification case and all options provided in a documentation specify only a way in which the class is provided to your classifier. But, there is a neat hack which could make a flow_from_directory useful for a regression task:

    1. You need to structure your directory in a following manner:

      directory with images\
          1st value (e.g. -0.95423)\
              1st picture from 1st value
              2nd picture from 1st value
              3rd picture from 1st value
              ...
          2nd value (e.g. - 0.9143242)\
              1st picture from 2nd value
              2nd picture from 2nd value
              3rd picture from 2nd value
              ...
         ...
      
    2. You also need to have a list list_of_values = [1st value, 2nd value, ...]. Then your generator is defined in a following manner:

      def regression_flow_from_directory(flow_from_directory_gen, list_of_values):
          for x, y in flow_from_directory_gen:
              yield x, list_of_values[y]
      

    And it's crucial for a flow_from_directory_gen to have a class_mode='sparse' to make this work. Of course this is a little bit cumbersome but it works (I used this solution :) )

    0 讨论(0)
  • 2020-12-30 03:01

    With Keras 2.2.4 you can use ".flow_from_dataframe" that solves what you want to do, allowing you to flow images from a directory for regression problems. You should store all your images in a folder and load a dataframe containing in one column the image IDs and in the other column the regression score (labels) and set "class_mode='other'" in ".flow_from_dataframe".

    Here you can find an example where the images are in "image_dir", the dataframe with the image IDs and the regression scores is loaded with pandas from "train file"

    train_label_df = pd.read_csv(train_file, delimiter=' ', header=None, names=['id', 'score'])
    
    train_datagen = ImageDataGenerator(rescale = 1./255, horizontal_flip = True,
                                       fill_mode = "nearest", zoom_range = 0.2,
                                       width_shift_range = 0.2, height_shift_range=0.2,
                                       rotation_range=30) 
    
    train_generator = train_datagen.flow_from_dataframe(dataframe=train_label_df, directory=image_dir, 
                                                  x_col="id", y_col="score", has_ext=True, 
                                                  class_mode="other", target_size=(img_width, img_height), 
                                                  batch_size=bs)
    
    0 讨论(0)
  • 2020-12-30 03:08

    There's just one glitch in the accepted answer that I would like to point out. The above code fails with an error message like:

    TypeError: only integer scalar arrays can be converted to a scalar index
    

    This is because y is an array. The fix is simple:

    def regression_flow_from_directory(flow_from_directory_gen,
                list_of_values):
        for x, y in flow_from_directory_gen:
            values = [list_of_values[y[i]] for i in range(len(y))]
            yield x, values
    

    The method to generate the list_of_values can be found in https://stackoverflow.com/a/47944082/4082092

    0 讨论(0)
提交回复
热议问题