How to provide a reproducible copy of your DataFrame with to_clipboard()

前端 未结 2 1488
余生分开走
余生分开走 2020-11-21 06:05

2018-09-18_reproducible_dataframe.ipynb

  • This question was previously marked as a duplicate of How to make good reproducible pandas examples.
    • Go to th
相关标签:
2条回答
  • 2020-11-21 06:20

    First: Do not add your data as an answer to the question

    How to quickly provide sample data from a pandas DataFrame

    • There is more than one way to answer this question. However, this answer isn't meant as an exhaustive solution. It provides the simplest method possible.
    • For the curious, there are other more verbose solutions provided on Stack Overflow.
    1. Provide a link to a shareable dataset (maybe on GitHub or a shared file on Google). This is particularly useful if it's a large dataset and the objective is to optimize some method. The drawback is that the data may no longer be available in the future, which reduces the benefit of the post.
      • Data must be provided in the question, but can be accompanied by a link to a more extensive dataset.
      • Do not post only a link or an image of the data.
    2. Provide the output of df.head(10).to_clipboard(sep=',', index=True)

    Code:

    Provide the output of pandas.DataFrame.to_clipboard

    df.head(10).to_clipboard(sep=',', index=True)
    
    • If you have a multi-index DataFrame add a note, telling which columns are the indices.
    • Note: when the previous line of code is executed, no output will appear.
      • The result of the code is now on the clipboard.
    • Paste the clipboard into a code block in your Stack Overflow question
    ,a,b
    2020-07-30,2,4
    2020-07-31,1,5
    2020-08-01,2,2
    2020-08-02,9,8
    2020-08-03,4,0
    2020-08-04,3,3
    2020-08-05,7,7
    2020-08-06,7,0
    2020-08-07,8,4
    2020-08-08,3,2
    
    • This can be copied to the clipboard by someone trying to answer your question, and followed by:
    df = pd.read_clipboard(sep=',')
    

    Locations of the dataframe other the .head(10)

    • Specify a section of the dataframe with the .iloc property
    • The following example selects rows 3 - 11 and all the columns
    df.iloc[3:12, :].to_clipboard(sep=',')
    

    Additional References for pd.read_clipboard

    • Specify Multi-Level columns using pd.read_clipboard?
    • How do you handle column names having spaces in them when using pd.read_clipboard?
    • How to handle custom named index when copying a dataframe using pd.read_clipboard?

    Google Colab Users

    • .to_clipboard() won't work
    • Use .to_dict() to copy your dataframe
    # if you have a datetime column, convert it to a str
    df['date'] = df['date'].astype('str')
    
    # if you have a datetime index, convert it to a str
    df.index = df.index.astype('str')
    
    # output to a dict
    df.head(10).to_dict(orient='index')
    
    # which will look like
    {'2020-07-30': {'a': 2, 'b': 4},
     '2020-07-31': {'a': 1, 'b': 5},
     '2020-08-01': {'a': 2, 'b': 2},
     '2020-08-02': {'a': 9, 'b': 8},
     '2020-08-03': {'a': 4, 'b': 0},
     '2020-08-04': {'a': 3, 'b': 3},
     '2020-08-05': {'a': 7, 'b': 7},
     '2020-08-06': {'a': 7, 'b': 0},
     '2020-08-07': {'a': 8, 'b': 4},
     '2020-08-08': {'a': 3, 'b': 2}}
    
    # copy the previous dict and paste into a code block on SO
    # the dict can be converted to a dataframe with 
    # df = pd.DataFrame.from_dict(d, orient='index')  # d is the name of the dict
    # convert datatime column or index back to datetime
    
    • For a more thorough answer using .to_dict()
      • How to efficiently build and share a sample dataframe?
      • How to make good reproducible pandas examples
    0 讨论(0)
  • 2020-11-21 06:35

    if you do something like print(df.head(20)) and paste the output in code format, then we can use pd.read_clipboard() to load the data into a dataframe. This approach works for the vast majority of questions posted under the pandas tag but fails miserably for questions involving multiindex

    0 讨论(0)
提交回复
热议问题