How to make a progress bar on a web page for pandas operation

人走茶凉 提交于 2020-01-22 13:57:21

问题


I have been googling for a while and couldn't figure out a way to do this. I have a simple Flask app which takes a CSV file, reads it into a Pandas dataframe, converts it and output as a new CSV file. I have managed to upload and convert it successfully with HTML

<div class="container">
  <form method="POST" action="/convert" enctype="multipart/form-data">
    <div class="form-group">
      <br />
      <input type="file" name="file">
      <input type="submit" name="upload"/>
    </div>
  </form>
</div>

where after I click submit, it runs the conversion in the background for a while and automatically triggers a download once it's done. The code that takes the result_df and triggers download looks like

@app.route('/convert', methods=["POST"])
def convert(
  if request.method == 'POST':
    # Read uploaded file to df
    input_csv_f = request.files['file']
    input_df = pd.read_csv(input_csv_f)
    # TODO: Add progress bar for pd_convert
    result_df = pd_convert(input_df)
    if result_df is not None:
      resp = make_response(result_df.to_csv())
      resp.headers["Content-Disposition"] = "attachment; filename=export.csv"
      resp.headers["Content-Type"] = "text/csv"
      return resp

I'd like to add a progress bar to pd_convert which is essentially a pandas apply operation. I found that tqdm works with pandas now and it has a progress_apply method instead of apply. But I'm not sure if it is relevant for making a progress bar on a web page. I guess it should be since it works on Jupyter notebooks. How do I add a progress bar for pd_convert() here?

The ultimate result I want is:

  1. User clicks upload, select the CSV file from their filesystem
  2. User clicks submit
  3. The progress bar starts to run
  4. Once the progress bar reaches 100%, a download is triggered

1 and 2 are done now. Then the next question is how to trigger the download. For now, my convert function triggers the download with no problem because the response is formed with a file. If I want to render the page I form a response with return render_template(...). Since I can only have one response, is it possible to have 3 and 4 with only one call to /convert?

Not a web developer, still learning about the basics. Thanks in advance!


====EDIT====

I tried the example here with some modifications. I get the progress from the row index in a for loop on the dataframe and put it in Redis. The client gets the progress from Redis from the stream by asking this new endpoint /progress. Something like

@app.route('/progress')
def progress():
  """Get percentage progress for the dataframe process"""
  r = redis.StrictRedis(
    host=redis_host, port=redis_port, password=redis_password, decode_responses=True)
  r.set("progress", str(0))
  # TODO: Problem, 2nd submit doesn't clear progress to 0%. How to make independent progress for each client and clear to 0% on each submit
  def get_progress():

    p = int(r.get("progress"))
    while p <= 100:
      p = int(r.get("progress"))
      p_msg = "data:" + str(p) + "\n\n"
      yield p_msg
      logging.info(p_msg)
      if p == 100:
        r.set("progress", str(0))
      time.sleep(1)

  return Response(get_progress(), mimetype='text/event-stream')

It is currently working but with some issues. The reason is definitely my lack of understanding in this solution.

Issues:

  • I need the progress to be reset to 0 every time submit button is pressed. I tried several places to reset it to 0 but haven't found the working version yet. It's definitely related to my lack of understanding in how stream works. Now it only resets when I refresh the page.
  • How to handle concurrent requests aka the Redis race condition? If multiple users make requests at the same time, the progress should be independent for each of them. I'm thinking about giving a random job_id for each submit event and make it the key in Redis. Since I don't need the entry after each job is done, I will just delete the entry after it's done.

I feel my missing part is the understanding of text/event-stream. Feeling I'm close to a working solution. Please share your opinion on what is the "proper" way to do this. I'm just guessing and trying to put together something that works with my very limited understanding.


回答1:


OK, I narrowed down the problems I was missing and figured it out. The concepts I needed include

Backend

  • Redis as a key-value database to store the progress which can be queried by endpoint /progress for an event stream (HTML5)
  • Server-Sent Event (SSE) for streaming the progress: text/event-stream MIME type response
  • Python generator in Flask app for SSE
  • Write progress (row index being processed) of a for loop on the Pandas dataframe to Redis

Frontend

  • Open the event stream: trigger SSE from the client side by an HTML button
  • Close the event stream: once the event data reaches 100%
  • Update the progress bar with the event stream dynamically using jQuery

The sample HTML

  <script>
  function getProgress() {
    var source = new EventSource("/progress");
    source.onmessage = function(event) {
      $('.progress-bar').css('width', event.data+'%').attr('aria-valuenow', event.data);
      $('.progress-bar-label').text(event.data+'%');

      // Event source closed after hitting 100%
      if(event.data == 100){
        source.close()
      }
    }
  }
  </script>

  <body>
    <div class="container">
      ...
      <form method="POST" action="/autoattr" enctype="multipart/form-data">
        <div class="form-group">
        ...
          <input type="file" name="file">
          <input type="submit" name="upload" onclick="getProgress()" />
        </div>
      </form>

      <div class="progress" style="width: 80%; margin: 50px;">
        <div class="progress-bar progress-bar-striped active"
          role="progressbar" aria-valuenow="0" aria-valuemin="0" aria-valuemax="100" style="width: 0%">
          <span class="progress-bar-label">0%</span>
        </div>
      </div>
    </div>
  </body>

Sample backend Flask code

redis_host = "localhost"
redis_port = 6379
redis_password = ""
r = redis.StrictRedis(
  host=redis_host, port=redis_port, password=redis_password, decode_responses=True)

@app.route('/progress')
def progress():
  """Get percentage progress for auto attribute process"""
  r.set("progress", str(0))
  def progress_stream():
    p = int(r.get("progress"))
    while p < 100:
      p = int(r.get("progress"))
      p_msg = "data:" + str(p) + "\n\n"
      yield p_msg
      # Client closes EventSource on 100%, gets reopened when `submit` is pressed
      if p == 100:
        r.set("progress", str(0))
      time.sleep(1)

  return Response(progress_stream(), mimetype='text/event-stream')

The rest is the code for Pandas for loop writing to Redis.

I pieced together a lot of the results from hours of Googling so I feel it's best to document here for people who also need this basic feature: add a progress bar in a Flask web app for Pandas dataframe processing.

Some useful references

• https://medium.com/code-zen/python-generator-and-html-server-sent-events-3cdf14140e56

• https://codeburst.io/polling-vs-sse-vs-websocket-how-to-choose-the-right-one-1859e4e13bd9

• What are Long-Polling, Websockets, Server-Sent Events (SSE) and Comet?



来源:https://stackoverflow.com/questions/55658488/how-to-make-a-progress-bar-on-a-web-page-for-pandas-operation

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!