File-conversion of HTML-published Jupyter Notebook to Executable Jupyter.ipynb file

孤人 提交于 2020-01-06 09:57:39

问题


I have HTML-published renditions of Jupyter Notebooks that I need to convert in bulk back to executable Jupyter.ipynb files. I have found many discussions and approaches for how to go the other way, publish from a Jupyter.ipynb file to a HTML file. Included under the "File..." menu in every Jupyter NB Web Client is a function to publish to HTML or "Download As..." with HTML as one of many options. However, there's no "Import into Jupyter" or "Import from HTML" functions. Am I missing something in this scenario? This is not that uncommmon of a need.

Short of writing my own webscraper to scrape the HTML-published version of the Jupyter NB, and then programmatically creating the JSON NB structure of an IPython NB file format, is there an easier way to do this?

I've tried the following code from IPython notebook: Convert an HTML notebook to ipynb with decent results, but this only captures and converts code cells and markdown cell.

from bs4 import BeautifulSoup
import json
import urllib.request
url = 'http://nbviewer.jupyter.org/url/jakevdp.github.com/downloads/notebooks/XKCD_plots.ipynb'
response = urllib.request.urlopen(url)
#  for local html file
# response = open("/Users/note/jupyter/notebook.html")
text = response.read()

soup = BeautifulSoup(text, 'lxml')
# see some of the html
print(soup.div)
dictionary = {'nbformat': 4, 'nbformat_minor': 1, 'cells': [], 'metadata': {}}
for d in soup.findAll("div"):
    if 'class' in d.attrs.keys():
        for clas in d.attrs["class"]:
            if clas in ["text_cell_render", "input_area"]:
                # code cell
                if clas == "input_area":
                    cell = {}
                    cell['metadata'] = {}
                    cell['outputs'] = []
                    cell['source'] = [d.get_text()]
                    cell['execution_count'] = None
                    cell['cell_type'] = 'code'
                    dictionary['cells'].append(cell)

                else:
                    cell = {}
                    cell['metadata'] = {}

                    cell['source'] = [d.decode_contents()]
                    cell['cell_type'] = 'markdown'
                    dictionary['cells'].append(cell)
open('notebook.ipynb', 'w').write(json.dumps(dictionary))

It doesn't convert the entire notebook, nor does it do it in batch mode.

来源:https://stackoverflow.com/questions/54194442/file-conversion-of-html-published-jupyter-notebook-to-executable-jupyter-ipynb-f

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!