问题
Background
I want to show part of my bookmarks on my Hugo website. The bookmarks from Firefox can be saved in JSON format, this is the source. The result should represent the nested structure somehow, in a format of a nested list, treeview or accordion. The source files of contents on the website are written in markdown. I want to generate a markdown file from the JSON input.
As I searched for possible solutions:
- treeview or accordion: HTML, CSS and Javascript needed. I could not nest accordions with the
<details>
tag. Also, seems like overkill at the moment. - unordered list: can be done with bare markdown.
I chose to generate an unordered nested list from JSON. I would like to do this with R.
Input/output
Input sample: https://gist.github.com/hermanp/c01365b8f4931ea7ff9d1aee1cbbc391
Preferred output (indentation with two spaces):
- Info
- Python
- [The Ultimate Python Beginner's Handbook](https://www.freecodecamp.org/news/the-python-guide-for-beginners/)
- [Python Like You Mean It](https://www.pythonlikeyoumeanit.com/index.html)
- [Automate the Boring Stuff with Python](https://automatetheboringstuff.com/)
- [Data science Python notebooks](https://github.com/donnemartin/data-science-ipython-notebooks)
- Frontend
- [CodePen](https://codepen.io/)
- [JavaScript](https://www.javascript.com/)
- [CSS-Tricks](https://css-tricks.com/)
- [Butterick’s Practical Typography](https://practicaltypography.com/)
- [Front-end Developer Handbook 2019](https://frontendmasters.com/books/front-end-handbook/2019/)
- [Using Ethics In Web Design](https://www.smashingmagazine.com/2018/03/using-ethics-in-web-design/)
- [Client-Side Web Development](https://info340.github.io/)
- [Stack Overflow](https://stackoverflow.com/)
- [HUP](https://hup.hu/)
- [Hope in Source](https://hopeinsource.com/)
Bonus preferred output: show favicons before links, like below (other suggestion welcomed, like loading them from the website's server instead of linking):
- ![https://cdn.sstatic.net/Sites/stackoverflow/Img/apple-touch-icon.png?v=c78bd457575a][Stack Overflow](https://stackoverflow.com/)
Attempt
generate_md <- function (file) {
# Encoding problem with tidyjson::read_json
bmarks_json_lite <- jsonlite::fromJSON(
txt = paste0("https://gist.githubusercontent.com/hermanp/",
"c01365b8f4931ea7ff9d1aee1cbbc391/raw/",
"33c21c88dad35145e2792b6258ede9c882c580ec/",
"bookmarks-example.json"))
# This is the start point, a data frame
level1 <- bmarks_json_lite$children$children[[2]]
# Get the name of the variable to modify it.
# Just felt that some abstraction needed.
varname <- deparse(substitute(level1))
varlevel <- as.integer(substr(varname, nchar(varname), nchar(varname)))
# Get through the data frame by its rows.
for (i in seq_len(nrow(get(varname)))) {
# If the type of the element in the row is "text/x-moz-place",
# then get its title and create a markdown list element from it.
if (get(varname)["type"][i] == "text/x-moz-place"){
# The two space indentation shall be multiplied as many times
# as deeply nested in the lists (minus one).
md_title <- paste0(strrep(" ", varlevel - 1),
"- ",
get(varname)["title"][i],
"\n")
# Otherwise do this and also get inside the next level.
} else if (get(varname)["type"][i] == "text/x-moz-place-container") {
md_title <- paste0(strrep(" ", varlevel - 1),
"- ",
get(varname)["title"][i],
"\n")
# I know this is not good, just want to express my thought.
# Create the next, deeper level's variable, whoose name shall
# represent the depth in the nest.
# Otherwise how can I multiply the indentation for the markdown
# list elements? It depends on the name of this variable.
varname <- paste0(regmatches(varname, regexpr("[[:alpha:]]+", varname)),
varlevel + 1L)
varlevel <- varlevel + 1L
assign(varname, get(varname)["children"][[i]])
# The same goes on as seen at the higher level.
for (j in seq_len(nrow(get(varname)))){
if (get(varname)["type"][i] == "text/x-moz-place"){
md_title <- paste0(strrep(" ", varlevel - 1),
"- ",
get(varname)["title"][i],
"\n")
} else if (get(varname)["type"][i] == "text/x-moz-place-container") {
md_title <- paste0(strrep(" ", varlevel - 1),
"- ",
get(varname)["title"][i],
"\n")
varname <- paste0(regmatches(varname, regexpr("[[:alpha:]]+", varname)),
varlevel + 1L)
varlevel <- varlevel + 1L
assign(varname, get(varname)["children"][[i]])
for (k in seq_len(nrow(get(varname)))){
# I don't know where this goes...
# Also I need to paste somewhere the md_title strings to get the
# final markdown output...
}
}
}
}
}
}
Question
How can I recursively grab and paste strings from this JSON file? I tried to search for tips in recursion, but it's quite a hard topic. Any suggestion, package, function, link will be welcomed!
回答1:
After I watched a few videos on recursion and saw a few code examples, I tried, manually stepped through the code and somehow managed to do it with recursion. This solution is independent on the nestedness of the bookmarks, therefore a generalized solution for everyone.
Note: all the bookmarks were in the Bookmarks Toolbar in Firefox. This is highlighted in the generate_md
function. You can tackle with it there. If I improve the answer later, I will make it more general.
library(jsonlite)
# This function recursively converts the bookmark titles to unordered
# list items.
recursive_func <- function (level) {
md_result <- character()
# Iterate through the current data frame, which may have a children
# column nested with other data frames.
for (i in seq_len(nrow(level))) {
# If this element is a bookmark and not a folder, then grab
# the title and construct a list item from it.
if (level[i, "type"] == "text/x-moz-place"){
md_title <- level[i, "title"]
md_uri <- level[i, "uri"]
md_iconuri <- level[i, "iconuri"]
# Condition: the URLs all have schema (http or https) part.
# If not, filname will be a zero length character vector.
host_url <- regmatches(x = md_uri,
m = regexpr(pattern = "(?<=://)[[:alnum:].-]+",
text = md_uri,
perl = T))
md_link <- paste0("[", md_title, "]", "(", md_uri, ")")
md_listitem <- paste0("- ", md_link, "\n")
# If this element is a folder, then get into it, call this
# function over it. Insert two space (for indentation) in
# the generated sting before every list item. Paste this
# list of items to the folder list item.
} else if (level[i, "type"] == "text/x-moz-place-container") {
md_title <- level[i, "title"]
md_listitem <- paste0("- ", md_title, "\n")
md_recurs <- recursive_func(level = level[i, "children"][[1]])
md_recurs <- gsub("(?<!(\\w ))-(?= )", " -", md_recurs, perl = T)
md_listitem <- paste0(md_listitem, md_recurs)
}
# Collect and paste the list items of the current data frame.
md_result <- paste0(md_result, md_listitem)
}
# Return the (sub)list of the data frame.
return(md_result)
}
generate_md <- function (jsonfile) {
# Encoding problem with tidyjson::read_json
bmarks_json_lite <- fromJSON(txt = jsonfile)
# This is the start point, a data frame. It represents the
# elements inside the Bookmarks Toolbar in Firefox.
level1 <- bmarks_json_lite$children$children[[2]]
# Do not know how to make it prettier, but it works.
markdown_result <- recursive_func(level = level1)
return(markdown_result)
}
You can run the generate_md
function with the example.
generate_md(paste0("https://gist.githubusercontent.com/hermanp/",
"c01365b8f4931ea7ff9d1aee1cbbc391/raw/",
"33c21c88dad35145e2792b6258ede9c882c580ec/",
"bookmarks-example.json"))
# Output
[1] "- Info\n - Python\n - [The Ultimate Python Beginner's Handbook](https://www.freecodecamp.org/news/the-python-guide-for-beginners/)\n - [Python Like You Mean It](https://www.pythonlikeyoumeanit.com/index.html)\n - [Automate the Boring Stuff with Python](https://automatetheboringstuff.com/)\n - [Data science Python notebooks](https://github.com/donnemartin/data-science-ipython-notebooks)\n - Frontend\n - [CodePen](https://codepen.io/)\n - [JavaScript](https://www.javascript.com/)\n - [CSS-Tricks](https://css-tricks.com/)\n - [Butterick’s Practical Typography](https://practicaltypography.com/)\n - [Front-end Developer Handbook 2019](https://frontendmasters.com/books/front-end-handbook/2019/)\n - [Using Ethics In Web Design](https://www.smashingmagazine.com/2018/03/using-ethics-in-web-design/)\n - [Client-Side Web Development](https://info340.github.io/)\n - [Stack Overflow](https://stackoverflow.com/)\n - [HUP](https://hup.hu/)\n - [Hope in Source](https://hopeinsource.com/)\n"
You can cat
it and write it to a file also with writeLines
. But bevare! In Windows environments, you probably need to turn useBytes = TRUE
to get the correct characters in the file. Reference: UTF-8 file output in R
cat(generate_md(paste0("https://gist.githubusercontent.com/hermanp/",
"c01365b8f4931ea7ff9d1aee1cbbc391/raw/",
"33c21c88dad35145e2792b6258ede9c882c580ec/",
"bookmarks-example.json")))
# Output
- Info
- Python
- [The Ultimate Python Beginner's Handbook](https://www.freecodecamp.org/news/the-python-guide-for-beginners/)
- [Python Like You Mean It](https://www.pythonlikeyoumeanit.com/index.html)
- [Automate the Boring Stuff with Python](https://automatetheboringstuff.com/)
- [Data science Python notebooks](https://github.com/donnemartin/data-science-ipython-notebooks)
- Frontend
- [CodePen](https://codepen.io/)
- [JavaScript](https://www.javascript.com/)
- [CSS-Tricks](https://css-tricks.com/)
- [Butterick’s Practical Typography](https://practicaltypography.com/)
- [Front-end Developer Handbook 2019](https://frontendmasters.com/books/front-end-handbook/2019/)
- [Using Ethics In Web Design](https://www.smashingmagazine.com/2018/03/using-ethics-in-web-design/)
- [Client-Side Web Development](https://info340.github.io/)
- [Stack Overflow](https://stackoverflow.com/)
- [HUP](https://hup.hu/)
- [Hope in Source](https://hopeinsource.com/)
There was a problem with the regex part. If there are bookmarks with some - title
(space, hyphen, space) characters in their titles, these hyphens will also be "indented" as the list items.
# Input JSON
https://gist.github.com/hermanp/381eaf9f2bf5f2b9cdf22f5295e73eb5
cat(generate_md(paste0("https://gist.githubusercontent.com/hermanp/",
"381eaf9f2bf5f2b9cdf22f5295e73eb5/raw/",
"76b74b2c3b5e34c2410e99a3f1b6ef06977b2ec7/",
"bookmarks-example-hyphen.json")))
# Output (two space indentation) markdown:
- Info
- Python
- [The Ultimate Python Beginner's Handbook](https://www.freecodecamp.org/news/the-python-guide-for-beginners/)
- [Python Like You Mean It](https://www.pythonlikeyoumeanit.com/index.html)
- [Automate the Boring Stuff with Python](https://automatetheboringstuff.com/)
- [Data science Python notebooks](https://github.com/donnemartin/data-science-ipython-notebooks)
- Frontend
- [CodePen](https://codepen.io/)
- [JavaScript - Wikipedia](https://en.wikipedia.org/wiki/JavaScript) # correct
- [CSS-Tricks](https://css-tricks.com/)
- [Butterick’s Practical Typography](https://practicaltypography.com/)
- [Front-end Developer Handbook 2019](https://frontendmasters.com/books/front-end-handbook/2019/)
- [Using Ethics In Web Design](https://www.smashingmagazine.com/2018/03/using-ethics-in-web-design/)
- [Client-Side Web Development](https://info340.github.io/)
- [Stack Overflow](https://stackoverflow.com/)
- [HUP](https://hup.hu/)
- [Hope in Source](https://hopeinsource.com/)
I posted another question about this problem. After some hint and try I answered my own question.
来源:https://stackoverflow.com/questions/65111588/convert-firefox-bookmarks-json-file-to-markdown