Python Pandas equivalent in JavaScript

前端 未结 9 1052
时光取名叫无心
时光取名叫无心 2021-01-29 17:24

With this CSV example:

   Source,col1,col2,col3
   foo,1,2,3
   bar,3,4,5

The standard method I use Pandas is this:

  1. Parse CSV<

9条回答
  •  时光取名叫无心
    2021-01-29 18:13

    All answers are good. Hoping my answer is comprehensive (i.e. tries to list all options). I hope to return and revise this answer with any criteria to help make a choice.

    I hope anyone coming here is familiar with d3. d3 is very useful "swiss army knife" for handling data in Javascript, like pandas is helpful for Python. You may see d3 used frequently like pandas, even if d3 is not exactly a DataFrame/Pandas replacement (i.e. d3 doesn't have the same API; d3 doesn't have Series / DataFrame which behave like in pandas)

    Ahmed's answer explains how d3 can be used to achieve some DataFrame functionality, and some of the libraries below were inspired by things like LearnJsData which uses d3 and lodash.

    As for DataFrame-focused-features , I was overwhelmed with JS libraries which help. Here's a quick list of some of the options you might've encountered. I haven't checked any of them in detail yet (Most I found in combination Google + NPM search).

    Be careful you use a variety that you can work with; some are Node.js aka Server-side Javascript, some are browser-compatible aka client-side Javascript. Some are Typescript.

    • pandas-js
      • From STEEL and Feras' answers
      • "pandas.js is an open source (experimental) library mimicking the Python pandas library. It relies on Immutable.js as the NumPy logical equivalent. The main data objects in pandas.js are, like in Python pandas, the Series and the DataFrame."
    • dataframe-js
      • "DataFrame-js provides an immutable data structure for javascript and datascience, the DataFrame, which allows to work on rows and columns with a sql and functional programming inspired api."
    • data-forge
      • Seen in Ashley Davis' answer
      • "JavaScript data transformation and analysis toolkit inspired by Pandas and LINQ."
      • Note the old data-forge JS repository is no longer maintained; now a new repository uses Typescript
    • jsdataframe
      • "Jsdataframe is a JavaScript data wrangling library inspired by data frame functionality in R and Python Pandas."
    • dataframe
      • "explore data by grouping and reducing."

    Then after coming to this question, checking other answers here and doing more searching, I found options like:

    • Apache Arrow in JS
      • Thanks to user Back2Basics suggestion:
      • "Apache Arrow is a columnar memory layout specification for encoding vectors and table-like containers of flat and nested data. Apache Arrow is the emerging standard for large in-memory columnar data (Spark, Pandas, Drill, Graphistry, ...)"
    • Observable
      • At first glance, seems like a JS alternative to the IPython/Jupyter "notebooks"
      • Observable's page promises: "Reactive programming", a "Community", on a "Web Platform"
      • See 5 minute intro here
    • recline (from Rufus' answer)
      • I expected an emphasis on DataFrame's API, which Pandas itself tries to preserve from R document its replacement/improvement/correspondence to every R function.
      • Instead I find an emphasis recline's example emphasizes the jQuery way of getting data into the DOM its (awesome) Multiview (the UI), which doesn't require jQuery but does require a browser! More examples
      • ...or an emphasis on its MVC-ish architecture; including back-end stuff (i.e. database connections)
      • I am probably being too harsh; after all, one of the nice things about pandas is how it can create visualizations easily; out-of-the-box.
    • js-data
      • Really more of an ORM! Most of its modules correspond to different data storage questions (js-data-mongodb, js-data-redis, js-data-cloud-datastore), sorting, filtering, etc.
      • On plus-side does work on Node.js as a first-priority; "Works in Node.js and in the Browser."
    • miso (another suggestion from Rufus)
      • Impressive backers like Guardian and bocoup.
    • AlaSQL
      • "AlaSQL" is an open source SQL database for Javascript with a strong focus on query speed and data source flexibility for both relational data and schemaless data. It works in your browser, Node.js, and Cordova."
    • Some thought experiments:
      • "Scaling a DataFrame in Javascript" - Gary Sieling

    I hope this post can become a community wiki, and evaluate (i.e. compare the different options above) against different criteria like:

    • Panda's criterias in its R comparison
      • Performance
      • Functionality/flexibility
      • Ease-of-use
    • My own suggestions
      • Similarity to Pandas / Dataframe API's
      • Specifically hits on their main features
      • Data-science emphasis > UI emphasis
      • Demonstrated integration in combination with other tools like Jupyter (interactive notebooks), etc

    Some things a JS library may never do (but could it?)

    • Use an underlying framework that is best-in-class Javascript numbers/math library? (i.e. an equivalent of a NumPy)
    • Use any optimizing/compilers that might result in faster code (i.e. an equivalent of Pandas' use of Cython)
      • On the relation between above NumPy and Cython and here and here
    • Sponsored by any data-science-flavored consortiums, ala Pandas and NumFocus

提交回复
热议问题