Create report and upload to server for download

前端未结

关注

 2  430

I have a CSV file thats 300MB and has over 1/2 million entries. I want to run reports and make these reports available for download.

Here\'s the data structure:

Example

Install Solr

Read the documentation for how to install Solr. The following is a "kick-start" to make this demo work on Linux:

wget http://www.apache.org/dist/lucene/solr/3.5.0/apache-solr-3.5.0.tgz
tar zxvf apache-solr-3.5.0.tgz
cd apache-solr-3.5.0/example
java -jar start.jar

Solr admin screen available on the following URL

http://localhost:8983/solr/admin/

Generate sample data

Wrote a groovy script to generate a sample CSV file.

new File("data.csv").withWriter { writer ->
    writer.println "id;A_s;B_i;C_s;D_s;E_s;F_s"

    for (i in 1..500000) {
        writer.println "${i};${i*10};${i*20};${i*30};${i*40};${i*50};${i*60}"
    }
}

Example

id;A_s;B_i;C_s;D_s;E_s;F_s
1;10;20;30;40;50;60
2;20;40;60;80;100;120
3;30;60;90;120;150;180
4;40;80;120;160;200;240
5;50;100;150;200;250;300
6;60;120;180;240;300;360
7;70;140;210;280;350;420
8;80;160;240;320;400;480
9;90;180;270;360;450;540
..

Note:

I performed no customisation to the "out of the box" Solr settings. This meant

Default Solr configuration requires a unique id field
Appending "_s" to the column name indicates string fields. See dynamic fields feature in Solr
Loaded Column B as an integer (B_i) to enable numeric sorting

See the Solr Wiki for details on how to write a custom schema.

Load CSV file

The Linux curl command is used to load the data.csv file, by performing a HTTP post operation:

$ curl 'http://localhost:8983/solr/update/csv?separator=;&commit=true' -H 'Content-type:text/plain; charset=utf-8' --data-binary @data.csv

Note:

500,000 rows of data loaded in approx 90 seconds
Use separator parameter to indicate the data is not separated by the default ","
Use commit parameter to indicate data is committed to index at the end.

Query the data

The following query returns the top 50 rows sorted on column B

http://localhost:8983/solr/select/?q=*:*&rows=50&sort=B_i+desc&fl=id,A_s,B_i,C_s,D_s,E_s,F_s&wt=csv

Output is CSV formatted:

id,A_s,B_i,C_s,D_s,E_s,F_s
500000,5000000,10000000,15000000,20000000,25000000,30000000
499999,4999990,9999980,14999970,19999960,24999950,29999940
499998,4999980,9999960,14999940,19999920,24999900,29999880
499997,4999970,9999940,14999910,19999880,24999850,29999820
499996,4999960,9999920,14999880,19999840,24999800,29999760
..

REST parameters:

  |-------------------------------|------------------------------|
  | Rest parameter                | Description                  |                      
  |-------------------------------|------------------------------|
  | q=*:*                         | Everything.                  |
  | rows=50                       | Number of rows to output     |
  | sort=B_i+desc                 | descending order, sort on B_i|
  | fl=id,A_s,B_i,C_s,D_s,E_s,F_s | columns to include           |
  | wt=csv                        | CSV output                   |
  |-------------------------------|------------------------------|

0 讨论(0)

失恋的感觉

2020-12-11 14:04
I'm making one liners today, and Solr seems like overkill. Here's one to solve your problem (assuming no header row):
```
sort -t\; -k2 < test.csv | head -50
```
Add -n if you need the second field sorted numerically instead of alphabetically.
0 讨论(0)
发布评论:

提交评论
- 加载中...