1、-- 导入数据
load data local inpath '/home/badou/Documents/data/order_data/orders.csv' overwrite into table orders;
2、每个用户有多少个订单
hive> select user_id,count(1) as order_cnt from orders group by user_id order by order_cnt desc limit 10; Total jobs = 2 Launching Job 1 out of 2 Number of reduce tasks not specified. Estimated from input data size: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapred.reduce.tasks=<number> Starting Job = job_202003192037_0003, Tracking URL = http://master:50030/jobdetails.jsp?jobid=job_202003192037_0003 Kill Command = /usr/local/src/hadoop-1.2.1/libexec/../bin/hadoop job -kill job_202003192037_0003 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2020-03-19 21:09:32,228 Stage-1 map = 0%, reduce = 0% 2020-03-19 21:09:44,551 Stage-1 map = 62%, reduce = 0% 2020-03-19 21:09:45,568 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 9.29 sec 2020-03-19 21:09:54,697 Stage-1 map = 100%, reduce = 33%, Cumulative CPU 9.29 sec 2020-03-19 21:09:57,727 Stage-1 map = 100%, reduce = 67%, Cumulative CPU 9.29 sec 2020-03-19 21:10:00,763 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 15.25 sec MapReduce Total cumulative CPU time: 15 seconds 250 msec Ended Job = job_202003192037_0003 Launching Job 2 out of 2 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapred.reduce.tasks=<number> Starting Job = job_202003192037_0004, Tracking URL = http://master:50030/jobdetails.jsp?jobid=job_202003192037_0004 Kill Command = /usr/local/src/hadoop-1.2.1/libexec/../bin/hadoop job -kill job_202003192037_0004 Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 1 2020-03-19 21:10:13,220 Stage-2 map = 0%, reduce = 0% 2020-03-19 21:10:23,341 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 5.42 sec 2020-03-19 21:10:32,465 Stage-2 map = 100%, reduce = 33%, Cumulative CPU 5.42 sec 2020-03-19 21:10:35,559 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 8.74 sec MapReduce Total cumulative CPU time: 8 seconds 740 msec Ended Job = job_202003192037_0004 MapReduce Jobs Launched: Job 0: Map: 1 Reduce: 1 Cumulative CPU: 15.25 sec HDFS Read: 108973054 HDFS Write: 5094362 SUCCESS Job 1: Map: 1 Reduce: 1 Cumulative CPU: 8.74 sec HDFS Read: 5094820 HDFS Write: 104 SUCCESS Total MapReduce CPU Time Spent: 23 seconds 990 msec OK user_id order_cnt 106879 100 3377 100 183036 100 96577 100 194931 100 66482 100 109020 100 12166 100 139897 100 99805 100 Time taken: 74.499 seconds, Fetched: 10 row(s)
3、
因为orders表中只有用户和订单的数据,需要关联priors或者trains表,才能获得到订单的数据。因为trains表中的数据量比较少,但是trains中因为是作为标签的数据,只有一个订单的数据。 可以取部分的priors来作为进行代码调试计算。加`limit` ```sql select ord.user_id,avg(pri.products_cnt) as avg_prod from (select order_id,user_id from orders)ord join (select order_id,count(1) as products_cnt from priors group by order_id)pri on ord.order_id=pri.order_id group by ord.user_id limit 10;
4、#### 每个用户在一周中的购买订单的分布
hive> select > user_id, > sum(case order_dow when '0' then 1 else 0 end) as dow_0, > sum(case order_dow when '1' then 1 else 0 end) as dow_1, > sum(case order_dow when '2' then 1 else 0 end) as dow_2, > sum(case order_dow when '3' then 1 else 0 end) as dow_3, > sum(case order_dow when '4' then 1 else 0 end) as dow_4, > sum(case order_dow when '5' then 1 else 0 end) as dow_5, > sum(case order_dow when '6' then 1 else 0 end) as dow_6 > from orders > group by user_id > limit 20; Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks not specified. Estimated from input data size: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapreduce.job.reduces=<number> Starting Job = job_1584680108277_0002, Tracking URL = http://master:8088/proxy/application_1584680108277_0002/ Kill Command = /usr/local/src/hadoop-2.6.1/bin/hadoop job -kill job_1584680108277_0002 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2020-03-19 22:28:14,095 Stage-1 map = 0%, reduce = 0% 2020-03-19 22:28:44,411 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 19.47 sec 2020-03-19 22:28:59,770 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 22.56 sec MapReduce Total cumulative CPU time: 22 seconds 560 msec Ended Job = job_1584680108277_0002 MapReduce Jobs Launched: Job 0: Map: 1 Reduce: 1 Cumulative CPU: 22.56 sec HDFS Read: 108968864 HDFS Write: 414 SUCCESS Total MapReduce CPU Time Spent: 22 seconds 560 msec OK user_id dow_0 dow_1 dow_2 dow_3 dow_4 dow_5 dow_6 1 0 3 2 2 4 0 0 10 1 0 1 2 0 2 0 100 1 1 0 2 0 2 0 1000 4 0 1 1 0 0 2 10000 15 12 10 7 9 9 11 100000 2 1 0 4 1 0 2 100001 4 15 17 13 6 9 3 100002 0 3 0 0 3 5 2 100003 0 0 0 0 0 3 1 100004 1 2 2 2 0 2 0 100005 3 5 1 2 6 1 1 100006 5 2 1 1 3 2 0 100007 0 0 1 1 2 3 0 100008 2 5 8 4 3 2 5 100009 4 3 1 0 0 1 0 10001 12 7 2 0 0 1 1 100010 3 2 0 1 1 2 3 100011 3 4 3 4 4 0 1 100012 0 23 2 1 0 0 0 100013 10 3 6 2 7 4 6 Time taken: 59.967 seconds, Fetched: 20 row(s) hive>
来源:https://www.cnblogs.com/hackerer/p/12531145.html