问题
I have a raw and low level text data file that contains Online Log data. I need to arrange these low level raw data and export the arranged data into a .csv file.
The sample raw data is something like in the below. In the data, evendid
is a column name and 0f3f98c7-1cee-4c1a-bc9219b
is its field value. In the same way, visitiorid
is also a column name and "01546981644d001e0f99d341182e" is its field value. So, we can see, column name and filed value is separated by : (colon) and each column is separated by ,
(comma). And one record is started by starting curly bracket {, and ends with end curly bracket }. In addition, each row / record contain 120 to 167 columns value. But some columns may contain empty value. So, I would like to write a program to arrange / clean this data from a .txt file and write into a .csv file. Any idea and support would be appreciate highly.
{ "eventid" : "0f3f98c7-1cee-4c1a-bad9-c5d772c9219b" , "visitorid" : "015469816482e00095002e08d007f0" , "eventtime" : 1462059242000 , "useragent" : "Mozilla/5.0 (Linux; U; Android 4.2.2; ca-ca; SonySO-04E Build/10.3.1.B.0.224) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30" , "pageurl_full_url" : "https://www.abcdefg.com/auto/v22/renewal/calculatePremium.html" , "pageurl_scheme" : "https" , "pageurl_domain" : "www.abcdefg.com" , "pageurl_path" : "/auto/abc22/renewal/calculatePremium.html" , "referrerurl_full_url" : "https://www.abcdefg.com.jp/auto/v22/renewal/calculatePremium.html" , "referrerurl_scheme" : "https" , "abcdefg_domain" : "www.abcdefg.com" , "referrerurl_path" : "/auto/abc22/renewal/calculatePremium.html" , "tags_main_4_executed" : true , "tags_main_16_executed" : true , "dom_title" : "super Car Ins [quote results For Renew]" , "dom_referrer" : "" , "dom_hash" : "" , "dom_domain" : "www.abcdefg.com" , "dom_viewport_width" : 720 , "dom_viewport_height" : 1030 , "dom_pathname" : "/auto/abc22/renewal/calculatePremium.html" , "dom_query_string" : "" , "dom_url" : "https://www.abcdefg.com/auto/abc22/renewal/calculatePremium.html" , "udo_quote_date" : "2016.5.1" , "udo_page_url" : "https://www.abcdefg.com/auto/abc22/renewal/calculatePremium.html" , "udo_ut_version" : "ut4.38.201604270626" , "udo_prod_id" : "ACD" , "udo_quote_expiry_date" : "2017.05.14" , "udo_quote_prev_expiry_date" : "2017.05.14" , "udo_ut_account" : "abcdefg-india" , "udo_page_cat" : "Product" , "udo_contract_paytype" : "" , "udo_quote_amt" : "71290,71690,72080" , "udo_quote_id" : "175545859609000,175545859609000,175545859609000" , "udo_client_id" : "911324977090000" , "udo_renewal_times" : "4" , "udo_prod_name" : "Renewal" , "udo_ut_profile" : "main" , "udo_device_id" : "Mobile" , "udo_contract_id" : "26771063" , "udo_ut_event" : "view" , "udo_ut_env" : "prod" , "udo__t_session_id" : "1462058968141" , "udo__t_visitor_id" : "01546981644d001e0f99d341182e00095002e08d007f0" , "udo_page_type" : "Quote" , "udo_ut_domain" : "abcdefg.com" , "udo_contract_amt" : "" , "udo_page_name" : "super car insurance quote [quote results For Renew]" , "udo_quote_pre_ins_company" : "abcInsCompany" , "js_timestamp" : "2016-04-30T23:34:02.612Z" , "firstpartycookies_utag_main_dc_event" : "15" , "firstpartycookies_utag_main__ss" : "0" , "firstpartycookies_sc_status" : "8" , "firstpartycookies_utag_quote_date" : "2016.5.1" , "firstpartycookies_utag_contract_id" : "26771063" , "firstpartycookies_utag_main__st" : "1462061042574" , "firstpartycookies_utag_main_dc_visit" : "1" , "firstpartycookies_utag_page_cat" : "Product" , "firstpartycookies_utag_prod_id" : "ACD" , "firstpartycookies_utag_renewal_times" : "4" , "firstpartycookies_utag_quote_expiry_date" : "2017.05.14" , "firstpartycookies_utag_page_type" : "Quote" , "firstpartycookies_uniqueid" : "954704708970000" , "firstpartycookies_sc_asp_net_sessionid" : "ih3l00llymb2ml4dwzzywdrv" , "firstpartycookies__gat_tealium_0" : "1" , "firstpartycookies_utag_main_v_id" : "01546981644d001e0f99d341182e00095002e08d007f0" , "firstpartycookies_session" : "MTAuNjguMS4zMg==" , "firstpartycookies_utag_prod_name" : "Renewal" , "firstpartycookies_utag_quote_id" : "175545859609000,175545859609000,175545859609000" , "firstpartycookies_utag_main_ses_id" : "1462058968141" , "firstpartycookies__ga" : "GA1.3.987054064.1462058970" , "firstpartycookies_utag_main__sn" : "1" , "firstpartycookies___gyr_casted_frames" : "L_E_2170_A39,L_E_1985_A39,L_E_1881_A39" , "firstpartycookies_jsessionid" : "0001oiKCaYAKZYSnX6JqmhIrOo0:15fkukn73" , "firstpartycookies_utag_main__pn" : "8" , "firstpartycookies_utag_quote_amt" : "71290,71690,72080" , "firstpartycookies___gyr_uuid" : "7e4f3633-0eff-4e0e-98b6-c8c544cbb375" , "firstpartycookies___gyr_sid" : "2ce7772f-0c95-455f-afc4-241a5541c47c" , "firstpartycookies_utag_quote_prev_expiry_date" : "2017.05.14" , "firstpartycookies_utag_client_id" : "911324977090000" , "firstpartycookies_utag_quote_pre_ins_company" : "abcInsuranceCompany" , "firstpartycookies_utag_device_id" : "Mobile" , "firstpartycookies___gyr_cmpcnts" : "L_E_2170_A39:[825:1],L_E_1985_A39:[1011:1],L_E_1881_A39:[1022:1]" , "firstpartycookies___gyr_depid" : "14326,17172,17292" , "firstpartycookies___gyr_rule_id_myabcdefg" : "1079"}
{ "eventid" : "f8c8beac-d8ce-4930-956e-79c6120aea65" , "visitorid" : "0154698511eb0019161e632df605020a9007a0a100bd0" , "eventtime" : 1462059246000 , "useragent" : "Mozilla/5.0 (Windows NT 6.1; Trident/7.0; rv:11.0) like Gecko" , "pageurl_full_url" : "https://www.abcdefg.com/" , "pageurl_scheme" : "https" , "pageurl_domain" : "www.abcdefg.com" , "pageurl_path" : "/" , "referrerurl_full_url" : "https://www.abcdefg.com/" , "referrerurl_scheme" : "https" , "referrerurl_domain" : "www.abcdefg.com" , "referrerurl_path" : "/" , "tags_main_4_executed" : true , "tags_main_15_executed" : true , "tags_main_16_executed" : true , "tags_main_61_executed" : true , "dom_title" : "【Hoken】Adv Site|Auto Insurance・care of" , "dom_referrer" : "http://search.abcdefg.com/search;_ylt=A3xTqFmsQCVXu08AhwiJBtF7?p=%E3%83%81%E3%83%A5%E3%83%BC%E3%83%AA%E3%83%83%E3%83%92&search.x=1&fr=top_ga1_sa&tid=top_ga1_sa&ei=UTF-8&aq=0&oq=%E3%81%A1%E3%82%85%E3%81%86%E3%82%8A%E3%81%A3%E3%81%B2&afs=" , "dom_hash" : "" , "dom_domain" : "www.abcdefg.com" , "dom_viewport_width" : 1912 , "dom_viewport_height" : 955 ,
回答1:
The ndjson
package can handle this. I made a file out of three of these records but if there are missing columns in various rows it'll make them NA
.
library(dtplyr)
library(ndjson)
glimpse(stream_in("so.json"))
I can't show the output since StackOverflow isn't bright enough to not recognize it as spam.
You can also use the slower jsonlite::stream_in()
but you'll have to flatten the records yourself.
来源:https://stackoverflow.com/questions/39380668/raw-data-cleaning-by-r