问题
For a current research project, I am trying to slice a JSON file into different time intercepts. Based on the object "Date", I want to analyse content of the JSON file by quarter, i.e. 01 January - 31 March, 01 April - 20 June etc.
The code would ideally have to pick the oldest date in the file and add quarterly time incercepts on top of that. I have done research on this point but not found any helpful methods yet.
Is there any smart way to include this in the code? The JSON file has the following structure:
[
{"No":"121","Stock Symbol":"A","Date":"05/11/2017","Text Main":"Sample text"}
]
And the existing relevant code excerpt looks like this:
import pandas as pd
file = pd.read_json (r'Glassdoor_A.json')
data = json.load(file)
# Create an empty dictionary
d = dict()
# processing:
for row in data:
line = row['Text Main']
# Remove the leading spaces and newline character
line = line.strip()
# Convert the characters in line to
# lowercase to avoid case mismatch
line = line.lower()
# Remove the punctuation marks from the line
line = line.translate(line.maketrans("", "", string.punctuation))
# Split the line into time intervals
line.sort_values(by=['Date'])
line.tshift(d, int = 90, freq=timedelta, axis='Date')
# Split the line into words
words = line.split(" ")
# Iterate over each word in line
for word in words:
# Check if the word is already in dictionary
if word in d:
# Increment count of word by 1
d[word] = d[word] + 1
else:
# Add the word to dictionary with count 1
d[word] = 1
# Print the contents of dictionary
for key in list(d.keys()):
print(key, ":", d[key])
# Count the total number of words
total = sum(d.values())
print(d[key], total)
回答1:
Please find below the solution to the question. The data can be sliced with Pandas by allocating a start and an end date and comparing the JSON Date
object with these dates.
Important note: the data must be normalised and dates have to be converted into a Pandas datetime format before processing the information.
import string
import json
import csv
import pandas as pd
import datetime
import numpy as np
# Loading and reading dataset
file = open("Glassdoor_A.json", "r")
data = json.load(file)
df = pd.json_normalize(data)
df['Date'] = pd.to_datetime(df['Date'])
# Create an empty dictionary
d = dict()
# Filtering by date
start_date = "01/01/2018"
end_date = "31/03/2018"
after_start_date = df["Date"] >= start_date
before_end_date = df["Date"] <= end_date
between_two_dates = after_start_date & before_end_date
filtered_dates = df.loc[between_two_dates]
print(filtered_dates)
来源:https://stackoverflow.com/questions/61620061/slice-json-file-into-different-time-intercepts-with-python