Slice JSON File into Different Time Intercepts with Python

心不动则不痛 提交于 2021-01-29 15:53:09

问题


For a current research project, I am trying to slice a JSON file into different time intercepts. Based on the object "Date", I want to analyse content of the JSON file by quarter, i.e. 01 January - 31 March, 01 April - 20 June etc.

The code would ideally have to pick the oldest date in the file and add quarterly time incercepts on top of that. I have done research on this point but not found any helpful methods yet.

Is there any smart way to include this in the code? The JSON file has the following structure:

[
{"No":"121","Stock Symbol":"A","Date":"05/11/2017","Text Main":"Sample text"}
]

And the existing relevant code excerpt looks like this:

import pandas as pd

file = pd.read_json (r'Glassdoor_A.json')
data = json.load(file)

# Create an empty dictionary
d = dict()

# processing:
for row in data:
    line = row['Text Main']
    # Remove the leading spaces and newline character
    line = line.strip()

    # Convert the characters in line to
    # lowercase to avoid case mismatch
    line = line.lower()

    # Remove the punctuation marks from the line
    line = line.translate(line.maketrans("", "", string.punctuation))

    # Split the line into time intervals
    line.sort_values(by=['Date'])
    line.tshift(d, int = 90, freq=timedelta, axis='Date')

    # Split the line into words
    words = line.split(" ")

    # Iterate over each word in line
    for word in words:
        # Check if the word is already in dictionary
        if word in d:
            # Increment count of word by 1
            d[word] = d[word] + 1
        else:
            # Add the word to dictionary with count 1
            d[word] = 1

# Print the contents of dictionary
for key in list(d.keys()):
    print(key, ":", d[key])

    # Count the total number of words
    total = sum(d.values())
    print(d[key], total)

回答1:


Please find below the solution to the question. The data can be sliced with Pandas by allocating a start and an end date and comparing the JSON Date object with these dates.

Important note: the data must be normalised and dates have to be converted into a Pandas datetime format before processing the information.

import string
import json
import csv

import pandas as pd
import datetime

import numpy as np


# Loading and reading dataset
file = open("Glassdoor_A.json", "r")
data = json.load(file)
df = pd.json_normalize(data)
df['Date'] = pd.to_datetime(df['Date'])


# Create an empty dictionary
d = dict()


# Filtering by date
start_date = "01/01/2018"
end_date = "31/03/2018"

after_start_date = df["Date"] >= start_date
before_end_date = df["Date"] <= end_date

between_two_dates = after_start_date & before_end_date
filtered_dates = df.loc[between_two_dates]

print(filtered_dates)


来源:https://stackoverflow.com/questions/61620061/slice-json-file-into-different-time-intercepts-with-python

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!