Get date from string by splitting

混江龙づ霸主 提交于 2019-12-12 01:36:24

问题


I have a batch of raw text files. Each file begins with Date>>month.day year News garbage.

garbage is a whole lot of text I don't need, and varies in length. The words Date>> and News always appear in the same place and do not change.

I want to copy month day year and insert this data into a CSV file, with a new line for every file in the format day month year.

How do I copy month day year into separate variables?

I tryed to split a string after a known word and before a known word. I'm familiar with string[x:y], but I basically want to change x and y from numbers into actual words (i.e. string[Date>>:News])

import re, os, sys, fnmatch, csv
folder = raw_input('Drag and drop the folder > ')
for filename in os.listdir(folder):
# First, avoid system files
if filename.startswith("."):
    pass
else:
    # Tell the script the file is in this directory and can be written
    file = open(folder+'/'+filename, "r+")
    filecontents = file.read()
    thestring = str(filecontents)
    print thestring[9:20]

An example text file:

Date>>January 2. 2012 News 122

5 different news agencies have reported the story of a man washing his dog.

回答1:


Here's a solution using the re module:

import re

s = "Date>>January 2. 2012 News 122"
m = re.match("^Date>>(\S+)\s+(\d+)\.\s+(\d+)", s)
if m:
   month, day, year = m.groups()
   print("{} {} {}").format(month, day, year)

Outputs:

January 2 2012

Edit:

Actually, there's another nicer (imo) solution using re.split described in the link Robin posted. Using that approach you can just do:

month, day, year = re.split(">>| |\. ", s)[1:4]



回答2:


You can use the string method .split(" ") to separate the output into a list of variables split at the space character. Because year and month.day will always be in the same place you can access them by their position in the output list. To separate month and day use the .split function again, but this time for .

Example:

list = theString.split(" ")
year = list[1]
month= list[0].split(".")[0]
day = list[0].split(".")[1]



回答3:


You could use string.split:

x = "A b c"
x.split(" ")

Or you could use regular expressions (which I see you import but don't use) with groups. I don't remember the exact syntax off hand, but the re is something like r'(.*)(Date>>)(.*). This re searches for the string "Date>>" in between two strings of any other type. The parentheses will capture them into numbered groups.



来源:https://stackoverflow.com/questions/23880285/get-date-from-string-by-splitting

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!