问题
I have a text file which looks like:
name1
1 0 1 0 1
0 1 1 1 0
0 0 0 0 0
name2
1 0 1 0 1
0 0 1 1 0
0 0 0 0 1
i.e., a plaintext label followed by a few rows with 1/0 separated by spaces. The number of rows of 1/0 is variable, but each row between any two particular labels should have the same number of 1/0s (though might potentially not).
How do I grab each name+rows chunk with a scanner? Is there any elegant way to enforce the consistency on the number of rows (and provide some sort of feedback if they aren't consistent)?
I'm thinking there might be a convenient way with clever delimiter specification, but I can't seem to get that working.
回答1:
I would do it the simple way. Grab each line as a String
, and feed it through, say, a regular expression that matches the 1-or-0-followed-by-space pattern. If it matches, treat it like a row. If not, treat it like a plaintext label. Check for the row-column-size consistency after the fact by checking that every label's array of data matches the size of the first label's array of data.
EDIT: I wasn't aware of the Scanner
class, although it sounds handy. I think the essential idea should still be roughly the same...use the Scanner
to parse your input, and handle the question of the sizes yourself.
Also, in theory, you could produce a regular expression that would match the label and the entire array, although I don't know if you can produce one that will guarantee that it only matches sets of lines with the same number of values in each row. But then, to set up more automated checking, you'd probably need to construct a second regular expression that exactly matches the array size of the first entry, and use it for all the others. I think this is a case where the cure is worse than the disease.
回答2:
Even better, after a helpful answer to another question (thanks Bart):
static final String labelRegex="^\\s*\\w+$";
static final Pattern labelPattern = Pattern.compile(labelRegex, Pattern.MULTILINE);
Matcher labelMatcher = labelPattern.matcher("");
static final String stateRegex = "([10] )+[10]\\s+";
static final String statesRegex = "("+stateRegex+")+";
static final Pattern statesPattern = Pattern.compile(statesRegex, Pattern.MULTILINE);
Matcher stateMatcher = statesPattern.matcher("");
static final String chunkRegex = "(?="+labelRegex+")";
static final Pattern chunkPattern = Pattern.compile(chunkRegex,Pattern.MULTILINE);
Scanner chunkScan;
public void setSource(File source) {
if(source!=null && source.canRead()) {
try {
chunkScan = new Scanner(new BufferedReader(new FileReader(source)));
chunkScan.useDelimiter(chunkPattern);
} catch (IOException e) {
e.printStackTrace();
}
}
}
public Map<String, List<GraphState>> next(int n) {
Map<String,List<GraphState>> result = new LinkedHashMap<String,List<GraphState>>(n);
String chunk, rows;
int i=0;
while (chunkScan.hasNext()&&i++<n) {
chunk = chunkScan.next().trim();
labelMatcher.reset(chunk);
stateMatcher.reset(chunk);
if (labelMatcher.find()&&stateMatcher.find()) {
rows = stateMatcher.group().replace(" ", "");
result.put(labelMatcher.group(), rowsToList(rows.split("\\n")));
}
}
return result;
}
回答3:
You would need to open the file and loop through every line with readLine() until you hit the end of the file.
-- I assumed you are doing consistency as you traverse the file. If you want to store the information and use it later, I would consider using some type of data structure.
As you traverse this, you can check the row with a simple regex to check if it is a label name. If not, split the row based on the ' ' (space character) and it will return to you in an array. Then check the size based on a consistent size.
Basic pseudocode:
int consistentSize = 5; // assume you have a size in mind
while ( (line = readLine()) != EOF)
{
// check for if label, if it's a simple name, you won't really need a regex
if (line == label)
{
// not sure if you want to do any consistency checking in here
} else {
String[] currLine = line.split(' ');
bool consist = true;
// now loop through currLine and do a check if each character is a number
for (int i = 0; i < currLine.size(); i++)
{
// can't remember java function for this (isNum() I think)
if (!currLine[i].isNum) { consist = false; break; }
}
// if got past this, the row has all numbers, therefore it is ok
// could easily add another array to keep track of rows that didn't have valid numbers and suhc
if (currLine.size() < consistentSize) System.out.println("row "+j + " is inconsistent");
}
}
You could also add another loop if you don't know the size you expect for each row and put some logic in to find the most common size and then figure out what doesn't match. I am unsure of how complicated your consistency checking needs to be.
来源:https://stackoverflow.com/questions/1545022/java-scanner-headache