Error whilst using StringTokenizer on text file with multiple lines

前端未结

关注

 4  1204

I\'m trying to read a text file and split the words individually using string tokenizer utility in java.

The text file looks like this;


                      
              相关标签:


      
      
        
          4条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  一整个雨季        
                
              
                            
                2021-01-07 11:41
              
            
            
                                                                       
You need to use hasMoreTokens() method. Also addressed various coding standard issues in your code as pointed out by JB Nizet

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.ArrayList;
import java.util.StringTokenizer;

public class TestStringTokenizer {

    /**
     * @param args
     * @throws IOException 
     */
    public static void main(String[] args) throws IOException {
        String fileSpecified = args[0];

        fileSpecified = fileSpecified.concat(".txt");
        String line;
        System.out.println ("file Specified = " + fileSpecified);

        ArrayList <String> words = new ArrayList<String> ();

        BufferedReader br =  new BufferedReader (new FileReader (fileSpecified));
        try{
            while ((line  = br.readLine()) != null) {
                StringTokenizer token = new StringTokenizer (line);
                while(token.hasMoreTokens())
                    words.add(token.nextToken());
            }
        } catch (IOException e) {
            System.out.println (e.getMessage());
            e.printStackTrace();
        } finally {
            br.close();
        }

        for (int i = 0; i < words.size(); i++) {
            System.out.println ("words = " + words.get(i));
        }
    }
}

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  走了就别回头了        
                
              
                            
                2021-01-07 11:52
              
            
            
                                                                       
a) You always have to check StringTokenizer.hasMoreTokens() first. Throwing NoSuchElementException is the documented behaviour if no more tokens are available:

token = new StringTokenizer (line);
while(token.hasMoreTokens())
    words.add(token.nextToken());


b) don't create a new Tokenizer for every line, unless your file is too large to fit into memory. Read the entire file to a String and let the tokenizer work on that
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  谎友^        
                
              
                            
                2021-01-07 11:56
              
            
            
                                                                       
This problem is due to the fact that you don't test if there is a next token before trying to get the next token. You should always test if hasMoreTokens() before returns true before calling nextToken().

But you have other bugs :


The first line is read, but not tokenized
You only add the first word of each line to your list of words
bad practice : the token variable should be declared inside the loop, and not outside
you don't close your reader in a finally block

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  南旧        
                
              
                            
                2021-01-07 12:03
              
            
            
                                                                       
Your general approach seems sound, but you have a basic problem in your code.

Your parser is most likely failing on the second line of your input file.  This line is a blank line, so when you call words.add(token.nextToken()); you get an error, because there are no tokens.  This also means you'll only ever get the first token on each line.

You should iterate on the tokes like this:

while(token.hasMoreTokens())
{
    words.add(token.nextToken())
}


You can find a more general example in the javadocs here:

http://download.oracle.com/javase/1.4.2/docs/api/java/util/StringTokenizer.html
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复