I need my Java program to take a string like:
\"This is a sample sentence.\"
and turn it into a string array like:
{\"this\
String.split() will do most of what you want. You may then need to loop over the words to pull out any punctuation.
For example:
String s = "This is a sample sentence.";
String[] words = s.split("\\s+");
for (int i = 0; i < words.length; i++) {
// You may want to check for a non-word character before blindly
// performing a replacement
// It may also be necessary to adjust the character class
words[i] = words[i].replaceAll("[^\\w]", "");
}
Most of the answers here convert String to String Array as the question asked. But Generally we use List , so more useful will be -
String dummy = "This is a sample sentence.";
List<String> wordList= Arrays.asList(dummy.split(" "));
Try this:
String[] stringArray = Pattern.compile("ian").split(
"This is a sample sentence"
.replaceAll("[^\\p{Alnum}]+", "") //this will remove all non alpha numeric chars
);
for (int j=0; i<stringArray .length; j++) {
System.out.println(i + " \"" + stringArray [j] + "\"");
}
Following is a code snippet which splits a sentense to word and give its count too.
import java.util.HashMap;
import java.util.Iterator;
import java.util.Map;
public class StringToword {
public static void main(String[] args) {
String s="a a a A A";
String[] splitedString=s.split(" ");
Map m=new HashMap();
int count=1;
for(String s1 :splitedString){
count=m.containsKey(s1)?count+1:1;
m.put(s1, count);
}
Iterator<StringToword> itr=m.entrySet().iterator();
while(itr.hasNext()){
System.out.println(itr.next());
}
}
}
Now, this can be accomplished just with split
as it takes regex:
String s = "This is a sample sentence with []s.";
String[] words = s.split("\\W+");
this will give words as: {"this","is","a","sample","sentence", "s"}
The \\W+
will match all non-alphabetic characters occurring one or more times. So there is no need to replace. You can check other patterns also.
string.replaceAll() doesn't correctly work with locale different from predefined. At least in jdk7u10.
This example creates a word dictionary from textfile with windows cyrillic charset CP1251
public static void main (String[] args) {
String fileName = "Tolstoy_VoinaMir.txt";
try {
List<String> lines = Files.readAllLines(Paths.get(fileName),
Charset.forName("CP1251"));
Set<String> words = new TreeSet<>();
for (String s: lines ) {
for (String w : s.split("\\s+")) {
w = w.replaceAll("\\p{Punct}","");
words.add(w);
}
}
for (String w: words) {
System.out.println(w);
}
} catch (Exception e) {
e.printStackTrace();
}