Skip to main content
Code Review

Return to Revisions

2 of 2
added 434 characters in body; edited title
Jamal
  • 35.2k
  • 13
  • 134
  • 238

Build a sentence from tokens / words in a String-Array

I'm facing an interesting issue at the moment:

My Situation:

I'm having (in Java) String-Arrays like the following (more complicated, of course). Each String-Array represents one sentence (I cant change the representation):

String[] tokens = {"This", "is", "just", "an", "example", "."};

My Problem:

I want to rebuild the original sentences from this String-Arrays. This doesn't sound that hard at first, but becomes really complex since sentence structure can have many cases. Sometimes you need whitespaces and sometimes you don't.

My Approach:

I've implemented a method that should do most of the tasks, which means rebuilding a sentence from the original String-Array. As you can see, it's very complex and complicated already, but works "okay" for the moment - I don't know how to improve it at the moment.

public static String detokenize(String[] tokens) {
 StringBuilder sentence = new StringBuilder();
 boolean sentenceInQuotation = false; 
 boolean firstWordInQuotationSentence = false;
 boolean firstWordInParenthisis = false;
 boolean date = false;
 
 for (int i = 0; i < tokens.length; i++) {
 
 if (tokens[i].equals(".") || tokens[i].equals(";") || tokens[i].equals(",") || tokens[i].equals("?") || tokens[i].equals("!")) {
 sentence.append(tokens[i]);
 }
 else if(tokens[i].equals(":")){
 Pattern p = Pattern.compile("\\d");
 Matcher m = p.matcher(tokens[i-1]);
 if(m.find() == true){
 date = true;
 }
 sentence.append(tokens[i]);
 }
 else if(tokens[i].equals("(")){
 sentence.append(" ");
 sentence.append(tokens[i]);
 firstWordInParenthisis = true;
 }
 else if (tokens[i].equals(")")) {
 sentence.append(tokens[i]);
 firstWordInParenthisis = false;
 } 
 else if(tokens[i].equals("\"")){
 if(sentenceInQuotation == false){
 sentence.append(" ");
 sentence.append(tokens[i]);
 sentenceInQuotation = true;
 firstWordInQuotationSentence = true;
 }
 else if(sentenceInQuotation == true){
 sentence.append(tokens[i]);
 sentenceInQuotation = false;
 }
 }
 else if (tokens[i].equals("&") || tokens[i].equals("+") || tokens[i].equals("=")) {
 sentence.append(" ");
 sentence.append(tokens[i]);
 } 
 //words
 else {
 if(sentenceInQuotation == true){
 if(firstWordInQuotationSentence == true){
 sentence.append(tokens[i]);
 firstWordInQuotationSentence = false;
 }
 else if(firstWordInQuotationSentence == false){
 if(firstWordInParenthisis == true){
 sentence.append(tokens[i]);
 firstWordInParenthisis = false;
 }
 else if(firstWordInParenthisis == false){
 sentence.append(" ");
 sentence.append(tokens[i]);
 }
 }
 }
 else if(firstWordInParenthisis == true){
 sentence.append(tokens[i]);
 firstWordInParenthisis = false;
 }
 else if(date == true){
 sentence.append(tokens[i]);
 date = false;
 }
 else if(sentenceInQuotation == false){
 sentence.append(" ");
 sentence.append(tokens[i]);
 }
 }
 }
 return sentence.toString().replaceFirst(" ", "");
}

As I said, this works quite good, but not perfect. I suggest you try my method with copy/paste and see it on your own.

Do you have ANY ideas or a better solution for my problem?

Examples:

For example, as I just tried some texts out I noticed that I don't yet check about tokens like "[", "]", or e.g. the different types of quotations, " or ". I also heard that it can make a different if if use ... (three points) or one ... unicode sign (mark it and you'll see it). So it becomes more and more complex.

user1293755
  • 153
  • 1
  • 1
  • 4
lang-java

AltStyle によって変換されたページ (->オリジナル) /