I have some arraylist string with keyword inside like that !
A windows is arraylist string with keyword is bold Struct of window : 9 words before + keyword + 9 words after
You can see some window overlaping alt text
How to i combine that arraylist to receive like that :
alt text
Thanks
-
1Please clarify your question and accept answers to some of your previous questions if you would like answers to this question.Michael Aaron Safyan– Michael Aaron Safyan2010年05月25日 02:39:56 +00:00Commented May 25, 2010 at 2:39
-
Thanks ! I have done, sorry i do't know about it !tiendv– tiendv2010年05月25日 02:53:52 +00:00Commented May 25, 2010 at 2:53
2 Answers 2
If you're not too worried about performance, a simple subList/equals
matching is very easy to write:
String[] texts = {
"sunset lake michigan michigan alaska water florida "
+ "peninsula third largest water seventh largest water "
+ "percentage edit list largest country",
"michigan alaska water florida peninsula third largest water "
+ "seventh largest water percentage edit list largest country "
+ "subdivision list political",
"third largest water seventh largest water percentage edit list "
+ "largest country subdivision list political geographic "
+ "subdivisions total edit references"
};
List<String> joined = new ArrayList<String>();
for (String text : texts) {
List<String> textAsList = Arrays.asList(text.split(" "));
final int N = joined.size();
final int M = textAsList.size();
for (int k = Math.min(N, M); k >= 0; k--) {
if (joined.subList(N - k, N).equals(textAsList.subList(0, k))) {
joined.addAll(textAsList.subList(k, M));
break;
}
}
}
System.out.println(joined);
This prints:
[sunset, lake, michigan, michigan, alaska, water, florida,
peninsula, third, largest, water, seventh, largest, water,
percentage, edit, list, largest, country, subdivision, list,
political, geographic, subdivisions, total, edit, references]
The algorithm works as it says: to build List<String> joined
, given a List<String> textAsList
, we find the longest subList
matching between the "tail" of joined
and the "head" of textAsList
.
-
I have document with some keyword : My goal is find some textchunks have keyword inside : To do this a have some step : Fist : i find a window is arraylist string content : 9 words befor keyword + keyword + 9 words after keyword. for Each keyword. after this step : we have some window (arraylist string ) of keyword , window may be overlaping Second i must combine overlaping windown after this step will have textchunks that content keyword . Problem here is if windown do't ovelaping , it still add to ListString joined . I have some keyword after jon i must recive Thanks in advandcetiendv– tiendv2010年05月25日 08:09:02 +00:00Commented May 25, 2010 at 8:09
-
@tiendv: I have no idea what you're trying to say. Edit the question with more information for the benefit of everyone trying to help. Give examples to illustrate the different cases. The more the better. Also give bounds on operating parameters, because the best algorithm for this would be quite complicated, but an easy but practical solution seems to exist.polygenelubricants– polygenelubricants2010年05月25日 08:18:44 +00:00Commented May 25, 2010 at 8:18
See How to Use Editor Panes and Text Panes and these examples using DefaultHighlighter
.
Addendum: Ah, I thought you just needed the view. For the model, consider the Knuth–Morris–Pratt algorithm, discussed in this answer.
-
I don't mention about how to show it and combine it in screen. I mean i have some arrayList string like win1,win2, win3 . How can i combine win if it have overlaping ! Thankstiendv– tiendv2010年05月25日 03:18:02 +00:00Commented May 25, 2010 at 3:18
-
@tiendv: Amended, but it looks like @polygenelubricants may have a good idea; this problem reminds me of matching overlapping gene sequences.trashgod– trashgod2010年05月25日 03:41:57 +00:00Commented May 25, 2010 at 3:41
-
@thrasgod: yes, it does remind me of that too, and I was about to suggest that the most state-of-the-art algorithm probably involves suffix tree/suffix arrays.polygenelubricants– polygenelubricants2010年05月25日 03:51:46 +00:00Commented May 25, 2010 at 3:51