1

I am looking for a Java 5 lbrary which let me compare some text as following lines returns true:

  • " foo bar " == "foo bar"
  • "foo\nbar" == "foo bar"
  • "foo\tbar" == "foo bar"
  • "féé bar" == "fee bar"
  • and so on...

Any suggestions?

asked Jan 6, 2010 at 14:26
9
  • Yes, like a regex. But all-in-one. Commented Jan 6, 2010 at 14:29
  • Is Java's String.matches() not suffice? Commented Jan 6, 2010 at 14:34
  • 2
    Just as a brief note, when comparing objects (such as String) use the equals method rather than the == operator. The == operator will compare object references. Commented Jan 6, 2010 at 14:37
  • 1
    Why in the world would, "féé bar" == "fee bar" ever be equal? Commented Jan 6, 2010 at 14:37
  • @coobird: yes of course, this is not java syntax I wanted to use but a syntax to explain that it is equal. Commented Jan 6, 2010 at 14:38

5 Answers 5

1

I don't think you'll find a library with these specific rules. You'll have to code them yourself. For some of the rules, regular expressions or even the String framework methods can be useful, but, for the last rule, I think you'll have to keep a Map of equality for those special chars. Then, you'll have to iterate through each char in the string comparing them using this Map. And, since you're iterating already through the string maybe you could apply all the rules in one iteration, avoiding regular expressions.

answered Jan 6, 2010 at 14:46
Sign up to request clarification or add additional context in comments.

1 Comment

There is no perfect answer, but yours is exhaustive. But instead of a map, I use a set of char in regex (f.i. [êéèë]) with replaceAll.
1
answered Jan 6, 2010 at 14:31

Comments

1

Sounds like you want to write a method to "normalize" your strings according to your rules, before comparing them. Use trim for the first rule, a number of replace, or maybe StringUtils.replaceChars(), for the others.

answered Jan 6, 2010 at 15:16

Comments

1

It doesn't have your specified functionality directly, but you may also be able to use the CharMatcher functions found in the google-guava library: http://code.google.com/p/guava-libraries/

answered Jan 6, 2010 at 15:56

Comments

1

There appear to be functions in the ICU library to remove diacritical marks:

http://site.icu-project.org/

The rest you can probably do with one or more regular expressions.

answered Jan 6, 2010 at 16:44

1 Comment

Shoot, in JDK 1.6, you can use java.text.Normalizer to remove the diacriticals! Previously this was a Sun internal class.

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.