Been trying to extract only accent characters[a particular word] from a multiple text files in a folder. Don't want to remove or convert accent characters to normal characters but print only those characters which are accent in multiple text files and mixed files which has both accent[words] and normal characters. in JAVA
**only to extract all accent specific words. ** after searching and exploring for a while this a link below is a type of one solution, similar regex but doesn't work as required also select null values and normal characters. Regex accented Characters for special field
another solution found for that is ([a-zA-Z]|[à-ü]|[À-Ü]) it selects each letter separately not feasible as it not word specific and also selects both normal and accent.
1 Answer 1
If you want to match word that contains the accent letter you need to go with something like:
[a-zA-Zà-üÀ-Ü]*[à-üÀ-Ü][a-zA-Zà-üÀ-Ü]*
explenation:
[a-zA-Zà-üÀ-Ü]*- this will match all the accent and not accent letters (so we can have other accent/non-accent letters in our word) - the star*modifier is here to match zero or more letters[à-üÀ-Ü]- this will match exactly one accent letter - to force matching only the words with an accent
java.text.Normalizerto deal with this.