Skip to main content
Stack Overflow
  1. About
  2. For Teams

You are not logged in. Your edit will be placed in a queue until it is peer reviewed.

We welcome edits that make the post easier to understand and more valuable for readers. Because community members review edits, please try to make the post substantially better than how you found it, for example, by fixing grammar or adding additional resources and hyperlinks.

Required fields*

Required fields*

UTF-8 Encoding in java, retrieving data from website

I'm trying to get data from website which is encoded in UTF-8 and insert them into the database (MYSQL). Database is also encoded in UTF-8.

This is the method I use to download data from specific site.

public String download(String url) throws java.io.IOException {
 java.io.InputStream s = null;
 java.io.InputStreamReader r = null;
 StringBuilder content = new StringBuilder();
 try {
 s = (java.io.InputStream)new URL(url).getContent();
 r = new java.io.InputStreamReader(s, "UTF-8");
 char[] buffer = new char[4*1024];
 int n = 0;
 while (n >= 0) {
 n = r.read(buffer, 0, buffer.length);
 if (n > 0) {
 content.append(buffer, 0, n);
 }
 }
 }
 finally {
 if (r != null) r.close();
 if (s != null) s.close(); 
 }
 return content.toString();
 }

If encoding is set to 'UTF-8' (r = new java.io.InputStreamReader(s, "UTF-8"); ) data inserted into database seems to look OK, but when I try to display it, I am getting something like this: C�te d'Ivoire, instead of Côte d'Ivoire.

All my websites are encoded in UTF-8.

Please help.

If encoding is set to 'windows-1252' (r = new java.io.InputStreamReader(s, "windows-1252"); ) everything works fine and I am getting Côte d'Ivoire on my website (), but in java this title looks like 'C? ́te d'Ivoire' what breaks other things, such as for example links. What does it mean ?

Answer*

Draft saved
Draft discarded
Cancel
3
  • I am using php/apache, and YES, I set encoding to UTF-8: header('Content-Type:text/html; charset=UTF-8'); Commented Jan 5, 2010 at 10:08
  • Be careful that setting the header does not mean setting the encoding. You should specify in your question that you are using PHP/apache, because your java code makes this ambiguous. Commented Jan 5, 2010 at 10:10
  • 2
    you need to define the encoding when you write it as well, don't know how this works in PHP, but what you're setting in the comment is just a instruction on how the client should interpret the content stream. Commented Jan 5, 2010 at 10:10

lang-java

AltStyle によって変換されたページ (->オリジナル) /