Skip to main content
Stack Overflow
  1. About
  2. For Teams

Return to Question

Post Timeline

edited tags; edited tags; edited tags
Link
BalusC
  • 1.1m
  • 377
  • 3.7k
  • 3.6k
added 291 characters in body
Source Link
Martin
  • 3
  • 1
  • 1
  • 3

I'm trying to get data from website which is encoded in UTF-8 and insert them into the database (MYSQL). Database is also encoded in UTF-8.

This is the method I use to download data from specific site.

public String download(String url) throws java.io.IOException {
 java.io.InputStream s = null;
 java.io.InputStreamReader r = null;
 StringBuilder content = new StringBuilder();
 try {
 s = (java.io.InputStream)new URL(url).getContent();
 r = new java.io.InputStreamReader(s, "UTF-8");
 char[] buffer = new char[4*1024];
 int n = 0;
 while (n >= 0) {
 n = r.read(buffer, 0, buffer.length);
 if (n > 0) {
 content.append(buffer, 0, n);
 }
 }
 }
 finally {
 if (r != null) r.close();
 if (s != null) s.close(); 
 }
 return content.toString();
 }

If encoding is set to 'UTF-8' (r = new java.io.InputStreamReader(s, "UTF-8"); ) data inserted into database seems to look OK, but when I try to display it, I am getting something like this: C�te d'Ivoire, instead of Côte d'Ivoire.

All my websites are encoded in UTF-8.

Please help.

If encoding is set to 'windows-1252' (r = new java.io.InputStreamReader(s, "windows-1252"); ) everything works fine and I am getting Côte d'Ivoire on my website (), but in java this title looks like 'C? ́te d'Ivoire' what breaks other things, such as for example links. What does it mean ?

I'm trying to get data from website which is encoded in UTF-8 and insert them into the database (MYSQL). Database is also encoded in UTF-8.

This is the method I use to download data from specific site.

public String download(String url) throws java.io.IOException {
 java.io.InputStream s = null;
 java.io.InputStreamReader r = null;
 StringBuilder content = new StringBuilder();
 try {
 s = (java.io.InputStream)new URL(url).getContent();
 r = new java.io.InputStreamReader(s, "UTF-8");
 char[] buffer = new char[4*1024];
 int n = 0;
 while (n >= 0) {
 n = r.read(buffer, 0, buffer.length);
 if (n > 0) {
 content.append(buffer, 0, n);
 }
 }
 }
 finally {
 if (r != null) r.close();
 if (s != null) s.close(); 
 }
 return content.toString();
 }

If encoding is set to 'UTF-8' (r = new java.io.InputStreamReader(s, "UTF-8"); ) data inserted into database seems to look OK, but when I try to display it, I am getting something like this: C�te d'Ivoire, instead of Côte d'Ivoire.

All my websites are encoded in UTF-8.

Please help.

I'm trying to get data from website which is encoded in UTF-8 and insert them into the database (MYSQL). Database is also encoded in UTF-8.

This is the method I use to download data from specific site.

public String download(String url) throws java.io.IOException {
 java.io.InputStream s = null;
 java.io.InputStreamReader r = null;
 StringBuilder content = new StringBuilder();
 try {
 s = (java.io.InputStream)new URL(url).getContent();
 r = new java.io.InputStreamReader(s, "UTF-8");
 char[] buffer = new char[4*1024];
 int n = 0;
 while (n >= 0) {
 n = r.read(buffer, 0, buffer.length);
 if (n > 0) {
 content.append(buffer, 0, n);
 }
 }
 }
 finally {
 if (r != null) r.close();
 if (s != null) s.close(); 
 }
 return content.toString();
 }

If encoding is set to 'UTF-8' (r = new java.io.InputStreamReader(s, "UTF-8"); ) data inserted into database seems to look OK, but when I try to display it, I am getting something like this: C�te d'Ivoire, instead of Côte d'Ivoire.

All my websites are encoded in UTF-8.

Please help.

If encoding is set to 'windows-1252' (r = new java.io.InputStreamReader(s, "windows-1252"); ) everything works fine and I am getting Côte d'Ivoire on my website (), but in java this title looks like 'C? ́te d'Ivoire' what breaks other things, such as for example links. What does it mean ?

Source Link
Martin
  • 3
  • 1
  • 1
  • 3

UTF-8 Encoding in java, retrieving data from website

I'm trying to get data from website which is encoded in UTF-8 and insert them into the database (MYSQL). Database is also encoded in UTF-8.

This is the method I use to download data from specific site.

public String download(String url) throws java.io.IOException {
 java.io.InputStream s = null;
 java.io.InputStreamReader r = null;
 StringBuilder content = new StringBuilder();
 try {
 s = (java.io.InputStream)new URL(url).getContent();
 r = new java.io.InputStreamReader(s, "UTF-8");
 char[] buffer = new char[4*1024];
 int n = 0;
 while (n >= 0) {
 n = r.read(buffer, 0, buffer.length);
 if (n > 0) {
 content.append(buffer, 0, n);
 }
 }
 }
 finally {
 if (r != null) r.close();
 if (s != null) s.close(); 
 }
 return content.toString();
 }

If encoding is set to 'UTF-8' (r = new java.io.InputStreamReader(s, "UTF-8"); ) data inserted into database seems to look OK, but when I try to display it, I am getting something like this: C�te d'Ivoire, instead of Côte d'Ivoire.

All my websites are encoded in UTF-8.

Please help.

lang-java

AltStyle によって変換されたページ (->オリジナル) /