I have the below code that takes the sql data and exports it to xml or csv. The xml is processing at good speed but my csv sometimes takes 1 hour depending on how many data rows it exports. My question is my code effective writing the csv or it can be improved for speed? Has around 40 columns and x rows which can be over 10k.
public static void SqlExtract(this string queryStatement, string xFilePath, string fileName)
{
string connectionString = @"Data Source=ipaddress; Initial Catalog=name; User ID=username; Password=password";
using (SqlConnection _con = new SqlConnection(connectionString))
{
using (SqlCommand _cmd = new SqlCommand(queryStatement, _con))
{
using (SqlDataAdapter _dap = new SqlDataAdapter(_cmd))
{
if (fileName == "item1" || fileName == "item2" || fileName == "item3")
{
DataSet ds = new DataSet("FItem");
_con.Open();
_dap.Fill(ds);
_con.Close();
FileStream fs = new FileStream(xFilePath, FileMode.Create, FileAccess.Write, FileShare.None);
StreamWriter writer = new StreamWriter(fs, Encoding.UTF8);
ds.WriteXml(writer, XmlWriteMode.IgnoreSchema);
fs.Close();
StringWriter sw = new StringWriter();
ds.WriteXml(sw, XmlWriteMode.IgnoreSchema);
string OutputXML = sw.ToString();
OutputXML = OutputXML.Replace("Table", "Item");
System.IO.File.WriteAllText(xFilePath, OutputXML);
}
else
{
DataTable table1 = new DataTable("Table1");
_con.Open();
_dap.Fill(table1);
_con.Close();
string exportCSV = string.Empty;
foreach (DataRow row in table1.Rows)
{
int i = 1;
foreach (DataColumn column in table1.Columns)
{
if (row[1].ToString() == "002" && i > 41 || row[1].ToString() == "END" && i > 4)
{
//do nothing
}
else
{
if (i > 1)
{
exportCSV += ";";
}
exportCSV += row[column.ColumnName].ToString();
}
i++;
}
exportCSV += "\r\n";
}
//Write CSV
File.WriteAllText(xFilePath, exportCSV.ToString());
}
}
}
}
}
-
\$\begingroup\$ also, you may look into turning this into async Task(), utilizing "await _con.OpenAsync().ConfigureAwait(false);" and " _cmd.ExecuteReaderAsync" \$\endgroup\$yob– yob2020年02月25日 00:36:14 +00:00Commented Feb 25, 2020 at 0:36
3 Answers 3
Your biggest problem targeting performance is the use of string concatenation by using exportCSV +=
. Each time such a line will be executed a new string object will be created which takes time.
By using a StringBuilder
like so
StringBuilder exportCSV = new StringBuilder(1024);
foreach (DataRow row in table1.Rows)
{
int i = 1;
foreach (DataColumn column in table1.Columns)
{
if (row[1].ToString() == "002" && i > 41 || row[1].ToString() == "END" && i > 4)
{
//do nothing
}
else
{
if (i > 1)
{
exportCSV.Append(";");
}
exportCSV.Append(row[column.ColumnName].ToString());
}
i++;
}
exportCSV.AppendLine();
}
the performance will get a lot better.
The xml-part should be rewritten like so
DataSet ds = new DataSet("FItem");
_con.Open();
_dap.Fill(ds);
_con.Close();
StringWriter sw = new StringWriter();
ds.WriteXml(sw, XmlWriteMode.IgnoreSchema);
string outputXML = sw.ToString().Replace("Table", "Item");
System.IO.File.WriteAllText(xFilePath, OutputXML);
-
\$\begingroup\$ thanks so much for the info it greatly improved performance now it’s like under a minute. Learn something new today thanks a lot. \$\endgroup\$QuickSilver– QuickSilver2020年02月24日 11:37:35 +00:00Commented Feb 24, 2020 at 11:37
-
\$\begingroup\$ Do you have any tips for improving speed on the xml part? \$\endgroup\$QuickSilver– QuickSilver2020年02月24日 11:58:27 +00:00Commented Feb 24, 2020 at 11:58
-
\$\begingroup\$ sorry what do you mean I’m writing it twice? If I leave one or the other I will not be able to access the xml I’ll get an error failed to parse document. \$\endgroup\$QuickSilver– QuickSilver2020年02月24日 12:34:31 +00:00Commented Feb 24, 2020 at 12:34
-
\$\begingroup\$ You are using
StreamWriter
to write to aFileStream
and later on you use aStringWriter
and use the content of it to callFile.WriteAllText()
which overwrites the formerly written file. See: docs.microsoft.com/en-us/dotnet/api/… \$\endgroup\$Heslacher– Heslacher2020年02月24日 12:40:50 +00:00Commented Feb 24, 2020 at 12:40 -
\$\begingroup\$ I need to do that to replace the tag name as when I start to write first time the tag is "table" default I think but I need it to be "Item" \$\endgroup\$QuickSilver– QuickSilver2020年02月24日 12:42:48 +00:00Commented Feb 24, 2020 at 12:42
Some quick remarks:
Don't start variable names with underscores unless they're class-wide
private
ones.Local variables should be camelCased.
OutputXML
doesn't follow that rule.Split your code into smaller methods that don't intermix data retrieval and file writing. Your method does multiple things, depending on some (odd) logic. Even its name doesn't make sense: it's called
SqlExtract
yet it generates CSV or XML files depending on seemingly arbitrary conditions.Don't hardcode your connection string. Put it in a .config file and access it via the ConfigurationManager.
Why do
_con.Open();
and_con.Close();
? Theusing
takes care of this.Don't loop through
table1.Rows
. Instead have a method that transforms aDataRow row
into a csv line and use that to construct aIEnumerable<T>
, which you then can use in combination withstring.Join()
. Something likevar csvContents = string.Join(Environment.NewLine, table1.Rows.Cast<DataRow>().Select(x => DataRowToCsvLine(x)));
. That's one line (instead of 20) which expresses far better what your code does (and yes, theDataRowToCsvLine
method is also a couple of lines, but it doesn't pollute the main logic).row[1].ToString() == "END"
: are you sure it is always going to be"END"
? Or is it possible it can be"end"
? Consider usingstring.Equals()
instead, which allows for case insensitive comparisons. Same forfileName == "item1"
etc.Properly dispose of
StringWriter
with ausing
block. Example: https://social.msdn.microsoft.com/Forums/vstudio/en-US/7d949d5c-a41c-4dc7-bbcb-429761f851d1/is-calling-flush-and-close-necesary-for-stringwriter-and-xmltextwriter?forum=netfxbcl
None of the above might improve the speed of your code, but it will make it much more maintainable and consistent.
-
\$\begingroup\$ Thanks for the tips I will amend this mistakes. The last par of the comments I can guarantee that END will be upper case. Same as with the item name. \$\endgroup\$QuickSilver– QuickSilver2020年02月24日 12:36:29 +00:00Commented Feb 24, 2020 at 12:36
Something along these lines: where your "entry point" is extension method DataTable.SqlToFile() (see DoWork below as a usage example). I'd suggest breaking your main logic of "serializing" datatable as xml/csv into separate methods - SqlToFile (saves "serialized" datatable into a file), SqlExtract (extracts datatable rows as xml or csv), and 2 separate methods SqlAsXML() and SqlAsCSV() to get xml and csv strings respectively. To serialize into xml, I'm using GetXML() ( or you can use WriteXML() as you did). And to serialize into csv, by using StringBuilder() and iterating in Rows, "saving" each row as comma-delimited string.
In addition, data access logic is implemented as its own class.
public static class SQLExtensions
{
public static bool SqlToFile(this System.Data.DataTable dt, string xFilePath, string fileName)
{
try
{
string result = dt.SqlExtract(fileName);
System.IO.File.WriteAllText(xFilePath, result);
return true;
}
catch (System.IO.IOException ex)
{
return false;
}
catch (Exception ex)
{
return false;
}
}
public static string SqlExtract(this System.Data.DataTable dt, string fileName)
{
string result = String.Empty;
var xmlfiles = new[] { "item1", "item2", "item3" };
if (xmlfiles.Contains(fileName))
{
result = dt.SqlAsXML(fileName);
}
else
{
string delimiter = ";";
result = dt.FilterData().SqlAsCSV(delimiter);
}
return result;
}
public static string SqlAsXML(this System.Data.DataTable dt, string fileName)
{
var ds = new System.Data.DataSet();
ds.Tables.Add(dt);
string xml = ds.GetXml();
return xml;
}
#region CSV related
public static IEnumerable<System.Data.DataRow> FilterData(this System.Data.DataTable dtIns)
{
var list = dtIns
.Rows.Cast<System.Data.DataRow>()
.Where(
r => !((r[1].ToString() == "002" && dtIns.Rows.IndexOf(r) > 41) || (r[1].ToString() == "END" && dtIns.Rows.IndexOf(r) > 4))
)
;
return list;
}
public static string SqlAsCSV(this System.Data.DataTable dt, string delimiter = ",")
{
StringBuilder sb = new StringBuilder();
for (int i = 0; i < dt.Rows.Count - 1; i++)
{
System.Data.DataRow dr = dt.Rows[i];
sb.AppendLine(dr.RowDataAsCSV(delimiter));
}
return sb.ToString();
}
public static string SqlAsCSV(this IEnumerable<System.Data.DataRow> rows, string delimiter = ",")
{
StringBuilder sb = new StringBuilder();
foreach (var row in rows)
{
sb.AppendLine(row.RowDataAsCSV(delimiter));
}
return sb.ToString();
}
public static string RowDataAsCSV(this System.Data.DataRow row, string delimiter = ",")
{
string rowdata = String.Join(delimiter, row.ItemArray.Select(f => f.ToString()));
return rowdata;
}
#endregion
}
public class DataAccess
{
public async Task< System.Data.DataTable> DataAsync(string connectionInfo, string commandSQL)
{
System.Data.DataTable dt = new System.Data.DataTable("data");
using (System.Data.SqlClient.SqlConnection connection = new System.Data.SqlClient.SqlConnection(connectionInfo))
{
try
{
await connection.OpenAsync().ConfigureAwait(false);
}
catch (InvalidOperationException ex)
{
return null;
}
catch (System.Data.SqlClient.SqlException ex)
{
return null;
}
using (System.Data.SqlClient.SqlCommand command = new System.Data.SqlClient.SqlCommand(commandSQL, connection))
{
using (var r = await command.ExecuteReaderAsync(System.Data.CommandBehavior.SequentialAccess).ConfigureAwait(false))
{
dt.Load(r);
}
}
connection.Close();
}
return dt;
}
}
public class MainMainMain
{
public async Task DoWork(string commandSQL, string xFilePath, string fileName)
{
string connectionString = @"Data Source=ipaddress; Initial Catalog=name; User ID=username; Password=password";
var da = new DataAccess();
var dt = await da.DataAsync(connectionString, commandSQL ).ConfigureAwait(false);
if (dt != null)
{
bool converted = dt.SqlToFile(xFilePath, fileName);
}
else
{
//oh-ho
}
}
}
- furthermore, you could add common interface, and have AsXML and AsCSV as implmentations.
-
\$\begingroup\$ Could you add why you made the changes you did? Its not much help to the OP if you just modify there code without an explanation for why they should change that. \$\endgroup\$user211881– user2118812020年02月25日 14:30:37 +00:00Commented Feb 25, 2020 at 14:30
-
1\$\begingroup\$ good point, will do \$\endgroup\$yob– yob2020年02月25日 15:11:06 +00:00Commented Feb 25, 2020 at 15:11
-
\$\begingroup\$ @yob Can you let me know if I do this changes will this increase performance? At the moment with all changes made from OP It takes for 10k rows around 2 min which is not bad considering it originally took around 15 mins for csv. Are you able to provide more info why this is more better approach? Thanks. \$\endgroup\$QuickSilver– QuickSilver2020年02月25日 15:51:59 +00:00Commented Feb 25, 2020 at 15:51
-
\$\begingroup\$ @QuickSilver, - i tried export to csv from selecting 8 columns and 2 columns from table containing 49700 rows. timings were ~23 sec and ~13 seconds. \$\endgroup\$yob– yob2020年02月25日 16:17:53 +00:00Commented Feb 25, 2020 at 16:17
-
\$\begingroup\$ @QuickSilver - 2 points, - 1) check if data "extraction" is a "bottleneck", 2) and remove "if (row[1].ToString() == "002" && i > 41 || row[1].ToString() == "END" && i > 4)" from within the loop, and instead use DataView with RowFilter \$\endgroup\$yob– yob2020年02月25日 16:21:14 +00:00Commented Feb 25, 2020 at 16:21