I am joining datatables to create a new datatable:
var row = from r0w1 in dt_vi.AsEnumerable()
join r0w2 in dt_w.AsEnumerable()
on r0w1.Field<int>("ID") equals r0w2.Field<int>("iD")
join r0w3 in dt_re.AsEnumerable()
on r0w1.Field<int?>("ID") equals r0w3.Field<int?>("id")
join r0w4 in dt_def.AsEnumerable()
on r0w1.Field<int?>("ID") equals r0w4.Field<int?>("id") into ps
from r0w4 in ps.DefaultIfEmpty()
select r0w1.ItemArray.Concat(r0w2.ItemArray.Concat(r0w3.ItemArray.Concat(r0w4 != null ? r0w4.ItemArray : new object[] { }))).ToArray();
foreach (object[] values in row)
dt.Rows.Add(values);
In the above code,
foreach (object[] values in row)
dt.Rows.Add(values);
is slow for lakhs of rows. I want to put the data of row
into dt
datatable. Is there any faster way of doing this?
-
3\$\begingroup\$ What is "lakhs"? Do you mean "lots"? \$\endgroup\$BCdotWEB– BCdotWEB2015年12月18日 12:03:19 +00:00Commented Dec 18, 2015 at 12:03
-
3\$\begingroup\$ 100000 = 1 Lakh. Indians use this unit. I dont know if it is used elsewhere in the world or not. \$\endgroup\$Harshit– Harshit2015年12月18日 13:56:19 +00:00Commented Dec 18, 2015 at 13:56
-
\$\begingroup\$ Could you show your Domain Models? \$\endgroup\$Randy– Randy2015年12月18日 19:47:40 +00:00Commented Dec 18, 2015 at 19:47
-
\$\begingroup\$ because of joins , more joins in a query always effecting the performance better to create master table to get the data or use weak entity set, could you show your model it will give more clear picture \$\endgroup\$Pankaj Gupta– Pankaj Gupta2015年12月20日 06:49:13 +00:00Commented Dec 20, 2015 at 6:49
-
\$\begingroup\$ I have rolled back the last edit. Please see what you may and may not do after receiving answers . \$\endgroup\$Quill– Quill2015年12月22日 06:35:35 +00:00Commented Dec 22, 2015 at 6:35
1 Answer 1
I have managed to run some tests (I was very curious about LINQ on objects vs. SQL processing performance):
1) Setup
-- drop table t1
create table t1 (id INT PRIMARY KEY CLUSTERED, name NVARCHAR(1000))
go
-- drop table t2
create table t2 (id INT PRIMARY KEY CLUSTERED, name NVARCHAR(1000))
go
-- drop table t3
create table t3 (id INT PRIMARY KEY CLUSTERED, name NVARCHAR(1000))
go
-- drop table t4
create table t4 (id INT PRIMARY KEY CLUSTERED, name NVARCHAR(1000))
go
insert into t1 select top 200000 ROW_NUMBER() OVER (ORDER BY (SELECT 1)), CAST(text AS NVARCHAR(1000)) from sys.messages
insert into t2 select top 200000 ROW_NUMBER() OVER (ORDER BY (SELECT 1)), CAST(text AS NVARCHAR(1000)) from sys.messages where message_id > 1000
insert into t3 select top 200000 ROW_NUMBER() OVER (ORDER BY (SELECT 1)), CAST(text AS NVARCHAR(1000)) from sys.messages where message_id > 2000
insert into t4 select top 200000 ROW_NUMBER() OVER (ORDER BY (SELECT 1)), CAST(text AS NVARCHAR(1000)) from sys.messages where message_id > 3000
All tables have 2 Lakhs (200K)
Application and SQL run on the same machine (PC).
2) C# code
public static DataTable fillDataTable(SqlConnection sqlConn, string table)
{
return fillDataTableByQuery(sqlConn, "SELECT * FROM " + table);
}
public static DataTable fillDataTableByQuery(SqlConnection sqlConn, String query)
{
DataTable dt = new DataTable();
using (SqlCommand cmd = new SqlCommand(query, sqlConn))
{
SqlDataAdapter da = new SqlDataAdapter(cmd);
da.Fill(dt);
}
return dt;
}
var dt = new DataTable();
for (int i = 0; i < 4; i ++ )
{
dt.Columns.Add(String.Format("Id{0}", i), typeof(int));
dt.Columns.Add(String.Format("Name{0}", i), typeof(String));
}
using (SqlConnection con = new SqlConnection(connectionString))
{
con.Open();
var sw = new Stopwatch();
sw.Start();
var dt1 = fillDataTable(con, "t1");
var dt2 = fillDataTable(con, "t2");
var dt3 = fillDataTable(con, "t3");
var dt4 = fillDataTable(con, "t4");
Console.WriteLine("Loaded in " + sw.ElapsedMilliseconds);
sw.Reset(); sw.Start();
var row = from r0w1 in dt1.AsEnumerable().AsParallel()
join r0w2 in dt2.AsEnumerable().AsParallel()
on r0w1.Field<int>("ID") equals r0w2.Field<int>("iD")
join r0w3 in dt3.AsEnumerable().AsParallel()
on r0w1.Field<int?>("ID") equals r0w3.Field<int?>("id")
join r0w4 in dt4.AsEnumerable().AsParallel()
on r0w1.Field<int?>("ID") equals r0w4.Field<int?>("id") into ps
from r0w4 in ps.DefaultIfEmpty()
select r0w1.ItemArray.Concat(r0w2.ItemArray.Concat(r0w3.ItemArray.Concat(r0w4 != null ? r0w4.ItemArray : new object[] { }))).ToArray();
foreach (object[] values in row)
dt.Rows.Add(values);
Console.WriteLine("Processed in " + sw.ElapsedMilliseconds);
sw.Reset(); sw.Start();
var directDt = fillDataTableByQuery(con, @"
select t1.*, t2.*, t3.*, t4.*
from t1
join t2 on t2.id = t1.id
join t3 on t3.id = t2.id
left join t4 on t4.id = t3.id
");
Console.WriteLine("Sql process + fetch in " + sw.ElapsedMilliseconds);
3) Results
No parallel
Loaded in 4370
Processed in 3249
Sql process + fetch in 3033
With parallel
Loaded in 4240
Processed in 1904
Sql process + fetch in 3160
So, including fetch, SQL performs better in this case (IDs are all clustered indexes). Also, using AsParallel makes a difference, at least on my machine (eight-core).
[later edit] - Dictionary version
Since you have to deal with fwk. 3.5. (VS 2008) and you said that you load data once and do more processing, I think you can try out the following:
class AggrData
{
public int Id1 { get; set; }
public String Name1 { get; set; }
public int Id2 { get; set; }
public String Name2 { get; set; }
public int Id3 { get; set; }
public String Name3 { get; set; }
public int Id4 { get; set; }
public String Name4 { get; set; }
}
// value is defined as anonymous type for extensibility
var dict1 = dt1.AsEnumerable().ToDictionary(r => r.Field<int>("ID"), r => new { Name = r.Field<String>("Name") });
var dict2 = dt2.AsEnumerable().ToDictionary(r => r.Field<int>("ID"), r => new { Name = r.Field<String>("Name") });
var dict3 = dt3.AsEnumerable().ToDictionary(r => r.Field<int>("ID"), r => new { Name = r.Field<String>("Name") });
var dict4 = dt4.AsEnumerable().ToDictionary(r => r.Field<int>("ID"), r => new { Name = r.Field<String>("Name") });
Console.WriteLine("Dictionary setup took " + sw.ElapsedMilliseconds);
sw.Reset(); sw.Start();
var aggrList = new List<AggrData>();
foreach (var key in dict1.Keys)
{
if (!dict2.ContainsKey(key) || !dict3.ContainsKey(key))
continue;
var newAggr = new AggrData()
{ Id1 = key, Name1 = dict1[key].Name,
Id2 = key, Name2 = dict2[key].Name,
Id3 = key, Name3 = dict3[key].Name
};
if (dict4.ContainsKey(key))
{
newAggr.Id4 = key;
newAggr.Name3 = dict4[key].Name;
}
aggrList.Add(newAggr);
}
Console.WriteLine("Aggregate POCOs fill took " + sw.ElapsedMilliseconds);
sw.Reset(); sw.Start();
Results:
Dictionary setup took 3654
Aggregate POCOs fill took 202
So, setup is expensive, but the actual fill is much faster, as checking and retrieving data from Dictionary is very fast - O(1).
-
\$\begingroup\$ Thanks for your reply. If I do calculations or perform queries then it takes much CPU f the server. Also, I use this above taken data from the datatable at various place, which prevents from every time visiting to the database. \$\endgroup\$Harshit– Harshit2015年12月21日 04:06:37 +00:00Commented Dec 21, 2015 at 4:06
-
\$\begingroup\$ Then your path is clear. You just add AsParallel to your IEnumerables and leave the logic in C#. Another option is to try using Dictionary<int, SomeClassHoldingTheData> for data tables used in joins, so that id match is done faster (I will give it a try once I have some spare time). \$\endgroup\$Alexei– Alexei2015年12月21日 06:16:06 +00:00Commented Dec 21, 2015 at 6:16
-
\$\begingroup\$ I am getting an error when I am using
AsParallel()
.does not contain a definition for 'AsParallel'
. How did you ran it ? I am using visual studio 2008. \$\endgroup\$Harshit– Harshit2015年12月21日 07:15:35 +00:00Commented Dec 21, 2015 at 7:15 -
\$\begingroup\$ I am using VS2013 Community here. AsParallel() is part of PLINQ - msdn.microsoft.com/en-us/library/dd997425(v=vs.100).aspx - and it is available from .NET Fwk 4.0 (Visual Studio 2010+). \$\endgroup\$Alexei– Alexei2015年12月21日 07:31:26 +00:00Commented Dec 21, 2015 at 7:31
-
\$\begingroup\$ May be that's why it is not working for me since I am using VS 2008 with 3.5 framework. Thanks for your reply \$\endgroup\$Harshit– Harshit2015年12月21日 07:36:28 +00:00Commented Dec 21, 2015 at 7:36