I have been trying to import CSV data about 60 MB; it contains about 400,000 rows, which should not be too much. I tried these ways to import the data:
- MySQL Workbench - it took about two hours!!!
- Using MySQL
LOAD INFILE
command - it took about 15 seconds! - Using Talend Open Studio - almost as long as Workbench.
Why are the GUI tools working so badly? I want to use GUI-based tools, instead of writing scripts.
With Talend Open Studio, is there a way to speed things up? I heard that there is possibly another Talend product that deals with Big Data, will this do it faster than Talend Open Studio?
Hope someone out there knows!
I know that in Hadoop using Pig, I can get the the CSV from HDFS into Pig memory very quickly.
-
1I can't explain the specifics you are seeing, but ... Serious programming demands serious tools, not convenience tools sudy as UIs.Rick James– Rick James2016年11月06日 04:42:23 +00:00Commented Nov 6, 2016 at 4:42
-
Hi Rick, thanks for your reply. BUT isn't Talend Open Studio considered a powerful tool for this kind of task. Its one of the big ETL tools from the Open Source world. If that isn't a serious tool, then what would be the proper 'serious' tool?Palu– Palu2016年11月06日 04:51:19 +00:00Commented Nov 6, 2016 at 4:51
-
1@RickJames - Talend really one of the best ETL toolsa_vlad– a_vlad2016年11月06日 05:52:38 +00:00Commented Nov 6, 2016 at 5:52
-
please go through hope it helps..nwazsohail.blogspot.com/2016/10/…Nawaz Sohail– Nawaz Sohail2016年11月08日 10:00:50 +00:00Commented Nov 8, 2016 at 10:00
2 Answers 2
What parameters are You use for Talend tMySQLOutput? What MySQL settings? It could not be faster then INFILE, but still really depend from settings.
You can post Your question on TelndForge forum with screen short of Job components, will try to help
For example one of my regular Talend-MySQL Jobs (billing prepare) - transfer from->to mysql 2M of records, 400Mb, it take 6-15 minutes depending from servers (we have 2 configuration)
edit, because always better 1 time to show, than 100 times explain
2.3M rows, 300Mb file, speed of tMySQL with 100 rows per insert: 100 rows per insert
same with 10 000 rows per insert 10 000 rows per insert
tBulkOutputExec: twice faster than previous
but then total time: original 10 000 rows: original
bulk: bulk just because after execution it wait commit transaction, single transaction
-
Hi a_vlad, thanks for your response and also being a user of Talend. I just started using Talend, so I am not sure what parameters would need to be set. In the TMap I just mapped the fields between the CSV and mysql database. In the tMySQLOutput I was able to verify the connection to the MySQL database. Other then that I don't know what other settings I would need.Palu– Palu2016年11月07日 16:54:41 +00:00Commented Nov 7, 2016 at 16:54
-
Hi Palu, first of all - in tMySQL output - did You use Insert or Insert/Update? second - what size of Batch (commit every), third - settings of MySQL also could be affected (InnoDB redo log size for example)a_vlad– a_vlad2016年11月07日 19:59:15 +00:00Commented Nov 7, 2016 at 19:59
-
Hi there, I was away for a while. Thanks for all your analysis and suggestions. I will go back and try things out and get back on this. Thanks for your super efforts both you Vlad and Dillon.Palu– Palu2016年11月30日 04:50:03 +00:00Commented Nov 30, 2016 at 4:50
First of all what output component are you using? The speeds you are suggesting seem like you are using Tmysqloutput
. This is very slow because it is designed to write line by line. Try using Tmysqloutputbulkexecute
. This collects the data in a bulk file then uploads the whole thing at once and then commits it.
-
BulkExec is faster, but even with normal - just check last Job - 300Mb, 2M of rows, 95sec by normal tMySQLInput (10 000 rows batch, 300 sec with 100 in batch), with BulkExec 80sec, in Our case - not important and it exactly not more than 2Hr as described, problem somewhere other placea_vlad– a_vlad2016年11月15日 03:12:23 +00:00Commented Nov 15, 2016 at 3:12
-
we won't know until he troubleshoots the issue or at least tells us more about his issue my answer gives him some options on where he can continue to test and at least identify the issue is not with the particular output component he is using just because something works for you and your use case does not mean it would work for his in particular a data base on a high latency connection will do very poorly on the first component but one with a low latency connection will perform much betterDillon Wright– Dillon Wright2016年11月15日 03:26:58 +00:00Commented Nov 15, 2016 at 3:26
-
that easy could be network issues or any other, but not tMySQL component issue - see pictures above (taken from every day operation)a_vlad– a_vlad2016年11月15日 03:33:27 +00:00Commented Nov 15, 2016 at 3:33
-
Hi, I know its late. But I did tryout using the command line way of importing text into MySQL and took less than a minute. SO not sure where the overhead is in TalendStudio. Now I have a question, can talend studio be used to execute a batch script instead, that would be one way to use a GUI tool and not have to execute things at the command line.Palu– Palu2017年09月17日 04:03:40 +00:00Commented Sep 17, 2017 at 4:03
-
you can always use the TSystem component to run via cmd prompt although i am puzzled as to the overhead on your job did you attempt to use the tMysqloutputbulkexec component?Dillon Wright– Dillon Wright2017年09月18日 02:12:47 +00:00Commented Sep 18, 2017 at 2:12
Explore related questions
See similar questions with these tags.