Another analysis to do is measure the time taken for the transformation of "lookup" and "filter" with the same file that the previous tests.
From the first test we had optimal results. Equally try to find the point of equilibrium between the saturation of the resources and benefits. When modifying the parameters, the changes were not significant, so were not recorded.
There are characteristics I try not to talk unless it highlighting (for better or worse). One is the ease of use, intuitive. It is who can tell me, robustness leads to be complex. That is totally wrong. Talend as various tools mentioned in the blog, I am astounded by the ease with which I can design and play, and delete, and more and more. There are others, where Load metadata from a flat file is an ordeal. Not complex, but unfriendly.
And to be fair with the punches, there are others that make you laugh. when you do not find the solution, but you see the stage is perfect, and just removing it and recreating it and it works.
Sorry, I went around the bush. Look at this job, with these parameters, gave some excellent times.
CASE 1: -Xms256M, -Xmx1024M
|Objective:||To measure elapsed time reading 6 million rows, from Flat file, join the main flow with a lookup table (MySql) and take attributes. Filter the flow and write a txt file.|
|Resources:||Virtual machine with: 2 GB RAM, Talend like main process over the virtual plataform. The resources used are anecdotal, today, Any production environment has enough processing power for current and future requirements. The objective here, is to build, to execute and to measure with the same environment (regardless of the limited resources)|
|Design & Run||
Setting MySQL Connection
Filter the data:
|Elapsed time (s)||112 Secs.|
|Rows per sec (avg)||53.431 rows/sec|
|How to Improve Perform
- Adjust the parameters:
- Xms + Xmx (as shown in the figure above)