How to Optimize Memory and Performance When Working on Bigger Plans
Reported for version 10
DQC plans store data while processing, which means that the more input data there is, the more memory DQC will require. Depending on the amount of input data, it could take multiple hours to finish a plan, and in some cases, intensive executions can result in a memory overflow error if the default memory allocated for DQC is consumed. If this is the case, an increase of memory could present a solution. See the options below for more information.
Allocating More Memory to the DQC IDE
The Integrated Development Environment of DQC uses memory for calculations occurring within model projects and storing plan information. The desired change may be that of a performance of the whole IDE, which can be indicated by similar error message to the one below:
Description of the issue
This problem occurs because DQC, as a Java program, is configured to use a pre-defined maximum amount of memory, which might not be enough in cases, e.g., where one attribute in the input data contains a big amount of text for each record or when a plan contains many grouping steps. This error can also mean that the limit for creating memory threads was broken. It occurs due to running complicated DQC plans, which results in too many parallel service calls.
Usually, the problem can be fixed by increasing the Java virtual machine memory (Xmx) from the default 256MB to several gigabytes. To do so, follow the steps below:
- Open
dqc.ini
ordqc64.ini
located in the root folder of the DQC installation with a text editor. This file contains theJAVA_OPTS
that will be used by the virtual machine on start. - Edit the -Xmx row and substitute the number (256 by default) by a higher number for reserving more memory, e.g. -Xmx1024m where 1024M means 1024MBs of memory.
Note to Xms, Xmx parameters:
-Xms40m -> minimal heap space in MBs – DQC will be started with this value (Here you can start with 75% of a free memory)
-Xmx256m -> maximal heap space in MBs – DQC will use this as maximum (Should be bigger than Xms)
Though there are other situations in which one might receive a java.lang.OutOfMemoryError
, the previous and following are several examples. Additionally, if you are running cyclic workflows, it might be worth looking into and consider an occasional cleaning of the \workflows\resources
folder, which may be responsible for consuming more memory than necessary.
Also, make sure that you have a JRE compatible with the DQC you use (e.g. using 64 bit DQC without jre64 can cause the following error).
If the memory error persists after allocating additional memory, check the other JAVA_OPTS
you use. If you enabled a single_batch parallel strategy -Dnme.parallel.strategy=single_batch
, it may consume a lot of memory as well. To disable this strategy, change the parameter back to its default value -Dnme.parallel.strategy=none
. Also, high processing parallelism level may cause memory issues when two memory intensive subtasks are executed alongside. To avoid this, lower the processing parallelism, e.g. to -Dnme.consolidation.parallel=1
and -Dnme.delta.parallel=1
. These values are the default values of the processing parallelism and if you had not changed them before, it is not necessary to use these parameters.
Allocating More Memory for Plan Execution
When creating a plan or when running memory-intensive steps like Profiling, we may want to allocate additional memory for plan execution. It may occur that even when there is enough memory allocated to the IDE itself (described in the Allocating More Memory to the DQC IDE scenario of this guide), the execution of all plans of the project is too slow. The reason is that the default settings for memory allocation are optimized for general use, which may not be ideal for all scenarios. To overcome such an issue, it is recommended that we allocate more memory for the whole project.
If an overall change of additional memory allocated for the project is needed, see the option below:
Open Window > Preferences > Ataccama DQC > Launching.
Change the value of the Default memory for launch configuration (MB) to 1024, which is recommended for modern machines.
When allocating more memory for plan execution, we should also consider doing the same for the whole DQC IDE (described in the Allocating More Memory to the DQC IDE solution above). Considering this step may be useful when the IDE is crashing even though we have enough memory allocated in the window where we launch the project.
Allocating Additional Memory for a Particular Plan Execution
When a plan requires more memory (e.g. great volumes or memory-intensive tasks such as roll-ups), you can extend the reserved memory for that plan only, instead of reserving the memory of the project as a whole or for the DQC on launch. In order to configure memory allocation for a particular plan, follow these steps:
- Click the drop down arrow beside run and select Run Configuration...
- From within the Runtimes tab, and in VM arguments, you can input the same variables for the VM that are added in the environment variable
JAVA_OPTS
. - Set the
Xmx
variable, which configures to the maximum heap size reserved by the Java Virtual Machine. Follow the notation for passing this parameter, e.g.,-Xmx1024m
stands for reserving 1 GB of memory as the maximum allocated for the plan. Notice that you can see a complete list of all the plans being opened and executed by DQC in the left pane, with their particular VM argument settings.
If increasing Xmx
to several gigabytes does not help, runtime properties parameter tuning could be required.
If the plan is failing on Profiling step you can try to set property profiling.inMemoryTotal to a smaller number instead of default 1000000, e.g. 50000.
Although the provided solution solves the problem as well, it is recommended to allocate more memory for the project itself. The procedure can be seen in the Allocating More Memory for Plan Execution scenario, where the memory change works for the project as a whole. The process described in the Allocating Additional Memory for a Particular Plan Execution section is used more often for adding another VM arguments (e.g. when declaring a path).
Some of the most memory-intensive steps are Profiling, Unification, Extended Unification, Lookup Builder, and Sorter. The default settings for allocated resources are optimized for common usage.
Allocating More Memory for the DQC Web Console on a Server
A different use case occurs when working with DQC Web Console on your server. The following error may indicate that there is not enough memory allocated in such case:
There are several possible reasons for this error:
- Lack of Xmx memory
- System variable max user processes is set too low
- Lack of physical memory
To overcome the issue, change JAVA_OPTS
parameters by placing the following commands inside scripts you invoke on your server:
set JAVA_OPTS=-Xmx1024m
for Windows server (.bat files -onlinectl.bat
,runcif.bat
,runewf.bat
)export JAVA_OPTS="-Xmx1024m"
for Linux server (.sh files -onlinectl.sh
,runcif.sh
,runewf.sh
)
The run_java.bat|sh
file is executed from all these files, therefore is it possible to change JAVA_OPTS
parameters there and they will override all the other JAVA_OPTS
defined previously. This script file is also used during plan and workflow execution. Also, be sure to restart the online server after making changes.
If the error message remains, some of the other options may be worth not overlooking:
- On Linux platform, the solution lies in increasing the values of two OS parameters,
open files
andmax user processes
to 20,000 or more, depending on the plan complexity. Current values of the parameters can be found by running the$ulimit -a
command. Note, that you may need to put the command to either your.bashrc
or.bash_profile
files. Add more physical memory.
Following steps of one of the four solutions provided above should solve the problem with memory issues occurring. Particular steps which need to be taken into account depend on the specific task you are fulfilling. Allocating more memory for all plans executions rather than for an exact plan presents a longer-run solution. Although, the IDE itself or the web console on a server may sometimes require an increase of additional memory.
Allocating Additional Memory on the BDE Cluster
When working on a BDE cluster and using Hadoop framework (e.g. performing MapReduce tasks), out of memory errors may also occur. This happens when MapReduce tasks heap sizes do not fit, for example when the MapReduce model is grouping or unifying too many records.
To overcome this issue, increase the memory in the mapred-site.xml
file. To do so, set the Xmx
to the desired value in the mapreduce.reduce.java.opts
property:
<property> <name>mapreduce.reduce.java.opts</name> <value>-Djava.net.preferIPv4Stack=true -Xmx10240m</value> </property>
Also, you need to increase the memory of the whole container in a similar way:
<property> <name>mapreduce.reduce.memory.mb</name> <value>8192</value> </property>
Usually, the allocated memory for one map is 75% of the whole container.
The mapred-site.xml
file is located in your folder containing configurations. The path to the configuration folder may be set miscellaneously. One of the options is to specify the path to the configuration folder in cluster.properties
file:
conf=path_to_your_cluster_configs
Another way how to define the path is to set it in the BDE IDE. When you set up a new Hadoop cluster, specify the path to the configuration there.
Related articles