How to Optimize Memory and Performance When Working on Bigger Plans

Reported for version 10

DQC plans store data while processing, which means that the more input data there is, the more memory DQC will require. Depending on the amount of input data, it could take multiple hours to finish a plan, and in some cases, intensive executions can result in a memory overflow error if the default memory allocated for DQC is consumed. If this is the case, an increase of memory could present a solution. See the options below for more information.

Allocating More Memory to the DQC IDE

The Integrated Development Environment of DQC uses memory for calculations occurring within model projects and storing plan information. The desired change may be that of a performance of the whole IDE, which can be indicated by similar error message to the one below:

OutOfMemoryError: Java Heap Space or GC Overhead Limit Exceeded...
java.lang.OutOfMemoryError: Java heap space
    at org.antlr.runtime.tree.BaseTree.createChildrenList(BaseTree.java:244)
    at org.antlr.runtime.tree.BaseTree.addChild(BaseTree.java:122)
    at org.antlr.runtime.tree.BaseTreeAdaptor.addChild(BaseTreeAdaptor.java:107)
    at com.ataccama.dqc.expressions.compile.ExpressionParser.eqOp(ExpressionParser.java:1465)
    at com.ataccama.dqc.expressions.compile.ExpressionParser.eqExpr(ExpressionParser.java:1304)
    at com.ataccama.dqc.expressions.compile.ExpressionParser.notExpr(ExpressionParser.java:1130)
    at com.ataccama.dqc.expressions.compile.ExpressionParser.andExpr(ExpressionParser.java:1001)
    at com.ataccama.dqc.expressions.compile.ExpressionParser.xorExpr(ExpressionParser.java:899)
    at com.ataccama.dqc.expressions.compile.ExpressionParser.orExpr(ExpressionParser.java:797)
    at com.ataccama.dqc.expressions.compile.ExpressionParser.expr(ExpressionParser.java:689)
    at com.ataccama.dqc.expressions.compile.ExpressionParser.arg(ExpressionParser.java:3572)
    at com.ataccama.dqc.expressions.compile.ExpressionParser.argList(ExpressionParser.java:3409)
    at com.ataccama.dqc.expressions.compile.ExpressionParser.funcCall(ExpressionParser.java:3067)
    at com.ataccama.dqc.expressions.compile.ExpressionParser.atom(ExpressionParser.java:2934)
    at com.ataccama.dqc.expressions.compile.ExpressionParser.termExpr(ExpressionParser.java:2629)
    at com.ataccama.dqc.expressions.compile.ExpressionParser.unaryExpr(ExpressionParser.java:2482)
    at com.ataccama.dqc.expressions.compile.ExpressionParser.multExpr(ExpressionParser.java:2041)
    at com.ataccama.dqc.expressions.compile.ExpressionParser.addExpr(ExpressionParser.java:1904)
    at com.ataccama.dqc.expressions.compile.ExpressionParser.relExpr(ExpressionParser.java:1805)
    at com.ataccama.dqc.expressions.compile.ExpressionParser.eqExpr(ExpressionParser.java:1199)
    at com.ataccama.dqc.expressions.compile.ExpressionParser.notExpr(ExpressionParser.java:1130)
    at com.ataccama.dqc.expressions.compile.ExpressionParser.andExpr(ExpressionParser.java:1001)
    at com.ataccama.dqc.expressions.compile.ExpressionParser.xorExpr(ExpressionParser.java:899)
    at com.ataccama.dqc.expressions.compile.ExpressionParser.orExpr(ExpressionParser.java:797)
    at com.ataccama.dqc.expressions.compile.ExpressionParser.expr(ExpressionParser.java:689)
    at com.ataccama.dqc.expressions.compile.ExpressionParser.sequence(ExpressionParser.java:279)
    at com.ataccama.dqc.expressions.compile.ExpressionParser.main(ExpressionParser.java:156)
    at com.ataccama.dqc.expressions.compile.GeneralExpressionCompiler.parse(GeneralExpressionCompiler.java:80)
    at com.ataccama.dqc.expressions.compile.GeneralExpressionCompiler.compile(GeneralExpressionCompiler.java:95)
    at com.ataccama.dqc.expressions.compile.GeneralExpressionCompiler.compileWithException(GeneralExpressionCompiler.java:116)
    at com.ataccama.dqc.model.expressions.util.ExpressionCompiler.compile(ExpressionCompiler.java:311)
    at com.ataccama.dqc.model.elements.steps.bean.ExpressionWrapper.compile(ExpressionWrapper.java:73)
  
java.lang.OutOfMemoryError: GC overhead limit exceeded
    at com.sun.org.apache.xerces.internal.dom.DeferredDocumentImpl.createChunk(Unknown Source)
    at com.sun.org.apache.xerces.internal.dom.DeferredDocumentImpl.ensureCapacity(Unknown Source)
    at com.sun.org.apache.xerces.internal.dom.DeferredDocumentImpl.createNode(Unknown Source)
    at com.sun.org.apache.xerces.internal.dom.DeferredDocumentImpl.createDeferredDocument(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.AbstractDOMParser.startDocument(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.dtd.XMLDTDValidator.startDocument(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.startEntity(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLVersionDetector.startDocumentParsing(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unknown Source)
    at javax.xml.parsers.DocumentBuilder.parse(Unknown Source)
    at com.ataccama.dqc.commons.util.xml.XmlDomUtil.createRootElement(XmlDomUtil.java:478)
    at com.ataccama.dqc.commons.util.xml.XmlDomUtil.createRootElement(XmlDomUtil.java:483)
    at com.ataccama.dqc.model.environment.plugins.PluginConfiguration.findPlugins(PluginConfiguration.java:127)
    at com.ataccama.dqc.model.environment.plugins.PluginConfiguration.<init>(PluginConfiguration.java:63)
    at com.ataccama.dqc.model.environment.plugins.PluginConfiguration.<init>(PluginConfiguration.java:47)
    at com.ataccama.dqc.model.environment.plugins.PluginConfiguration$1.newInstance(PluginConfiguration.java:211)
    at com.ataccama.dqc.commons.util.reflect.PurityClassLoadingContext$ContextData.get(PurityClassLoadingContext.java:99)
    at com.ataccama.dqc.commons.util.reflect.PurityClassLoadingContext.getCorrectInstance(PurityClassLoadingContext.java:42)
    at com.ataccama.dqc.model.environment.plugins.PluginConfiguration.getInstance(PluginConfiguration.java:207)
    at com.ataccama.dqc.api.internal.services.CoreImpl.<clinit>(CoreImpl.java:39)
    at com.ataccama.dqc.api.internal.CoreSupport.<init>(CoreSupport.java:19)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
    at java.lang.reflect.Constructor.newInstance(Unknown Source)
    at java.lang.Class.newInstance0(Unknown Source)
    at java.lang.Class.newInstance(Unknown Source)
    at com.ataccama.dqc.gui.proxy.GuiCoreProxy.initCoreSupport(GuiCoreProxy.java:152)

Description of the issue

This problem occurs because DQC, as a Java program, is configured to use a pre-defined maximum amount of memory, which might not be enough in cases, e.g., where one attribute in the input data contains a big amount of text for each record or when a plan contains many grouping steps. This error can also mean that the limit for creating memory threads was broken. It occurs due to running complicated DQC plans, which results in too many parallel service calls.

Usually, the problem can be fixed by increasing the Java virtual machine memory (Xmx) from the default 256MB to several gigabytes. To do so, follow the steps below: 

  1. Open dqc.ini or dqc64.ini located in the root folder of the DQC installation with a text editor. This file contains the JAVA_OPTS that will be used by the virtual machine on start.
  2. Edit the -Xmx row and substitute the number (256 by default) by a higher number for reserving more memory, e.g. -Xmx1024m where 1024M means 1024MBs of memory. 

Note to Xms, Xmx parameters:

-Xms40m -> minimal heap space in MBs – DQC will be started with this value (Here you can start with 75% of a free memory)

-Xmx256m -> maximal heap space in MBs – DQC will use this as maximum (Should be bigger than Xms)

Though there are other situations in which one might receive a java.lang.OutOfMemoryError, the previous and following are several examples. Additionally, if you are running cyclic workflows, it might be worth looking into and consider an occasional cleaning of the \workflows\resources folder, which may be responsible for consuming more memory than necessary.

Also, make sure that you have a JRE compatible with the DQC you use (e.g. using 64 bit DQC without jre64 can cause the following error).

java.lang.OutOfMemoryError: unable to create new native thread...
java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Unknown Source)
at com.ataccama.dqc.commons.threads.ThreadPool.start(ThreadPool.java:75)
at com.ataccama.dqc.communication.rpc.common.AbstractEndImpl.<init>(AbstractEndImpl.java:78)
at com.ataccama.dqc.communication.client.ClientNodeImpl.<init>(ClientNodeImpl.java:61)
at com.ataccama.dqc.processor.launch.LaunchValidator.check(LaunchValidator.java:56)
at com.ataccama.dqc.processor.bin.CifProcessor.isGuiConnected(CifProcessor.java:409)
at com.ataccama.dqc.processor.internal.management.DevelopmentProcessMonitor.checkCorrectExecution(DevelopmentProcessMonitor.java:21)
at com.ataccama.dqc.processor.internal.management.ProcessMonitorBase$MonitorRunnable.run(ProcessMonitorBase.java:46)
at java.lang.Thread.run(Unknown Source)

If the memory error persists after allocating additional memory, check the other JAVA_OPTS you use. If you enabled a single_batch parallel strategy -Dnme.parallel.strategy=single_batch, it may consume a lot of memory as well. To disable this strategy, change the parameter back to its default value -Dnme.parallel.strategy=none. Also, high processing parallelism level may cause memory issues when two memory intensive subtasks are executed alongside. To avoid this, lower the processing parallelism, e.g. to -Dnme.consolidation.parallel=1 and -Dnme.delta.parallel=1. These values are the default values of the processing parallelism and if you had not changed them before, it is not necessary to use these parameters.

Allocating More Memory for Plan Execution 

When creating a plan or when running memory-intensive steps like Profiling, we may want to allocate additional memory for plan execution. It may occur that even when there is enough memory allocated to the IDE itself (described in the Allocating More Memory to the DQC IDE  scenario of this guide), the execution of all plans of the project is too slow. The reason is that the default settings for memory allocation are optimized for general use, which may not be ideal for all scenarios. To overcome such an issue, it is recommended that we allocate more memory for the whole project.

If an overall change of additional memory allocated for the project is needed, see the option below:

  1. Open Window > Preferences > Ataccama DQC > Launching.

  2. Change the value of the Default memory for launch configuration (MB) to 1024, which is recommended for modern machines. 

When allocating more memory for plan execution, we should also consider doing the same for the whole DQC IDE (described in the Allocating More Memory to the DQC IDE solution above). Considering this step may be useful when the IDE is crashing even though we have enough memory allocated in the window where we launch the project.

Allocating Additional Memory for a Particular Plan Execution

When a plan requires more memory (e.g. great volumes or memory-intensive tasks such as roll-ups), you can extend the reserved memory for that plan only, instead of reserving the memory of the project as a whole or for the DQC on launch. In order to configure memory allocation for a particular plan, follow these steps:

  1. Click the drop down arrow beside run and select Run Configuration...
  2. From within the Runtimes tab, and in VM arguments, you can input the same variables for the VM that are added in the environment variable JAVA_OPTS.
  3. Set the Xmx variable, which configures to the maximum heap size reserved by the Java Virtual Machine. Follow the notation for passing this parameter, e.g., -Xmx1024m stands for reserving 1 GB of memory as the maximum allocated for the plan. Notice that you can see a complete list of all the plans being opened and executed by DQC in the left pane, with their particular VM argument settings.

If increasing Xmx to several gigabytes does not help, runtime properties parameter tuning could be required. 

If the plan is failing on Profiling step you can try to set property profiling.inMemoryTotal to a smaller number instead of default 1000000, e.g. 50000.

Although the provided solution solves the problem as well, it is recommended to allocate more memory for the project itself. The procedure can be seen in the Allocating More Memory for Plan Execution scenario, where the memory change works for the project as a whole. The process described in the Allocating Additional Memory for a Particular Plan Execution section is used more often for adding another VM arguments (e.g. when declaring a path). 

 Some of the most memory-intensive steps are Profiling, Unification, Extended Unification, Lookup Builder, and Sorter. The default settings for allocated resources are optimized for common usage.

Allocating More Memory for the DQC Web Console on a Server

A different use case occurs when working with DQC Web Console on your server. The following error may indicate that there is not enough memory allocated in such case: 

java.lang.OutOfMemoryError: unable to create new native thread...
java.lang.OutOfMemoryError: unable to create new native thread
        at java.lang.Thread.start0(Native Method)
        at java.lang.Thread.start(Thread.java:640)
        at com.ataccama.dqc.commons.threads.AsyncExecutor.<init>(AsyncExecutor.java:34)
        at com.ataccama.dqc.commons.threads.RunnableSetsProviderByThreadPool.ensureIdleExecutor(RunnableSetsProviderByThreadPool.java:63)
        at com.ataccama.dqc.commons.threads.RunnableSetsProviderByThreadPool.getExecutor(RunnableSetsProviderByThreadPool.java:47)
        at com.ataccama.dqc.commons.threads.RunnablesSetAsyncExecutor.run(RunnablesSetAsyncExecutor.java:107)
        at com.ataccama.dqc.processor.internal.runner.RuntimeModel.run(RuntimeModel.java:198)
        at com.ataccama.dqc.processor.internal.runner.Processor.run(Processor.java:128)
        at com.ataccama.dqc.processor.bin.CifProcessor.execute(CifProcessor.java:387)
        at com.ataccama.dqc.processor.bin.CifProcessor.main(CifProcessor.java:162)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at com.ataccama.dqc.bootstrap.DqcBootstrap.run(DqcBootstrap.java:127)
        at com.ataccama.dqc.bootstrap.DqcBootstrap.execute(DqcBootstrap.java:71)
        at com.ataccama.dqc.bootstrap.DqcBootstrap.main(DqcBootstrap.java:57)

There are several possible reasons for this error:

  1. Lack of Xmx memory
  2. System variable max user processes is set too low
  3. Lack of physical memory

To overcome the issue, change JAVA_OPTS parameters by placing the following commands inside scripts you invoke on your server:

  • set JAVA_OPTS=-Xmx1024m for Windows server (.bat files -  onlinectl.bat, runcif.bat, runewf.bat)
  • export JAVA_OPTS="-Xmx1024m" for Linux server (.sh files - onlinectl.sh, runcif.sh, runewf.sh)

The run_java.bat|sh file is executed from all these files, therefore is it possible to change JAVA_OPTS parameters there and they will override all the other JAVA_OPTS defined previously. This script file is also used during plan and workflow execution. Also, be sure to restart the online server after making changes.

If the error message remains, some of the other options may be worth not overlooking:

  • On Linux platform, the solution lies in increasing the values of two OS parameters,  open files and  max user processes to 20,000 or more, depending on the plan complexity. Current values of the parameters can be found by running the $ulimit -a command. Note, that you may need to put the command to either your .bashrc or .bash_profile files. 
  • Add more physical memory.

Following steps of one of the four solutions provided above should solve the problem with memory issues occurring. Particular steps which need to be taken into account depend on the specific task you are fulfilling. Allocating more memory for all plans executions rather than for an exact plan presents a longer-run solution. Although, the IDE itself or the web console on a server may sometimes require an increase of additional memory.

Allocating Additional Memory on the BDE Cluster

When working on a BDE cluster and using Hadoop framework (e.g. performing MapReduce tasks), out of memory errors may also occur. This happens when MapReduce tasks heap sizes do not fit, for example when the MapReduce model is grouping or unifying too many records.

To overcome this issue, increase the memory in the mapred-site.xml file. To do so, set the Xmx to the desired value in the mapreduce.reduce.java.opts property:

  <property>
    <name>mapreduce.reduce.java.opts</name>
    <value>-Djava.net.preferIPv4Stack=true -Xmx10240m</value>
  </property>

Also, you need to increase the memory of the whole container in a similar way:

<property>
    <name>mapreduce.reduce.memory.mb</name>
    <value>8192</value>
  </property>

Usually, the allocated memory for one map is 75% of the whole container.

The mapred-site.xml file is located in your folder containing configurations. The path to the configuration folder may be set miscellaneously. One of the options is to specify the path to the configuration folder in cluster.properties file:

conf=path_to_your_cluster_configs

Another way how to define the path is to set it in the BDE IDE. When you set up a new Hadoop cluster, specify the path to the configuration there.