Databases at CERN blog - Powering particle physics https://db-blog.web.cern.ch/ en Apache Spark 3.0 Memory Monitoring Improvements https://db-blog.web.cern.ch/blog/luca-canali/2020-08-spark3-memory-monitoring <span>Apache Spark 3.0 Memory Monitoring Improvements</span> <div class="field field--name-body field--type-text-with-summary field--label-above"> <div class="field--label"><b>Blog article:</b></div> <div class="field--item"><p><b>TLDR;</b> <a data-original-attrs="{&quot;data-original-href&quot;:&quot;https://spark.apache.org/releases/spark-release-3-0-0.html&quot;}" href="https://spark.apache.org/releases/spark-release-3-0-0.html">Apache Spark 3.0</a> comes with many improvements, including new features for memory monitoring. This can help you troubleshooting memory usage and optimizing the memory configuration of your Spark jobs for better performance and stability, see <a data-original-attrs="{&quot;data-original-href&quot;:&quot;https://issues.apache.org/jira/browse/SPARK-23429&quot;}" href="https://issues.apache.org/jira/browse/SPARK-23429">SPARK-23429</a> and <a data-original-attrs="{&quot;data-original-href&quot;:&quot;https://issues.apache.org/jira/browse/SPARK-27189&quot;}" href="https://issues.apache.org/jira/browse/SPARK-27189">SPARK-27189</a>.</p> <div> </div> <div><b>The problem with memory</b></div> <div>Memory is key for the performance and stability of Spark jobs. If you don't allocate enough memory for your Spark executors you are more likely to run into the much dreaded Java OOM (out of memory) errors or substantially degrade your jobs' performance. Memory is needed by Spark to execute efficiently Dataframe/RDD operations, and for improving the performance of algorithms that would otherwise have to swap to disk in their processing (e.g. shuffle operations), moreover, it can be used for caching data, reducing I/O. This is all good in theory, but in practice how do you know how much memory you need?</div> <div> </div> <div><b>A basic solution</b></div> <div>One first basic approach to memory sizing for Spark jobs, is to start by giving the executors ample amounts of memory, provided your systems has enough resources. For example, by setting the <font data-keep-original-tag="true"><font face="courier">spark.executor.memory</font></font> configuration parameter to several GBs. Note, in local mode you would set <font data-keep-original-tag="true"><font face="courier">spark.driver.memory</font></font> instead. You can further tune the configuration by trial-and-error, by reducing and increasing memory with each test and observe the results. This approach may give good results quickly, but it is not a very solid approach to the problem.</div> <div> </div> <div><b>A more structured approach</b> to memory usage troubleshooting and to sizing memory for Spark jobs is to use monitoring data to understand how much memory is used by the Spark application, which jobs request more memory, and which memory areas are used, finally linking this back to the application details and in the context of other resources utilization (for example, CPU usage).</div> <div>This approach helps with drilling down on issues of OOM, and also to be more precise in allocating memory for Spark applications, aiming at using just enough memory as needed, without wasting memory that can be a scarce shared resource in some systems. It is still an experimental and iterative process, but more informed than the basic trial-and-error solution.</div> <div> </div> <h3 data-original-attrs="{&quot;style&quot;:&quot;&quot;}" style="text-align:left"><b><font data-keep-original-tag="true"><font color="#ff0000">How memory is allocated and used by Spark</font></font></b></h3> <h4 data-original-attrs="{&quot;style&quot;:&quot;&quot;}" style="text-align:left"><b><font data-keep-original-tag="false"><font data-original-attrs="{&quot;style&quot;:&quot;&quot;}"><font color="#ff0000">Configuration of executor memory</font></font></font></b></h4> <div>The main configuration parameter used to request the allocation of executor memory is <font data-keep-original-tag="false"><font data-original-attrs="{&quot;style&quot;:&quot;&quot;}"><font face="courier">spark.executor.memory.</font></font></font>Spark running on YARN, Kubernetes or Mesos, adds to that a memory overhead  to cover for additional memory usage (OS, redundancy, filesystem cache, off-heap allocations, etc), which is calculated as memory_overhead_factor * <font data-keep-original-tag="false"><font data-original-attrs="{&quot;style&quot;:&quot;&quot;}"><font face="courier">spark.executor.memory</font></font></font>  (with a minimum of 384 MB). The overhead factor is 0.1 (10%), it and can be configured when running on Kubernetes (only) using <font data-keep-original-tag="true"><font face="courier">spark.kubernetes.memoryOverheadFactor</font></font>.</div> <div>When using PySpark additional memory can be allocated using <font data-keep-original-tag="false"><font data-original-attrs="{&quot;style&quot;:&quot;&quot;}"><font face="courier">spark.executor.pyspark.memory</font></font></font>. </div> <div>Additional memory for off-heap allocation is configured using <font data-keep-original-tag="false"><font data-original-attrs="{&quot;style&quot;:&quot;&quot;}"><font face="courier">spark.memory.offHeap.size=&lt;size&gt; </font></font></font>and <font data-keep-original-tag="false"><font data-original-attrs="{&quot;style&quot;:&quot;&quot;}"><font face="courier">spark.memory.offHeap.enabled=true</font></font></font>. This works on YARN, for K8S, see <a href="https://issues.apache.org/jira/browse/SPARK-32661">SPARK-32661.</a> </div> <div>Note also parameters for driver memory allocation: <font data-keep-original-tag="false"><font data-original-attrs="{&quot;style&quot;:&quot;&quot;}"><font face="courier">spark.driver.memory</font></font></font> and <font data-keep-original-tag="false"><font data-original-attrs="{&quot;style&quot;:&quot;&quot;}"><font face="courier">spark.driver.memoryOverhead</font></font></font>.</div> <div> <div>Note: this discussion covers recent versions of Spark at the time of this writing, notably Spark 3.0 and 2.4. See also <a data-original-attrs="{&quot;data-original-href&quot;:&quot;https://spark.apache.org/docs/latest/configuration.html#application-properties&quot;}" href="https://spark.apache.org/docs/latest/configuration.html#application-properties">Spark documentation</a>.</div> <div>  </div> </div> <div class="separator" data-original-attrs="{&quot;style&quot;:&quot;&quot;}" style="text-align:center"><span style="clear:both"><a data-original-attrs="{&quot;data-original-href&quot;:&quot;https://cern.ch/canali/docs/SparkExecutorMemory.png&quot;,&quot;style&quot;:&quot;&quot;}" href="https://cern.ch/canali/docs/SparkExecutorMemory.png" style="margin-left:13px; margin-right:13px"><img alt="" data-entity-type="" data-entity-uuid="" src="https://cern.ch/canali/docs/SparkExecutorMemory.png" style="width:1024px" /></a></span></div> <div><b>Figure 1:</b> Pictorial representation of the memory areas allocated and used by Spark executors and the main parameters for their configuration. <ul><li>Image in png format: <a href="https://cern.ch/canali/docs/SparkExecutorMemory.png" rel="nofollow">SparkExecutorMemory.png</a></li> <li>Image source, in powerpoint format: <a href="https://github.com/LucaCanali/Miscellaneous/blob/master/Spark_Notes/SparkExecutorMemory.pptx">SparkExecutorMemory.pptx</a></li> </ul></div> <div>  </div> <h4><b><font data-keep-original-tag="false"><font data-original-attrs="{&quot;style&quot;:&quot;&quot;}"><font color="#ff0000">Spark unified memory pool</font></font></font></b></h4> <div>Spark tasks allocate memory for execution and storage from the JVM heap of the executors using a unified memory pool managed by the <a data-original-attrs="{&quot;data-original-href&quot;:&quot;https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/memory/package.scala&quot;}" href="https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/memory/package.scala">Spark memory management system</a>. <a data-original-attrs="{&quot;data-original-href&quot;:&quot;https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/memory/UnifiedMemoryManager.scala&quot;}" href="https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/memory/UnifiedMemoryManager.scala">Unified memory</a> occupies by default 60% of the JVM heap: 0.6 * (<font data-keep-original-tag="false"><font data-original-attrs="{&quot;style&quot;:&quot;&quot;}"><font face="courier">spark.executor.memory</font></font></font> - 300 MB). The factor 0.6 (60%) is the default value of the configuration parameter <font data-keep-original-tag="true"><font face="courier">spark.memory.fraction</font></font>. 300MB is a hard-coded value of "reserved memory". The rest of the memory is used for user data structures, internal metadata in Spark, and safeguarding against OOM errors. </div> <div>Spark manages execution and storage memory requests using the unified memory pool. <span style="color: rgb(29, 31, 34); font-family: &quot;Helvetica Neue&quot;, Helvetica, Arial, sans-serif; font-size: 14px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-style: initial; text-decoration-color: initial; display: inline !important; float: none;">When little execution memory is used, storage can acquire most of the available memory and vice versa. </span>Additional structure in the working of the storage and execution memory is exposed with the configuration parameter <font data-keep-original-tag="true"><font face="courier">spark.memory.storageFraction</font></font> (default is 0.5), which guarantees that the stored blocks will not be evicted from the unified memory by execution below the specified threshold.</div> <div>The unified memory pool can optionally be allocated using off-heap memory, the relevant configuration parameters are: <font data-keep-original-tag="true"><font face="courier">spark.memory.offHeap.size</font></font> and <font data-keep-original-tag="true"><font face="courier">spark.memory.offHeap.enabled</font></font>. </div> <div>  </div> <h4 data-original-attrs="{&quot;style&quot;:&quot;&quot;}" style="text-align:left"><b><font data-keep-original-tag="false"><font data-original-attrs="{&quot;style&quot;:&quot;&quot;}"><font color="#ff0000">Opportunities for memory configuration settings</font></font></font></b></h4> <div>The first key configuration to get right is <font data-keep-original-tag="false"><font data-original-attrs="{&quot;style&quot;:&quot;&quot;}"><font face="courier">spark.executor.memory</font></font></font>. Monitoring data (see the following paragraphs) can help you understand if you need to <b>increase </b>the memory allocated to Spark executors and or if you are already allocating plenty of memory and can consider <b>reducing </b>the memory footprint.</div> <div>There are other memory-related configuration parameters that may need some <b>adjustments </b>for specific workloads: this can be analyzed and tested using <b>memory monitoring</b> data.</div> <div>In particular, increasing <font data-keep-original-tag="true"><font face="courier">spark.memory.fraction</font></font> (default is 0.6) may be useful when deploying large Java heap, as there is a chance that you will not need to set aside 40% of the JVM heap for user memory. With similar reasoning, when using large Java heap allocation, manually setting <font data-keep-original-tag="true"><font face="courier">spark.executor.memoryOverhead</font></font> to a value lower than the default (<font data-keep-original-tag="true"><font face="courier">0.1 * spark.executor.memory)</font></font> can be tested.</div> <div>  </div> <h3 data-original-attrs="{&quot;style&quot;:&quot;&quot;}" style="text-align:left"><b><font data-keep-original-tag="true"><font color="#ff0000">Memory monitoring improvements in Spark 3.0</font></font></b></h3> <div>Two notable improvements in Spark 3.0 for memory monitoring are:</div> <div> <ul data-original-attrs="{&quot;style&quot;:&quot;&quot;}" style="text-align:left"><li><a data-original-attrs="{&quot;data-original-href&quot;:&quot;https://issues.apache.org/jira/browse/SPARK-23429&quot;}" href="https://issues.apache.org/jira/browse/SPARK-23429">SPARK-23429</a>: Add executor memory metrics to heartbeat and expose in executors REST API <ul><li>see also the umbrella ticket <a data-original-attrs="{&quot;data-original-href&quot;:&quot;https://issues.apache.org/jira/browse/SPARK-23206&quot;}" href="https://issues.apache.org/jira/browse/SPARK-23206">SPARK-23206</a>: Additional Memory Tuning Metrics</li> </ul></li> <li><a data-original-attrs="{&quot;data-original-href&quot;:&quot;https://issues.apache.org/jira/browse/SPARK-27189&quot;}" href="https://issues.apache.org/jira/browse/SPARK-27189">SPARK-27189</a>: Add Executor metrics and memory usage instrumentation to the metrics system</li> </ul><div>When troubleshooting memory usage it is important to investigate how much memory was used as the workload progresses and <b>measure peak values of memory usage</b>. Peak values are particularly important, as this is where you get possible slow downs or even OOM errors. Spark 3.0 instrumentation adds monitoring data on the amount of memory used, drilling down on unified memory, and memory used by Python (when using PySpark). This is implemented using a new set of metrics called "<b><a data-original-attrs="{&quot;data-original-href&quot;:&quot;https://spark.apache.org/docs/latest/monitoring.html#executor-metrics&quot;}" href="https://www.blogger.com/blog/post/edit/7003976656201910397/5310584206274763475#">executor metrics</a>"</b>, and can be helpful for memory sizing and troubleshooting performance. </div> </div> <div>    </div> <div> <h4><b><font data-keep-original-tag="true"><font color="#ff0000">Measuring memory usage and peak values using the REST API</font></font></b></h4> </div> <div>An example of the data you can get from the <a href="https://spark.apache.org/docs/latest/monitoring.html#rest-api">REST API</a> in Spark 3.0:</div> <div> </div> <div><font data-keep-original-tag="true"><font face="courier"><b>WebUI URL + /api/v1/applications/&lt;application_id&gt;/executors</b></font></font></div> <div> </div> <div>Here below you can find a snippet of the peak executor memory metrics, sampled on a snapshot and limited to one of the executors used for testing:</div> <div> </div> <div> <pre data-original-attrs="{&quot;style&quot;:&quot;&quot;}"> <span style="overflow-wrap:break-word"><span style="white-space:pre-wrap"><font data-keep-original-tag="true"><font face="courier"><b>"peakMemoryMetrics" : { "JVMHeapMemory" : 29487812552, "JVMOffHeapMemory" : 149957200, "OnHeapExecutionMemory" : 12458956272, "OffHeapExecutionMemory" : 0, "OnHeapStorageMemory" : 83578970, "OffHeapStorageMemory" : 0, "OnHeapUnifiedMemory" : 12540212490, "OffHeapUnifiedMemory" : 0, "DirectPoolMemory" : 66809076, "MappedPoolMemory" : 0, "ProcessTreeJVMVMemory" : 38084534272, "ProcessTreeJVMRSSMemory" : 36998328320, "ProcessTreePythonVMemory" : 0, "ProcessTreePythonRSSMemory" : 0, "ProcessTreeOtherVMemory" : 0, "ProcessTreeOtherRSSMemory" : 0, "MinorGCCount" : 561, "MinorGCTime" : 49918, "MajorGCCount" : 0, "MajorGCTime" : 0 },</b></font></font> </span></span></pre></div> <div> <p> </p> <p><b>Notes:</b></p> <ul data-original-attrs="{&quot;style&quot;:&quot;&quot;}" style="text-align:left"><li>Procfs metrics (<a data-original-attrs="{&quot;data-original-href&quot;:&quot;https://issues.apache.org/jira/browse/SPARK-24958&quot;}" href="https://issues.apache.org/jira/browse/SPARK-24958">SPARK-24958</a>) provide a view on the process usage from "the OS point of observation". <ul><li>Notably, procfs metrics provide a way to measure memory usage by Python, when using PySpark and in general other processes that may be spawned by Spark tasks.</li> </ul></li> <li>Profs metrics are gathered conditionally: <ul><li>if the /proc filesystem exists</li> <li>if <font data-keep-original-tag="true"><font face="courier">spark.executor.processTreeMetrics.enabled=true</font></font></li> <li>The optional configuration <font data-keep-original-tag="true"><font face="courier">spark.executor.metrics.pollingInterval</font></font> allows to gather executor metrics at high frequency, <a data-original-attrs="{&quot;data-original-href&quot;:&quot;https://spark.apache.org/docs/latest/configuration.html#executor-metrics&quot;}" href="https://www.blogger.com/blog/post/edit/7003976656201910397/5310584206274763475#">see doc</a>.</li> </ul></li> <li>Additional improvements of the memory instrumentation via REST API (targeting Spark 3.1) are in "<a data-original-attrs="{&quot;data-original-href&quot;:&quot;https://issues.apache.org/jira/browse/SPARK-23431&quot;}" href="https://issues.apache.org/jira/browse/SPARK-23431">SPARK-23431</a> Expose the new executor memory metrics at the stage level".</li> </ul></div> <div>  </div> <h4 data-original-attrs="{&quot;style&quot;:&quot;&quot;}" style="text-align:left"><b><font data-keep-original-tag="true"><font color="#ff0000">Improvements to the Spark metrics system and Spark performance dashboard</font></font></b></h4> <div>The <a data-original-attrs="{&quot;data-original-href&quot;:&quot;https://spark.apache.org/docs/latest/monitoring.html#metrics&quot;}" href="https://spark.apache.org/docs/latest/monitoring.html#metrics">Spark metrics system based on the Dropwizard metrics library</a> provides the data source to build a <a data-original-attrs="{&quot;data-original-href&quot;:&quot;https://db-blog.web.cern.ch/blog/luca-canali/2019-02-performance-dashboard-apache-spark&quot;}" href="https://db-blog.web.cern.ch/blog/luca-canali/2019-02-performance-dashboard-apache-spark">Spark performance dashboard</a>. A dashboard naturally leads to time series visualization of Spark performance and workload metrics. Spark 3.0 instrumentation (<a data-original-attrs="{&quot;data-original-href&quot;:&quot;https://issues.apache.org/jira/browse/SPARK-27189&quot;}" href="https://issues.apache.org/jira/browse/SPARK-27189">SPARK-27189</a>) hooks to the executor metrics data source and makes available the time series data with the evolution of memory usage. </div> <div>Some of the advantages of collecting metrics values and visualizing them with Grafana are:</div> <div> <ul data-original-attrs="{&quot;style&quot;:&quot;&quot;}" style="text-align:left"><li>The possibility to see the evolution of the metrics values in real time and to compare them with other key metrics of the workload. </li> <li>Metrics can be examined as aggregated values or drilled down at the executor level. This allows you to understand if there are outliers or stragglers.</li> <li>It is possible to study the evolution of the metrics values with time and understand which part of the workload has generated certain spikes in a given metric, for example. It is also possible to annotate the dashboard graphs, as explained <a data-original-attrs="{&quot;data-original-href&quot;:&quot;https://db-blog.web.cern.ch/blog/luca-canali/2019-02-performance-dashboard-apache-spark&quot;}" href="https://db-blog.web.cern.ch/blog/luca-canali/2019-02-performance-dashboard-apache-spark">at this link</a>, with details of query id, job id, and stage id.</li> </ul></div> <div>Here are a few examples of dashboard graphs related to memory usage:</div> <div> </div> <div class="separator" data-original-attrs="{&quot;style&quot;:&quot;&quot;}" style="text-align:center"><span style="clear:both"><a data-original-attrs="{&quot;data-original-href&quot;:&quot;https://1.bp.blogspot.com/-sI6y0ioIvMc/X0PTt0p-9gI/AAAAAAAAFac/DjmaKgVU5_MPp6SHX6Jz9Mz_SvV92CKUgCLcBGAsYHQ/s2048/Figure2_merged.png&quot;,&quot;style&quot;:&quot;&quot;}" href="https://1.bp.blogspot.com/-sI6y0ioIvMc/X0PTt0p-9gI/AAAAAAAAFac/DjmaKgVU5_MPp6SHX6Jz9Mz_SvV92CKUgCLcBGAsYHQ/s2048/Figure2_merged.png" style="margin-left:13px; margin-right:13px"><img alt="" data-entity-type="" data-entity-uuid="" data-original-height="1382" data-original-width="2048" src="https://1.bp.blogspot.com/-sI6y0ioIvMc/X0PTt0p-9gI/AAAAAAAAFac/DjmaKgVU5_MPp6SHX6Jz9Mz_SvV92CKUgCLcBGAsYHQ/s2048/Figure2_merged.png" style="width:1024px" /></a></span></div> <div><b>Figure 2:</b> Graphs of memory-related  metrics collected and visualized using a <a data-original-attrs="{&quot;data-original-href&quot;:&quot;https://github.com/cerndb/spark-dashboard&quot;}" href="https://github.com/cerndb/spark-dashboard">Spark performance dashboard</a>. Metrics reported in the figure are: Java heap memory, RSS memory, Execution memory, and Storage memory. The Grafana dashboard allows us to drill down on the metrics values per executor. These types of plots can be used to study the time evolution of key metrics.</div> <div>  </div> <div> <h4 data-original-attrs="{&quot;style&quot;:&quot;&quot;}" style="text-align:left"><b><font data-keep-original-tag="true"><font color="#ff0000">What if you are using Spark 2.x?</font></font></b></h4> <div>Some monitoring features related to memory usage are already available in Spark 2.x and still useful in Spark 3.0:</div> <div> <ul><li>Task metrics are available in the REST API and in the dropwizard-based metrics and provide information: <ul><li>Garbage Collection time: when garbage collection takes a significant amount of time typically you want to investigate for the need for allocating more memory (or reducing memory usage).</li> <li>Shuffle-related metrics: memory can prevent some shuffle operations with I/O to storage and be beneficial for performance.</li> <li>Task peak execution memory metric.</li> </ul></li> <li>The WebUI reports storage memory usage per executor.</li> <li>Spark dropwizard-based metrics system provides a JVM source with memory-related utilization metrics.</li> </ul></div> </div> <div> <h3 data-original-attrs="{&quot;style&quot;:&quot;&quot;}" style="text-align:left"><b><font data-keep-original-tag="true"><font color="#ff0000">  </font></font></b></h3> <h3 data-original-attrs="{&quot;style&quot;:&quot;&quot;}" style="text-align:left"><b><font data-keep-original-tag="true"><font color="#ff0000">Lab configuration:</font></font></b></h3> <div>When experimenting and trying to get a grasp for the many parameters related to memory and monitoring, I found it useful to set up a small test workload. Some notes on the setup I used:</div> <div> </div> <div> <ul data-original-attrs="{&quot;style&quot;:&quot;&quot;}" style="text-align:left"><li>Tested using Spark 3.0 on YARN and Kubernetes.  </li> <li>Spark performance dashboard: configuration and installation instruction for the <a data-original-attrs="{&quot;data-original-href&quot;:&quot;https://github.com/cerndb/spark-dashboard&quot;}" href="https://github.com/cerndb/spark-dashboard">Spark dashboard at this link</a>.  </li> <li>Workload generator: TPCDS benchmark for Spark, with a small <a data-original-attrs="{&quot;data-original-href&quot;:&quot;https://github.com/LucaCanali/spark-sql-perf/tree/runOnSpark3&quot;}" href="https://github.com/LucaCanali/spark-sql-perf/tree/runOnSpark3">modification to run on Spark 3.0</a>.  Example:</li> </ul></div> </div> <div> </div> <div><font data-keep-original-tag="false"><font data-original-attrs="{&quot;style&quot;:&quot;&quot;}"><font face="courier">bin/spark-shell --master yarn --num-executors 16 --executor-cores 8 \</font></font></font></div> <div><font data-keep-original-tag="false"><font data-original-attrs="{&quot;style&quot;:&quot;&quot;}"><font face="courier">--driver-memory 4g --executor-memory 32g \</font></font></font></div> <div><font data-keep-original-tag="false"><font data-original-attrs="{&quot;style&quot;:&quot;&quot;}"><font face="courier">--jars /home/luca/spark-sql-perf/target/scala-2.12/spark-sql-perf_2.12-0.5.1-SNAPSHOT.jar \</font></font></font></div> <div><font data-keep-original-tag="false"><font data-original-attrs="{&quot;style&quot;:&quot;&quot;}"><font face="courier">--conf spark.eventLog.enabled=false \</font></font></font></div> <div><font data-keep-original-tag="false"><font data-original-attrs="{&quot;style&quot;:&quot;&quot;}"><font face="courier">--conf spark.sql.shuffle.partitions=512 \</font></font></font></div> <div><font data-keep-original-tag="false"><font data-original-attrs="{&quot;style&quot;:&quot;&quot;}"><font face="courier">--conf spark.sql.autoBroadcastJoinThreshold=100000000 \</font></font></font></div> <div><font data-keep-original-tag="false"><font data-original-attrs="{&quot;style&quot;:&quot;&quot;}"><font face="courier">--conf spark.executor.processTreeMetrics.enabled=true</font></font></font></div> <div> <div><font data-keep-original-tag="true"><font face="courier">  </font></font></div> </div> <div><font data-keep-original-tag="true"><font face="courier">  </font></font></div> <div> <div><font data-keep-original-tag="true"><font face="courier">import com.databricks.spark.sql.perf.tpcds.TPCDSTables</font></font></div> <div><font data-keep-original-tag="true"><font face="courier">val tables = new TPCDSTables(spark.sqlContext, "/home/luca/tpcds-kit/tools","1500")</font></font></div> <div><font data-keep-original-tag="true"><font face="courier">tables.createTemporaryTables("/project/spark/TPCDS/tpcds_1500_parquet_1.10.1", "parquet")</font></font></div> <div><font data-keep-original-tag="true"><font face="courier">val tpcds = new com.databricks.spark.sql.perf.tpcds.TPCDS(spark.sqlContext)</font></font></div> <div><font data-keep-original-tag="true"><font face="courier">val experiment = tpcds.runExperiment(tpcds.tpcds2_4Queries)</font></font></div> </div> <div> </div> <div>  </div> <div> <h3 data-original-attrs="{&quot;style&quot;:&quot;&quot;}" style="text-align:left"><b><font data-keep-original-tag="true"><font color="#ff0000">Limitations and caveats</font></font></b></h3> <div> <ul data-original-attrs="{&quot;style&quot;:&quot;&quot;}" style="text-align:left"><li>Spark metrics and instrumentation are still an area in active development. There is room for improvement both in their implementation and documentation. I found that some of the metrics may be difficult to understand or may present what looks like strange behaviors in some circumstances. In general, more testing and sharing experience between Spark users may be highly beneficial for further improving Spark instrumentation.</li> <li>The tools and methods discussed here are based on metrics, they are reactive by nature, suitable for troubleshooting and iterative experimentation.</li> <li>This post is centered on  describing Spark 3.0 new features for memory monitoring and how you can experiment with them. A key piece left for future work is to show some real-world examples of troubleshooting using memory metrics and instrumentation.</li> <li>For the scope of this post, we assume that the workload to troubleshoot is a black box and that we just want to try to optimize the memory allocation  and use. This post does not cover techniques to improve the memory footprint of Spark jobs, however, they are very important for correctly using Spark. Examples of techniques that are useful in this area are: implementing the correct partitioning scheme for the data and operations, reducing partition skew, using the appropriate join mechanisms, streamlining caching, and many others, covered elsewhere. </li> </ul><p>  </p></div> </div> <h3 data-original-attrs="{&quot;style&quot;:&quot;&quot;}" style="text-align:left"><b><font data-keep-original-tag="true"><font color="#ff0000">References</font></font></b></h3> <div>Talks:</div> <div> <ul data-original-attrs="{&quot;style&quot;:&quot;&quot;}" style="text-align:left"><li><a data-original-attrs="{&quot;data-original-href&quot;:&quot;https://databricks.com/session/metrics-driven-tuning-of-apache-spark-at-scale&quot;}" href="https://databricks.com/session/metrics-driven-tuning-of-apache-spark-at-scale">Metrics-Driven Tuning of Apache Spark at Scale</a>, Spark Summit 2018.</li> <li><a data-original-attrs="{&quot;data-original-href&quot;:&quot;https://databricks.com/session_eu19/performance-troubleshooting-using-apache-spark-metrics&quot;}" href="https://databricks.com/session_eu19/performance-troubleshooting-using-apache-spark-metrics">Performance Troubleshooting Using Apache Spark Metrics</a>, Spark Summit 2019</li> <li><a data-original-attrs="{&quot;data-original-href&quot;:&quot;https://databricks.com/session/tuning-apache-spark-for-large-scale-workloads&quot;}" href="https://databricks.com/session/tuning-apache-spark-for-large-scale-workloads">Tuning Apache Spark for Large-Scale Workloads</a>, Spark summit 2017</li> <li><a data-original-attrs="{&quot;data-original-href&quot;:&quot;https://databricks.com/session/deep-dive-apache-spark-memory-management&quot;}" href="https://databricks.com/session/deep-dive-apache-spark-memory-management">Deep Dive: Apache Spark Memory Management</a>, Spark Summit 2016.</li> <li><a data-original-attrs="{&quot;data-original-href&quot;:&quot;https://databricks.com/session/understanding-memory-management-in-spark-for-fun-and-profit&quot;}" href="https://databricks.com/session/understanding-memory-management-in-spark-for-fun-and-profit">Understanding Memory Management In Spark For Fun And Profit</a>, Spark Summit 2016.</li> </ul></div> <div>Spark documentation and blogs:</div> <div> <ul data-original-attrs="{&quot;style&quot;:&quot;&quot;}" style="text-align:left"><li>Monitoring guide: <a data-original-attrs="{&quot;data-original-href&quot;:&quot;https://spark.apache.org/docs/latest/monitoring.html#rest-api&quot;}" href="https://spark.apache.org/docs/latest/monitoring.html#rest-api">REST API</a>, <a data-original-attrs="{&quot;data-original-href&quot;:&quot;https://spark.apache.org/docs/latest/monitoring.html#executor-task-metrics&quot;}" href="https://spark.apache.org/docs/latest/monitoring.html#executor-task-metrics">Executor Task Metrics</a>. <a data-original-attrs="{&quot;data-original-href&quot;:&quot;https://spark.apache.org/docs/latest/monitoring.html#executor-metrics&quot;}" href="https://spark.apache.org/docs/latest/monitoring.html#executor-metrics">Executor Metrics</a>, <a data-original-attrs="{&quot;data-original-href&quot;:&quot;https://spark.apache.org/docs/latest/monitoring.html#metrics&quot;}" href="https://spark.apache.org/docs/latest/monitoring.html#metrics">Spark Metrics System</a></li> <li>Tuning guide: <a data-original-attrs="{&quot;data-original-href&quot;:&quot;https://spark.apache.org/docs/latest/tuning.html#memory-management-overview&quot;}" href="https://spark.apache.org/docs/latest/tuning.html#memory-management-overview">Memory Management Overview</a></li> <li><a data-original-attrs="{&quot;data-original-href&quot;:&quot;https://www.waitingforcode.com/apache-spark/apache-spark-off-heap-memory/read&quot;}" href="https://www.waitingforcode.com/apache-spark/apache-spark-off-heap-memory/read">Apache Spark and off-heap memory</a></li> <li><a data-original-attrs="{&quot;data-original-href&quot;:&quot;https://db-blog.web.cern.ch/blog/luca-canali/2019-02-performance-dashboard-apache-spark&quot;}" href="https://db-blog.web.cern.ch/blog/luca-canali/2019-02-performance-dashboard-apache-spark">A Performance Dashboard for Apache Spark</a></li> </ul></div> <div> </div> <div> <div>JIRAs:  <a data-original-attrs="{&quot;data-original-href&quot;:&quot;https://issues.apache.org/jira/browse/SPARK-23206&quot;}" href="https://issues.apache.org/jira/browse/SPARK-23206">SPARK-23206</a>, <a data-original-attrs="{&quot;data-original-href&quot;:&quot;https://issues.apache.org/jira/browse/SPARK-23429&quot;}" href="https://issues.apache.org/jira/browse/SPARK-23429">SPARK-23429</a> and <a data-original-attrs="{&quot;data-original-href&quot;:&quot;https://issues.apache.org/jira/browse/SPARK-27189&quot;}" href="https://issues.apache.org/jira/browse/SPARK-27189">SPARK-27189</a> contain most of the details of the improvements in Apache Spark discussed here.</div> <div> </div> </div> <div>Spark code: <a data-original-attrs="{&quot;data-original-href&quot;:&quot;https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/memory/package.scala&quot;}" href="https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/memory/package.scala">Spark Memory Manager</a>, <a data-original-attrs="{&quot;data-original-href&quot;:&quot;https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/memory/UnifiedMemoryManager.scala&quot;}" href="https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/memory/UnifiedMemoryManager.scala">Unified memory</a></div> <div> </div> <h3 data-original-attrs="{&quot;style&quot;:&quot;&quot;}" style="text-align:left"><b><font data-keep-original-tag="true"><font color="#ff0000">Conclusions and acknowledgments</font></font></b></h3> <div>It is important to correctly size memory configurations for Spark applications. This improves performance, stability, and resource utilization in multi-tenant environments. Spark 3.0 has important improvements to memory monitoring instrumentation. The analysis of peak memory usage, and of memory use broken down by area and plotted as a function of time, provide important insights for troubleshooting OOM errors and for Spark job memory sizing.   </div> <div>Many thanks to the Apache Spark community, and in particular the committers and reviewers who have helped with the improvements in <a data-original-attrs="{&quot;data-original-href&quot;:&quot;https://issues.apache.org/jira/browse/SPARK-27189&quot;}" href="https://issues.apache.org/jira/browse/SPARK-27189">SPARK-27189</a>.</div> <div>This work has been developed in the context of the data analytics services at CERN, many thanks to my colleagues for help and suggestions.  </div> <div>  </div> <div> </div> </div> </div> <span><a title="View user profile." href="/users/luca-canali" lang="" about="/users/luca-canali" typeof="schema:Person" property="schema:name" datatype="">canali</a></span> <span>Fri, 08/28/2020 - 09:51</span> <div class="field field--name-field-tags field--type-entity-reference field--label-above"> <div class="field--label"><b>Tags</b></div> <div class="field--items"> <div class="field--item"><a href="/tags/apache-spark" hreflang="en">Apache Spark</a></div> <div class="field--item"><a href="/tags/performance" hreflang="en">Performance</a></div> <div class="field--item"><a href="/tags/troubleshooting" hreflang="en">Troubleshooting</a></div> <div class="field--item"><a href="/tags/tools" hreflang="en">Tools</a></div> </div> </div> <section> <h2>Add new comment</h2> <drupal-render-placeholder callback="comment.lazy_builders:renderForm" arguments="0=node&amp;1=183&amp;2=comment_node_blog_post&amp;3=comment_node_blog_post" token="zCHE5TcznKKDxkYN-jcFemXU1C_i3TmqKRdAP_u281E"></drupal-render-placeholder> </section> Fri, 28 Aug 2020 07:51:14 +0000 canali 183 at https://db-blog.web.cern.ch https://db-blog.web.cern.ch/blog/luca-canali/2020-08-spark3-memory-monitoring#comments Hardening Apache ZooKeeper security using zkpolicy https://db-blog.web.cern.ch/blog/emil-kleszcz/2020-07-hardening-apache-zookeeper-security-using-zkpolicy <span>Hardening Apache ZooKeeper security using zkpolicy</span> <div class="field field--name-body field--type-text-with-summary field--label-above"> <div class="field--label"><b>Blog article:</b></div> <div class="field--item"><p><span style="font-size:11pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#000000"><span style="font-weight:400"><span style="font-style:normal"><span style="text-decoration:none">Apache ZooKeeper is an open-source server which enables highly reliable distributed coordination. Distributed applications can use it to maintain configuration information, implement naming, provide synchronization and group services. There are numerous applications using ZooKeeper such as Hbase, Kafka, YARN, HDFS and Spark.</span></span></span></span></span></span></p> <p style="line-height:1.38; text-align:justify; margin-bottom:16px"><span style="font-size:11pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#000000"><span style="font-weight:400"><span style="font-style:normal"><span style="text-decoration:none">Metadata is stored in data objects named znodes. Services, by accessing ZooKeeper, create znode trees to save metadata and coordination information. </span></span></span></span></span></span></p> <h2 style="line-height:1.38; text-align:justify; margin-top:24px; margin-bottom:16px"><span style="font-size:16pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#000000"><span style="font-weight:400"><span style="font-style:normal"><span style="text-decoration:none">Controlling access to ZooKeeper information</span></span></span></span></span></span></h2> <p style="line-height:1.38; text-align:justify; margin-bottom:16px"><span style="font-size:11pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#000000"><span style="font-weight:400"><span style="font-style:normal"><span style="text-decoration:none">ZooKeeper uses access control lists (ACLs) to control access to znodes. The ACL implementation is quite similar to UNIX file access permissions: it employs permission bits to allow/disallow various operations against a node and the scope to which the bits apply. An ACL only applies to a specific znode and not to its children.</span></span></span></span></span></span></p> <p style="line-height:1.38; text-align:justify; margin-bottom:16px"><span style="font-size:11pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#000000"><span style="font-weight:400"><span style="font-style:normal"><span style="text-decoration:none">ACLs are comprised of elements of the following format: "</span></span></span></span></span></span><span style="font-size:11pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#000000"><span style="font-weight:400"><span style="font-style:normal"><span style="text-decoration:none">id:permissions"</span></span></span></span></span></span></p> <p style="line-height:1.38; text-align:justify; margin-bottom:16px"><span style="font-size:11pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#000000"><span style="font-weight:400"><span style="font-style:normal"><span style="text-decoration:none">Where id defines a client that is authenticated to the server using one of the supported pluggable authentication schemes [1] and permissions defines the permission bits expression combinations of the supported operations (CREATE, READ, DELETE, WRITE, ADMIN). </span></span></span></span></span></span></p> <p style="line-height:1.38; text-align:justify; margin-bottom:16px"><span style="font-size:11pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#000000"><span style="font-weight:400"><span style="font-style:normal"><span style="text-decoration:none">ACLs are set on creation of each znode and can be later altered. It is not safe to rely on services to correctly set ACLs in order to achieve maximum security. The fact that multiple services use the same ZooKeeper znode tree to store information raises security concerns.</span></span></span></span></span></span></p> <h2 style="line-height:1.38; text-align:justify; margin-top:24px; margin-bottom:16px"><span style="font-size:16pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#000000"><span style="font-weight:400"><span style="font-style:normal"><span style="text-decoration:none">ZooKeeper Security concerns</span></span></span></span></span></span></h2> <h3 style="line-height:1.38; text-align:justify; margin-top:21px; margin-bottom:5px"><span style="font-size:13.999999999999998pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#434343"><span style="font-weight:400"><span style="font-style:normal"><span style="text-decoration:none">Anonymous clients</span></span></span></span></span></span></h3> <p style="line-height:1.38; text-align:justify"><span style="font-size:11pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#000000"><span style="font-weight:400"><span style="font-style:normal"><span style="text-decoration:none">Anonymous client connections are in many cases required, for example when ZooKeeper is used as a discovery mechanism for services. Znodes without proper access permissions can be altered by anonymous users.</span></span></span></span></span></span></p> <p style="line-height:1.38; text-align:justify"><span style="font-size:11pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#000000"><span style="font-weight:400"><span style="font-style:normal"><span style="text-decoration:none">Even when anonymous connections are not required, it is not as straightforward to reject them. Before ZooKeeper 3.6.0 (released at 04/03/2020), security guides and sources proposed the </span></span></span></span></span></span><span style="font-size:11pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#000000"><span style="font-weight:400"><span style="font-style:italic"><span style="text-decoration:none">requireClientAuthScheme </span></span></span></span></span></span><span style="font-size:11pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#000000"><span style="font-weight:400"><span style="font-style:normal"><span style="text-decoration:none">as a solution to reject anonymous clients. However, this property was just a patch addition and never made it to the upstream codebase [2][3]. After 3.6.0, </span></span></span></span></span></span><span style="font-size:11pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#000000"><span style="font-weight:400"><span style="font-style:italic"><span style="text-decoration:none">sessionRequireClientSASLAuth</span></span></span></span></span></span><span style="font-size:11pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#000000"><span style="font-weight:400"><span style="font-style:normal"><span style="text-decoration:none"> can be used to accept connections and requests only from clients that have authenticated with the server via SASL.</span></span></span></span></span></span></p> <p style="line-height:1.38; text-align:justify"><span style="font-size:11pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#000000"><span style="font-weight:400"><span style="font-style:normal"><span style="text-decoration:none">By default, services such as Hbase and YARN do not set ACLs for their znodes. This means that they use the </span></span></span></span></span></span><span style="font-size:11pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#000000"><span style="font-weight:400"><span style="font-style:italic"><span style="text-decoration:none">world:anyone:cdrwa</span></span></span></span></span></span><span style="font-size:11pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#000000"><span style="font-weight:400"><span style="font-style:normal"><span style="text-decoration:none"> scheme. Anonymous users can change data, delete or get ownership of znodes, accessing internal service information or even blocking services from accessing znodes.</span></span></span></span></span></span></p> <p style="line-height:1.38; text-align:justify"><span style="font-size:11pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#000000"><span style="font-weight:700"><span style="font-style:italic"><span style="text-decoration:none">Problematic scenario example</span></span></span></span></span></span><span style="font-size:11pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#000000"><span style="font-weight:400"><span style="font-style:normal"><span style="text-decoration:none">: An anonymous user connects to the ZooKeeper ensemble, and changes YARN leader election znode (</span></span></span></span></span></span><span style="font-size:11pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#000000"><span style="font-weight:400"><span style="font-style:italic"><span style="text-decoration:none">/yarn-leader-election</span></span></span></span></span></span><span style="font-size:11pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#000000"><span style="font-weight:400"><span style="font-style:normal"><span style="text-decoration:none">) permissions so as to be accessible only by her, using the setACL command. YARN will not be able to elect a leader or set up a new node because the znode path will not be accessible for READ and CREATE.</span></span></span></span></span></span></p> <h3 style="line-height:1.38; text-align:justify; margin-top:21px; margin-bottom:5px"><span style="font-size:13.999999999999998pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#434343"><span style="font-weight:400"><span style="font-style:normal"><span style="text-decoration:none">Service access to ZooKeeper ensemble</span></span></span></span></span></span></h3> <p style="line-height:1.38; text-align:justify"><span style="font-size:11pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#000000"><span style="font-weight:400"><span style="font-style:normal"><span style="text-decoration:none">Services operate on ZooKeeper using its exported API [4]. In fact, they connect and operate exactly as clients do when using the command line interface. This introduces the aforementioned issue. It cannot be assured that services only expose znode operations on specific znodes, thus a service can act as an intermediate for changing other service’s znodes by an end user, providing another way to cause issues on them or to extract znode data.</span></span></span></span></span></span></p> <h3 style="line-height:1.38; text-align:justify; margin-top:21px; margin-bottom:5px"><span style="font-size:13.999999999999998pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#434343"><span style="font-weight:400"><span style="font-style:normal"><span style="text-decoration:none">Need for ACL policy auditing and enforcing</span></span></span></span></span></span></h3> <p><span style="font-size:11pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#000000"><span style="font-weight:400"><span style="font-style:normal"><span style="text-decoration:none">The previous issues signify the need for monitoring of the ACL policies that each service defines. We should also have a way to intervene and enforce secure ACL policies for each service. This is the motivation for the development of </span></span></span></span></span></span><span style="font-size:11pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#000000"><span style="font-weight:400"><span style="font-style:italic"><span style="text-decoration:none">zkpolicy</span></span></span></span></span></span><span style="font-size:11pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#000000"><span style="font-weight:400"><span style="font-style:normal"><span style="text-decoration:none">, a tool to audit and enforce ACL policies on ZooKeeper.</span></span></span></span></span></span></p> <h2 style="line-height:1.38; text-align:justify; margin-top:24px; margin-bottom:8px"><span style="font-size:16pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#000000"><span style="font-weight:400"><span style="font-style:normal"><span style="text-decoration:none">ZooKeeper Policy auditing tool - zkpolicy</span></span></span></span></span></span></h2> <h3 style="line-height:1.38; text-align:justify; margin-top:21px; margin-bottom:5px"><span style="font-size:13.999999999999998pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#434343"><span style="font-weight:400"><span style="font-style:normal"><span style="text-decoration:none">Features</span></span></span></span></span></span></h3> <p style="line-height:1.38"><span style="font-size:11pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#000000"><span style="font-weight:400"><span style="font-style:normal"><span style="text-decoration:none">zkpolicy is a tool that can be added in the arsenal of security and monitoring teams, by providing, inter alia, the following features:</span></span></span></span></span></span></p> <ul><li style="list-style-type:disc"><span style="font-size:11pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#000000"><span style="font-weight:400"><span style="font-style:normal"><span style="text-decoration:none">Querying the znode tree for nodes with specific characteristics (e.g znodes accessible by a specific client or completely open znodes).</span></span></span></span></span></span></li> <li style="list-style-type:disc"><span style="font-size:11pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#000000"><span style="font-weight:400"><span style="font-style:normal"><span style="text-decoration:none">Definition of policies for widely used services as well as custom definitions using YAML files.</span></span></span></span></span></span></li> <li style="list-style-type:disc"><span style="font-size:11pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#000000"><span style="font-weight:400"><span style="font-style:normal"><span style="text-decoration:none">Test execution for ACL policy compliance.</span></span></span></span></span></span></li> <li style="list-style-type:disc"><span style="font-size:11pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#000000"><span style="font-weight:400"><span style="font-style:normal"><span style="text-decoration:none">Generation of audit reports with results from multiple tests and queries as well as general information about the ensemble (e.g. complete list of ACLs for the cluster znodes, Four Letter Word commands enabled [5]).</span></span></span></span></span></span></li> <li style="list-style-type:disc"><span style="font-size:11pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#000000"><span style="font-weight:400"><span style="font-style:normal"><span style="text-decoration:none">Built in policies for various services, as defined by Cloudera/Hortonworks Best Practices [6].</span></span></span></span></span></span></li> <li style="list-style-type:disc"><span style="font-size:11pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#000000"><span style="font-weight:400"><span style="font-style:normal"><span style="text-decoration:none">Enforcing ACL policies on znode subtrees.</span></span></span></span></span></span></li> </ul><p style="line-height:1.38"><span style="font-size:11pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#000000"><span style="font-weight:400"><span style="font-style:normal"><span style="text-decoration:none">The tool is implemented in Java and these features are available either from a command line interface or as a Maven dependency at the Central repository. Authentication with ZooKeeper is done using SASL, leveraging the JAAS Krb5LoginModule.</span></span></span></span></span></span></p> <p style="line-height:1.38"><span style="font-size:11pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#000000"><span style="font-weight:400"><span style="font-style:normal"><span style="text-decoration:none">The tool is open sourced and available in </span></span></span></span></span></span><a href="https://github.com/cerndb/zkpolicy" style="text-decoration:none"><span style="font-size:11pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#1155cc"><span style="font-weight:400"><span style="font-style:normal"><span style="text-decoration:underline"><span style="-webkit-text-decoration-skip:none"><span style="text-decoration-skip-ink:none">https://github.com/cerndb/zkpolicy</span></span></span></span></span></span></span></span></a><span style="font-size:11pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#000000"><span style="font-weight:400"><span style="font-style:normal"><span style="text-decoration:none">.</span></span></span></span></span></span></p> <h3 style="line-height:1.38; margin-top:21px; margin-bottom:5px"><span style="font-size:13.999999999999998pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#434343"><span style="font-weight:400"><span style="font-style:normal"><span style="text-decoration:none">Usage scenarios - examples</span></span></span></span></span></span></h3> <h4 style="line-height:1.38; margin-top:19px; margin-bottom:5px"><span style="font-size:12pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#666666"><span style="font-weight:400"><span style="font-style:normal"><span style="text-decoration:none">Get the list of znodes with no ACL restrictions</span></span></span></span></span></span></h4> <p style="line-height:1.38"><span style="font-size:11pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#000000"><span style="font-weight:400"><span style="font-style:normal"><span style="text-decoration:none">zkpolicy query noACL --root-path / --list</span></span></span></span></span></span></p> <h4 style="line-height:1.38; margin-top:19px; margin-bottom:5px"><span style="font-size:12pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#666666"><span style="font-weight:400"><span style="font-style:normal"><span style="text-decoration:none">Get the list of znodes that are accessible by a certain SASL authenticated client  </span></span></span></span></span></span></h4> <p style="line-height:1.38"><span style="font-size:11pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#000000"><span style="font-weight:400"><span style="font-style:normal"><span style="text-decoration:none">zkpolicy query regexMatchACL --root-path /  --args sasl:user1:.* --list</span></span></span></span></span></span></p> <h4 style="line-height:1.38; margin-top:19px; margin-bottom:5px"><span style="font-size:12pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#666666"><span style="font-weight:400"><span style="font-style:normal"><span style="text-decoration:none">Generate an audit report   </span></span></span></span></span></span></h4> <p style="line-height:1.38"><span style="font-size:11pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#000000"><span style="font-weight:400"><span style="font-style:normal"><span style="text-decoration:none">Generating audit reports requires passing the appropriate audit configuration file (more information on how to structure such a file can be found in zkpolicy configuration documentation) and executing the command below:</span></span></span></span></span></span></p> <p style="line-height:1.38"><em><span style="font-size:11pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#000000"><span style="font-weight:400"><span style="font-style:normal"><span style="text-decoration:none">zkpolicy audit --input &lt;audit_config&gt; --output report.out</span></span></span></span></span></span></em></p> <p style="line-height:1.38"><span style="font-size:11pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#000000"><span style="font-weight:400"><span style="font-style:normal"><span style="text-decoration:none">An audit report has the following format:</span></span></span></span></span></span></p> <p style="line-height:1.38"><span style="font-size:11pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#000000"><span style="font-weight:400"><span style="font-style:normal"><span style="text-decoration:none"><span style="border:none"><span style="display:inline-block"><span style="overflow:hidden"><span style="width:484px"><span style="height:484px"><img alt="" data-entity-type="" data-entity-uuid="" height="484" src="https://lh5.googleusercontent.com/yClIHj64t3DT9o99jPBIEeUhPqOtPFtFRFDJw4ObUc5vXV8LrzIDrOhNZNm1p_N_3V9ja9DyPtnP8vdgSZ2ciMCfSGrzd2iqlc5fXmqHVKdvtiANgIIxLKn4-hQxd_4-nnsNFWS5" width="484" /></span></span></span></span></span></span></span></span></span></span></span></p> <p style="line-height:1.38; text-align:justify"><span style="font-size:11pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#000000"><span style="font-weight:400"><span style="font-style:normal"><span style="text-decoration:none">This report provides valuable information for the security state of ZooKeeper and can be even handed to security experts without providing direct access to ZooKeeper, by including the complete list of ACL definitions for the ensemble.</span></span></span></span></span></span></p> <h4 style="line-height:1.38; margin-top:19px; margin-bottom:5px"><strong><span style="font-size:12pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#666666"><span style="font-weight:400"><span style="font-style:normal"><span style="text-decoration:none">Enforce a service policy   </span></span></span></span></span></span></strong></h4> <p style="line-height:1.38"><span style="font-size:11pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#000000"><span style="font-weight:400"><span style="font-style:normal"><span style="text-decoration:none">The audit report may point out a service that is not following secure policies. In that case, we can patch this vulnerability by enforcing the correct policy, using the following command:</span></span></span></span></span></span></p> <p style="line-height:1.38"><span style="font-size:11pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#000000"><span style="font-weight:400"><span style="font-style:normal"><span style="text-decoration:none">zkpolicy enforce --service-policy &lt;service_name&gt;</span></span></span></span></span></span></p> <h4 style="line-height:1.38; margin-top:19px; margin-bottom:5px"><span style="font-size:12pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#666666"><span style="font-weight:400"><span style="font-style:normal"><span style="text-decoration:none">Enforce a custom, user defined policy   </span></span></span></span></span></span></h4> <p style="line-height:1.38"><span style="font-size:11pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#000000"><span style="font-weight:400"><span style="font-style:normal"><span style="text-decoration:none">Policies may not always be relevant to a service but to general znode paths, like the root or the quote one. Enforcing custom defined policies</span></span></span></span></span></span></p> <p style="line-height:1.38"><span style="font-size:11pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#000000"><span style="font-weight:400"><span style="font-style:normal"><span style="text-decoration:none">zkpolicy enforce --input &lt;policy_definition_file&gt;</span></span></span></span></span></span></p> <p style="line-height:1.38"><span style="font-size:11pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#000000"><span style="font-weight:400"><span style="font-style:normal"><span style="text-decoration:none">It is advised to first execute policy enforcements in dry run mode, so as to get the list of the znodes to be affected without actually altering their ACLs. This can be done by adding the `--dry-run` option to the previous command.</span></span></span></span></span></span></p> <h4 style="line-height:1.38; margin-top:19px; margin-bottom:5px"><strong><span style="font-size:12pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#666666"><span style="font-weight:400"><span style="font-style:normal"><span style="text-decoration:none">Rolling back unwanted enforce</span></span></span></span></span></span></strong></h4> <p style="line-height:1.38; text-align:justify"><span style="font-size:11pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#000000"><span style="font-weight:400"><span style="font-style:normal"><span style="text-decoration:none">It is possible to enforce an incorrectly defined policy and for that reason the tool provides rollback functionality. Before every enforcement operation, a snapshot of the ACL state of the nodes to be affected is taken. Snapshots are saved in /opt/zkpolicy/rollback/ directory and can be later used to rollback by issuing the following command:</span></span></span></span></span></span></p> <p style="line-height:1.38; text-align:justify"><span style="font-size:11pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#000000"><span style="font-weight:400"><span style="font-style:normal"><span style="text-decoration:none">zkpolicy rollback --input &lt;rollback_file&gt; </span></span></span></span></span></span></p> <h3 style="line-height:1.38; text-align:justify; margin-top:21px; margin-bottom:5px"><span style="font-size:13.999999999999998pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#434343"><span style="font-weight:400"><span style="font-style:normal"><span style="text-decoration:none">Outcome</span></span></span></span></span></span></h3> <p style="line-height:1.38"><span style="font-size:11pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#000000"><span style="font-weight:400"><span style="font-style:normal"><span style="text-decoration:none">Using zkpolicy, we audited the development cluster and managed to spot parts of the configuration that should be altered for hardening ZooKeeper security. We have thoroughly tested the policies on our Hadoop/Yarn/HBase clusters and continuing tests for the ZooKeeper on Kafka. Thanks to the tool and the tests we have done the following actions for strengthening the security of the services:</span></span></span></span></span></span></p> <ul><li style="list-style-type:disc"><span style="font-size:11pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#000000"><span style="font-weight:400"><span style="font-style:normal"><span style="text-decoration:none">We migrated ZooKeeper from Cloudera 3.4.5 to Apache 3.6.0 so as to support new security features. </span></span></span></span></span></span></li> <li style="list-style-type:disc"><span style="font-size:11pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#000000"><span style="font-weight:400"><span style="font-style:normal"><span style="text-decoration:none">We hardened ACLs on HDFS and YARN and enabled SASL authentication between these services and ZooKeeper internally on our clusters.</span></span></span></span></span></span></li> <li style="list-style-type:disc"><span style="font-size:11pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#000000"><span style="font-weight:400"><span style="font-style:normal"><span style="text-decoration:none">We disabled most of the Four Letter Words that were enabled by default.</span></span></span></span></span></span></li> <li style="list-style-type:disc"><span style="font-size:11pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#000000"><span style="font-weight:400"><span style="font-style:normal"><span style="text-decoration:none">We decided to enable auth_to_local.</span></span></span></span></span></span></li> <li style="list-style-type:disc"><span style="font-size:11pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#000000"><span style="font-weight:400"><span style="font-style:normal"><span style="text-decoration:none">We allowed only the superuser to access and alter all the znodes and by default all the users use SASL for authorization.</span></span></span></span></span></span></li> </ul><h3 style="line-height:1.38; text-align:justify; margin-top:21px; margin-bottom:5px"><span style="font-size:13.999999999999998pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#434343"><span style="font-weight:400"><span style="font-style:normal"><span style="text-decoration:none">Information</span></span></span></span></span></span></h3> <p style="line-height:1.38"><span style="font-size:11pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#000000"><span style="font-weight:400"><span style="font-style:normal"><span style="text-decoration:none">The tool was initially developed internally for IT-DB in the corresponding Gitlab repository (no public access) [7] and then open sourced by mirroring to a public Github repository [8].  </span></span></span></span></span></span></p> <p style="line-height:1.38"><span style="font-size:11pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#000000"><span style="font-weight:400"><span style="font-style:normal"><span style="text-decoration:none">More details of the tool can be found in the documentation or on the slides from the presentation given at the IT-DB technical meeting, 26-06-2020 [9]. </span></span></span></span></span></span></p> <p style="line-height:1.38"><span style="font-size:11pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#000000"><span style="font-weight:400"><span style="font-style:normal"><span style="text-decoration:none">We hope that you will find this tool useful for your Big Data services and that you will help us with the future developments of this product. Please let us know in the comments your so-far experience and share any ideas for the future of the tool.</span></span></span></span></span></span></p> <h2 style="line-height:1.38; text-align:justify; margin-top:24px; margin-bottom:16px"><span style="font-size:16pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#000000"><span style="font-weight:400"><span style="font-style:normal"><span style="text-decoration:none">References</span></span></span></span></span></span></h2> <p style="line-height: 1.38;"><span style="font-size:11pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#000000"><span style="font-weight:400"><span style="font-style:normal"><span style="text-decoration:none">[1] </span></span></span></span></span></span><a href="https://zookeeper.apache.org/doc/r3.6.1/zookeeperProgrammers.html#sc_BuiltinACLSchemes" style="text-decoration:none"><span style="font-size:11pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#1155cc"><span style="font-weight:400"><span style="font-style:normal"><span style="text-decoration:underline"><span style="-webkit-text-decoration-skip:none"><span style="text-decoration-skip-ink:none">https://zookeeper.apache.org/doc/r3.6.1/zookeeperProgrammers.html#sc_BuiltinACLSchemes</span></span></span></span></span></span></span></span></a></p> <p style="line-height: 1.38;"><span style="font-size:11pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#000000"><span style="font-weight:400"><span style="font-style:normal"><span style="text-decoration:none">[2] </span></span></span></span></span></span><a href="https://issues.apache.org/jira/browse/ZOOKEEPER-1634" style="text-decoration:none"><span style="font-size:11pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#1155cc"><span style="font-weight:400"><span style="font-style:normal"><span style="text-decoration:underline"><span style="-webkit-text-decoration-skip:none"><span style="text-decoration-skip-ink:none">https://issues.apache.org/jira/browse/ZOOKEEPER-1634</span></span></span></span></span></span></span></span></a><span style="font-size:11pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#000000"><span style="font-weight:400"><span style="font-style:normal"><span style="text-decoration:none"> </span></span></span></span></span></span></p> <p style="line-height: 1.38;"><span style="font-size:11pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#000000"><span style="font-weight:400"><span style="font-style:normal"><span style="text-decoration:none">[3] </span></span></span></span></span></span><a href="https://issues.apache.org/jira/browse/ZOOKEEPER-2526" style="text-decoration:none"><span style="font-size:11pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#1155cc"><span style="font-weight:400"><span style="font-style:normal"><span style="text-decoration:underline"><span style="-webkit-text-decoration-skip:none"><span style="text-decoration-skip-ink:none">https://issues.apache.org/jira/browse/ZOOKEEPER-2526</span></span></span></span></span></span></span></span></a><span style="font-size:11pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#000000"><span style="font-weight:400"><span style="font-style:normal"><span style="text-decoration:none"> </span></span></span></span></span></span></p> <p style="line-height: 1.38;"><span style="font-size:11pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#000000"><span style="font-weight:400"><span style="font-style:normal"><span style="text-decoration:none">[4] </span></span></span></span></span></span><a href="https://zookeeper.apache.org/doc/r3.6.1/apidocs/zookeeper-server/index.html" style="text-decoration:none"><span style="font-size:11pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#1155cc"><span style="font-weight:400"><span style="font-style:normal"><span style="text-decoration:underline"><span style="-webkit-text-decoration-skip:none"><span style="text-decoration-skip-ink:none">https://zookeeper.apache.org/doc/r3.6.1/apidocs/zookeeper-server/index.html</span></span></span></span></span></span></span></span></a></p> <p style="line-height: 1.38;"><span style="font-size:11pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#000000"><span style="font-weight:400"><span style="font-style:normal"><span style="text-decoration:none">[5] </span></span></span></span></span></span><a href="https://zookeeper.apache.org/doc/r3.6.1/zookeeperAdmin.html#sc_4lw" style="text-decoration:none"><span style="font-size:11pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#1155cc"><span style="font-weight:400"><span style="font-style:normal"><span style="text-decoration:underline"><span style="-webkit-text-decoration-skip:none"><span style="text-decoration-skip-ink:none">https://zookeeper.apache.org/doc/r3.6.1/zookeeperAdmin.html#sc_4lw</span></span></span></span></span></span></span></span></a></p> <p style="line-height: 1.38;"><span style="font-size:11pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#000000"><span style="font-weight:400"><span style="font-style:normal"><span style="text-decoration:none">[6] </span></span></span></span></span></span><a href="https://docs.cloudera.com/HDPDocuments/HDP2/HDP-2.5.5/bk_security/content/zookeeper_acls_best_practices.html" style="text-decoration:none"><span style="font-size:11pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#1155cc"><span style="font-weight:400"><span style="font-style:normal"><span style="text-decoration:underline"><span style="-webkit-text-decoration-skip:none"><span style="text-decoration-skip-ink:none">https://docs.cloudera.com/HDPDocuments/HDP2/HDP-2.5.5/bk_security/content/zookeeper_acls_best_practices.html</span></span></span></span></span></span></span></span></a></p> <p style="line-height: 1.38;"><span style="font-size:11pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#000000"><span style="font-weight:400"><span style="font-style:normal"><span style="text-decoration:none">[7] </span></span></span></span></span></span><a href="https://gitlab.cern.ch/db/zookeeper-policy-audit-tool/" style="text-decoration:none"><span style="font-size:11pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#1155cc"><span style="font-weight:400"><span style="font-style:normal"><span style="text-decoration:underline"><span style="-webkit-text-decoration-skip:none"><span style="text-decoration-skip-ink:none">https://gitlab.cern.ch/db/zookeeper-policy-audit-tool/</span></span></span></span></span></span></span></span></a></p> <p style="line-height: 1.38;"><span style="font-size:11pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#000000"><span style="font-weight:400"><span style="font-style:normal"><span style="text-decoration:none">[8] </span></span></span></span></span></span><a href="https://github.com/cerndb/zkpolicy" style="text-decoration:none"><span style="font-size:11pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#1155cc"><span style="font-weight:400"><span style="font-style:normal"><span style="text-decoration:underline"><span style="-webkit-text-decoration-skip:none"><span style="text-decoration-skip-ink:none">https://github.com/cerndb/zkpolicy</span></span></span></span></span></span></span></span></a></p> <p><span style="font-size:11pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#000000"><span style="font-weight:400"><span style="font-style:normal"><span style="text-decoration:none">[9] </span></span></span></span></span></span><a href="https://indico.cern.ch/event/875776/#2-zookeeper-policy-auditing-to" style="text-decoration:none"><span style="font-size:11pt; font-variant:normal; white-space:pre-wrap"><span style="font-family:Arial"><span style="color:#1155cc"><span style="font-weight:400"><span style="font-style:normal"><span style="text-decoration:underline"><span style="-webkit-text-decoration-skip:none"><span style="text-decoration-skip-ink:none">https://indico.cern.ch/event/875776/#2-zookeeper-policy-auditing-to</span></span></span></span></span></span></span></span></a></p> </div> </div> <span><a title="View user profile." href="/users/emil-kleszcz" lang="" about="/users/emil-kleszcz" typeof="schema:Person" property="schema:name" datatype="">ekleszcz</a></span> <span>Tue, 07/21/2020 - 15:42</span> <section> <h2>Add new comment</h2> <drupal-render-placeholder callback="comment.lazy_builders:renderForm" arguments="0=node&amp;1=182&amp;2=comment_node_blog_post&amp;3=comment_node_blog_post" token="I7YDzCJWe_7NNKeK1lIm2j_JnSSBkdXlpoGAemvW7TU"></drupal-render-placeholder> </section> Tue, 21 Jul 2020 13:42:57 +0000 ekleszcz 182 at https://db-blog.web.cern.ch ORDS - Managing APEX static images https://db-blog.web.cern.ch/blog/jakub-granieczny/2020-06-ords-managing-apex-static-images <span>ORDS - Managing APEX static images</span> <div class="field field--name-body field--type-text-with-summary field--label-above"> <div class="field--label"><b>Blog article:</b></div> <div class="field--item"><p>In today’s post, we’ll be talking about the possible ways to manage the static images/CSS/JS that come shipped with APEX, when running on ORDS. They are separate resources (not contained in the DB like some other APEX images) necessary for your APEX applications look and behave the way they’re intended to. If you and your users browse your internet using <a href="https://lynx.browser.org">lynx</a> (see image below) feel free to skip this one. Otherwise - dig in! </p> <p><img alt="" src="https://cernbox.cern.ch/index.php/apps/gallery/preview//Lynx-wikipedia.png?width=2900&amp;height=2900&amp;c=169233968762191872%3Ab7978d53&amp;requesttoken=GAcPPQcIfycLHjESbTdzOWRbABIzEXNdCUhiEmBaARc%3D%3ALvgtbQJTiGsABB9RW7EKDCX%2FQ0SqU4IZdbJIWHXnu08%3D&amp;x-access-token=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJhY2NvdW50X2lkIjoiamdyYW5pZWMiLCJncm91cHMiOltdLCJkaXNwbGF5X25hbWUiOiJKYWt1YiBHcmFuaWVjem55IChqZ3JhbmllYykifQ.nEgCdYJH06ygomhD8YUnbpZDZgXE5cQROkTs4uOwtBE" style="width: 500px; height: 363px;" /></p> <p> </p> <h1>The standard way</h1> <p>Normally, what you’d do is take the APEX images directory, pack them into a war and deploy the images in the <span style="font-family:Courier New,Courier,monospace;">/i/</span> context on the same app server that ORDS is deployed in (like it’s mentioned in the <a href="https://docs.oracle.com/en/database/oracle/oracle-rest-data-services/19.4/aelig/installing-REST-data-services.html#GUID-AAC0B618-F3E9-4415-94D4-077D6BBF1CE2">official installation guide</a>. This is pretty straightforward and probably what will be sufficient for most. However, if you have multiple databases with APEX installed in them, possibly in different versions - this is where a problem arises.</p> <h1>Problem?</h1> <p>Well, the problem is that every APEX version has its version of the images. It may happen that from one version to the other some file are removed, some are added, a piece of JavaScript is changed, so as a result, you see some missing pictures, misaligned elements or just get served with a plaintext version of your site.</p> <p>So we have to provide different versions of the images for APEX to choose from. How should we do it and how do we tell APEX which one to choose?</p> <h1>The multi-database way - using aliases</h1> <p>Let’s say you have 3 databases - <span style="font-family:Courier New,Courier,monospace;">db1</span>, <span style="font-family:Courier New,Courier,monospace;">db2</span> and <span style="font-family:Courier New,Courier,monospace;">db3</span>. <span style="font-family:Courier New,Courier,monospace;">Db1</span> and <span style="font-family:Courier New,Courier,monospace;">db2</span> running APEX 18.2  and <span style="font-family:Courier New,Courier,monospace;">db3</span> running APEX 19.2.</p> <p>In your application server include 2 folders with APEX images -  <span style="font-family:Courier New,Courier,monospace;">apex_images1820</span> and <span style="font-family:Courier New,Courier,monospace;">apex_images1920</span>. Then create 3 links - <span style="font-family:Courier New,Courier,monospace;">i_db1</span> and <span style="font-family:Courier New,Courier,monospace;">i_db2</span> pointing to <span style="font-family:Courier New,Courier,monospace;">apex_images1820</span> and <span style="font-family:Courier New,Courier,monospace;">i_db3</span> pointing to <span style="font-family:Courier New,Courier,monospace;">apex_images1920</span>.</p> <p>Then, for each of your APEX applications, you have to set the expected images’ context using the <span style="font-family:Courier New,Courier,monospace;">reset_images_prefix.sql </span><a href="https://docs.oracle.com/cd/E21611_01/doc.11/e21058/trouble.htm#AELIG7174">script shipped with APEX</a>.</p> <pre class="rteindent1"> SQL&gt; @&lt;apex directory&gt;\utilities\reset_image_prefix.sql Enter the Application Express image prefix [/i/] /i_db1/</pre><p>For <span style="font-family:Courier New,Courier,monospace;">db1</span> we set it to <span style="font-family:Courier New,Courier,monospace;">i_db1</span>, <span style="font-family:Courier New,Courier,monospace;">db2</span> -&gt; <span style="font-family:Courier New,Courier,monospace;">i_db2</span>, <span style="font-family:Courier New,Courier,monospace;">db3</span> -&gt; <span style="font-family:Courier New,Courier,monospace;">i_db3</span>.</p> <p>This way during APEX upgrade the DBAs take care of upgrading the APEX version in the database and all has to be done is to change the link in the application server. So let’s assume we’re upgrading <span style="font-family:Courier New,Courier,monospace;">db1</span>. Simply change <span style="font-family:Courier New,Courier,monospace;">i_db1</span> to link to point to <span style="font-family:Courier New,Courier,monospace;">apex_images1920</span> and you’re done.</p> <h1>The multi-database way with the help of your friendly neighborhood DBA</h1> <p>Another approach to this is to leave all the work to the DBA - instead of setting up those aliases, just deploy the images folders - <span style="font-family:Courier New,Courier,monospace;">apex_images1820</span> and <span style="font-family:Courier New,Courier,monospace;">apex_images1920</span> in our case. Then with each upgrade remember to launch the reset_image_prefix scripts and point to the right context. This may sound simpler, but one thing has to be kept in mind, to cite Joel Kallman from Oracle:</p> <blockquote><p>“You should <strong>not</strong> do this on a live system, as this process will invalidate many objects in the APEX schema and they will need to be recompiled. Again - don't do this on a live system.”.</p> </blockquote> <p>On the other hand, you’re upgrading your APEX schema anyway, so that shouldn’t be a that big of a problem.</p> <h1>Using a static file server</h1> <p>What if you have multiple application servers running ORDS (either to provide high availability or to serve different environments) and don’t want to store a multiplicity of the APEX static files? Thankfully you can point APEX to use an outside URL for requesting the files. Actually, even <a href="https://blogs.oracle.com/apex/announcing-oracle-apex-static-resources-on-oracle-content-delivery-network">Oracle started providing these resources in CDN</a>.  You can either use the URLs provided by Oracle or, if you’d rather rely on your own infrastructure, create a static file server on your own - just be sure to include the</p> <pre class="rteindent1"> Access-Control-Allow-Origin: “*”</pre><p>header to the responses, otherwise the browser will complain about <a href="https://developer.mozilla.org/en-US/docs/Web/HTTP/CORS">CORS</a> issues.</p> <p>For example if you're running Apache HTTP Server, this small snippet may come in handy:</p> <pre> Header append Access-Control-Allow-Origin * Header merge Vary "Origin" </pre><p>Use the same command as before to set the URL as image prefix. Like before, you have to take care of changing the prefix when upgrading.</p> <h1>Using the query parameter? Unfortunately not an option</h1> <p>One interesting thing that can be observed about the request made for the resource is a query parameter indicating a version, e.g.:</p> <pre class="rteindent1"> GET /i/libraries/apex/minified/chartBundle.min.js?v=18.2.0.00.12</pre><p>if a request is made on behalf of APEX 18.2. This looks very promising - we could envision a simple <a href="https://httpd.apache.org/docs/2.4/mod/mod_rewrite.html">mod_rewrite</a>  or <a href="http://tomcat.apache.org/tomcat-9.0-doc/rewrite.html">Tomcat’s adaptation</a> to request the appropriate version of the resources automagically. Until it doesn’t - unfortunately some resources are requested without passing this parameter, for example:</p> <pre class="rteindent1"> GET /i/app_ui/font/apex-5-icon-font.woff2</pre><p>If this one piece of the puzzle wasn’t missing, we would have a maintenance-free way of managing the APEX images - every APEX version would be directed to its corresponding resources.</p> </div> </div> <span><a title="View user profile." href="/users/jakub-granieczny" lang="" about="/users/jakub-granieczny" typeof="schema:Person" property="schema:name" datatype="">jgraniec</a></span> <span>Thu, 06/11/2020 - 17:22</span> <section> <h2>Add new comment</h2> <drupal-render-placeholder callback="comment.lazy_builders:renderForm" arguments="0=node&amp;1=181&amp;2=comment_node_blog_post&amp;3=comment_node_blog_post" token="ylFcCLqiB_5kJXhtNvFLk25u6-awIVPcksXNS3aiwe8"></drupal-render-placeholder> </section> Thu, 11 Jun 2020 15:22:32 +0000 jgraniec 181 at https://db-blog.web.cern.ch Creating PDFs in APEX after ORDS 19.1 https://db-blog.web.cern.ch/blog/jakub-granieczny/2020-05-creating-pdfs-apex-after-ords-191 <span>Creating PDFs in APEX after ORDS 19.1</span> <div class="field field--name-body field--type-text-with-summary field--label-above"> <div class="field--label"><b>Blog article:</b></div> <div class="field--item"><h1><strong>Creating PDFs in APEX after ORDS 19.1</strong></h1> <p>Until 19.1 ORDS provided a built-in printing engine based on Apache FOP which allowed you to download a PDF version of your reports and XLS-FO templates in a very easy manner. However in ORDS 18.4.0 release notes we could find information that this feature is deprecated and will be removed in future release. This is exactly what happened with the release of ORDS 19.2.</p> <h2><strong>So what actually happened? </strong></h2> <p>This is from Oracle’s release notes of ORDS 19.2:</p> <blockquote><p><em>“<strong>Deprecation of Apache FOP PDF Support</strong><br /> Support for generating PDF responses for PL/SQL Gateway calls will be removed in ORDS 19.2.0. This will impact the features in Oracle Application Express relating to generating PDF documents. Future versions of Oracle Application Express will move to a new mechanism to generate PDF resources.”</em></p> </blockquote> <p>This means that Apache FOP will not be available anymore in your ORDS installations. This may certainly cause problems for users, which were using this feature intensively, especially taking into account that at the time of writing this post there’s unfortunately still no new mechanism for generating PDFs pre-shipped. When we try to print a PDF, a very disappointing message is displayed: </p> <p><img alt="" src="https://cernbox.cern.ch/index.php/apps/files_sharing/ajax/publicpreview.php?x=2880&amp;y=974&amp;a=true&amp;file=%252F%252F&amp;t=5GQ36MJeymr4Wjr&amp;scalingup=0&amp;x-access-token=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJkcm9wX29ubHkiOmZhbHNlLCJleHAiOiIyMDIwLTA1LTEyVDE4OjI3OjE2Ljc4MDY0Njg3NCswMjowMCIsImV4cGlyZXMiOjAsImlkIjoiMjU2NjUzIiwiaXRlbV90eXBlIjowLCJtdGltZSI6MTU4OTI5NzIxMCwib3duZXIiOiJqZ3JhbmllYyIsInBhdGgiOiJlb3Nob21lLWo6MTYxOTQzNDI1NzkxMjk1NDg4IiwicHJvdGVjdGVkIjpmYWxzZSwicmVhZF9vbmx5Ijp0cnVlLCJzaGFyZV9uYW1lIjoiU2NyZWVuc2hvdCAyMDIwLTA1LTEyIGF0IDE1LjIzLjUxLnBuZyIsInRva2VuIjoiNUdRMzZNSmV5bXI0V2pyIn0.s0gbPOHzCOhFGOG7Je9-ES1Z-RIXyW-EendEyI2C52Q" /></p> <p>However, fear not, the solution may turn out to be not too complex after all. </p> <h2><strong>So, what are my choices?</strong></h2> <p>There’s <strong>AOP</strong>  - Apex Office Print <a href="https://www.apexofficeprint.com/index.html#pricing">https://www.apexofficeprint.com/index.html#pricing</a>, but in the free tier you get to generate only 30 reports per month. The most expensive plans go up to $350 and 35000 reports per month. It’s a PLSQL package, so the transition will not be completely transparent to the application designers.</p> <p>A similar package is <strong>PL/PDF</strong> available here: <a href="https://www.plpdf.com">https://www.plpdf.com</a>. It has a different pricing scheme, where there’s no limit on the number of reports generated. </p> <p>Another option is <strong>Oracle BI Publisher</strong>, which is also a paid solution. It can cover many different use cases and possibility to use it as a print server is only a very small part of its capabilities. This is reflected in the cost of the license, however, if you already have Oracle BI as a part of your infrastructure, this might be the way to go.</p> <h2><strong>However...</strong></h2> <p>These options, apart from being paid solutions, require from the APEX developers to put in time and effort - either some PL/SQL code has to be written, or the template has to be recreated.</p> <p>Third option and the one I will cover in this post that solves these problems, is a self-hosted, <strong>external Apache FOP</strong> engine. It might sound complicated, but I will show you how this can be achieved in a couple of easy steps. The biggest challenge here was guessing in what format ORDS sends the data to the print server when auto-generating PDFs, but you don’t have to worry about this — it only took some trial-and-error:<br /><img alt="" src="https://cernbox.cern.ch/index.php/s/hl2kr9Z33kcswfS/download" style="width: 600px; height: 393px;" /></p> <p>In our case, we’re going to host our Apache FOP on a <a href="http://tomcat.apache.org">Tomcat</a>, just like our ORDS, creating a solution that is self-contained, seamless to the developers and free! Nothing stops you from doing  the same on a separate/different application server. It may even end up being an easier to maintain solution for one important reason that I will mention later. </p> <h2><strong>Step one - setting up the Apache FOP</strong></h2> <p>If you want to just be done with this step, you can use the precooked fop.war that you can find at the end of this post. If not - read on!</p> <p>Download the FOP source from <a href="https://xmlgraphics.apache.org/fop/download.html">https://xmlgraphics.apache.org/fop/download.html</a>. </p> <p>Since imitation is the highest form of flattery, let’s pay some compliments to the author of one the bundled examples.</p> <p>Navigate to <span style="font-family:courier new,courier,monospace;">fop-core/src/main/java/org/apache/fop/servlet</span> and open<span style="font-family:courier new,courier,monospace;"> FopServlet.java</span> in your editor/IDE of choice.</p> <p>ORDS makes a POST request with 2 parameters - “<span style="font-family:courier new,courier,monospace;">template</span>” and “<span style="font-family:courier new,courier,monospace;">xml</span>”. Their names are pretty self-explanatory and both of them contain the data in plain text. Knowing that, there are only a couple of changes needed in the servlet:</p> <ol><li> <p>Change <span style="font-family:courier new,courier,monospace;">XSLT_REQUEST_PARAM</span> value to “<span style="font-family:courier new,courier,monospace;">template</span>”</p> </li> <li> <p>Rename the method “<span style="font-family:courier new,courier,monospace;">doGet</span>” to “<span style="font-family:courier new,courier,monospace;">doPost</span>”</p> </li> <li> <p>Modify the “<span style="font-family:courier new,courier,monospace;">convertString2Source</span>” method so it only contains 1 line:​</p> </li> </ol><blockquote><p>return new StreamSource(new java.io.StringReader(param));</p> </blockquote> <p>With that ready, it’s time to compile the sources. Download ant if you don’t already have it: <a href="https://ant.apache.org/bindownload.cgi">https://ant.apache.org/bindownload.cgi</a></p> <p>Navigate to the FOP directory in the console and simply execute: <span style="font-family:courier new,courier,monospace;">ant</span></p> <p>The <span style="font-family:courier new,courier,monospace;">fop.war</span> should now be present in <span style="font-family:courier new,courier,monospace;">fop/build</span> directory.</p> <p>As a nice and quick testing scenario I always use Tim Hall’s ORDS in Docker - <a href="https://oracle-base.com/articles/linux/docker-oracle-rest-data-services-ords-on-docker">https://oracle-base.com/articles/linux/docker-oracle-rest-data-services-ords-on-docker</a>. It’s perfect, as a minimalistic way to test if our configuration works. Just mount the <span style="font-family:courier new,courier,monospace;">fop.war</span> into<span style="font-family:courier new,courier,monospace;"> /u01/config/instance1/webapps/fop.war</span></p> <h2><strong>Step two - setting up ORDS</strong></h2> <p>First, you have to login to the internal workspace in APEX.</p> <p>In order to do that, navigate to the main APEX page (in our case it will be <span style="font-family:courier new,courier,monospace;">localhost:8080/ords</span>). As a name of the workspace put INTERNAL and then the credentials to access the internal workspace (for us it’s<span style="font-family:courier new,courier,monospace;"> admin/ApexPassword1</span>).</p> <p>Go to “Manage Instance” menu, then “Instance Settings” and to the “Report Printing” tab.</p> <p>There, you should set following values:</p> <p><strong>Print Serve</strong>r: External (Apache FOP)</p> <p><strong>Host Address</strong>: be careful here, especially if you’re running in docker — take the container’s hostname:</p> <blockquote><p>docker inspect --format='{{.Config.Hostname}}' &lt;&lt;container_name&gt;&gt;</p> </blockquote> <div> <p><strong>Script</strong>: <span style="font-family:courier new,courier,monospace;">/fop/ </span></p> </div> <p><img alt="" src="https://cernbox.cern.ch/index.php/apps/files_sharing/ajax/publicpreview.php?x=2880&amp;y=974&amp;a=true&amp;file=%252F%252F&amp;t=WR2FV78YmkpqgMl&amp;scalingup=0&amp;x-access-token=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJkcm9wX29ubHkiOmZhbHNlLCJleHAiOiIyMDIwLTA1LTEyVDE4OjI5OjAwLjc3MDM2MjA4NiswMjowMCIsImV4cGlyZXMiOjAsImlkIjoiMjU2NjU0IiwiaXRlbV90eXBlIjowLCJtdGltZSI6MTU4OTI5NzIxNywib3duZXIiOiJqZ3JhbmllYyIsInBhdGgiOiJlb3Nob21lLWo6MTYxOTQzNDI5MjgwOTU2NDE2IiwicHJvdGVjdGVkIjpmYWxzZSwicmVhZF9vbmx5Ijp0cnVlLCJzaGFyZV9uYW1lIjoiU2NyZWVuc2hvdCAyMDIwLTA1LTEyIGF0IDE2LjU1LjM1LnBuZyIsInRva2VuIjoiV1IyRlY3OFlta3BxZ01sIn0.mevoSqklYjUBKTA3UvxDUM2EUlGPYO_nJu9-5lYGX-M" /></p> <h2><strong>Step three - enjoy!</strong></h2> <p>That was simple, wasn’t it? Now you can go back to printing beautiful reports for free and without your users even noticing any change!</p> <h2><strong>Why move to a new version of ORDS and put in all of this effort?</strong></h2> <p>Well, of course there are bugs patches introduced in every new release. But what makes moving to 19.4 worthwhile for me is the introduction of new features, like the <a href="https://docs.oracle.com/en/database/oracle/sql-developer-web/19.1/sdweb/about-sdw.html#GUID-AF7601F9-7713-4ECC-8EC9-FB0296002C69">SQL Developer Web </a>- a quick way for your users to manage their’s data from the browser, without the hassle of taking care of TNS Names, tunneling and other access issues. <a href="https://www.thatjeffsmith.com/archive/2019/12/sql-developer-web-is-now-available/">That Jeff Smith</a> and <a href="https://oracle-base.com/articles/misc/oracle-rest-data-services-ords-sql-developer-web">Tim Hall from Oracle-Base</a> have nice posts about it if you're interested. The REST aspect of ORDS is getting generous improvements as well — like the <a href="https://www.oracle.com/technetwork/developer-tools/rest-data-services/downloads/ords-releasenotes-194-5908833.html#performance-of-rest-apis">connection recycling</a>, which may significantly increase the throughput of the processed requests.</p> <div>The pre-cooked fop.war can be found <a href="https://cernbox.cern.ch/index.php/s/5U4O51IIHhuZ9Ji">here</a>.</div> <div> </div> <h2>Update: APEX 20.1 limited PDF printing support</h2> <p>Reading through <a href="https://docs.oracle.com/en/database/oracle/application-express/20.1/htmrn/index.html#HTMRN-GUID-DFE49F55-9A5F-4A57-A617-8975856D810F">APEX 20.1 release notes</a> there's mention of:</p> <blockquote><h4 id="HTMRN-GUID-DFE49F55-9A5F-4A57-A617-8975856D810F"><strong>Native PDF Printing for Interactive Grid</strong></h4> <p>You can now print PDF files directly from Interactive Grids. This feature produces a PDF file which includes formatting options such as highlighting, column grouping, and column breaks.</p> </blockquote> <p>I have given it a go and indeed it seems to work. But as mentioned, it's only for Interactive Grids. When using e.g. Interactive Report, we're greeted with the same, old 503 as before. Using Apache FOP as described in this post seems like a more reliable option for the moment, but this feature shows that things are on a good course and maybe we'll get full support for built-in PDF printing.</p> <p>Another small thing I noticed is support of Apex Office Print as a Print Server directly in the instance setting. So if you don't mind spending some money, it looks like a straightforward alternative as well:<img alt="" src="https://cernbox.cern.ch/index.php/s/XjI3vgnzFibXeZg/download" style="width: 500px; height: 126px;" /></p> </div> </div> <span><a title="View user profile." href="/users/jakub-granieczny" lang="" about="/users/jakub-granieczny" typeof="schema:Person" property="schema:name" datatype="">jgraniec</a></span> <span>Tue, 05/12/2020 - 17:39</span> <div class="field field--name-field-tags field--type-entity-reference field--label-above"> <div class="field--label"><b>Tags</b></div> <div class="field--items"> <div class="field--item"><a href="/tags/ords" hreflang="en">ORDS</a></div> <div class="field--item"><a href="/tags/tomcat" hreflang="en">Tomcat</a></div> </div> </div> <section> <article data-comment-user-id="0" id="comment-36336" class="js-comment"> <mark class="hidden" data-comment-timestamp="1589436147"></mark> <footer> <article typeof="schema:Person" about="/user/0"> </article> <p>Submitted by <a rel="nofollow" href="https://www.apexofficeprint.com" lang="" typeof="schema:Person" property="schema:name" datatype="">Dimitri Gielis (not verified)</a> on Thu, 05/14/2020 - 08:02</p> <a href="/comment/36336#comment-36336" hreflang="und">Permalink</a> </footer> <div> <h3><a href="/comment/36336#comment-36336" class="permalink" rel="bookmark" hreflang="und">AOP On-premises has unlimited prints.</a></h3> <div class="field field--name-comment-body field--type-text-long field--label-hidden field--item"><p>Very nice post. I just wanted to mention that APEX Office Print (AOP) also has an on-premises version, which has unlimited reports.</p> </div> <drupal-render-placeholder callback="comment.lazy_builders:renderLinks" arguments="0=36336&amp;1=default&amp;2=und&amp;3=" token="qpXtjKJomrLVltq1l01cMBTgxZGw2-S2FVaWzejf5qA"></drupal-render-placeholder> </div> </article> <article data-comment-user-id="0" id="comment-36380" class="js-comment"> <mark class="hidden" data-comment-timestamp="1591296408"></mark> <footer> <article typeof="schema:Person" about="/user/0"> </article> <p>Submitted by <span lang="" typeof="schema:Person" property="schema:name" datatype="">Sven (not verified)</span> on Thu, 06/04/2020 - 20:46</p> <a href="/comment/36380#comment-36380" hreflang="und">Permalink</a> </footer> <div> <h3><a href="/comment/36380#comment-36380" class="permalink" rel="bookmark" hreflang="und">Thanks for the nice post!</a></h3> <div class="field field--name-comment-body field--type-text-long field--label-hidden field--item"><p>Thanks for the nice post! Have you also tried it with APEX 20.1?</p> <p>Thanks,<br /> Sven</p> </div> <drupal-render-placeholder callback="comment.lazy_builders:renderLinks" arguments="0=36380&amp;1=default&amp;2=und&amp;3=" token="FMXnbeIWTuyTDP6jklT9UxpkJ6NEYjT6zp2dfguOmJc"></drupal-render-placeholder> </div> </article> <div class="indented"> <article data-comment-user-id="257" id="comment-36392" class="js-comment"> <mark class="hidden" data-comment-timestamp="1591779318"></mark> <footer> <article typeof="schema:Person" about="/users/jakub-granieczny"> </article> <p>Submitted by <a title="View user profile." href="/users/jakub-granieczny" lang="" about="/users/jakub-granieczny" typeof="schema:Person" property="schema:name" datatype="">jgraniec</a> on Wed, 06/10/2020 - 10:55</p> <p class="visually-hidden">In reply to <a href="/comment/36380#comment-36380" class="permalink" rel="bookmark" hreflang="und">Thanks for the nice post!</a> by <span lang="" typeof="schema:Person" property="schema:name" datatype="">Sven (not verified)</span></p> <a href="/comment/36392#comment-36392" hreflang="und">Permalink</a> </footer> <div> <h3><a href="/comment/36392#comment-36392" class="permalink" rel="bookmark" hreflang="und">APEX 20.1</a></h3> <div class="field field--name-comment-body field--type-text-long field--label-hidden field--item"><p>Hi Sven,</p> <p>I did not have a chance to give APEX 20.1 a go, I will update the post once I do! But I believe that if there's option to choose "external (Apache FOP)" as a printing engine, there should be no problems.</p> <p>Best regards,<br /> Jakub</p> </div> <drupal-render-placeholder callback="comment.lazy_builders:renderLinks" arguments="0=36392&amp;1=default&amp;2=und&amp;3=" token="Hs6Xoba3LnIvKjgf9dFEzn4AcizxDyWe2Hc1CdcXL9I"></drupal-render-placeholder> </div> </article> <article data-comment-user-id="257" id="comment-36407" class="js-comment"> <mark class="hidden" data-comment-timestamp="1592377755"></mark> <footer> <article typeof="schema:Person" about="/users/jakub-granieczny"> </article> <p>Submitted by <a title="View user profile." href="/users/jakub-granieczny" lang="" about="/users/jakub-granieczny" typeof="schema:Person" property="schema:name" datatype="">jgraniec</a> on Wed, 06/17/2020 - 09:09</p> <p class="visually-hidden">In reply to <a href="/comment/36380#comment-36380" class="permalink" rel="bookmark" hreflang="und">Thanks for the nice post!</a> by <span lang="" typeof="schema:Person" property="schema:name" datatype="">Sven (not verified)</span></p> <a href="/comment/36407#comment-36407" hreflang="und">Permalink</a> </footer> <div> <h3><a href="/comment/36407#comment-36407" class="permalink" rel="bookmark" hreflang="und">APEX 20.1 Update</a></h3> <div class="field field--name-comment-body field--type-text-long field--label-hidden field--item"><p>I updated the post with info about APEX 20.1. The Apache FOP works the same as before, but there are some interesting changes as well.</p> </div> <drupal-render-placeholder callback="comment.lazy_builders:renderLinks" arguments="0=36407&amp;1=default&amp;2=und&amp;3=" token="9DQjsRqNzzM_wKRLXrFIPd1IzZHiw12TpNEf6x3_kW0"></drupal-render-placeholder> </div> </article> </div> <h2>Add new comment</h2> <drupal-render-placeholder callback="comment.lazy_builders:renderForm" arguments="0=node&amp;1=180&amp;2=comment_node_blog_post&amp;3=comment_node_blog_post" token="5d__oCWZIDb5F6arUFfPW6MUz4MP6DWNLT6kTQffL6k"></drupal-render-placeholder> </section> Tue, 12 May 2020 15:39:50 +0000 jgraniec 180 at https://db-blog.web.cern.ch Distributed Deep Learning for Physics with TensorFlow and Kubernetes https://db-blog.web.cern.ch/blog/luca-canali/2020-03-distributed-deep-learning-physics-tensorflow-and-kubernetes <span>Distributed Deep Learning for Physics with TensorFlow and Kubernetes</span> <div class="field field--name-body field--type-text-with-summary field--label-above"> <div class="field--label"><b>Blog article:</b></div> <div class="field--item"><p style="margin: 0px;"><b><font color="#ff0000" data-blogger-escaped-style="color: red;">Summary:</font></b><span> </span>This post details a solution for distributed deep learning training for a High Energy Physics use case, deployed using cloud resources and Kubernetes. You will find the results for training using CPU and GPU nodes. This post also describes an experimental tool that we developed, <a href="https://github.com/cerndb/tf-spawner">TF-Spawner</a>, and how we used it to run distributed TensorFlow on a Kubernetes cluster.</p> <p style="margin: 0px;"> </p> <p style="margin: 0px;"><b><font color="#ff0000">Authors:</font></b> <a href="mailto:Riccardo.Castellotti@cern.ch">Riccardo.Castellotti@cern.ch</a> and <a href="mailto:Luca.Canali@cern.ch">Luca.Canali@cern.ch</a></p> <p style="margin: 0px;"> </p> <h2><font color="#ff0000" data-blogger-escaped-style="color: red;">A Particle Classifier</font></h2> <p style="margin: 0px;">This work was developed as part of the pipeline described in <a href="https://rdcu.be/b4Wk9" rel="nofollow">Machine Learning Pipelines with Modern Big DataTools for High Energy Physics</a>. The main goal is to build a particle classifier to improve the quality of data filtering for online systems at the LHC experiments. The classifier is implemented using a neural network model described in<span> </span><a href="https://link.springer.com/epdf/10.1007/s41781-019-0028-1?author_access_token=eTrqfrCuFIP2vF4nDLnFfPe4RwlQNchNByi7wbcMAY7NPT1w8XxcX1ECT83E92HWx9dJzh9T9_y5Vfi9oc80ZXe7hp7PAj21GjdEF2hlNWXYAkFiNn--k5gFtNRj6avm0UukUt9M9hAH_j4UR7eR-g%3D%3D">this research article.</a></p> <p style="margin: 0px;">The datasets used for test and training are stored in<span> </span><a href="https://www.tensorflow.org/tutorials/load_data/tfrecord">TFRecord<span> </span></a>format, with a cumulative size of about 250 GB, with 34 million events in the training dataset. A key part of the neural network (see Figure 1) is a <a href="https://www.tensorflow.org/api_docs/python/tf/keras/layers/GRU">GRU layer</a> that is trained using lists of 801 particles with 19 low-level features each, which account for most of the training dataset. The datasets used for this work have been produced using Apache Spark, see<span> </span><a href="https://github.com/cerndb/SparkDLTrigger">details and code</a>. The original pipeline produces files in Apache Parquet format; we have used Spark and the<span> </span><a href="https://mvnrepository.com/artifact/org.tensorflow/spark-tensorflow-connector_2.11">spark<font class="pl-k">-</font>tensorflow<font class="pl-k">-</font>connector</a><span> </span>to convert the datasets into TFRecord format,<span> </span><a href="https://github.com/cerndb/SparkDLTrigger/blob/master/Training_TFKeras_CPU/DataPrep_extract_and_convert_Full_Dataset_TFRecord.scala">see also the code</a>.</p> <p style="margin: 0px;"> </p> <p style="margin: 0px;"><b><font color="#ff0000" data-blogger-escaped-style="color: red;">Data:</font><span> </span><a href="https://github.com/cerndb/SparkDLTrigger/tree/master/Data">download the datasets used for this work from this link</a></b></p> <p style="margin: 0px;"><b><font color="#ff0000" data-blogger-escaped-style="color: red;">Code:</font> <a href="https://github.com/cerndb/SparkDLTrigger/tree/master/Training_TFKeras_CPU_GPU_K8S_Distributed">see the code for the tests reported in this post at this link</a></b></p> <p style="margin: 0px;"> </p> <p class="separator" data-blogger-escaped-style="clear: both; text-align: center;" style="margin: 0px; clear: both; text-align: center;"> </p> <p class="separator" data-blogger-escaped-style="clear: both; text-align: center;" style="margin: 0px; color: rgb(0, 0, 0); font-family: &quot;Times New Roman&quot;; font-size: medium; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial; clear: both; text-align: center;"><a data-blogger-escaped-style="margin-left: 1em; margin-right: 1em;" href="https://1.bp.blogspot.com/-UHgKsqnhxGw/XnfD3TA1s4I/AAAAAAAAFTQ/cekgVX0OoVgskSLao1vsGQENy-8PeAGpgCLcBGAsYHQ/s1600/Figure1.png" style="margin-left: 1em; margin-right: 1em;"><img data-blogger-escaped-data-original-height="501" data-blogger-escaped-data-original-width="1144" src="https://1.bp.blogspot.com/-UHgKsqnhxGw/XnfD3TA1s4I/AAAAAAAAFTQ/cekgVX0OoVgskSLao1vsGQENy-8PeAGpgCLcBGAsYHQ/s1600/Figure1.png" style="cursor: move; border-width: 0px; border-style: solid; width: 800px; height: 350px;" /></a></p> <p style="margin: 0px;"><b>Figure 1:</b><span> </span>(left) Diagram of the neural network for the Inclusive Classifier model, from<span> </span><a href="https://arxiv.org/abs/1807.00083">T. Nguyen et. al.</a><span> </span>(right) TF.Keras implementation used in this work.</p> <p style="margin: 0px;"> </p> <h2><font color="#ff0000" data-blogger-escaped-style="color: red;">Distributed Training on Cloud Resources</font></h2> <p style="margin: 0px;">Cloud resources provide a suitable environment for scaling distributed training of neural networks. One of the key advantages of using cloud resources is the elasticity of the platform that allows allocating resources when needed. Moreover, container orchestration systems, in particular Kubernetes, provide a powerful and flexible API for deploying many types of workloads on cloud resources, including machine learning and data processing pipelines. CERN physicists, and data scientists in general, can access cloud resources and Kubernetes clusters via the CERN OpenStack private cloud. The use of public clouds is also being actively tested for High Energy Physics (HEP) workloads. The tests reported here have been run using resources from<span> </span><a href="https://www.oracle.com/cloud/">Oracle's OCI</a>.</p> <p style="margin: 0px;">For this work, we have developed a custom launcher script,<span> </span><a href="https://github.com/cerndb/tf-spawner">TF-Spawner</a><span> </span>(see also the paragraph on TF-Spawner for more details) for running distributed TensorFlow training code on Kubernetes clusters.</p> <p style="margin: 0px;">Training and test datasets have been copied to the cloud object storage prior to running the tests, OCI object storage in this case, while for tests run at CERN we used an S3 installation based on Ceph. Our model training job with TensorFlow used training and test data in TFRecord format, produced at the end of the data preparation part of the pipeline, as discussed in the previous paragraph. TensorFlow reads natively TFRecord format and has tunable parameters and optimizations when ingesting this type of data using the modules tf.data and tf.io. We found that reading from OCI object storage can become a bottleneck for distributed training, as it requires reading data over the network which can suffer from bandwidth saturation, latency spikes and/or multi-tenancy noise. We followed<span> </span><a href="https://www.tensorflow.org/guide/data_performance">TensorFlow's documentation recommendations</a><span> </span>for improving the data pipeline performance, by using prefetching, parallel data extraction, sequential interleaving, caching, and by using a large read buffer. Notably, caching has proven to be very useful for distributed training with GPUs and for some of the largest tests on CPU, where we observed that the first training epoch, which has to read the data into the cache, was much slower than subsequent epoch which would find data already cached.</p> <p style="margin: 0px;">Tests were run using TensorFlow version 2.0.1, using tf.distribute strategy "multi worker mirror strategy''. Additional care was taken to make sure that the different tests would also yield the same good results in terms of accuracy on the test dataset as what was found with<span> </span><a href="https://db-blog.web.cern.ch/blog/luca-canali/machine-learning-pipelines-high-energy-physics-using-apache-spark-bigdl">training methods tested in previous work</a>. To achieve this we have found that additional tuning was needed on the settings of the learning rate for the optimizer (we use the <a href="https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/Adam">Adam </a>optimizer for all the tests discussed in this article). We scaled the learning rate with the number of workers, to match the increase in effective batch size (we used 128 for each worker). In addition, we found that slowly reducing the learning rate as the number of epochs progressed, was beneficial to the convergence of the network. This additional step is an ad hoc tuning that we developed by trial and error and that we validated by monitoring the accuracy and loss on the test dataset at the end of each training.</p> <p>To gather performance data, we ran the training for 6 epochs, which provided accuracy and loss very close to the best results that we would obtain by training the network up to 12 epochs. We have also tested adding shuffling between each epoch, using the shuffle method of the tf.data API, however it has not shown measurable improvements so this technique has not been further used in the tests reported here.</p> <p style="margin: 0px;"> </p> <p class="separator" data-blogger-escaped-style="clear: both; text-align: center;" style="margin: 0px; color: rgb(0, 0, 0); font-family: &quot;Times New Roman&quot;; font-size: medium; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial; clear: both; text-align: center;"><a data-blogger-escaped-style="margin-left: 1em; margin-right: 1em;" href="https://1.bp.blogspot.com/-Ke0hXsH6lRY/XnfGnngP8DI/AAAAAAAAFTc/zg3i3YqrUGwo3IMq7sbuXn79IjgJNtlPgCLcBGAsYHQ/s1600/Figure2_Speedup_CPU_GPU_matteo.png" style="margin-left: 1em; margin-right: 1em;"><img data-blogger-escaped-data-original-height="1000" data-blogger-escaped-data-original-width="900" src="https://1.bp.blogspot.com/-Ke0hXsH6lRY/XnfGnngP8DI/AAAAAAAAFTc/zg3i3YqrUGwo3IMq7sbuXn79IjgJNtlPgCLcBGAsYHQ/s1600/Figure2_Speedup_CPU_GPU_matteo.png" style="cursor: move; border-width: 0px; border-style: solid; width: 1000px; height: 700px;" /></a></p> <p style="margin: 0px;"><b>Figure 2:</b> Measured speedup for the distributed training of the Inclusive Classifier model using TensorFlow and tf.distribute with “multi  worker  mirror  strategy”, running on cloud resources with CPU and GPU nodes (Nvidia P100), training for 6 epochs.  The speedup values indicate how well the distributed training scales as the number of worker nodes, with CPU and GPU resources, increases.</p> <div> <p style="margin: 0px;"> </p> </div> <h2><font color="#ff0000" data-blogger-escaped-style="color: red;">Results and Performance Measurements, CPU and GPU Tests</font></h2> <p>We deployed our tests using Oracle's OCI. Cloud resources were used to build Kubernetes clusters using virtual machines (VMs). We used a set of Terraform script to automate the configuration process. The cluster for CPU tests used VMs of the flavor "VM.Standard2.16'', based on 2.0 GHz Intel Xeon Platinum 8167M, each providing 16 physical cores (Oracle cloud refers to this as OCPUs) and 240 GB of RAM. Tests in our configuration deployed 3 pods for each VM (Kubernetes node), each pod running one TensorFlow worker. Additional OS-based measurements on the VMs confirmed that this was a suitable configuration, as we could measure that the CPU utilization on each VM matched the number of available physical cores (OCPUs), therefore providing good utilization without saturation. The available RAM in the worker nodes was used to cache the training dataset using the tf.data API (data populates the cache during the first epoch).</p> <p>Figure 2 shows the results of the Inclusive Classifier model training <a href="https://en.wikipedia.org/wiki/Speedup">speedup </a>for a variable number of nodes and CPU cores. Tests have been run using <a href="https://github.com/cerndb/tf-spawner">TF-Spawner</a>. Measurements show that the training time decreases as the number of allocated cores increases. The speedup grows close to linearly in the range tested: from 32 cores to 480 cores. The largest distributed training test that we ran using CPU, used 480 physical cores (OCPU), distributed over 30 VM, each running 3 workers each (each worker running in a separate container in a pod), for a total of 90 workers.</p> <p>Similarly, we have performed tests using GPU resources on OCI and running the workload with <a href="https://github.com/cerndb/tf-spawner">TF-Spawner</a>. For the GPU tests we have used the VM flavor "GPU 2.1'' which comes equipped with one Nvidia P100 GPU, 12 physical cores (OCPU) and 72 GB of RAM. We have tested with distributed training up to 10 GPUs, and found that scalability was close to linear in the tested range. One important lesson learned when using GPUs is, that the slow performance of reading data from OCI storage makes the first training epoch much slower than the rest of the epochs (up to 3-4 times slower). It was therefore very important to use TensorFlow's caching for the training dataset for our tests with GPUs. However, we could only cache the training dataset for tests using 4 nodes or more, given the limited amount of memory in the VM flavor used (72 GB of RAM per node) compared to the size of the training set (200 GB).</p> <p>Distributed training tests with CPUs and GPUs were performed using the same infrastructure, namely a Kubernetes cluster built on cloud resources and cloud storage allocated on OCI. Moreover, we used the same script for CPU and GPU training and used the same APIs, tf.distribute and tf.keras, and the same TensorFlow version. The TensorFlow runtime used was different for the two cases, as training on GPU resources took advantage of TensorFlow's optimizations for CUDA and Nvidia GPUs. Figure 3 shows the distributed training time measured for some selected cluster configurations. We can use these results to compare the performance we found when training on GPU and on CPU. For example, we find in Figure 3 that the training time of the Inclusive Classifier for 6 epochs using 400 CPU cores (distributed over 25 VMs equipped with 16 physical cores each) is about 2000 seconds, which is similar to the training time we measured when distributing the training over 6 nodes equipped with GPUs. When training using GPU resources (Nvidia P100), we measured that each batch is processed in about 59 ms (except for epoch 1 which is I/O bound and is about 3x slower). Each batch contains 128 records, and has a size of about 7.4 MB. This corresponds to a measured throughput of training data flowing through the GPU of about 125 MB/sec per node (i.e. 1.2 GB/sec when training using 10 GPUs). When training on CPU, the measured processing time per batch is about 930 ms, which corresponds to 8 MB/sec per node, and amounts to 716 MB/sec for the training test with 90 workers and 480 CPU cores. We do not believe these results can be easily generalized to other environments and models, however, they are reported here as they can be useful as an example and for future reference.</p> <p style="margin: 0px;"> </p> <p class="separator" data-blogger-escaped-style="clear: both; text-align: center;" style="margin: 0px; color: rgb(0, 0, 0); font-family: &quot;Times New Roman&quot;; font-size: medium; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial; clear: both; text-align: center;"><a data-blogger-escaped-style="margin-left: 1em; margin-right: 1em;" href="https://1.bp.blogspot.com/-P1DToNSzje8/XnfHFxStw3I/AAAAAAAAFTg/U5JbCFQzrT8FM7BIpQZuiM_q0DntANTNQCLcBGAsYHQ/s1600/Figure3_Training_Time_CPU_GPU_distributed_matteo.png" style="margin-left: 1em; margin-right: 1em;"><img data-blogger-escaped-data-original-height="480" data-blogger-escaped-data-original-width="640" src="https://1.bp.blogspot.com/-P1DToNSzje8/XnfHFxStw3I/AAAAAAAAFTg/U5JbCFQzrT8FM7BIpQZuiM_q0DntANTNQCLcBGAsYHQ/s1600/Figure3_Training_Time_CPU_GPU_distributed_matteo.png" style="cursor: move; border-width: 0px; border-style: solid; width: 640px; height: 480px;" /></a></p> <p style="margin: 0px;"><b>Figure 3:</b> Selected measurements of the distributed training time for the Inclusive Classifier model using TensorFlow and tf.distribute with “multi worker mirror strategy”, training for 6 epochs, running on cloud resources, using CPU (2.0 GHz Intel Xeon Platinum 8167M) and GPU (Nvidia P100) nodes, on Oracle's OCI.</p> <p style="margin: 0px;"> </p> <h2><font color="#ff0000" data-blogger-escaped-style="color: red;">TF-Spawner</font></h2> <p style="margin: 0px;"><a href="https://github.com/cerndb/tf-spawner">TF-Spawner</a> is an experimental tool for running TensorFlow distributed training on Kubernetes clusters.</p> <p style="margin: 0px;">TF-Spawner takes as input the user's Python code for TensorFlow training, which is expected to use <a href="https://www.tensorflow.org/tutorials/distribute/multi_worker_with_keras">tf.distribute strategy for multi worker training</a>, and runs it on a Kubernetes cluster. TF-Spawner takes care of requesting the desired number of workers, each running in a container image inside a dedicated pod (unit of execution) on a Kubernetes cluster. We used the official TensorFlow images from Docker Hub for this work. Moreover, TF-Spawner handles the distribution of the necessary credentials for authenticating with cloud storage and manages the TF_CONFIG environment variable needed by tf.distribute.</p> <p style="margin: 0px;"> </p> <p style="margin: 0px;"><b>Examples:</b></p> <ul><li><a href="https://github.com/cerndb/tf-spawner">This is a link tp a "toy example" of how to train MNIST</a> in the TF-Spawner README file</li> <li><a href="https://github.com/cerndb/SparkDLTrigger/tree/master/Training_TFKeras_CPU_GPU_K8S_Distributed">Here the details of the steps used to train the Particle Classfier</a> with TF-Spawner</li> </ul><p style="margin: 0px; color: rgb(0, 0, 0); font-family: &quot;Times New Roman&quot;; font-size: medium; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;"> </p> <p style="margin: 0px; color: rgb(0, 0, 0); font-family: &quot;Times New Roman&quot;; font-size: medium; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;"><b>TensorBoard  metrics visualization:</b></p> <p style="margin: 0px; color: rgb(0, 0, 0); font-family: &quot;Times New Roman&quot;; font-size: medium; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">  </p> <p style="margin: 0px; color: rgb(0, 0, 0); font-family: &quot;Times New Roman&quot;; font-size: medium; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;"><a href="https://www.tensorflow.org/tensorboard" rel="nofollow">TensorBoard</a><span> </span>provides monitoring and instrumentation for TensorFlow operations. To use TensorBoard with TF-Spawner you can follow a few additional <a href="https://github.com/cerndb/tf-spawner#tensorboard">steps detailed in the documentation</a>.</p> <p style="margin: 0px; color: rgb(0, 0, 0); font-family: &quot;Times New Roman&quot;; font-size: medium; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;"> </p> <p class="separator" data-blogger-escaped-style="clear: both; text-align: center;" style="margin: 0px; color: rgb(0, 0, 0); font-family: &quot;Times New Roman&quot;; font-size: medium; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial; clear: both; text-align: center;"><a data-blogger-escaped-style="margin-left: 1em; margin-right: 1em;" href="https://1.bp.blogspot.com/-6UGfvKeFL64/Xob08u9nfUI/AAAAAAAAFU0/a9u6DkKXGUk9V745e9wGewlvT60FPvLyACLcBGAsYHQ/s1600/Figure_TensorBoard_10GPU_12epochs_addXaxis.png" style="margin-left: 1em; margin-right: 1em;"><img data-blogger-escaped-data-original-height="1524" data-blogger-escaped-data-original-width="1426" src="https://1.bp.blogspot.com/-6UGfvKeFL64/Xob08u9nfUI/AAAAAAAAFU0/a9u6DkKXGUk9V745e9wGewlvT60FPvLyACLcBGAsYHQ/s1600/Figure_TensorBoard_10GPU_12epochs_addXaxis.png" style="cursor: move; border-width: 0px; border-style: solid; width: 598px; height: 640px;" /></a></p> <p style="margin: 0px; color: rgb(0, 0, 0); font-family: &quot;Times New Roman&quot;; font-size: medium; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;"><b>Figure 4:</b> TensorBoard visualization of the distributed training metrics for the Inclusive Classifier, trained on 10 GPUs nodes on a Kubernetes cluster using TF-Spawner. Measurements show that training convergences smoothly. Note: the reason why we see lower accuracy and greater loss for the training dataset compared to the validation dataset is due to the use of dropout in the model.</p> <p style="margin: 0px; color: rgb(0, 0, 0); font-family: &quot;Times New Roman&quot;; font-size: medium; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;"> </p> <p style="margin: 0px;"><b>Limitations:<span> </span></b>We found TF-Spawner powerful and easy to use for the scope of this work. However, it is an experimental tool. Notably, there is no validation of the user-provided training script, it is simply passed to Python for execution. Users need to make sure that all the requested pods are effectively running, and have to manually take care of possible failures. At the end of the training, the pods will be found in "Completed" state, users can then manually get the information they need, such as the training time from the pods' log files. Similarly, other common operations, such as fetching the saved trained model, or monitoring training with TensorBoard, will need to be performed manually. These are all relatively easy tasks, but require additional effort and some familiarity with the Kubernetes environment. </p> <p style="margin: 0px;">Another limitation to the proposed approach is that the use of TF-Spawner does not naturally fit with the use of Jupyter Notebooks, which are often the preferred environment for ML development. Ideas for future work in this direction and other tools that can be helpful in this area are listed in the conclusions.</p> <p style="margin: 0px;">If you try and find TF-Spawner useful for your work, we welcome feedback.</p> <p style="margin: 0px;"> </p> <h2><font color="#ff0000">Conclusions and Acknowledgements</font></h2> <p style="margin: 0px;">This work shows an example of how we implemented distributed deep learning for a High Energy Physics use case, using commonly used tools and platforms from industry and open source, namely TensorFlow and Kubernetes. A key point of this work is demonstrating the use of cloud resources to scale out distributed training.</p> <p style="margin: 0px;">Machine learning and deep learning on large amounts of data are standard tools for particle physics, and their use is expected to increase in the HEP community in the coming years, both for data acquisition and for data analysis workflows, notably in the context of the challenges of the<span> </span><a href="https://hilumilhc.web.cern.ch/">High Luminosity LHC project</a>. Improvements in productivity and cost reduction for development, deployment, and maintenance of machine learning pipelines on HEP data are of high interest.</p> <p style="margin: 0px;">We have developed and used a simple tool for running TensorFlow distributed training on Kubernetes clusters, <a href="https://github.com/cerndb/tf-spawner">TF-Spawner</a>. Previously reported work has addressed the<span> </span><a href="https://db-blog.web.cern.ch/blog/luca-canali/machine-learning-pipelines-high-energy-physics-using-apache-spark-bigdl">implementation of the pipeline and distributed training using Apache Spark</a>. Future work may address the use of other solutions for distributed training, using cloud resources and open source tools, such as Horovod on Spark and KubeFlow. In particular, we are interested in further exploring the integration of distributed training with the CERN analytics platform based on Jupyter Notebooks.</p> <p style="margin: 0px;"> </p> <p>This work has been developed in the context of the Data Analytics services at CERN and of the<span> </span><a href="https://openlab.cern/">CERN openlab</a><span> </span>project on machine learning in the cloud in collaboration with Oracle. Additional information on the work described here can be found in the article <a href="https://rdcu.be/b4Wk9" rel="nofollow">Machine Learning Pipelines with Modern Big DataTools for High Energy Physics</a>. The authors would like to thank Matteo Migliorini and Marco Zanetti of the University of Padova for their collaboration and joint work, Thong Nguyen and Maurizio Pierini for their help, suggestions, and for providing the dataset and models for this work. Many thanks also to CERN openlab, to our Oracle contacts for this project, and to our colleagues at the Spark and Hadoop Service at CERN.</p> <p> </p> <p> </p> </div> </div> <span><a title="View user profile." href="/users/luca-canali" lang="" about="/users/luca-canali" typeof="schema:Person" property="schema:name" datatype="">canali</a></span> <span>Mon, 03/23/2020 - 14:43</span> <div class="field field--name-field-tags field--type-entity-reference field--label-above"> <div class="field--label"><b>Tags</b></div> <div class="field--items"> <div class="field--item"><a href="/tags/tensorflow" hreflang="en">TensorFlow</a></div> <div class="field--item"><a href="/tags/tools" hreflang="en">Tools</a></div> <div class="field--item"><a href="/tags/kubernetes" hreflang="en">kubernetes</a></div> <div class="field--item"><a href="/tags/machine-learning" hreflang="en">Machine Learning</a></div> <div class="field--item"><a href="/tags/performance-0" hreflang="en">Performance</a></div> </div> </div> <section> <h2>Add new comment</h2> <drupal-render-placeholder callback="comment.lazy_builders:renderForm" arguments="0=node&amp;1=179&amp;2=comment_node_blog_post&amp;3=comment_node_blog_post" token="LMVSjz1ZbbTF4FePeK06Cv1mNkXR0QeRwKNk2InaMzk"></drupal-render-placeholder> </section> Mon, 23 Mar 2020 13:43:59 +0000 canali 179 at https://db-blog.web.cern.ch Automatize the deployment of Kubernetes Clusters on Cloud Infrastructure https://db-blog.web.cern.ch/blog/antonio-nappi/2020-03-automatize-deployment-kubernetes-clusters-cloud-infrastructure <span> Automatize the deployment of Kubernetes Clusters on Cloud Infrastructure</span> <div class="field field--name-body field--type-text-with-summary field--label-above"> <div class="field--label"><b>Blog article:</b></div> <div class="field--item"><p dir="ltr" id="docs-internal-guid-27e5c163-7fff-72d4-e663-db5e4c5c0740"><span>Hi, my name is Priyanshu Khandelwal. I was amongst the 40 students selected from all over the world to work at CERN as an Openlab Summer Student 2019. I worked in the IT-DB-DAR section under the supervision of Mr Antonio Nappi. </span></p> <h2 dir="ltr"><span>Introduction</span></h2> <p dir="ltr"><span>At CERN’s IT-DB-DAR section, t</span><span>here is ongoing work to migrate the hosting infrastructure from virtual machines to Kubernetes. Profiting of Kubernetes portability to evaluate how CERN’s services can run on public clouds - in particular, on Oracle cloud.</span></p> <p dir="ltr"><span>During the summer, I worked on automating the process of Infrastructural deployment on public/private clouds and making this deployment procedure cloud-agnostic.</span></p> <p dir="ltr"><span>As part of my project, we developed Terraform [</span>1<span>] (discussed below) modules that could deploy infrastructure (refer Figure 1) consisting of Kubernetes cluster, InfluxDb, Elasticsearch, Grafana and Kibana on Oracle Cloud. We also developed Terraform modules to create Kubernetes clusters and Magnum cluster templates on Openstack. </span></p> <figure><img alt="" src="https://db-blog-multimedia.web.cern.ch/db-blog-multimedia/anappi/Infrastructure.png" /><figcaption><p class="rteleft">Figure 1. Infrastructure to be deployed on Oracle Cloud</p> </figcaption></figure><h2 dir="ltr"><span>About Terraform</span></h2> <p dir="ltr"><span>Terraform [</span>1<span>] is an open source tool developed by Hashcorp which can create and manage cloud infrastructure. It uses Hashicorp Configuration Language (HCL) [</span><a href="https://github.com/hashicorp/hcl"><span>2</span></a><span>] for it’s configuration. With terraform, it is possible to specify the infrastructure components like network, subnet, compute instances etc in a set of declarative files in HCL. All major cloud providers have Terraform plugins available so Terraform can be used to deploy infrastructure on any of these cloud providers.</span></p> <h2 dir="ltr"><span>Creating a Terraform Module</span></h2> <p dir="ltr"><span>Using Terraform, we specified the infrastructure that we wanted to deploy on the cloud as HCL code. We coded virtual machines and related infrastructure components like network subnet, virtual network etc as resources in Terraform.</span><br /><span>An example is shown in Figure 2, where we declare a virtual machine resource of type “oci_core_instance” in terraform. The type represents the cloud provider, Oracle Cloud in this case. Further, several configuration arguments like Virtual Network Interface Card details, availability domain, shape, metadata etc can be added to the virtual machine resource.</span></p> <figure><img alt="" src="http://db-blog-multimedia.web.cern.ch/db-blog-multimedia/anappi/ELK.png" /><figcaption><p>Figure 2. Coding a virtual machine as a resource in Terraform</p> </figcaption></figure><p dir="ltr"><span>After coding the infrastructure, we wrote bash scripts that could install particular software like Elasticsearch, Kibana, InfluxDb etc along with their dependencies on the virtual machine and configure the system for them. An example bash script is shown in Figure 3. These scripts were executed automatically on the virtual machine after it was provisioned. </span></p> <figure><img alt="" src="http://db-blog-multimedia.web.cern.ch/db-blog-multimedia/anappi/ElkInstallation.png" /><figcaption><p>Figure 3. Bash script to setup Elasticsearch and Kibana on a VM</p> </figcaption></figure><p><span>Then, we use “remote-exec” to copy the needed configuration files from the local system to the provisioned virtual machine (shown in Figure 4).</span></p> <figure><img src="http://db-blog-multimedia.web.cern.ch/db-blog-multimedia/anappi/RemoteExec.png" /><figcaption><p>Figure 4. Using ‘remote-exec’ to copy files from local system.</p> </figcaption></figure><h2 dir="ltr"><span>Deploying a Terraform Module</span></h2> <p dir="ltr"><span>The process to deploy any of the developed modules is the same for any public/private cloud. It consists of the following steps </span></p> <ol dir="ltr"><li><span>Enter the client secrets, passwords and preferences in the terraform.tfvars file. An example terraform.tfvars file is shown in Figure 5</span></li> </ol><figure><img src="http://db-blog-multimedia.web.cern.ch/db-blog-multimedia/anappi/TerraformTfvars.png" /><figcaption><p>Figure 5. An example terraform.tfvars file</p> </figcaption></figure><ol dir="ltr" start="2"><li><span>Run the following command to download and install the cloud provider specific dependencies on your local system automatically.</span></li> </ol><blockquote><pre> <code>$ terraform init</code> </pre></blockquote> <p><span>Run the following commands to deploy the current state of infrastructure to the cloud. Note that terraform would provide an analysis of all the infrastructure components that would be created or destroyed if the current state of the module is deployed before proceeding with the actual deployment.</span></p> <blockquote><pre> $ terraform plan</pre></blockquote> <blockquote><pre> $ terraform apply</pre></blockquote> <h2 dir="ltr"><span>Conclusion</span></h2> <p dir="ltr"><span>This way, we were able to automate and simplify the entire process of infrastructural setup. The need to manually set up the system along with the dependencies is eliminated and developers no longer have to write long complex commands everytime they want to deploy infrastructure. The deployments are Cloud Agnostic as are able to use the same procedure of deployment for different cloud providers like Oracle Cloud and Openstack.</span></p> <h2 dir="ltr"><span>Challenges</span></h2> <p dir="ltr"><span>Although we were able to achieve our goals, the following were few problems that we encountered in the development process -</span></p> <ol dir="ltr"><li role="presentation"><span>We started development with terraform version 0.11.x which was lacking some useful functionalities that were introduced in later releases. For instance, the API for terraform 0.12.x a function “fileexists()” to determine whether a file exists at a given path dynamically. </span></li> <li role="presentation"><span>Everytime we rename a terraform resource and run “terraform apply” to update it in the cloud, in spite of the infrastructure component getting updated with the new changes, it gets recreated on the cloud</span></li> </ol><h2 dir="ltr"><span>Acknowledgement</span></h2> <p dir="ltr"><span>Overall, I would say it was quite an enriching experience, the work was fun and the IT-DB-DAR team was amazing. I am greatly indebted to CERN for offering me such a wonderful opportunity and to my supervisor Mr. Antonio Nappi for guiding me throughout the program. This summer was one of the best summers I have ever had! </span><br /><span>More details regarding the project are available </span><a href="http://doi.org/10.5281/zenodo.3565664"><span>here</span></a><span>.</span></p> <h2 dir="ltr"><span>References</span></h2> <ol dir="ltr"><li role="presentation"><a href="https://www.terraform.io/docs"><span>https://www.terraform.io/docs</span></a></li> <li role="presentation"><a href="https://github.com/hashicorp/hcl"><span>https://github.com/hashicorp/hcl</span></a></li> <li role="presentation"><a href="http://doi.org/10.5281/zenodo.3565664"><span>http://doi.org/10.5281/zenodo.3565664</span></a></li> </ol></div> </div> <span><a title="View user profile." href="/users/antonio-nappi" lang="" about="/users/antonio-nappi" typeof="schema:Person" property="schema:name" datatype="">anappi</a></span> <span>Thu, 03/12/2020 - 16:21</span> <section> <h2>Add new comment</h2> <drupal-render-placeholder callback="comment.lazy_builders:renderForm" arguments="0=node&amp;1=178&amp;2=comment_node_blog_post&amp;3=comment_node_blog_post" token="6HpyEI643Mjt3PTF1iyBnffcOMrccTarnrg2ALMNbDc"></drupal-render-placeholder> </section> Thu, 12 Mar 2020 15:21:19 +0000 anappi 178 at https://db-blog.web.cern.ch Building and documenting REST APIs with ORDS https://db-blog.web.cern.ch/blog/luis-rodriguez-fernandez/2020-01-building-and-documenting-rest-apis-ords <span>Building and documenting REST APIs with ORDS</span> <div class="field field--name-body field--type-text-with-summary field--label-above"> <div class="field--label"><b>Blog article:</b></div> <div class="field--item"><h1>Introduction</h1> <p>In the first part of the article we will provide an overview of how you can use Oracle REST Data Services for providing APIs directly from your PL/SQL code . The second part covers how to document our Web services using <a href="https://swagger.io/">Swagger</a>. Lets begin with a couple of technical concepts:</p> <ul><li><strong>ORDS</strong>: Oracle REST Data Services ORDS enables developers with <strong>SQL</strong> and database skills to develop <strong>REST APIs</strong> for the Oracle Database. In a few words ORDS is: <ul><li>A <strong>mid-tier Java</strong> application.</li> <li>Runs in a Java application server like <strong>WebLogic</strong> or <strong>Tomcat</strong>.</li> <li><strong>Maps</strong> standard <strong>http(s) </strong>RESTful requests to <strong>database transactions</strong>. <ul><li>Access to Relational data over HTTP(s) without installing JDBC/ODBC drivers</li> </ul></li> <li>Can declaratively returns results in <strong>JSON</strong> format.</li> <li>Can connect to Oracle <strong>NoSQL</strong> and Oracle container databases in <strong>Cloud</strong>.</li> <li>Supports <strong>Swagger</strong> based <strong>Open API</strong> integration.</li> <li>It was formally known as <strong>Oracle APEX Listener</strong>.</li> </ul></li> </ul><p><a href="http://db-blog-multimedia.web.cern.ch/db-blog-multimedia/lurodrig/iheb-ords/figure1.png"><img alt="Figure 1: Relational to JSON with ORDS" src="http://db-blog-multimedia.web.cern.ch/db-blog-multimedia/lurodrig/iheb-ords/figure1.png" style="width: 1024px; height: 355px;" /></a></p> <p class="rtecenter"><em>Figure 1: Relational to JSON with ORDS</em></p> <p>The picture above shows the ORDS architecture and how does it work. It basically acts as a <strong>middleman</strong> between clients (applications) and the database, <strong>mapping</strong> incoming <strong>HTTP(S) </strong>requests to <strong>resource handlers</strong> for a specific URI pattern. Resource handlers can have different source types, such as query and PL/SQL. With the query source type, ORDS executes the query, converts the results into JSON, and returns the JSON to the client.</p> <ul><li><strong>ORDS - URL Structure</strong>. The URL for an ORDS Web Service consists of five elements: <strong>https://&lt;host&gt;:&lt;port&gt;/ords/&lt;schema&gt;/&lt;module&gt;/&lt;template&gt;</strong> <ul><li>Host: Name of the host.</li> <li>Port: Port Number.</li> <li>Schema: Pattern defined for the schema.</li> <li>Module: Base path for the module.</li> <li>Template: Pattern defined for the template.</li> </ul></li> <li><strong>Swagger</strong>: Swagger is an<strong> open-source software framework</strong> supported by a large tool ecosystem that helps developers <strong>design</strong>, <strong>build</strong>, <strong>document</strong>, and <strong>consume</strong> <strong>RESTful</strong> web services. While most users use the Swagger UI tool to identify Swagger, the Swagger tool set includes automated documentation support, code generation and test case generation.</li> </ul><p>It is out of the scope how to install and configure ORDS. For that purpose you can check this fantastic article from <a href="http://https://oracle-base.com/misc/site-info">Tim Hall: </a><a href="https://oracle-base.com/articles/linux/docker-oracle-rest-data-services-ords-on-docker">https://oracle-base.com/articles/linux/docker-oracle-rest-data-services…</a></p> <h1>REST services, how to.</h1> <p>This example will show us how to create a Web Services within a new <strong>schema</strong> called <strong>ORDSEXAMPLE</strong>.</p> <h2>Create a new user</h2> <p>Log as SYSDBA and create a new user with the necessary privileges using the following SQL:</p> <pre> ALTER SESSION SET CONTAINER =pdb1; CREATE USER ORDSEXAMPLE IDENTIFIED BY ordsexample_1995 DEFAULT TABLESPACE USERS QUOTA UNLIMITED ON USERS; GRANT CREATE TABLE TO ORDSEXAMPLE; GRANT CREATE SESSION TO ORDSEXAMPLE</pre><p> <style type="text/css"> <!--/*--><![CDATA[/* ><!--*/ p { margin-bottom: 0.1in; direction: ltr; line-height: 115%; text-align: justify; }p.western { font-family: "Times New Roman", serif; }p.cjk { font-family: "Times New Roman"; }p.ctl { font-family: "Times New Roman"; font-size: 12pt; }a:link { color: rgb(0, 0, 25 /*--><!]]>*/ </style></p><p>Prepare your schema</p> <p>Create a copy of the good old classic Oracle <strong>EMP</strong> and <strong>DEPT</strong> tables with sample data using the following SQL snippets:</p> <pre> CREATE TABLE dept ( deptno NUMBER(2, 0), dname VARCHAR2(14), loc VARCHAR2(13), CONSTRAINT pk_dept PRIMARY KEY ( deptno ) ); CREATE TABLE emp ( empno NUMBER(4, 0), ename VARCHAR2(10), job VARCHAR2(9), mgr NUMBER(4, 0), hiredate DATE, sal NUMBER(7, 2), comm NUMBER(7, 2), deptno NUMBER(2, 0), CONSTRAINT pk_emp PRIMARY KEY ( empno ), CONSTRAINT fk_deptno FOREIGN KEY ( deptno ) REFERENCES dept ( deptno ) );</pre><pre lang="en-IE" style="margin-top: 0.08in; margin-bottom: 0in; line-height: 115%;" xml:lang="en-IE"> <style type="text/css"> <!--/*--><![CDATA[/* ><!--*/ p { margin-bottom: 0.1in; direction: ltr; line-height: 115%; text-align: justify; }p.western { font-family: "Times New Roman", serif; }p.cjk { font-family: "Times New Roman"; }p.ctl { font-family: "Times New Roman"; font-size: 12pt; }a:link { color: rgb(0, 0, 255); } /*--><!]]>*/ </style></pre><pre> <style type="text/css"> <!--/*--><![CDATA[/* ><!--*/ p { margin-bottom: 0.1in; direction: ltr; line-height: 115%; text-align: justify; }p.western { font-family: "Times New Roman", serif; }p.cjk { font-family: "Times New Roman"; }p.ctl { font-family: "Times New Roman"; font-size: 12pt; }a:link { color: rgb(0, 0, 255); } /*--><!]]>*/ </style></pre><p>Populate them:</p> <pre> INSERT INTO DEPT VALUES(10,'ACCOUNTING','NEW YORK'); INSERT INTO DEPT VALUES(20,'RESEARCH','DALLAS'); INSERT INTO DEPT VALUES(30,'SALES','CHICAGO'); INSERT INTO DEPT VALUES(40,'OPERATIONS','BOSTON'); INSERT INTO DEPT VALUES(50,'PURCHASING','WASHINGTON'); INSERT INTO DEPT VALUES(60,'HR','NEW YORK'); INSERT INTO EMP (EMPNO,ENAME,JOB,MGR,HIREDATE,SAL,COMM,DEPTNO) VALUES (7788,'SCOTT','ANALYST',7566,TO_DATE('19-APR-87','DD-MON-RR'),3000,NULL,20); INSERT INTO EMP (EMPNO,ENAME,JOB,MGR,HIREDATE,SAL,COMM,DEPTNO) VALUES (7698,'BLAKE','MANAGER',7839,TO_DATE('01-MAY-81','DD-MON-RR'),2850,NULL,30); INSERT INTO EMP (EMPNO,ENAME,JOB,MGR,HIREDATE,SAL,COMM,DEPTNO) VALUES (7839,'KING','PRESIDENT',NULL,TO_DATE('17-NOV-81','DD-MON-RR'),5000,NULL,10); INSERT INTO EMP (EMPNO,ENAME,JOB,MGR,HIREDATE,SAL,COMM,DEPTNO) VALUES (7566,'JONES','MANAGER',7839,TO_DATE('02-APR-81','DD-MON-RR'),2975,NULL,20); INSERT INTO EMP (EMPNO,ENAME,JOB,MGR,HIREDATE,SAL,COMM,DEPTNO) VALUES (7782,'CLARK','MANAGER',7839,TO_DATE('09-JUN-81','DD-MON-RR'),2450,NULL,10); INSERT INTO EMP (EMPNO,ENAME,JOB,MGR,HIREDATE,SAL,COMM,DEPTNO) VALUES (7902,'FORD','ANALYST',7566,TO_DATE('03-DEC-81','DD-MON-RR'),3000,NULL,20); INSERT INTO EMP (EMPNO,ENAME,JOB,MGR,HIREDATE,SAL,COMM,DEPTNO) VALUES (7876,'ADAMS','CLERK',7788,TO_DATE('23-MAY-87','DD-MON-RR'),1100,NULL,20); INSERT INTO EMP (EMPNO,ENAME,JOB,MGR,HIREDATE,SAL,COMM,DEPTNO) VALUES (7499,'ALLEN','SALESMAN',7698,TO_DATE('20-FEB-81','DD-MON-RR'),1600,300,30); INSERT INTO EMP (EMPNO,ENAME,JOB,MGR,HIREDATE,SAL,COMM,DEPTNO) VALUES (7900,'JAMES','CLERK',7698,TO_DATE('03-DEC-81','DD-MON-RR'),950,NULL,50); INSERT INTO EMP (EMPNO,ENAME,JOB,MGR,HIREDATE,SAL,COMM,DEPTNO) VALUES (7654,'MARTIN','SALESMAN',7698,TO_DATE('28-SEP-81','DD-MON-RR'),1250,1400,30); INSERT INTO EMP (EMPNO,ENAME,JOB,MGR,HIREDATE,SAL,COMM,DEPTNO) VALUES (7934,'MILLER','CLERK',7782,TO_DATE('23-JAN-82','DD-MON-RR'),1300,NULL,10); INSERT INTO EMP (EMPNO,ENAME,JOB,MGR,HIREDATE,SAL,COMM,DEPTNO) VALUES (7369,'SMITH','CLERK',7902,TO_DATE('17-DEC-80','DD-MON-RR'),800,NULL,60); INSERT INTO EMP (EMPNO,ENAME,JOB,MGR,HIREDATE,SAL,COMM,DEPTNO) VALUES (7844,'TURNER','SALESMAN',7698,TO_DATE('08-SEP-81','DD-MON-RR'),1500,0,30); INSERT INTO EMP (EMPNO,ENAME,JOB,MGR,HIREDATE,SAL,COMM,DEPTNO) VALUES (7521,'WARD','SALESMAN',7698,TO_DATE('22-FEB-81','DD-MON-RR'),1250,500,40); COMMIT;</pre><p> <style type="text/css"> <!--/*--><![CDATA[/* ><!--*/ p { margin-bottom: 0.1in; direction: ltr; line-height: 115%; text-align: justify; }p.western { font-family: "Times New Roman", serif; }p.cjk { font-family: "Times New Roman"; }p.ctl { font-family: "Times New Roman"; font-size: 12pt; }a:link { color: rgb(0, 0, 255); } /*--><!]]>*/ </style></p> <p>We can use check that the tables are correctly filled and include the default collection of six divisions and fourteen employees:</p> <pre lang="en-IE" style="margin-top: 0.08in; margin-bottom: 0in; line-height: 115%;" xml:lang="en-IE"> SELECT     ename,     dname,     job,     empno,     hiredate,     loc FROM     emp,     dept WHERE     emp.deptno = dept.deptno ORDER BY     emp.ename;</pre><h2 style="margin-bottom: 0in; line-height: 150%; break-before: page;">Enable the ORDS schema</h2> <p>Before creating any Web Services, we have to enable REST data services for the ORDSEXAMPLE schema:</p> <pre lang="en-IE" style="margin-top: 0.08in; margin-bottom: 0in; line-height: 115%;" xml:lang="en-IE"> BEGIN     ords.enable_schema(p_enabled =&gt; TRUE,                                     p_schema =&gt; 'ORDSEXAMPLE',                                     p_url_mapping_type =&gt; 'BASE_PATH',                                     p_url_mapping_pattern =&gt; 'api',                                     p_auto_rest_auth =&gt; FALSE);     COMMIT; END;</pre><p> <style type="text/css"> <!--/*--><![CDATA[/* ><!--*/ p { margin-bottom: 0.1in; direction: ltr; line-height: 115%; text-align: justify; }p.western { font-family: "Times New Roman", serif; }p.cjk { font-family: "Times New Roman"; }p.ctl { font-family: "Times New Roman"; font-size: 12pt; }a:link { color: rgb(0, 0, 255); } /*--><!]]>*/ </style></p><p>Web services from the schema can now be referenced using the following base URL: <strong><a href="http://localhost:8080/ords/ORDSEXAMPLE/">http://localhost:8080/ords/ORDSEXAMPLE/</a></strong></p> <h2>Define Module</h2> <p>We name the resource module <strong>hr.v1</strong>. A Web Service name should always include the version number; this will allow us to publish updated versions of a Web Service that follow the same URL patterns:</p> <pre lang="en-IE" style="margin-top: 0.08in; margin-bottom: 0in; line-height: 115%;" xml:lang="en-IE"> BEGIN     ords.define_module(p_module_name =&gt; 'hr.v1',                                    p_base_path =&gt; 'hr/v1/',                                    p_items_per_page =&gt; 10,                                    p_status =&gt; 'PUBLISHED',                                    p_comments =&gt; 'Sample HR Module');     COMMIT; END;</pre><h2> <style type="text/css"> <!--/*--><![CDATA[/* ><!--*/ p { margin-bottom: 0.1in; direction: ltr; line-height: 115%; text-align: justify; }p.western { font-family: "Times New Roman", serif; }p.cjk { font-family: "Times New Roman"; }p.ctl { font-family: "Times New Roman"; font-size: 12pt; }a:link { color: rgb(0, 0, 255); } /*--><!]]>*/ </style></h2><p>Define Template </p><pre align="left" lang="en-IE" style="margin-bottom: 0in; line-height: 100%;" xml:lang="en-IE"> BEGIN     ords.define_template(p_module_name =&gt; 'hr.v1',                                      p_pattern =&gt; 'departments',                                      p_comments =&gt; 'Departments Resource');                                      ords.define_template(p_module_name =&gt; 'hr.v1',                                      p_pattern =&gt; 'employees',                                      p_comments =&gt; 'Employees Resource');     COMMIT; END; </pre><h2>Define the handlers</h2> <p>One for each HTTP method: GET, POST,PUT and DELETE</p> <pre> -- The GET Handler method on the departments template BEGIN ords.define_handler(p_module_name =&gt; 'hr.v1', p_pattern =&gt; 'departments', p_method =&gt; 'GET', p_source_type =&gt; ords.source_type_query, p_source =&gt; 'SELECT deptno, dname, loc FROM dept ORDER BY deptno', p_items_per_page =&gt; 5, p_comments =&gt; 'List departments'); COMMIT; END; -- The POST Handler method on the departments template BEGIN ords.define_handler(p_module_name =&gt; 'hr.v1', p_pattern =&gt; 'departments', p_method =&gt; 'POST', p_source_type =&gt; ords.source_type_plsql , p_source =&gt; 'BEGIN INSERT INTO dept (deptno, dname, loc) VALUES (:pn_dept_no, :pv_dept_name, :pv_location); :pn_status := 200; :pv_result := ''Department Added''; EXCEPTION WHEN OTHERS THEN :pn_status := 400; :pv_result := ''Unable to add department: '' || SQLERRM; END;' , p_comments =&gt; 'Create a Department'); END; -- The PUT Handler method on the departments template BEGIN ords.define_handler(p_module_name =&gt; 'hr.v1', p_pattern =&gt; 'departments', p_method =&gt; 'PUT', p_source_type =&gt; ords.source_type_plsql , p_source =&gt; 'BEGIN UPDATE dept SET dname = :pv_dept_name, loc = :pv_location WHERE deptno = :pn_dept_no; IF SQL%ROWCOUNT = 0 THEN :pn_status := 400; :pv_result := ''Invalid department number''; ELSE :pn_status := 200; :pv_result := ''Department Updated''; END IF; EXCEPTION WHEN OTHERS THEN :pn_status := 400; :pv_result := ''Unable to update department:'' || SQLERRM; END;' , p_comments =&gt; 'Create a Department'); END; -- The DELETE Handler method on the departments template BEGIN ords.define_handler(p_module_name =&gt; 'hr.v1', p_pattern =&gt; 'departments', p_method =&gt; 'DELETE', p_source_type =&gt; ords.source_type_plsql , p_source =&gt; 'BEGIN DELETE FROM dept WHERE deptno = :pn_dept_no; IF SQL%ROWCOUNT = 0 THEN :pn_status := 400; :pv_result := ''Invalid department number''; ELSE :pn_status := 200; :pv_result := ''Department Deleted''; END IF; EXCEPTION WHEN OTHERS THEN :pn_status := 400; :pv_result := ''Unable to delete department: '' || SQLERRM; END;' , p_comments =&gt; 'Delete a Department'); COMMIT; END</pre><pre> -- The GET Handler method on the employees template BEGIN ords.define_handler(p_module_name =&gt; 'hr.v1', p_pattern =&gt; 'employees', p_method =&gt; 'GET', p_source_type =&gt; ords.source_type_query , p_source =&gt; 'SELECT d.dname, e.ename, e.job, e.empno, e.hiredate, d.loc FROM emp e, dept d WHERE e.deptno = d.deptno AND (:pn_deptno IS NULL OR d.deptno = :pn_deptno) ORDER BY d.dname, e.ename' , p_comments =&gt; 'List employees'); COMMIT; END;</pre><h2>Define Parameters</h2> <p>The following scripts will define all parameters for each handler:</p> <pre> --Departments PUT BEGIN ords.define_parameter(p_module_name =&gt; 'hr.v1', p_pattern =&gt; 'departments', p_method =&gt; 'PUT', p_name =&gt; 'department_number', p_bind_variable_name =&gt; 'pn_dept_no', p_source_type =&gt; 'HEADER', p_param_type =&gt; 'INT', p_access_method =&gt; 'IN', p_comments =&gt; 'Department Number'); ords.define_parameter(p_module_name =&gt; 'hr.v1', p_pattern =&gt; 'departments', p_method =&gt; 'PUT', p_name =&gt; 'department_name', p_bind_variable_name =&gt; 'pv_dept_name', p_source_type =&gt; 'HEADER', p_param_type =&gt; 'STRING', p_access_method =&gt; 'IN', p_comments =&gt; 'Department Name'); ords.define_parameter(p_module_name =&gt; 'hr.v1', p_pattern =&gt; 'departments', p_method =&gt; 'PUT', p_name =&gt; 'location_name', p_bind_variable_name =&gt; 'pv_location', p_source_type =&gt; 'HEADER', p_param_type =&gt; 'STRING', p_access_method =&gt; 'IN', p_comments =&gt; 'Location Name'); ords.define_parameter(p_module_name =&gt; 'hr.v1', p_pattern =&gt; 'departments', p_method =&gt; 'PUT', p_name =&gt; 'X-APEX-STATUS-CODE', p_bind_variable_name =&gt; 'pn_status', p_source_type =&gt; 'HEADER', p_param_type =&gt; 'INT', p_access_method =&gt; 'OUT', p_comments =&gt; 'Response status'); ords.define_parameter(p_module_name =&gt; 'hr.v1', p_pattern =&gt; 'departments', p_method =&gt; 'PUT', p_name =&gt; 'result_message', p_bind_variable_name =&gt; 'pv_result', p_source_type =&gt; 'RESPONSE', p_param_type =&gt; 'STRING', p_access_method =&gt; 'OUT', p_comments =&gt; 'Result message'); COMMIT; END; --Department DELETE BEGIN ords.define_parameter(p_module_name =&gt; 'hr.v1', p_pattern =&gt; 'departments', p_method =&gt; 'DELETE', p_name =&gt; 'department_number', p_bind_variable_name =&gt; 'pn_dept_no', p_source_type =&gt; 'HEADER', p_param_type =&gt; 'INT', p_access_method =&gt; 'IN', p_comments =&gt; 'Department Number'); ords.define_parameter(p_module_name =&gt; 'hr.v1', p_pattern =&gt; 'departments', p_method =&gt; 'DELETE', p_name =&gt; 'X-APEX-STATUS-CODE', p_bind_variable_name =&gt; 'pn_status', p_source_type =&gt; 'HEADER', p_param_type =&gt; 'INT', p_access_method =&gt; 'OUT', p_comments =&gt; 'Response status'); ords.define_parameter(p_module_name =&gt; 'hr.v1', p_pattern =&gt; 'departments', p_method =&gt; 'DELETE', p_name =&gt; 'result_message', p_bind_variable_name =&gt; 'pv_result', p_source_type =&gt; 'RESPONSE', p_param_type =&gt; 'STRING', p_access_method =&gt; 'OUT', p_comments =&gt; 'Result message'); COMMIT; END; --Employee GET BEGIN ords.define_parameter(p_module_name =&gt; 'hr.v1', p_pattern =&gt; 'employees', p_method =&gt; 'GET', p_name =&gt; 'department_number', p_bind_variable_name =&gt; 'pn_deptno', p_source_type =&gt; 'URI', p_param_type =&gt; 'INT', p_access_method =&gt; 'IN', p_comments =&gt; 'Restrict employees by department'); COMMIT; END;</pre><p>We can use the <strong>ORDS_METADATA</strong> <strong>views</strong> to query them. The following SQL query will return all of the Web Services defined for the ORDSEXAMPLE schema.</p> <pre> --Reviewing the Web Services SELECT uom.comments module_desc, uot.comments template_desc, uoh.comments handler_desc, uoh.method, uoh.source_type, '&lt;host_ref&gt;' || uos.pattern || uom.uri_prefix || uot.uri_template url, ( SELECT COUNT(id) FROM user_ords_parameters WHERE handler_id = uoh.id ) parameter_count FROM user_ords_schemas uos, user_ords_modules uom, user_ords_templates uot, user_ords_handlers uoh WHERE uot.module_id = uom.id AND uom.schema_id = uos.id AND uoh.template_id = uot.id AND uos.parsing_schema = 'ORDSEXAMPLE' ORDER BY uom.comments, uot.uri_template --Example &lt;host_ref&gt; = http://localhost:8080/ords/ </pre><h2>Testing Web Services</h2> <p>Just choose your preferred HTTP client!</p> <h1>Generating a Swagger document through ORDS</h1> <h2>1- Using the open-api-catalog (Swagger Editor)</h2> <p>The <strong>open-api-catalog</strong> output is in <strong>Open API (Swagger) 2.0</strong> format, which makes it really simple to generate documentation and an example of the calling code in many programming languages.</p> <p> An overview of the contents of an ORDS enabled schema can be displayed in a browser (GET HTTP method) using the following type of URL. The output is a JSON document:</p> <ul><li><strong>Format</strong>: <a href="http://server:port/ords/&lt;connection&gt;/&lt;schema-alias&gt;/open-api-catalog/&lt;object-alias&amp;gt">http://server:port/ords/&lt;connection&gt;/&lt;schema-alias&gt;/open-api-catalog/&lt;o…</a>;</li> <li><strong>Example</strong>: <a href="http://localhost:8080/ords/api/open-api-catalog/hr/v1/">http://localhost:8080/ords/api/open-api-catalog/hr/v1/</a></li> </ul><p>Go to the online Swagger Editor (<strong><a href="https://editor.swagger.io/">https://editor.swagger.io/</a></strong>), paste the output JSON text. It will be converted to YAML and display the available endpoints.</p> <h2>2- Using the swagger-UI server</h2> <p>In the previous version of APEX was not obvious to generate this documentation. The new setting takes a URL which points to a SWAGGER UI 2.0 server. If the instance setting is set, the URL that generates the web service’s swagger JSON document will be sent to the Swagger UI server. If there is no URL specified, raw JSON will be produced.</p> <p> </p> <p><img alt="" src="http://db-blog-multimedia.web.cern.ch/db-blog-multimedia/lurodrig/iheb-ords/figure2.png" /></p> <p class="rtecenter"><em>Figure 2: Setting the Swagger URL </em></p> <p> </p> <p>Setting up the Swagger server is actually quite easy. It is a simple set of HTML, Javascript and CSS files that can be downloaded from here (<a href="https://swagger.io/tools/swagger-ui/">https://swagger.io/tools/swagger-ui/</a>) and unzipped into a directory into the web root of our local web server. During my test I pulled a pre-built docker image of the swagger-ui directly from Docker Hub. and I ran it in a separate container <a href="http://localhost:8888/">http://localhost:8888/</a>. Then, from ORDS REST Workshop, navigate to the module definition level and click the Generate Swagger Doc button.</p> <p>If the Swagger UI URL is set correctly at the Instance level, APEX will forward the URL of the documentation to the Swagger UI Server. The server must be able to reach back to the documentation URL, but as long as it can, we will not only get documentation but be able to test the services as well.</p> <p><img alt="" src="http://db-blog-multimedia.web.cern.ch/db-blog-multimedia/lurodrig/iheb-ords/figure3.png" /></p> <p class="rtecenter"><em>Figure 3: APEX REST Workshop for generating swagger documents </em></p> <p class="rteleft"> </p> <p class="rteleft"><img alt="" src="http://db-blog-multimedia.web.cern.ch/db-blog-multimedia/lurodrig/iheb-ords/figure4.png" /></p> <p class="rtecenter"><em>Figure 4: Documentation generated by Swagger UI </em></p> <p class="rtecenter"> </p> <h1 class="rteleft">Acknowledgements and Reference links</h1> <p>My greatest thanks go to my supervisor Luis Rodriguez Fernandez who has been guiding me throughout the program. I think him for all the time devoted to me, for his total availability, for his kindness and especially for his remarks and constructive criticism. I am also thankful to all the IT-DB group members for their warm welcome, their help during this project and for making this experience very pleasant and rewarding.</p> <ul><li><a href="https://docs.oracle.com/en/database/oracle/oracle-rest-data-services/19.1/index.html">https://docs.oracle.com/en/database/oracle/oracle-rest-data-services/19…</a></li> <li><a href="https://oracle-base.com/articles/misc/oracle-rest-data-services-ords-open-api-swagger-support">https://oracle-base.com/articles/misc/oracle-rest-data-services-ords-op…</a></li> </ul><p> </p> <p><style type="text/css"> <!--/*--><![CDATA[/* ><!--*/ p { margin-bottom: 0.1in; direction: ltr; line-height: 115%; text-align: justify; }p.western { font-family: "Times New Roman", serif; }p.cjk { font-family: "Times New Roman"; }p.ctl { font-family: "Times New Roman"; font-size: 12pt; }a:link { color: rgb(0, 0, 255); } /*--><!]]>*/ </style></p> <p style="margin-bottom: 0in; line-height: 150%"> <style type="text/css"> <!--/*--><![CDATA[/* ><!--*/ p { margin-bottom: 0.1in; direction: ltr; line-height: 115%; text-align: justify; }p.western { font-family: "Times New Roman", serif; }p.cjk { font-family: "Times New Roman"; }p.ctl { font-family: "Times New Roman"; font-size: 12pt; }a:link { color: rgb(0, 0, 255); } /*--><!]]>*/ </style></p> <p><style type="text/css"> <!--/*--><![CDATA[/* ><!--*/ p { margin-bottom: 0.1in; direction: ltr; line-height: 115%; text-align: justify; }p.western { font-family: "Times New Roman", serif; }p.cjk { font-family: "Times New Roman"; }p.ctl { font-family: "Times New Roman"; font-size: 12pt; }a:link { color: rgb(0, 0, 255); } /*--><!]]>*/ </style></p> <p><style type="text/css"> <!--/*--><![CDATA[/* ><!--*/ p { margin-bottom: 0.1in; direction: ltr; line-height: 115%; text-align: justify; }p.western { font-family: "Times New Roman", serif; }p.cjk { font-family: "Times New Roman"; }p.ctl { font-family: "Times New Roman"; font-size: 12pt; }a:link { color: rgb(0, 0, 255); } /*--><!]]>*/ </style></p> <p><style type="text/css"> <!--/*--><![CDATA[/* ><!--*/ p { margin-bottom: 0.1in; direction: ltr; line-height: 115%; text-align: justify; }p.western { font-family: "Times New Roman", serif; }p.cjk { font-family: "Times New Roman"; }p.ctl { font-family: "Times New Roman"; font-size: 12pt; }a:link { color: rgb(0, 0, 255); } /*--><!]]>*/ </style></p> <p><style type="text/css"> <!--/*--><![CDATA[/* ><!--*/ p { margin-bottom: 0.1in; direction: ltr; line-height: 115%; text-align: justify; }p.western { }a:link { color: rgb(0, 0, 255); } /*--><!]]>*/ </style></p> </div> </div> <span><a title="View user profile." href="/users/luis-rodriguez-fernandez" lang="" about="/users/luis-rodriguez-fernandez" typeof="schema:Person" property="schema:name" datatype="">lurodrig</a></span> <span>Fri, 01/24/2020 - 10:18</span> <section> <h2>Add new comment</h2> <drupal-render-placeholder callback="comment.lazy_builders:renderForm" arguments="0=node&amp;1=175&amp;2=comment_node_blog_post&amp;3=comment_node_blog_post" token="IQJfCbB5r4Lmr4VfhKHByfYXKkQc6xbBV8nFNwbV5UA"></drupal-render-placeholder> </section> Fri, 24 Jan 2020 09:18:30 +0000 lurodrig 175 at https://db-blog.web.cern.ch Benefits of a multi-layer system https://db-blog.web.cern.ch/blog/viktor-kozlovszky/2019-12-benefits-multi-layer-system <span>Benefits of a multi-layer system</span> <div class="field field--name-body field--type-text-with-summary field--label-above"> <div class="field--label"><b>Blog article:</b></div> <div class="field--item"><h1>Introduction</h1> <p>Designing a <a href="https://stackify.com/n-tier-architecture/">multi-layer system</a> is not rocket science, the difficulty can lie in selecting the right technologies. The main concept behind the design is to have better control and fine tuning of the components. This blog post will discuss the benefits &amp; limitations of implementing this type of design and our practical experience gained from using it for the Open Days reservation system, which helped to welcome 75.000 people on our site and was hosted on the Oracle cloud using their cloud services.</p> <p>This post is part of the "Open Days reservation system - 2019" series. To learn more about the CERN Open Days project, read <a href="https://db-blog.web.cern.ch/blog/viktor-kozlovszky/2019-10-open-days-reservation-systems-high-level-overview-2019">the following post</a>.</p> <h1>Multi-layer concept</h1> <p>The multi-layer system is not a new invention, it allows to combine technologies working together. The main idea behind it is to group functionalities and implement them using the best fitting programming framework / language, which can result in a fast and task optimized system.</p> <div><spanp>One of the best ways to implement layer separation is by asking the following questions: <ul><li>What are the interaction points of the system? <em>(User interface(s))</em></li> <li>What is the status of the workflow and what changes are allowed during system usage? <em>(Business logic)</em></li> <li>How can changes be applied in a trackable and reversible way <em>(Data persistence)</em></li> </ul><p></p></spanp></div> <p>Based on the answers to these questions we can have a simple or a complex implementation. The following diagram shows the difference between these two implementation possibilities. Note that the layout of the complex implementation strategy is only an example and the final design can be a subset or a more complicated solution made up of some of the visualized elements. <em>(The cross-layer components of the complex strategy demonstrating the implementations of certain technologies like JAVA JSP, C# <span class="st">ASP.NET Web Forms, C# ASP.NET MVC, etc.</span></em><em>)</em></p> <div style="text-align:center"><span><span style="font-size: 11pt; font-family: Arial; color: rgb(0, 0, 0); background-color: transparent; font-variant-numeric: normal; font-variant-east-asian: normal; vertical-align: baseline; white-space: pre-wrap;"><img alt="" src="http://db-blog-multimedia.web.cern.ch/db-blog-multimedia/vkozlovs/opendays-2019/multi-layer-concept.png" /></span></span></div> <p>A multi-layer design is very flexible, especially when it comes to selecting the technology of the architectural components. It allows the developer to use any combination of technologies as long as they support information exchange standards. Thanks to this feature architectural components are technology independent, which means that they can be replaced or reimplemented using another technology (i.e. given these meet the requirements of the replaced component) at anytime without adding additional risks to the system.</p> <h1>Design of the Open Days reservation system</h1> <p>Selecting the right software stack is a difficult task. The difficulty lies in forecasting future requirements of the system. It is possible to overcome this obstacle or at least be prepared for a technology shift, so the impact of any changes will be less severe. A multi-layer design aims to solve this by allowing different technologies to interact through standard interfaces(/messages), so at any point they can be replaced with the same functionality written in another language/framework. This allows for changes to the architectural components as the requirements of the system evolve over time.</p> <div><span>At the planning phase, our software stack selection was based on the “common ground” approach. This approach requires using technologies, which are well known by the majority of the team. </span>For the simple implementation startegy our stack selection was the following<span>:</span> <ul><li><strong>User interface layer</strong>: For this layer we chose the <a href="https://angular.io/">Angular</a> framework. This framework is optimized to work fast for form applications. It requires following a strict coding standard, has a substantial active developer community and lots of feature libraries. It supports the re-use of written components, has an <a href="https://db-blog.web.cern.ch/blog/viktor-kozlovszky/2019-11-internationalization-open-days-reservation-system-2019">out of the box solution for internationalization</a> and allows combination with <a href="https://material.angular.io/">Angular material design</a>.</li> <li><strong>Business logic layer</strong>: For this layer we chose the JAVA based <a href="https://spring.io/projects/spring-boot">Spring boot</a> framework. This framework has a very well integrated third party collection set. It avoids the developer to deal with all the boiler-plate spring configuration. Its annotation support it makes the code simpler and more readable.</li> <li><strong>Data persistence layer</strong>: For this layer we chose the <a href="https://docs.oracle.com/en/cloud/paas/atp-cloud/index.html">Oracle Autonomous Transaction Processing (ATP)</a> cloud database. It comes with auto update service for fixes and patches, plus it’s easy to scale resource allocation and thus helps to adjust the system based on its demand.</li> </ul></div> <div style="text-align:center;"><span><span style="font-size: 11pt; font-family: Arial; color: rgb(0, 0, 0); background-color: transparent; font-variant-numeric: normal; font-variant-east-asian: normal; vertical-align: baseline; white-space: pre-wrap;"><img alt="" src="http://db-blog-multimedia.web.cern.ch/db-blog-multimedia/vkozlovs/opendays-2019/OpenDaysArchitecture.png" style="width: 700px; height: 353px;" /></span></span></div> <p> </p> <p>A well-built architectural design makes a big difference when a large stream of people are taking advantage of a system, but it is not the only significant feature. Introducing parallel task handling can also result in better resource performance and task management. It can help to distribute the usage of the system more uniformly between the components involved in execution of the system. This can be further improved by designing a <a href="https://www.w3.org/2001/tag/doc/state-20060215">stateless</a> system. The concept behind this is to split a task into smaller subtasks so they can be carried out by any generic executor. So the components would be individually capable of making smaller changes and the sum of those small changes will result as the main task’s execution. Introducing a stateless design to your system allows you to scale the number of the system components more easier based on demand. The Open Days reservation system was designed stateless, which allowed us to use only the necessary amount of resources.</p> <p>In order to make the components independently scalable we found it easy to follow the <a href="https://www.docker.com/resources/what-container">container-based</a> approach. We used <a href="https://www.docker.com/">Docker</a> containers for setting up individual building environments for the different layers. The containers were running on <a href="https://kubernetes.io/">Kubernetes</a> (K8s) clusters. The cluster deployments were completely automated using <a href="https://www.terraform.io/">Terraform</a> plans. These plans helped to interact with the Oracle <a href="https://docs.cloud.oracle.com/iaas/Content/ContEng/Concepts/contengoverview.htm">Container Engine for Kubernetes</a> (OKE) service, which is well integrated with the Oracle <a href="https://www.oracle.com/cloud/networking/load-balancing.html">load balancer</a> service. Thanks to the integration the configuration of traffic to the clusters was fast and easy.</p> <p>The Open Days reservation system performed very well during its 2.5 months production time. Scaling of the layers individually took only a couple of minutes. This helped our organisation to better manage costs and resources. During the peak time our Kubernetes cluster was scaled to three times and the Autonomous Database to ten times its normal capacity. The system was stress tested, in just 6 minutes it was capable of handling the complete booking for 20,000 parallel users.</p> <h1>Pros and cons of a multi-layer system</h1> <p>A multi-layer design focuses on the fine tuning possibilities of a system by allowing various technologies to work well together. It works best using simple tasks and components dedicated to functionality as its layers. By using continuous integration &amp; deployment (CI/CD) tasks, version management can become easier and simpler. With the right design, the root cause of a bug can be found faster. Adding stateless design on the top of the layer separation can improve the resource management.</p> <p>Unfortunately, despite these benefits this approach also has some disadvantages. For example, the time spent on designing the system takes longer than usual, because it requires the developer to have a clear view of the system. Defining the communication standards between the layers / functional elements requires stricter rules than just designing a global solution.<br /> Since this approach allows you to combine different technologies, it can increase the complexity of the system and also understanding how best to fine tune the layers requires higher professional knowledge.</p> <h1>Conclusions</h1> <p>Depending on the use case, deadlines, the clarity of the system’s purpose and the planned lifecycle, a multi-layer system approach can be a good choice. In the short term it can help the teams to reach their objectives faster and in the long run it can definitely pay off when a technology shift is required to withstand new emerging requirements.</p> <p> </p> <p> </p> <p> </p> </div> </div> <span><a title="View user profile." href="/users/viktor-kozlovszky" lang="" about="/users/viktor-kozlovszky" typeof="schema:Person" property="schema:name" datatype="">vkozlovs</a></span> <span>Thu, 12/19/2019 - 15:31</span> <section> <h2>Add new comment</h2> <drupal-render-placeholder callback="comment.lazy_builders:renderForm" arguments="0=node&amp;1=174&amp;2=comment_node_blog_post&amp;3=comment_node_blog_post" token="mT0dijne3MECg0vGhD_a66Us5GtNke3wznWjYxwMRIY"></drupal-render-placeholder> </section> Thu, 19 Dec 2019 14:31:35 +0000 vkozlovs 174 at https://db-blog.web.cern.ch Oracle REST Data Services running on Tomcat - Basic Authentication using JNDI Realm https://db-blog.web.cern.ch/blog/jakub-granieczny/2019-12-oracle-rest-data-services-running-tomcat-basic-authentication-using <span>Oracle REST Data Services running on Tomcat - Basic Authentication using JNDI Realm</span> <div class="field field--name-body field--type-text-with-summary field--label-above"> <div class="field--label"><b>Blog article:</b></div> <div class="field--item"><h1>Oracle REST Data Services running on Tomcat - Basic Authentication using JNDI Realm</h1> <h2>What do we want to achieve?</h2> <p>We want to protect our REST endpoints using Basic Authentication and authenticate the requests against our users directory (LDAP). We also want to manage the privileges centrally, through the ORDS Roles and Privileges (<a href="https://oracle-base.com/articles/misc/oracle-rest-data-services-ords-authentication#ords-roles-and-privileges">https://oracle-base.com/articles/misc/oracle-rest-data-services-ords-au…</a>), so no matter if ORDS runs on Oracle WebLogic or Apache Tomcat, it should behave the same and give access to the same resources.</p> <h2>Setup</h2> <p>You should have the following installed:</p> <ul><li>ORDS version &gt;=18.1.1</li> <li>Tomcat &gt;=8.0.0</li> <li>JDK 7 or higher</li> </ul><p>For me, a great guide to get started quickly is <a href="https://oracle-base.com/articles/linux/docker-oracle-rest-data-services-ords-on-docker">this</a> one for running ORDS in Docker, written by Tim Hall.</p> <h2>First try</h2> <p>In the beginning the problem seems pretty straightforward. We'll try to authenticate using Tomcat Users, following this guide: <a href="https://oracle-base.com/articles/misc/oracle-rest-data-services-ords-authentication">https://oracle-base.com/articles/misc/oracle-rest-data-services-ords-au…</a>. First, create tomcat-users.xml in "$CATALINA_BASE/conf/tomcat-users.xml" :</p> <pre> &lt;?xml version="1.0" encoding="UTF-8"?&gt; &lt;tomcat-users xmlns="http://tomcat.apache.org/xml" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://tomcat.apache.org/xml tomcat-users.xsd" version="1.0"&gt; &lt;role rolename="ords-rest-access"/&gt; &lt;user username="tomcat" password="tomcat" roles="ords-rest-access"/&gt; &lt;/tomcat-users&gt;</pre><p><code><?xml version="1.0" encoding="UTF-8"??><tomcat-users version="1.0" xmlns="http://tomcat.apache.org/xml" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemalocation="http://tomcat.apache.org/xml tomcat-users.xsd"><role rolename="ords-rest-access"><user password="tomcat" roles="ords-rest-access" username="tomcat"></user></role></tomcat-users></code> Next, configure the realm, let's define it in the "$CATALINA_BASE/conf/server.xml" . For more information, see <a href="https://tomcat.apache.org/tomcat-9.0-doc/realm-howto.html#Configuring_a_Realm">https://tomcat.apache.org/tomcat-9.0-doc/realm-howto.html#Configuring_a…</a></p> <pre> &lt;Resource name="UserDatabase" auth="Container" type="org.apache.catalina.UserDatabase" description="User database that can be updated and saved" factory="org.apache.catalina.users.MemoryUserDatabaseFactory" pathname="conf/tomcat-users.xml" /&gt; ... &lt;Realm className="org.apache.catalina.realm.UserDatabaseRealm" resourceName="UserDatabase"/&gt;</pre><p>We'll also define and secure an endpoint. It will print for us all the CGI variables and will be accessible only for members of group "ords-rest-access". <code> </code></p> <pre> BEGIN &lt;!--define endpoint--&gt; ORDS.DEFINE_MODULE( p_module_name =&gt; 'api.v1', p_base_path =&gt; '/api/v1/', p_items_per_page =&gt; 25, p_status =&gt; 'PUBLISHED', p_comments =&gt; NULL); ORDS.ENABLE_SCHEMA( p_enabled =&gt; TRUE, p_schema =&gt; 'ORDSEXAMPLE', p_url_mapping_type =&gt; 'BASE_PATH', p_url_mapping_pattern =&gt; 'test', p_auto_rest_auth =&gt; FALSE); ORDS.DEFINE_TEMPLATE( p_module_name =&gt; 'api.v1', p_pattern =&gt; 'test_endpoint', p_priority =&gt; 0, p_etag_type =&gt; 'HASH', p_etag_query =&gt; NULL, p_comments =&gt; NULL); ORDS.DEFINE_HANDLER( p_module_name =&gt; 'api.v1', p_pattern =&gt; 'test_endpoint', p_method =&gt; 'GET', p_source_type =&gt; 'plsql/block', p_items_per_page =&gt; 5, p_mimes_allowed =&gt; '', p_comments =&gt; NULL, p_source =&gt; 'BEGIN owa_util.print_cgi_env; END;' ); &lt;!--define security--&gt; DECLARE l_roles OWA.VC_ARR; l_modules OWA.VC_ARR; l_patterns OWA.VC_ARR; ORDS.CREATE_ROLE(p_role_name =&gt; 'ords-rest-access'); l_roles(1) := 'ords-rest-access'; l_patterns(1):= '/api/v1/test_endpoint'; ORDS.DEFINE_PRIVILEGE( p_privilege_name =&gt; 'test_privilege', p_roles =&gt; l_roles, p_patterns =&gt; l_patterns, p_modules =&gt; l_modules, p_label =&gt; '', p_description =&gt; '', p_comments =&gt; NULL); COMMIT; END;</pre><p> </p> <p>With this basic setup we test it and see if it works: <code> </code></p> <p><code>curl -u tomcat:tomcat http://localhost:8080/ords/test/api/v1/test_endpoint </code></p> <h2>First results</h2> <p>Wow, it works! We authenticate with Tomcat user and his role is passed to ORDS and checked against the one defined in the endpoint security constraint. That was easy!</p> <p> </p> <h2>Second try</h2> <p>Authenticating with Tomcat users is a nice training scenario, but what if we want to do something more based on a real production setup like a <a href="https://docs.oracle.com/javase/tutorial/jndi/overview/index.html">JNDI realm</a> to do the same thing?  Just substitute the UserDatabaseRealm with JNDIRealm, provide necessary parameters and everything should be fine. So let's do just that. It would look something like this:</p> <pre> &lt;Realm className="<a href="https://tomcat.apache.org/tomcat-9.0-doc/realm-howto.html#JNDIRealm">org.apache.catalina.realm.JNDIRealm</a>" connectionURL="ldap://localhost:389" userBase="ou=people,dc=mycompany,dc=com" userSearch="(mail={0})" userRoleName="memberOf" roleBase="ou=groups,dc=mycompany,dc=com" roleName="cn" roleSearch="(uniqueMember={0})" /&gt; </pre><p>Doesn't seem too much different, does it?</p> <h2>Second results</h2> <p>Restart Catalina, send the request to the same endpoint, but this time authenticate with your LDAP user and...well, now you broke it. Just great.</p> <p>Alright, let's see the logs up close. First, make sure that your <a href="https://tomcat.apache.org/tomcat-8.0-doc/logging.html#Using_java.util.logging_(default)">logging.properties</a> contains following lines to get more information about the authentication process:</p> <pre> org.apache.catalina.realm.level = ALL org.apache.catalina.realm.useParentHandlers = true org.apache.catalina.authenticator.level = ALL org.apache.catalina.authenticator.useParentHandlers = true oracle.dbtools.level=FINEST</pre><p>We restart Catalina  again and we see very weird behaviour -- from inspecting the catalina.log it seems that Catalina authenticates our request, but then ORDS says that the request has not been authenticated. That is not a big issue for unsecured endpoints, but for endpoints that have been secured, like the one we defined earlier, it renders them inaccessible. The logs may look something like this: <code> </code></p> <p><code>10-Dec-2019 16:14:42.965 FINE [http-nio-8080-exec-9] org.apache.catalina.realm.CombinedRealm.authenticate Authenticated user [jgraniec] with realm [org.apache.catalina.realm.JNDIRealm]</code><br /><code>10-Dec-2019 16:14:42.965 FINE [http-nio-8080-exec-9] org.apache.catalina.authenticator.AuthenticatorBase.register Authenticated 'jgraniec' with type 'NONE'<br /> 10-Dec-2019 16:14:42.965 FINE [http-nio-8080-exec-9] org.apache.catalina.authenticator.AuthenticatorBase.register Authenticated 'none' with type 'null'</code><br /><code>10-Dec-2019 16:14:42.968 FINE [http-nio-8080-exec-9] . did not authenticate request </code></p> <p>Hmm... that looks weird. Why Tomcat authenticates the request, but ORDS doesn't see it? If we look even deeper into the tutorials, we find this nice tutorial to authentication using JDBC realm(Tim Hall again to the rescue!): <a href="https://oracle-base.com/articles/misc/oracle-rest-data-services-ords-basic-and-digest-authentication-on-tomcat-using-jdbcrealm">https://oracle-base.com/articles/misc/oracle-rest-data-services-ords-ba…</a></p> <p>But there is only one significant difference that actually does the trick. It seems that setting up security constraint makes the whole thing work as intended. In the "$CATALINA_BASE/conf/web.xml" we should find something like this:  <code> <security-constraint><web-resource-collection></web-resource-collection></security-constraint><security-role></security-role></code></p> <pre> &lt;security-constraint&gt; &lt;web-resource-collection&gt; &lt;web-resource-name&gt;ords&lt;/web-resource-name&gt; &lt;url-pattern&gt;/*&lt;/url-pattern&gt; &lt;/web-resource-collection&gt; &lt;auth-constraint&gt; &lt;role-name&gt;*&lt;/role-name&gt; &lt;/auth-constraint&gt; &lt;/security-constraint&gt; &lt;login-config&gt; &lt;auth-method&gt;BASIC&lt;/auth-method&gt; &lt;/login-config&gt; &lt;security-role&gt; &lt;role-name&gt;*&lt;/role-name&gt; &lt;/security-role&gt; &lt;/security-coonstraint&gt;</pre><p>And even we see a note:</p> <blockquote><p>"Thanks to Marcel Boermann for helping me with this. I spent a lot of time trying to get this to work, then Marcel sent me an example "web.xml" file, which made it clear you need the full basic authentication setup in Tomcat, in addition to the normal ORDS security setup. (...)"</p> </blockquote> <p>However, it doesn't seem right -- this method protects everything below "/*". If we want a public endpoint, available without any authentication, we would have to exclude it in our web.xml's protection pattern, which seems like the wrong idea and forces us to manage the privileges from two locations - the web.xml of Tomcat and privilege constraints in ORDS itself. What places this situation even further away from perfect is the fact that in WebLogic installation such a problem does not exist. Surely there must be something that can be done to help this behaviour.</p> <h2>The problem</h2> <p>After further inspection (and some decompilation), the culprit seems to be found. When the no security-constraint is present in Tomcat, the authentication is handled first by ORDS itself, using CatalinaAuthenticator (oracle.dbtools.auth.container.catalina.CatalinaAuthenticator to be exact). There we see something along the lines of:</p> <div id="CatalinaAuthenticator" lang="Java" xml:lang="Java"> <pre> ... request.login(username, new String(credential)); try { Principal principal = request.getUserPrincipal(); if (!(principal instanceof User)) { return AuthenticationResult.unknown(); } ...</pre></div> <p> </p> <p>Which doesn't look too bad at first, but if we take a closer look at it, we see that the expected instance of principal is of type "org.apache.catalina.User". In its <a href="https://tomcat.apache.org/tomcat-9.0-doc/api/org/apache/catalina/User.html">JavaDoc</a> we can read:<code> </code></p> <blockquote><p>Abstract representation of a user in a UserDatabase.  Each user is optionally associated with a set of Groups through which he or she inherits additional security roles, and is optionally assigned a set of specific Roles.</p> </blockquote> <p>So we see that this User class is intended to, essentially, be used only with UserDatabaseRealm. As can be seen from the code above, if the <span style="font-family:courier new,courier,monospace;">UserPrincipal</span> is not an instance of <span style="font-family:courier new,courier,monospace;">User</span>, the AuthenticationResults is <cite>unknown</cite> (failed in the result). We see that the <span style="font-family:courier new,courier,monospace;">request.login</span>, in the end, calls Realm's <span style="font-family:courier new,courier,monospace;">authenticate</span> method. The difference between UserDatabaseRealm and JNDIRealm is very slight. If correctly authenticated, <span style="font-family:courier new,courier,monospace;">UserDatabaseRealm</span> returns <span style="font-family:courier new,courier,monospace;">new GenericPrincipal(String name, String password, List</span><string><span style="font-family:courier new,courier,monospace;"> roles, Principal userPrincipal) </span>while <span style="font-family:courier new,courier,monospace;">JNDIRealm</span> returns<span style="font-family:courier new,courier,monospace;"> new GenericPrincipal(String name, String password, List</span><string><span style="font-family:courier new,courier,monospace;"> roles).</span> This causes for the <span style="font-family:courier new,courier,monospace;">GenericPrincipal</span>, when asked for a <span style="font-family:courier new,courier,monospace;">UserPrincipal</span>  to return an instance of GenericPrincipal in case of JNDIRealm:</string></string></p> <p><string><string><span style="font-family:courier new,courier,monospace;"><code> public Principal getUserPrincipal() { </code></span></string></string><string><string><br /><span style="font-family:courier new,courier,monospace;"><code>    return (Principal)(this.userPrincipal != null ? this.userPrincipal : this);<br /> } </code></span> </string></string></p> <p><string><string>The only problem is that GenericPrincipal is not in any correlation with User. They both implement <span style="font-family:courier new,courier,monospace;">java.security.Principal</span> in the end sure, but <span style="font-family:courier new,courier,monospace;">GenericPrincipal</span> is not castable to org.apache.catalina.User. </string></string></p> <p><string><string>In order to work around this problem, we'll create a Valve that will insert our special UserPrincipal in the GenericPrincipal returned.</string></string></p> <p><string><string></string></string></p> <p><span style="font-family:courier new,courier,monospace;">public class OrdsBasicAuthValve extends BasicAuthenticator {</span></p> <p><span style="font-family:courier new,courier,monospace;">    @Override<br />     protected Principal doLogin(Request request, String username, String password) throws ServletException {<br />         Principal principal = super.doLogin(request, username, password);<br />         if (principal instanceof GenericPrincipal) {<br />             GenericPrincipal gp = (GenericPrincipal) principal;<br />             if (!(gp.getUserPrincipal() instanceof User)) {<br />                 User userPrincipal = new UserPrincipal(gp.getName(), gp.getPassword(), gp.getRoles());<br />                 principal = new GenericPrincipal(gp.getName(), gp.getPassword(), Arrays.asList(gp.getRoles()), userPrincipal);<br />             }<br />         }<br />         return principal;<br />     }<br /> }</span></p> <p>Here, the <span style="font-family:courier new,courier,monospace;">UserPrincipal</span> is a simple implementation of <span style="font-family:courier new,courier,monospace;">org.apache.catalina.User</span> interface. A sample how this can look can be found <a href="https://cernbox.cern.ch/index.php/s/scw6SCWQwIsonyF">here</a>.</p> <p><string><string> We make our Tomcat instance use our Valve and voilà! Our ORDS authentication works through Tomcat as intended, without any need for security-constraints and exactly the same way as in WebLogic.</string></string></p> <h2>Sidenote</h2> <p><strong>For ORDS &lt; 18.1.1</strong> there seems to be an issue, where all Tomcat authentication handled by ORDS (so without security-constraint defined directly in Tomcat) fails, even for UserDatabaseRealm.<br /> This is related to the BUG:26881221 ("Fix regression preventing authentication of Tomcat based users") that was <a href="https://www.oracle.com/technetwork/developer-tools/rest-data-services/downloads/ords-relnotes-1811-4426686.html">fixed</a> in version 18.1.1 of ORDS.<br /> My recommendation would be to secure the whole application using the security-constraint and create a fallback realm, which would always authenticate as a 'fallback' user.<br /> However, resulting behaviour will be a little bit different in the case of unsecured endpoints when it comes to the REMOTE_USER CGI variable. What would happen normally is:</p> <ul><li>user authenticated successfully -&gt; REMOTE_USER=<em>*user*</em></li> <li>not authenticated -&gt; REMOTE_USER=<em>*schema_name*</em></li> </ul><p>In the proposed solution:</p> <ul><li> user authenticated successfully -&gt; REMOTE_USER=<em>*user* </em>(just like before)</li> <li> not authenticated -&gt; REMOTE_USER=<em>*fallback_realm_user*</em></li> </ul><p>For building your own realm, a very useful reference I found was: <a href="https://dzone.com/articles/how-to-implement-a-new-realm-in-tomcat">https://dzone.com/articles/how-to-implement-a-new-realm-in-tomcat</a></p> <p> </p> </div> </div> <span><a title="View user profile." href="/users/jakub-granieczny" lang="" about="/users/jakub-granieczny" typeof="schema:Person" property="schema:name" datatype="">jgraniec</a></span> <span>Thu, 12/12/2019 - 16:13</span> <section> <article data-comment-user-id="0" id="comment-36212" class="js-comment"> <mark class="hidden" data-comment-timestamp="1583702947"></mark> <footer> <article typeof="schema:Person" about="/user/0"> </article> <p>Submitted by <a rel="nofollow" href="http://www.prolinux.cl" lang="" typeof="schema:Person" property="schema:name" datatype="">Horacio Miranda (not verified)</a> on Sun, 03/08/2020 - 22:29</p> <a href="/comment/36212#comment-36212" hreflang="und">Permalink</a> </footer> <div> <h3><a href="/comment/36212#comment-36212" class="permalink" rel="bookmark" hreflang="und">Performance test Oracle-Tomcat using pool.</a></h3> <div class="field field--name-comment-body field--type-text-long field--label-hidden field--item"><p>Hi, Oracle comes with a nice way to reduce memory usage, It is called DRCP, they some documents php-oracle and Java-Oracle that talks about howto implement DRCP.<br /> I hope this information reach you well and any application Oracle-Something performs using an small memory foot print.<br /> <a href="https://www.oracle.com/technetwork/database/application-development/jdbc-ucp-conn-mgmt-strategies-3045654.pdf">https://www.oracle.com/technetwork/database/application-development/jdb…</a></p> <p>--<br /> Regards, Horacio Miranda.<br /> Oracle RAC Specialist, RHCE, etc.</p> </div> <drupal-render-placeholder callback="comment.lazy_builders:renderLinks" arguments="0=36212&amp;1=default&amp;2=und&amp;3=" token="HuQKbnHBs3xbRVHFW1S9x7BjSd9J0VwcXKM0GCEnyBI"></drupal-render-placeholder> </div> </article> <article data-comment-user-id="0" id="comment-36390" class="js-comment"> <mark class="hidden" data-comment-timestamp="1591741814"></mark> <footer> <article typeof="schema:Person" about="/user/0"> </article> <p>Submitted by <span lang="" typeof="schema:Person" property="schema:name" datatype="">Alexander Roberts (not verified)</span> on Wed, 06/10/2020 - 00:30</p> <a href="/comment/36390#comment-36390" hreflang="und">Permalink</a> </footer> <div> <h3><a href="/comment/36390#comment-36390" class="permalink" rel="bookmark" hreflang="und">How to build the &quot;valve&quot; workaround.</a></h3> <div class="field field--name-comment-body field--type-text-long field--label-hidden field--item"><p>Hi,<br /> Thank you for your great post. - I too ran into this issue and have (for the time being) used a<br /> change to the &quot;web.xml&quot; file to get this working.</p> <p>I would like to implement the valve workaround as you have and to this end I asked<br /> a collegue of mine who has Java developer skills to build the valve based upon this post.</p> <p>He got so far but has not been able to locate the file that contains the &quot;UserPrincipal&quot; class.<br /> Could you provide me with the details of where this can be found?</p> <p>Best regards<br /> Alex</p> </div> <drupal-render-placeholder callback="comment.lazy_builders:renderLinks" arguments="0=36390&amp;1=default&amp;2=und&amp;3=" token="sjd40dM3meoohSQ2e3da6EhMMce4bKnIwdn52b2tUCs"></drupal-render-placeholder> </div> </article> <div class="indented"> <article data-comment-user-id="257" id="comment-36391" class="js-comment"> <mark class="hidden" data-comment-timestamp="1591778079"></mark> <footer> <article typeof="schema:Person" about="/users/jakub-granieczny"> </article> <p>Submitted by <a title="View user profile." href="/users/jakub-granieczny" lang="" about="/users/jakub-granieczny" typeof="schema:Person" property="schema:name" datatype="">jgraniec</a> on Wed, 06/10/2020 - 10:34</p> <p class="visually-hidden">In reply to <a href="/comment/36390#comment-36390" class="permalink" rel="bookmark" hreflang="und">How to build the &quot;valve&quot; workaround.</a> by <span lang="" typeof="schema:Person" property="schema:name" datatype="">Alexander Roberts (not verified)</span></p> <a href="/comment/36391#comment-36391" hreflang="und">Permalink</a> </footer> <div> <h3><a href="/comment/36391#comment-36391" class="permalink" rel="bookmark" hreflang="und">Hi!</a></h3> <div class="field field--name-comment-body field--type-text-long field--label-hidden field--item"><p>Hi!</p> <p>Having a ORDS REST service with a mix of public and secure endpoints can render the workaround using "web.xml" almost impossible to manage so it's definetely worth to invest some effort to use this valve.</p> <p>The UserPrincipal is a simple implementation of org.apache.catalina.User.<br /> I updated the post with this info and a sample class you can use. </p> <p>We will soon publish a repo with this valve and many more interesting ones for authentication, so be sure to follow the blog!</p> </div> <drupal-render-placeholder callback="comment.lazy_builders:renderLinks" arguments="0=36391&amp;1=default&amp;2=und&amp;3=" token="uxWL0VqcYe31RZ29ym4VMcKNTg-Abk9XDgR2BWmTwCc"></drupal-render-placeholder> </div> </article> </div> <h2>Add new comment</h2> <drupal-render-placeholder callback="comment.lazy_builders:renderForm" arguments="0=node&amp;1=173&amp;2=comment_node_blog_post&amp;3=comment_node_blog_post" token="Ew9DTVDLT90Gjl5kMljsX-iIoMSkJAHXFXGO7Ssu3FU"></drupal-render-placeholder> </section> Thu, 12 Dec 2019 15:13:23 +0000 jgraniec 173 at https://db-blog.web.cern.ch Internationalization of the 2019 Open Days reservation system https://db-blog.web.cern.ch/blog/viktor-kozlovszky/2019-11-internationalization-2019-open-days-reservation-system <span>Internationalization of the 2019 Open Days reservation system</span> <div class="field field--name-body field--type-text-with-summary field--label-above"> <div class="field--label"><b>Blog article:</b></div> <div class="field--item"><h1>Introduction</h1> <p>International organisations can have multiple official languages, in these cases usually their workflows/processes are designed to support that. CERN is one of those organisations, it's official languages are French and English. Therefore one of our tasks was to make the Open Days reservation system bilingual. In this article you will read about the choices we made to internationalize the system, what obstacles we faced and what solution we went for.</p> <p>This post is part of the "Open Days reservation system - 2019" series. If you want to know more about the CERN Open Days project, you can find it in <a href="https://db-blog.web.cern.ch/blog/viktor-kozlovszky/2019-10-open-days-reservation-systems-high-level-overview-2019">the following post</a>.</p> <h1>Translation layers for Open Days reservation system</h1> <p>From the development perspective it is important to have a rough idea about the deliverable system before you start working on it. Of course it is not possible to plan everything in advance, but it is important to have a basic understanding of the requirements, which the system needs to meet. Internationalization (<a href="https://www.w3.org/International/questions/qa-i18n#i18n">i18n</a>) is one of these aspects, and its importance increases as the system goes from internal use to external use.</p> <p><a href="https://db-blog.web.cern.ch/blog/viktor-kozlovszky/2019-10-internationalization-concepts-and-implementations">I18n depends highly on the technologies</a> and on the architectural design, which you select for your project. Certain things can also fall out while you are checking the project as a whole. For example at which phase will be the final display text available, the technical skill level of the translators, how the components are going to share the translation, etc. It is important that you know the answers for these in advance so you can lower the risks and the amount of dirty solutions/hacks as you get closer to delivery.</p> <p>We knew from the beginning that the production data (entrances + opening times + capacity, etc.) for the Open Days would only be available when the registration period started. We were also aware that during booking we want to have basic reporting which shows the progress of the reservations (how many tickets are still available, how people are going to arrive to where , etc. ). According to this additional information and due to the layer-separated system, consisting of front-end, back-end and database layers, we found the layer-divided translation approach the best fit.</p> <p><img alt="" src="http://db-blog-multimedia.web.cern.ch/db-blog-multimedia/vkozlovs/opendays-2019/language_pyramid.png" /></p> <p><b id="docs-internal-guid-bdd2711e-7fff-392a-8d16-46f283f57f84" style="font-weight:normal;"><span style="font-size:11pt;font-family:Arial;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;">The image demonstrates a layer separated translation design. Imagine your application/system as a pyramid which represents the stable/final version of the work. This is what the different language speaking users are going to interact with. The workflows should not/are not going to change based on the language, they are stable as a constructed building. On the pyramid you can see our different i18n layers, which are the following:</span></b></p> <ul><li>In our database we formed a tiny translation layer, we added the most frequently used and the most dynamically changing content such as list elements (radio button, checkbox, combobox, etc.). Since they were living there, it increased the number of executed queries as the content was loading in the end-users browser. One common practice for lowering the number of executed queries is to cache the end results of the queries and reuse them. By putting the selection options into the database we added an extra flexibility to our development process. This became measurable as we were building the drill down functionalities for the dashboard on the top of the gathered data. Keeping only one single source of truth helped us to avoid hardcoding the selection lists for data gathering and reporting.</li> <li>On our back-end application layer we had a <a href="https://www.w3.org/TR/ws-arch/#relwwwrest">REST API</a> based application implemented in <a href="https://spring.io/projects/spring-boot">JAVA using Spring boot</a>. This layer had more translations than the database layer. Here the translations were about the different errors which could occur while loading information from the database or validating and storing information to the database. Since for our solution we did not use user profiles and because of our stateless system design every message coming from the front-end included the preferred language.</li> <li>The biggest portion of the translation lived inside the front-end layer. For this we selected <a href="https://angular.io/">Angular</a>, because it allows collaboration in a standard way not just between the developers, but between the translator team and the developer team also.</li> </ul><p><b id="docs-internal-guid-bbf90c15-7fff-870d-9158-24a136668b96" style="font-weight:normal;"><span style="font-size:11pt;font-family:Arial;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;">The maintenance of the different display languages became very simple once we got the right i18n design. For the back-end application and database layers the design was straight forward, but for the front-end we did not get it right in the first iteration. Once we got that layer’s i18n design right, we were able the benefit from simplified and collaborative usage. In the second part of this post we will focus on what obstacles we faced for the front-end layer and how we solved them.</span></b></p> <h1>Our experiences with i18n for Angular</h1> <p>Angular’s simplicity and true power lies in its command line interface (<a href="https://cli.angular.io/">cli</a>). It has an out-of-the-box approach for internationalization. You add translation decoration on html tags and with the help of the cli you extract them into resource files. For the resource files we used the XML-based <a href="https://www.w3.org/International/its/wiki/XLIFF_1.2_Mapping">XLIFF format</a>. Let’s see the concept through an example.</p> <p>Here is our decorated html tag:</p> <blockquote><p>&lt;div&gt;<br />    &lt;span i18n="welcome page| general greetings @@greetings"&gt;Hello&lt;/span&gt;<br /> &lt;/div&gt;</p> </blockquote> <p>Here is the relevant resource file part which will be generated :</p> <blockquote><p>...<br /> &lt;trans-unit id="greetings" datatype="html" approved="yes"&gt;<br />     &lt;source&gt;Hello&lt;/source&gt;<br />     &lt;target state="translated"&gt;Bonjour&lt;/target&gt;<br />     &lt;context-group purpose="location"&gt;<br />         &lt;context context-type="sourcefile"&gt;app/Pages/welcome-page/welcome-page.component.html&lt;/context&gt;<br />         &lt;context context-type="linenumber"&gt;4&lt;/context&gt;<br />     &lt;/context-group&gt;<br />     &lt;note priority="1" from="description"&gt;general greetings &lt;/note&gt;<br />     &lt;note priority="1" from="meaning"&gt;welcome page&lt;/note&gt;<br /> &lt;/trans-unit&gt;<br /> ...</p> </blockquote> <p>From the example you can see that the cli generates a “trans-unit” block for each i18n decorated tag. The "source" tag contains the text to be translated in the source language, and is taken from the span above. The "target" tag contains the translation. The “context-group” tag lists all the different occurrences of the string. This is needed as the translation block can be used for multiple html tags if the identifier and the decorations are the same. In the translation decoration we can add extra information for the translator, like “meaning” and “description” in the example. These are for providing extra context for the translators. You can find more detailed information about Angular i18n <a href="https://angular.io/guide/i18n">here</a>.</p> <p>Note that if you are going with named (@@greetings) translation units then you have to pay attention on the right naming convention. You should avoid to add the page name or component name to the translation unit id, because the cli is designed to handle multiple occurrences and you don’t want to manage 2 or more translation units with the same content. For these cases it’s better to use the exact same decoration in both places. Also it is very important to understand that for every new i18n decoration, source text or translation context change it is required to regenerate the resource files.</p> <p>Once you have the resource files with the translations you need to build individual applications for each translated language. This means that translation is done during build time and not during runtime. This results in translated content reaching the user’s browser faster.</p> <p>At the beginning of our project updates to the translations didn't happen too often, so manually managing the XLIFF-files was not a problem. As new decorations were added we extracted the XLIFF-files and manually "merged" them with the old ones. When we entered the pre-release phase of development this process became insufficient. By that time the translation files grew big (1 decorated line generates 10 lines) and merging the already translated content manually became very time-consuming.</p> <p>We needed a tool to help us improve our translation cycle, because it was not maintainable. As a result of our search we found <a href="https://github.com/martinroob/ngx-i18nsupport/tree/master/projects/xliffmerge">xliffmerge</a>. This tool merges new translation updates into existing XLIFF-files. With the help of this the developers no longer had to babysit the translation files, and could instead offload the work of translation to others completely, relying 100% on the tool to manage the files.</p> <p>At first when as we started using the tool we faced some some <a href="https://github.com/martinroob/ngx-i18nsupport/issues/145">difficulties</a>, but thanks to the fast reply from the maintainer of the tool, <a href="https://github.com/martinroob">Martin Roob</a>, we managed to find the proper way to use it. The tool requires a source language to be defined, then from that source language it will generate a master resource file. The content of that master resource file will be merged to the other translation resource files. We manually started to edit this generated master file to apply the text changes for the English version. This caused us issues, because it was not properly displaying the modified text. Thanks to the quick reply from the maintainer we understood that the intented approach to change wordings for the source language is to directly edit the HTML source code, not to edit the generated master resource file. The maintainer suggested an alternative approach for us, which fits our use-case well. The suggested approach was to trick the tool into thinking the source language was some unrelated language, and instead have every real target language be a translation of this fake language. With this approach the appearance of our English web pages could be changed using the same method as the that for the French ones. A funny consequence of this is that the source language of the application technically became Norwegian, although in practice this language was treated as "English without our changes".</p> <p>Xliffmerge helps with updates of the XLIFF-files, but you still need to edit them directly to perform translations. <a href="https://virtaal.translatehouse.org/">Virtaal</a> is a tool we used that gives you a simple UI to edit the XLIFFs. It provides an opportunity for non-technicals to contribute and collaborate, without digging into XML files directly.</p> </div> </div> <span><a title="View user profile." href="/users/viktor-kozlovszky" lang="" about="/users/viktor-kozlovszky" typeof="schema:Person" property="schema:name" datatype="">vkozlovs</a></span> <span>Tue, 11/12/2019 - 15:12</span> <section> <h2>Add new comment</h2> <drupal-render-placeholder callback="comment.lazy_builders:renderForm" arguments="0=node&amp;1=169&amp;2=comment_node_blog_post&amp;3=comment_node_blog_post" token="YF_7YQxWLQMQZxov0gBQUfHD-ObX1vIzyEO30SZUVQc"></drupal-render-placeholder> </section> Tue, 12 Nov 2019 14:12:57 +0000 vkozlovs 169 at https://db-blog.web.cern.ch