Talend Job Build Behaviour


Overview :

Within Talend Studio there are two options for job building, either as an OSGI bundle or as Standalone Job. Depending on the composition of the job one or both of these options may be available and this can be confusing, but there is a logic behind it which this article will attempt to explain.

In earlier versions of Talend it was possible to build all jobs into OSGI bundles, but this changed from version 7 and now only those containing an ESB component can be become a bundle, anything else will be a standalone runtime job. Let’s take a deeper look at what is happening here and what effect it may have on integration and mediation processes.

OSGI and Karaf in Talend :

The Open Standards Gateway Initiative (OSGI) framework is designed to assist modular programming at the application level. It performs a similar task to the Spring framework but where Spring provides methods to simplify the plumbing between components, OSGI provides a full lifecycle management engine for registered modules. A container is used similar to a JVM in many aspects but with additional features allowing multiple programs to execute within the same namespace. Bundles are java jar files with additional metadata allowing interaction with the container and other bundles within it. Source code may contain specific compiler directives or annotations with special meanings to the container, causing the process to behave in a particular way. Bundles can have dependencies to other bundles within the container enhancing modularity and usability.

Apache Karaf is built on the OSGI framework allowing a choice of containers to be used. The default installation uses the Apache Felix version but Talend runtime is configured to use Eclipse Equinox instead. This is not surprising as the Open Studio development environment is derived from the Eclipse IDE, itself built on top of OSGI using the Equinox container. Containers can be changed in runtime Karaf but would need full re-configuration, effectively re-inventing the wheel.

Karaf extends the framework to include the concept of features. A feature takes the bundle idea a stage further and contains a manifest that lists all other bundles the main bundle depends on. When a feature is installed this list of dependencies will be checked and any not present in the container will be installed via Maven prior to installation of the lead bundle.

Several features are preinstalled in the runtime container but in the community edition of Talend bundle installation is used to deploy routes and jobs. The subscription version publishes bundles onto the Nexus repository and part of this process generates the associated features which are stored in Nexus with the bundles. Deploying a job from TAC takes the feature from Nexus and installs it in Karaf which will then install the appropriate bundles.

Apache CXF in Karaf :

The CXF framework simplifies creation of both SOAP and REST services and is available in Camel via the CXF bean component. Installing the camel CXF features encapsulating the camel libraries to support this into Karaf enables web services to be built using the jax-ws (SOAP) and jax-rs (REST) API’s and accessed in both routes and jobs within Talend. Once installed into the container the framework can be shared by all bundles significantly reducing the footprint of each service module giving a valuable saving on resources. Since the bundles themselves no longer contain any reference to the framework they won’t work outside the container. To make them run standalone would require adding back in the CXF libraries and build in a jetty server to each service which becomes prohibitively expensive in terms of resource once several services are in use. Once an ESB component is detected within a job Talend will offer the option to build as a bundle.

When there are no ESB components in the job then it’s not making any use of the frameworks built into the container. It may require lots of other libraries, but they would still need to be added to the bundle the same as the standalone jar file. No advantage is gained by using the services in the container and the cost of the OSGI meta data and code annotations outweighs any benefits of running within Karaf. Talend makes the decision that the bundle can only be built stand alone in the interest of efficiency. This differs from earlier versions that used to allow all jobs to be built as bundles which may be an inconvenience when migrating but eventually jobs will be running in the optimum environment depending on their composition.

 Things get a bit more confusing with ESB components. It seems whether by design or default there are 3 layers of ESB components.

  1. Components that make full use of installed frameworks and the jetty server. These are the producer components:
SOAPREST
tESBProviderRequest
tESBProviderResponse
tRESTRequest
tRESTResponse
  • Items that use a part of the framework for access only
SOAPREST
tESBConsumer
tESBProviderFault
tRESTClient
  • Other components which are part of the ESB family but don’t use the frameworks including: tRouteInput, tRouteOutput and tRouteFault.

Components in layer 1 will gain the maximum benefit from installation into the container and conversely require the largest number of additional libraries  and resources to be added to a stand-alone version. Talend recognises the optimum solution and only offers the option to build as an OSGI bundle.

Layer 2 components do gain some benefit from being in the container but the overall balance between OSGI additions and stand-alone imports is less clear cut primarily due to less intensive usage of CXF and no requirement for Jetty. In this case Talend leaves the decision with the user and offers both build options.

The final layer is slightly different with the components placed in this group as they relate to jobs called by a route rather than usage of the CXF framework. When these components are placed in a routine the build option will offer the possibility to build as a bundle but it won’t work in practice as it will say ESB components are missing (assuming no layer 1 or 2 components are also present). This may seem strange but in fact jobs containing these should not be built in isolation rather as part of the calling route which will pick them up automatically in its own build routine. Effectively the default behaviour for jobs with no ESB components is being overridden allowing them to be deployed in the container when as individual units they are not gaining any benefit but, in order for the OSGI route bundle to call them they must themselves be deployed as bundles.

To summarise:

Component TypeFunctionBuild Options
Non ESBAnyStand-Alone Only
ESB Layer 1ProducerOSGI Bundle Only
ESB Layer 2ConsumerBoth
ESB Layer 3Route Call WrapperDo Not Build