LevSelector.com New York
home > EAI - Enterprise Application Integration

Enterprise Application Integration.
 
On this page:
* Main considerations
* Use services instead of objects
* Multiple transports - general format
* Why EJBs are not recommended
* Why servlets-daemons are recommended
* Use XML/SOAP
* Distributed Transactions
* Deadlocks
* Simple locking with a queue
* Misc. Links

 
Main considerations home - top of the page -

In big organizations you don't really see "ALL Java" or "ALL C++"  or "ALL PERL" or "ALL MICROSOFT" architecture. It is futile trying to design the system as a single application (compiled together, high-coupled parts). In the long run it never happens this way. Different departments and groups could never agree on one language or platform.  What you end up with is a system consisting of heterogenious parts (mainframe, unix, Microsoft, etc.).  The challenge is to make them work together.  The problem is that these systems use different data formats and protocols internally.  They may have different philosophy of how security or session issues are handled, etc.

To make these systems work together, the organization has to define (to limit the choice of) the methods and formats of communication (cooperation) between systems.
These methods should feature:
 - low coupling/dependency between parts - to allow different parts of the system to evolve independently at their own speed and taste.
 - being simple and flexible - to be used by ALL current and future systems.

An approach we see more and more lately is shifting from Distributed Objects (like CORBA or DCOM) to Distributed Services (XML/SOAP).  This is the essense of the whole concept of "Web Services" - the main defining feature of Microsoft's ".NET" architecture, as well as the competing with it J2EE.  Both competing approaches use XML-based communication.
 
Why using distributed objects (DO) is not good?
  -- 1. It is a "heavy-weight" technology (needs high qualification to implement and maintain through upgrades).
  -- 2. Some 3rd party applications don't support it - and it is difficult and expensive to add it.
  -- 3. Using DO means that you create "high coupling" in your system, that is high level of dependency between parts of the system.  While developing and testing, you usually have to recompile both sides of communicating systems. And you have to make sure that you are using the same versions of compiler and distributed objects system.  When the next version of CORBA or Java comes out - you can't upgrade just one part of the system - you usually have to upgrade all parts simultaneously - or none. Otherwise they will not be able to communicate. So in practice you wait and don't upgrade systems which need upgrades badly.  This is NOT good.

The basic ideas described below on this page are:
 
 - Low coupling - use services instead of objects.  Allow parts of the system to evolve independently. Avoid high level of coupling between parts of the system. Instead of using distributed objects (which usually requires re-compilation and testing of both sides of communicating systems) - use distributed services. That is, make small independent services which can be developed and tested independently. And define how you can request and receive these services (communication). Also teach those services a standard way of describing themselves to other services on request

 - Low-coupling - use multiple transport mechanisms, but avoid transport-protocol-specific formats and binary formats. Instead use some simple common format (for example, XML and SOAP). Also use messaging middleware such as MQSeries (further decoupling, to assist with distributed transactions, etc.).

 - Why you should avoid using EJBs. - instead you can use servlets-daemons in commercial application server - or write your own servers.

 - Use XML/SOAP.


 
Distributed Services vs. Distributed Objects home - top of the page -

Low coupling - use services instead of objects.  Allow parts of the system to evolve independently. Avoid high level of coupling between parts of the system.
Instead of using distributed objects (which usually requires re-compilation and testing of both sides of communicating systems) - use distributed services. That is,
make small independent services which can be developed and tested independently.

Two main types of frameworks for distributed computing:
  Distributed objects - CORBA, DCOM and EJB. All distributed objects architectures in their attempt to provide higher-level services, have become very intrusive and impose severe architectural restrictions on the application services (high-level coupling).
  Distributed services - Java Servlets and XML/SOAP.

Distributed objects:
   CORBA - (Common Object Request Broker Architecture) - a specification for distributed object computing between applications written in different programming languages (C++, Java, VB, etc.). The interfaces between clients and servers are declared in a neutral language -  Interface Definition Language (IDL) - and then compiled to generate language-specific stubs and skeletons.  Various vendors provide their versions of IDL compilers for different platforms and languages.
CORBA allows to bridge the platform gap (for example between Windows-based clients and their Solaris/C++ servers). While CORBA is extremely promising in theory, it has failed to live up to expectations in practice:
  - Version incompatibilities - you can't upgrade parts of the system - they have to wait for the whole system to upgrade (high degree of coupling).
  - Complicated system - need very highly qualified developers, especially when using advanced capabilities such as the POA (Portable Object Adapter framework).
  - Problems with stability, poor vendor support, slow adoption of features.

   DCOM - (Distributed Component Object Model) - old Microsoft's framework similar to CORBA, but practically unusable without the Microsoft Transaction Server (MTS) which is only available for Windows platforms. Microsoft has essentially abandoned DCOM in favor of COM+/SOAP as the distributed computing model of choice.

   EJB - (Enterprise JavaBeans) - framework included in the Java 2 Enterprise Edition (J2EE) specification proposed by Sun Microsystems. Similar to CORBA and DCOM.  EJB components can leverage the various facilities provided by runtime environments known as EJB Containers, including object persistence, transaction management, and access control. Main Containers (Application Servers) - IBM's WebSphere and BEA's WebLogic.

Distributed services:
   Java Servlets - a simple framework that allows HTTP-based access to Java services, it is part of Java 2 Enterprise Edition (J2EE) specification. Servlets are hosted in a Servlet Container, process HTTP requests (GET, POST), can communicate with databases (JDBC), use messaging services - and finally print the response back to the browser.
   SOAP - (Simple Object Access Protocol) is a way to implement communication (RPCs (Remote Procedure Calls) and Callbacks) in XML via HTTP.

Idea: decouple system by using "services" instead of "objects".  That is, separate individual components into standalone services which can be changed and debugged independently.  Thus you don't have to recompile and test the whole system - but just the small part of it.

Object-oriented programming is a proven winner for application design, but NOT for distributed applications.
Frameworks such as CORBA attempt to hide the fact that an object is remotely located when in reality that fact should not be hidden.
So use services instead of objects.

 
Transports and Formats home - top of the page -

Low-coupling - use multiple transport mechanisms, but avoid transport-protocol-specific formats and binary formats. Instead use some simple common format (for
example, XML and SOAP). Also use messaging middleware such as MQSeries (further decoupling, to assist with distributed transactions, etc.).

  Transports: HTTP(S), MQSeries, IIOP, and plain TCP, SMTP, FTP and others.
  Types of interraction: request-response (live chat) or messaging (e-mail).

Idea: don't use transport-specific data format.  Instead use XML messages which can be passed over all transports. All you need to communicate via XML is an XML-parser, which is available for all languages and platforms.  Use SOAP for RPCs and callbacks.

Idea: decouple systems by using mostly messaging instead of request-response communication. Messaging is asynchronous (thus puts less load on network). It is also convenient to implement broadcasting.

Note: Different transports may be required.  Sometimes nothing can beat pure sockets. For example, broadcasting of real-time updates to thousands of subscribed clients can be done very effectively from one computer using open sockets, but it will require much more hardware power to provide similar information as an HTTP server responding to periodic update requests from clients.

 
Why EJBs are NOT recommended home - top of the page -

Java is a good technology - but EJBs are NOT recommended.
Here is why.

 - EJBs don't integrate good with other languages (EJB use RMI/IIOP, which is similar to OMG' s (CORBA) and is impractical).
 - most of benefits of using EJBs are in fact provided by a container and is also available to servlets.

 - EJBs are complex.

 - EJBs are very restrictedin what they can do.  For instance, EJBs are not allowed to start their own threads and therefore cannot start their own event loops; this means that they cannot support other transport protocols such as HTTP or MQSeries.

 - EJBs are slow
 - EJB's implementation of object persistence (Entity Beans) - leads to very tight coupling between the database and the application and to unmaintainable database schemas. This is bad, considering that data normally far outlives any application, and thus it is very important that the database schema be independent of the application services.  Also it is largely futile exercise to try and hide the database from the application service developer. Also even EJB specification states that "the overhead of an inter-component call will likely be prohibitive for object interactions that are too fine-grained". It also says that EJBs should be very coarse in granularity and that finer business object modeling should be done without EJBs. Entity Beans use significant system resources since each load or store forms a separate database operation - in contrast data access objects would aggregate the database operations.

 - Lifecycle Management (activation/passivation of EJBeans) - is still an immature solution. CORBA faces the same problem and has gone through several iterations at trying to solve it. The latest Portable Object Adapter (POA) specification shows that container-managed lifecycles need to be driven by user-specified policies that are applied differently to different sets of objects. At the same time, anybody that has implemented a big POA-based system would concede that the paradigm introduces a lot of complexity. Furthermore, it is apparent that lifecycle management is a problem similar in nature to the caching implementations of relational databases. Decades of research has been poured into optimizing the caching abilities of databases - can we honestly expect to see better caching in EJB containers in the near future? Of course, data caching is a simpler problem than object lifecycle management, but the question becomes are we really gaining anything from the more complex solution? The rejection of IBM' s Component Broker in the marketplace would indicate that object lifecycle management is not a feature that developers and system architects are clamoring for.

 - Distributed transaction management - is not need for most of transactions (local transactions can be handled by the database itself).  When you need distributed transaction - you don't have to depend on EJBs. You can use reliable messaging systems (such as IBM's MQSeries).  Or use Java Transaction API (JTA)  - which is an independent specification and is available to servlets as well.

 - Component Interfaces - The EJB framework dictates that Enterprise Beans can only be accessed though their Home or Remote interfaces via JNDI service which adds additional overhead. A better approach may be to use regular JavaBeans which don't impose this limitation.

 - Session Management - good thing provided by the container, it is also available to Servlets (you don't need EJBs for that).

 - DB Connection Pooling - good thing provided by the container, it is also available to Servlets (you don't need EJBs for that).

 - Fine granularity access Control / Security - good thing, but fine granularity (on the method level) is note required by most of the applications. The security provided by the container to servlets is usually enough - and you don't need EJBs for that. When you need more - you can use other ways to enforce security (Kerberos, Netegrity, etc.)

 - Rapid Development - true feature if you consider the stand-alone application. But its inflexibility and difficulties of integrating with other systems may in fact make the development time longer (not shorter !). And further evolution is very difficult (as for all tightly-coupled systems). Better approach to RAD (Rapid Application Development) is to to use pre-built services instead of pre-built componentsIBM' s MQSI v2 ( messaging ) and webMethods B2B ( webmethods.com - xml ) are two examples where true rapid application development can be achieved when the required application services are available. The use of XML as a common data representation and XML-RPC (whether SOAP or not) as a common communication mechanism simplifies the evolution of our systems and thus offers significant time-to-market benefits in the long run.

 - Portable Deployment - not true.  Different containers still don't allow portability of EJBs between them, because they differ in many pretty-basic aspects (for example, the find methods on EJB Home interfaces and the O/R mapping tools).

 - Third-party Components - not limited to EJBs. Most EJB vendors make their components also available in other forms such as plain class libraries or JavaBeans. Those components can also be built into standalone services providing an XML-based interface.

 - Strong Vendor Support - Yes. But it also exists for other technologies.

 - Successful Adoption - can't be fully exposed until a few years down the road and we have thus not seen this yet.

If SOAP does emerge as a viable platform for distributed computing, we will most likely see an effort to facilitate EJB-SOAP interoperability. Another aspect of container-managed systems that is interesting is the CORBA 3 specification which includes support for CORBA components (and promises more bells and whistles than EJB - four types of Components in place of two types of Beans, for starters).

 
Why Servlets are recommended home - top of the page -

There are many ways how you can offer services.
In many cases you can use a web server model (webserver-script-database).
In others you can have your own server ( C++ or Java servers).
Probably the best way to make a server is to use Servlets - because they can take advantage of functionality provided by commercial application servers (such as session management, fail-over, load-balancing).

There are two types of servlets:
  - typical servlets (request-response)
  - daemon servlets - serve as services.

Unlike EJBs, Java Servlets are free to start their own threads. This allows servlets to manage event loops that can handle requests from other transports such as MQSeries. A servlet can use the init() and destroy() methods of an HTTPServlet to start and stop threads. Thus, a servlet can start a separate thread for each additional transport that it intends to support. If all the transports carry XML/SOAP messages, you can use centralized XML-processing functionality of the servlet for all the transport interfaces.

Session support - 2 types of sessions:
   - transient per-client servlet sessions can be persisted temporarily, although at a fairly high cost.
   - per-application context information can be stored through the lifetime of a servlet and can be accessed by other servlets that belong in the same logical application

The Servlet Container also provides access to a JNDI server where the application service can publish a reference to itself so that clients and other services can locate it through a JNDI lookup instead of a URL.

Developing application services as servlets also has its drawbacks. This model makes the application service dependent on the Servlet Container to provide a runtime environment. This may prove to be complexity that is unnecessary in the application architecture. So, if a service does not utilize any of the Container' s facilities, it may be better to run the service as a standalone process.

 
XML vs, binary formats home - top of the page -

Using XML is always slower than using binary formats.  Especially for DOM API (SAX API was shown to be pretty fast).

DOM is in fact not the best representation for data-oriented XML since it supports many intricacies that only practically apply to full-featured XML documents. JDOM (http://www.jdom.org/), an emerging API for XML parsing in Java, alleviates some of these issues by making the representation more appropriate for XML data. Another approach may be a HashTree-based XML API that will optimize performance.

XML data takes more space and increases bandwidth requirements. But the cost of bandwidth is usually much less than cost of software development and maintenance.

XML is not the cure-all solution. In some situations it is impossible to reach required speed. Some 3rd party applications simply don't have XML interfaces.


 
Distributed transactions home - top of the page -

The "traditional" transaction (not distributed) is simply a set of operations which should be either successfully performed together - or not performed at all. The simplest example - a money transfer between 2 accounts. It involves 2 actions: removing money from one account and adding money to the other. Imagine that the computer loses power in the middle of this process.  The money was removed from the 1st account - but was never added to the second. This is an error. How to prevent it? It is simple. We will record all steps of the transaction in a log file. This way after restarting the system can read the log and successfully finish the transaction (commit) or cancel all the changes (roll back).  This was a simple explanation of something called  "Transaction Protocol" (TP).  TP should comply with 4 fundamental properties, usually denoted ACID: Atomicity, Consistency, Isolation, Durability.

Transaction may be nested (one big transaction includes several smaller transactions - and failure of any one of them would rollback the whole big transaction).  There are some standards and specifications (ISO, OMG, JTA - Java Transaction API, etc.) for transaction protocol (basic decisions concerning the nested transaction models (open / closed subtransactions), the set of service primitives and their roles, etc.).

When individual actions of a transaction run of different systems - we are dealing with distributed transactions. (DT).
Example 1: money transfers between remote accounts (between different banks).
Example 2: data replication between corporate directory, Outlook, some sales CRM package, Web Authorization database, etc.

DT allows individual actions to run simultaneously (in parallel) - for some transaction this can be used to increase the speed.

Distributed transactions can be governed by different protocols. One of the simpliest and commonly used protocols - two-phase commit (2PC) protocol.   The 2PC protocol uses a central "transaction monitor" process. It goes like this: first, all changes required by a transaction are stored temporarily by each database. The transaction monitor then issues a "pre-commit" command to each database which requires an acknowledgment. If the monitor receives the appropriate response from each database, the monitor issues the "commit" command, which causes all databases to simultaneously make the transaction changes permanent.

You may define your own transaction protocol (TP) to custom fit your transactions. Do you need nested transactions? Do you need parallel processing. Some TPs don't have a central monitor - but instead they have a truly distributed system. Some TPs define also a communication method they use (for example, XML messages). For example, TP for a system where all parts are almost never available simultaneously should be different from a standard banking 2 phase commit system.  Different time-outs. Probably messaging is a requirement. Etc.

Check out the links below. Or for more reading search Internet for "distributed transaction" or "two phase commit".

- http://ei.cs.vt.edu/~cs5204/fall99/distributedDBMS/duckett/tpcp.html -
- http://aspn.activestate.com/ASPN/Mail/Message/xml-dev/755432 -
- http://www.computer.org/proceedings/dexa/7662/76620100abs.htm -
- http://www.vermicelli.pasta.cs.uit.no/ipv6/students/andrer/doc/html/ -
- http://java.sun.com/products/jta/ - The Java Transaction API (JTA) 1.0.1 Specification
 

Some books:
 - Data Replication : Tools and Techniques for Managing Distributed Information - by Marie Buretta
 - Principles of Distributed Database Systems - by M. Tamer Ozsu, Patrick Valduriez
 - Transaction Management : Managing Complex Transactions and Sharing Distributed Databases - by Dimitris N. Chorofas, Dimitris N. Chorafas
 - Transaction Processing : Concepts and Techniques (Morgan Kaufmann Series in Data Management Systems) - by Jim Gray, Andreas Reuter
 - Distributed Algorithms (Data Management Series) - by Nancy A. Lynch

Transaction:
 
 - Atomicity: All updates are successful or no updates are successful. Must support commit and rollback, may support savepoints. Do one transaction at a time. Or, for parallel - what should be read: the cache (or rollback segment) or the database?

 - Consistency: Each transaction leaves the database in a consistent state. Constraints are satisfied.

 - Isolation: Concurrent transactions have the same effect as single transactions. Uncommitted changes are hidden from other transactions. Changes from other transactions are hidden from the app.
  Problems:
    - Lost update. A.Read B.Read B.write B.commit A.write (thus changes written by B are lost)
    - Dirty Read. A.Read A.write B.read A.rollback (thus B doesn' have correct info)
    - Unrepeatable Read. A.Read B.Write A.read.
    - Phantom problem. A.read B.insert A.aggragate (like sum) B.rollback. A.commit.

 - Durability: Changes should stick. 


 
Deadlocks home - top of the page -

Example: what NOT to do.
Imagine that you have 2 processes reading/writing data from/between databases A & B. Imagine further that reading and writing puts locks on the tables. Imagine that one process has locked A, but couldn't get a lock on B - and vice versa, another process got B and is waiting for A.  What you have is a dead lock (mutual exclusion). Processes may wait for each other forever.
 
 
 Solutions:
 - Avoid locks where possible (for example, use "dirty reads").
 - Don't allow locking of more than one resource/process at a time.
 - Don't hold and wait. Put limits on the length of the proceses. 
 - If you can't get a lock - release everything - and try again. If you can't get a lock after N attempts - report the problem or do something else (for example, put your request on a que of some sort). 
 - Institute rules helping to avoid deadlocks. For example, always allocate resources in the same order.
 - Use a "Monitor" process to oversee and allow locks in the system.
 - Use a "Cleaner" process which should wake up every couple minutes to resolve deadlocks (by killing one of the processes creating them).
 - Use messaging to pass request and responses. This way user processes can never get a lock on your data. Instead they are forced to go through your messaging API. You control the API and make sure it does not cause problems. 
 - Review users' SQL. Better yet force users to use only stored procedures which are tested and reviewed. 
 - Users going from MS Windows platform (especially via ODBC) can be very dangerous. You will have to kill their processes pretty often. Make sure they identify themselves in their ODBC settings.

Let's apply these principles to the A/B example above, namely let's forbid simlutaneous locking of more than one resource. If we can't do locking of A and B simultaneously by one job, we need to alternate between A and B.  We can first read a little from A, release lock on A - then get a lock on B and write there. Repeat this process as many times as needed.  Or we can first read everything from A into a temporary table, then release the A-lock. After that we can work between this temporary table and B.  Yet another approach wouldbe to avoid locks on reading byusing "dirty reads".


 
simple locking with a queue home - top of the page -

simple locking with a queue
 
sub lock {
  delete locks where timestamp older than 15 sec
  insert lock (tabname, my_pid, timestamp);
  for (1 .. 10)  {
    if(my_pid = first lock) 
      { return $success } 
    else 
      { sleep(1), next }
  }
  delete lock (tabname, my_pid);
  return $error;
}

sub unlock {
  delete lock (tabname, my_pid);
}


 
misc links home - top of the page -

* www.ittoolbox.com - One of the children sites is EAI.Toolbox ( http://eai.ittoolbox.com/  )

--------------------------------------