Tuesday, March 31, 2009

Does BI Go Together With SOA, or Should They be Considered Separate Projects?

As I ponder this question of whether SOA and BI go together or not, I am reminded of a concept I learned during my undergraduate studies about process control. Fundamentally, there are two types of processes: Open loop and Closed loop. Open loop processes are those in which the process executes from start to finish based only on the inputs to the process. In contrast, closed loop processes not only take into account the process inputs but continuously observe the process outputs and make dynamic adjustments aimed at improving process efficiency, correcting errors, or both.

As we already know, SOA is an architectural style that strives for business and IT alignment. SOA by itself is an open loop process because it achieves this alignment based only on the current business state and lacks the feedback mechanism to constantly ensure and optimize this alignment once it has been achieved. That is where BI fits in. BI is the broad category of applications and technologies that gather, store, analyze, and provide access to data aimed at helping the enterprise make better business decisions. These “better” decisions are what “close” the open loop SOA implementation by providing the feedback to ensure the continuous alignment between the business and IT. Click here to see the same concept pictorially.

So, although SOA and BI are fundamentally different, they can be very effective together since they both ultimately strive for business process efficiency, albeit in their own way. Now whether they are implemented together in the same program, as separate projects, or as subprojects of one project is purely an implementation decision that each organization must make for itself based on its individual capabilities.

Thursday, March 19, 2009

Do You Think IBM Is Really Going to Buy Sun Microsystems?

Although, the proposition of IBM acquiring Sun seems very attractive on the surface, I don't see the acquisition going through due to problems along two lines: Technology and Culture. Let's take a look at each one:

1. Technology related
Each company has a strong suite of products (Sun with SPARC, Solaris, Java, etc. and IBM with System z, AIX, WebSphere, etc.) with a strong following and customer base. These products are different enough to present serious challenges in creating a unified, consistent technology platform in a combined company.

2. Culture related
By far the biggest problem in the merger of two huge companies is going to be integrating the organizations, people, and processes. Ultimately, a dysfunctional culture in the resulting company might outweigh any potential benefits from synergies.

Personally, I always cringe when competition is reduced by M&A. We're seeing what's going on with the banks becoming too big to fail while paradoxically being too big to manage as well. Would IBM + Sun equate to the same?

Monday, March 16, 2009

Does SOA Increase Security Risks?

SOA is an architectural style that is being used in most modern day system implementations with great expectations. A question that many have, though, is how secure are these SOA-based systems? Are they any better or worse off than their non-SOA counterparts? In my opinion, there are 3 main reasons why SOA based systems might be more insecure compared to their non-SOA brethren:
  1. The first reason is what I call "SOA Security Proximate Cause Syndrome". Proximate cause is a legal term that allows one to link the effect of one action as the cause of another action. Although, there is no written rule that states that SOA systems must be distributed, the fact remains that SOA is the preferred architectural style for complex systems and complex systems tend to be distributed. Distributed systems in turn tend to have a higher "surface area". The more surface area of a system, the more vulnerable it becomes. Thus, the distributed nature of SOA systems becomes the proximate cause of their potential higher insecurity.
  2. The second reason is what I call the "SOA Security Paradox". An SOA is by its very nature designed to be highly flexible, extensible, and maintainable. Now, think about the classic principle of security “Security through Obscurity". There in lies the paradox -- a conflict between the inherent goal of SOA and the implication of this goal on security.
  3. The third reason is poor SOA governance. In the absence of strict governance (design and runtime) SOA systems tend to suffer from service proliferation similar to a virus spreading through its host. These unchecked services often open previously unthought-of of security loopholes. As an example, consider a service that is always called by a client on the extranet through an authentication service. A new "rogue" i.e. "ungoverned" service on the intranet calls this same service without the use of the authentication service. Now, consider what happens if this new "rogue" services is called by the extranet client. Oops! Did we just bypass the authentication service? This simplistic example plays out more often than one might think.

So, is an SOA system inherently insecure? In principle it shouldn't be but our experience in practice has proven otherwise.

* Originally posted on ebizQ Forum on March 11, 2009

KISS Your Web Services

In my post title "WS-Confusion", I talked about the state of confusion that many professionals dealing with Web Service technology are in. Well, that blog entry stirred up some interest and I ended up writing a follow-up article about the issue. The article titled "KISS Your Web Services" is available here.

WS-Confusion?

There’s absolutely no doubt about it… Web Services are hot and are here to stay. XML Schemas, SOAP, and WSDL are all indispensable while working with Web Services. Yes, to some extent (and in some form) UDDI too. And let’s not forget the Security related specifications such as XML Encryption, XML Digital Signatures, and WS-Security, which are quite useful when Web Service boundaries extend beyond the corporate firewall. As a consultant and an architect, I have implemented and audited/assessed complex business software systems that leverage Web Service technology as a core part of their architecture. The specifications that I mentioned above are pretty much all that I have used/seen used. Furthermore, all the Web Services have always been over HTTP/HTTPS. So what about all the other Web Service Specifications such as WS-Transaction, WS-Routing, WS-Reliability, WS-ReliableMessaging, WS-BPEL, WS-Notification, WS-Eventing, WS-AtomicTransactions, WS-Coordination, WS-SecureConversation, and so on? While I agree with the theory of these specifications and that most of them are very well written, my question is: Are we making Web Services more complex than they need to be? I am very interested in knowing if any one of you have used or seen these or other WS-specifications used in real-world (existing) systems?

Client Side Data Validation: A False Sense of Security

Microsoft defines a Web Application (WebApp) as a software program that uses HTTP for its core communication protocol and delivers Web-based information to the user in the HTML language. Such applications are also called Web-based applications. Although one could create a custom client for such an application, most applications will leverage an existing web browser client such as Internet Explorer, Netscape Navigator, Opera, etc. In this blog, I will be focusing entirely on the set of webapps that leverage a browser on the client side.

There are many benefits of creating a web application. A few of them are:
  • The ability to leverage existing communication infrastructure and protocols
  • The ability to leverage existing client side software (browsers) thus reducing the total development time and related costs.
  • Reduced client-side deployment costs. For most webapps the only software required by a client is a compliant browser.
There is no such thing as a free lunch and webapps are no exception. One of the major cons of webapps is the loss of control over the client software and environment. For example, if a webapp is designed for public access then it may be accessed from machines running different versions of the same browser, different versions of different browsers, different operating systems, and different hardware devices (such as kiosks, cell phones, and PDAs). It is also likely to be accessed by a type of user we affectionately refer to as a “hacker”. The primary objective of a hacker is to gain illegal access and control over your webapp and either cause it to malfunction/crash or expose sensitive data.

That’s where data validation and a properly designed validation framework fit in. No, this is not a typo or misprint. Security details such as SSL, certificate management, firewalls, etc. are important, but provide only the icing on the cake. They don’t guarantee that the cake (i.e. your webapp) is baked well. What I mean is that while these “details” may make it harder for a hacker to find holes in your webapp, they don’t seal the loopholes themselves. That is, they only delay the inevitable hacking of your application. Take SSL as an example. A common misconception is that SSL provides web application security. The fact is that it does not. SSL is used only to encrypt the data between the web browser and the web server, and thus prevents eavesdropping. SSL has no knowledge of your webapp and hence provides no security to it.

As a software consultant, I’ve had the opportunity to not only design and implement webapps but to assess/audit many webapps as well. I often encounter web pages within the application that are very sophisticated with lots of client side JavaScript that perform all kinds of checks on the data entered by the user. Even the HTML elements have data validation attributes such as MAXLENGTH. The HTML form is only submitted upon successful validation of all the data entered. The server side happily performs the business logic once it receives the posted form (request).

Do you see the problem here? The developers have made a big assumption of “control” here. They assume that all users of the webapp will be equally honest. They assume that all users will always access the webapp through the browser(s) that they (the developers) have tested upon. And so on. What they have forgotten is that it is very easy to simulate browser-like behavior through the command line using freely available tools. In fact, almost any “posted” form can be sent by typing in the appropriate URL in the browser window; although an easy way to prevent such “form posting” is to disable GET requests for these pages. But there is no way to prevent any one from simulating or even creating their own browser to hack into your system!

The underlying problem here is that the developers have failed to recognize the main difference between client side validation and server side validation. The main difference between the two is NOT where the validation is occurring such as on the client or on the server. The main difference is in the purpose behind the validation.

Client side validation is merely a convenience. It is performed to provide the user with quick feedback. It is performed to make the application appear responsive and give the illusion of a desktop application.

Server side validation, on the other hand, is a must for building a secure webapp. It is done to ensure that all data sent to the server from the client is valid data, no matter how the data was entered in on the client side.

Thus, only server side validation provides real application level security. Many developers fall into the trap of a false sense of security by performing all data validation only on the client side. Here are two examples to put things in perspective

Example 1
A typical “Logon” page has a textbox to enter a username and another textbox to enter a password. On the server side, one may encounter some code in the receiving servlet that constructs a SQL query of the form "SELECT * FROM SecurityTable WHERE username = '" + form.getParameter("username") + "' AND password = '" + form.getParameter("password") + "';" and execute it. If the query comes back with a row in the results then the user successfully logged in, otherwise not.

The first problem here is the way that the SQL is constructed, but let’s ignore that for this blog. What if the user types in a username such as “Alice’--“? Assuming that there is a user named “Alice” in SecurityTable, the user (or shall we call her “hacker”) successfully logs in. I’ll leave finding out why this happens as an exercise for you.

Some creative client side validation can prevent normal users from doing this from the browser. But what about the case where JavaScript is disabled on the client or for those advanced users (hackers) who can use another “browser like” program to send direct commands (HTTP POST and GET commands)? Server side validation is a must to prevent something like what was described above from happening and hence plugging a security hole in the webapp. SSL, firewalls, and the likes won’t help you here.

Example 2
A typical “User Registration” page for a public but limited access webapp includes several textboxes for user identification and authentication information such Name, SSN, Date of Birth, and other relatively uniquely known information. Once the user has proven her identity, she is issued a username and password to access the protected parts of the system. In special cases, the user may be prevented from directly registering and an “Administrative” user (i.e. the administrator) may have to register the user instead. The administrator uses the same registration page, but checkmarks a special checkbox on the page that tells the system (receiving servlet) to bypass all [special] checks and directly register the user. The JSP that renders the HTML page is smart enough only to include the checkbox if the user accessing the page is an administrator.

So far so good? Not unless there is some server side validation in the receiving servlet. The receiving servlet checks to see if the “bypass checks” parameter is present and if it is then it bypasses all special checks and registers the user. But it must also check to see if the logged in user is an administrator. Even though the JSP page did that when it rendered, the receiving servlet must perform the check again. In this case, the JSP check can actually be considered as part of the client side validation. It was merely done for convenience. After all, we don’t want to confuse regular users trying to register with the extra checkbox, since no matter what they select (checked or unchecked), it’s not going to make a difference. This is because regular users do not have the authority to bypass “authentication” checks. Furthermore, it would not be an impossible task for a hacker to figure out what the name of the checkbox was and manually issue a registration request with the checkbox name included in the request (with its value set to “checked” of course). Therefore the receiving servlet must check the identity of the logged on user (if there is a user logged on) and only allow an administrator to bypass special checks in the registration process.

Conclusion
Remember, client side validation is for convenience and server side validation is for security. You must always perform at least as much validation on the server as you perform on the client. All properly designed validation frameworks, such as the Struts Validation Framework, handle this for you. Feel free to leave me a comment and let me know your thoughts…

Model Driven Architecture - Hype Vs Reality

If you’ve been keeping up with your daily doze of buzzwords then you’ve most likely heard of MDA. MDA stands for Model Driven Architecture and is required knowledge for cocktail discussions.

The Concept
The concept behind MDA is certainly not new and is quite simple in theory. In its most rudimentary form, it is the all too familiar code generation that has been offered by leading software modeling tools such as Rational Rose and the Together family of products. These tools allow you to model your system as a series of packages and classes (for Java) and then generate the skeletal code based off of these models. They also offer something called “round trip engineering” in which you can make changes to your code and import these into your models thus keeping your models in sync with your code. Although a very noble concept, I have rarely seen it fully implemented in real projects. Even if a project did implement “round trip engineering”, I would question the benefit provided for the cost of doing so.

MDA is more than code generation. It is a formalization of several concepts by the Object Management Group (OMG). The OMG is best known for its distributed object model specification called CORBA and the widely used modeling language called the Unified Modeling Language or UML.

The Lingo
At the very core of MDA is the concept of a model. A model is an abstraction of the end (or target) system. It serves as a prototype and a proof-of-concept. MDA defines two types of models. A Platform Independent Model (PIM) is one that describes the target system without any details about the specifics of the implementation platform. On the other hand, a Platform Specific Model (PSM) describes the target system on its intended platform, such as J2EE, .NET, CORBA, etc. The process of converting a PIM into a PSM is called transformation. A model (PIM or PSM) is written in a modeling language. The OMG does not restrict MDA to any particular language. However, the modeling language must be well defined, which means that it must be precise to allow interpretation by a computer. Therefore, a human language such as English is not an option (at least for the foreseeable future). An example of a good modeling language is (obviously) the UML.

Is it just Hype?
As I mentioned earlier, MDA is not a novel concept. We already talked about code generation, but database designers have been using a form of MDA for a long time. ErWin by Computer Associates is a CASE (Computer Aided Software Engineering) tool that provides MDA abilities to database designers and administrators. ErWin allows you to define your database design using a logical model. A logical database model is free from any database vendor specific details. In MDA terminology the logical model is a PIM. ErWin automates the process of converting the database agnostic logical model into a database specific (such as Oracle, SQL server, etc.) model. This database specific model is known as the physical model by database designers and as a PSM in MDA lingo. As I mentioned earlier, the process of converting a PIM into a PSM is called transformation. Finally, ErWin can be used to generate the SQL code (DDL) to create the database structure (tables, views, indexes, triggers, etc.) for the targeted database. Based on the definition of MDA and the capabilities offered by the tool, ErWin is an MDA tool. In this case the modeling language is the well defined E/R diagramming notation. So there is no doubt that MDA tools are possible. However, there are several hurdles to overcome before such tools become mainstream for general purpose software development, and especially for custom development.

Two such hurdles include:

Transformation Complexity Although designing a properly normalized database that meets the business needs and application performance demands is a non trivial task, the process of converting the database PIM into a PSM and the PSM into SQL is fairly mundane. Transforming complex class and interaction diagrams is a more involved and [possibly] artistic process with many possible alternatives, each one with its own set of pros and cons.

Language Expressiveness Once again, the simplistic E/R diagramming notation is sufficient for describing complex database diagrams mainly because the complexity is not in the diagram but rather in the design decisions and tradeoffs considered while creating the diagram. E/R notation is also universally accepted as the language used by database designers for data modeling. UML, on the other hand (even with 2.0) is very controversial in its ability to support complex software interactions and is often extended with custom stereotypes and notations by software architects and designers. Even though, MDA is not tied to UML, the reality is that UML is the lingua franca of MDA.

The Reality
In my opinion, MDA tools, even with their existing limitations, have a definite place in any architect’s toolbox. But then, everything can be taken to an extreme and the same applies to MDA, which is not without its associated hypes.

Here are the two most common ones that I encounter:

MDA brings Software Architecture to the masses Remember, MDA is a tool in an architect’s toolbox. It is not the toolbox itself and it is definitely not the architect. MDA does not eliminate the need for competent and experienced architects, designers, and coders on the team. As the saying goes “Not everyone with a hammer in their hand is a carpenter”.

MDA equals Software Architecture using pictures Is this really possible? Even in database modeling, where a level of MDA is already being used, how far does MDA take database architecture? Talk with any database designer or DBA and you will quickly realize that most of their work does not really revolve around using ErWin. The same applies to software architecture in general. It involves much more than drawing pictures. In fact, one could argue that it involves too much, which is why we are still struggling to come up with a universally accepted definition of software architecture.

So, my recommendation is to use the MDA tools for what they are… tools, and stay away from the hype. Maybe the next acronym to take root will be CDA or Command Driven Architecture (coined by yours truly). You basically tell (or command) the CDA tool that you want a robust, multi-tiered architecture for handling bank transactions and the tool creates it for you. And while it’s at it, maybe it will bake you some cookies as well. What do you think?

Sunday, March 15, 2009

Service Oriented Architecture - All that glitters is not gold

I was in a quandary about what my first blog should be ever since JavaWorld approached me with the idea of a Java Design blog. It was a few days later as I was talking with a good friend of mine (after a couple of sets of some rigorous tennis) that he happened to describe the architecture of a product that he works on at his job. Without really thinking, I blurted out “Oh, it’s a service-oriented architecture”. As it turns out, it was not, which is when the idea of writing about the term service-oriented architecture came to mind. Thanks, Kurt.

The phrase “Service-oriented Architecture” is probably by far one of the most used and abused buzzwords today. It is abused not because people don’t know what the term means, but because they are too generous in its application. Although this may seem paradoxical, as you’ll see in a moment, it’s really not so.

First, let’s define a service-oriented architecture or SOA as it’s commonly abbreviated and referred to in conversation and literature. In its simplest form, an SOA is any architecture that can, at least on a logical level, be decomposed into three categories of components: a service, a provider of the service, and a consumer of the service.

Here’s the catch: almost any software application with a basic level of object orientation can be described in such a way, even if the designer of the application had never heard of SOA! The problem with this definition is that it is too vague and does not imply any level of sophistication in the application [architecture]. So how do you know if an architecture that appears to be an SOA actually is an SOA?

Here are four litmus tests that I typically use:

  1. Does the architecture explicitly address how service consumers find service providers? This test focuses on loose coupling between service providers and consumers. Typically, this test is satisfied by an implementation of the Factory design pattern as described by the Gang of Four. One way of achieving this within the bounds of J2EE is registering service providers in a JNDI directory. A better way would be to implement the Service Locator pattern as described in Core J2EE patterns.
  2. Is each service provided by a provider explicitly bounded by an input and output contract? Once again, this test focuses on loose coupling. However, in this case, we are concerned about the coupling between the service and its provider and consumers. One way of satisfying this test within J2EE is to start each service implementation with two interface definitions: one interface encapsulates all the input parameters and the other one encapsulates the output. Web Services achieve this by using well-defined SOAP (XML) messages that specify the input and output, and by providing a well documented description of these using the Web Services Description Language (WSDL).
  3. Does the architecture explicitly address location and distribution transparency? Test #1 described above gets us part of the way there. However, this test focuses more on the quality of service (QoS) characteristics of the architecture, such as service availability, fault-tolerance, and the ability of achieving performance and load scalability through server load balancing, server farms, and distribution/deployment across multiple tiers.
  4. Are the services really just objects with another name? This test probes the architecture to see if it was actually designed as an SOA or simply labeled one for better marketing exposure. Services are not distributed objects. Objects are by definition stateful i.e. they encapsulate some state and provide methods to manipulate that state. Services, on the other hand are stateless. The input message has all the information that the service needs to perform its task and the output message has all the information the client needs back from the service. Thus, the interaction of a service consumer with a service is in the form of a single call rather than an orchestration of multiple calls as it is with a regular object.
I am almost certain that there are details about an SOA that I have missed in this blog. I would love to hear from you about your experiences with SOAs, both positive and negative, and about tests that you have used to weed out SOA imposters.