Wednesday, July 4, 2007

In search for Data Provider

In .NET, access to Oracle in accomplished via so called Data Providers. To a developer - they are merely assemblies exposing objects and their methods to connect, run a query, traverse result set, etc. A few providers come with .NET; Oracle supplies their own; and a number of third party ones exist.

Picking a right data provider is essential for applications deployed to end users, there are several options to consider:


Third party providers do not quite fit for this project.

The product is a simple (at least from user perspective) self-contained utility. There should be as little external dependencies as possible, ideally zero. Shipping with a third party assembly may require extra configuration. And any bugs in their code will be seen as flaws of the application. Depending on other company's will to fix bugs may ruin product's creditability - I'm not taking this chance.

For reference, not all third party providers introduce extra dependencies. As an example, OraDirect for .NET reduces dependency to Oracle client software by offering a direct TCP database access (similarly to Java type-4 thin driver).


Oracle supplied provider does not fit for the same reason
To use the application, users must download a 450M Oracle client CD and install support for .NET. A bit of overkill for a little program like this. Think I'll be having hard time to convince people to install my program - not talking about registering on OTN, downloading the multi-meg file, unzipping it, running installer, navigating to .NET assembly and installing it. They'll forget the installer's option to pick by the time they get there.

I really regret this because Oracle provider is such a luxury. In addition to the basic features every provider has, it adds tons of useful functionality including a few critical options. Oracle Corporation really should stop thinking big projects and enterprise clients. [Added after July 11, 2007: Oracle just announced "Oracle 11" which supports instant client for .NET". This should address all Oracle Provider issues below - after it is released officially].


We are left with providers coming along with .NET framework. All of them work out of the box and need no extra configuration.

Microsoft Data Provider for Oracle
This was my choice previously, but this time it didn't work. Here is why:
  • SYSDBA connections are not supported.
    Connect in SYSDBA is accomplished by adding "DBA Privilege=SYSDBA" to the connection string. To a surprise, attempts to do so yields "keyword is not supported" exception. Searched Google and yes, Microsoft provider does not support AS SYSDBA.

    I'm working on a system utility to be used by DBAs. There are big chances the DBAs would want to connect as SYS or other SYSDBA user - and what do I tell them? "Sorry, because of limitation of Microsoft Data Provider, SYSDBA connections are not supported"? People do not care who's fault it is, they'll just march away.
  • Can't cancel running statements
    Not widely used feature, but crucial in my case. The application calls DataPump procedure get_status() waiting indefinitely for an event of interest to occur. Now, when user wants to stop or pause the job, we're ought to tell Oracle to do so by calling stop_job(). But we can't - we're in the middle of get_status call.

    One workaround is to invoke get_status() in non-blocking way. The procedure takes timeout parameter; we could poll database in a loop checking cancellation flag between the calls. The question is the value of the timeout interval. Make it too short, and program will trash the database (which is already busy pumping data). Make it too long, and user will loose patience waiting for response. Saved the idea for last resort.

    Another method is to abort the running query. OracleCommand class has method Cancel() which is a bit strange: it is not guaranteed to work - and doesn't. As if this wasn't enough, there is no indication of whenever cancellation succeeded. MSDN reference reads:
    "If there is nothing to cancel, nothing happens. However, if there is a command in process, and the attempt to cancel fails, no exception is generated."
    Way to go.
  • There is no way to control fetch buffer size.
    When query is returning resultset, there is a number of exchanges between server and the client. Server fills a buffer and ships it to the client, indicating whenever more data is available. The client consumes the buffer and asks server for another chunk. This is called a roundtrip. The number of roundtrips depends on amount of data to retrieve (which we can not control) and buffer size. Obviously, the smaller the buffer, the more roundtrips are needed.

    How bad the roundtrips are? In one of the companies I worked for, a test fetch of 100,000 rows one row at a time, took 30 minutes. Changing buffer size to 5000 rows reduced the time to 2 seconds. There is no typo in the figures: the same query ran 900 times faster.

    Who controls the buffer size? The client program, by calling OCI function OCIAttrSet with either OCI_ATTR_PREFETCH_ROWS or OCI_ATTR_PREFETCH_MEMORY.

    How this function is exposed to us, end users and developers - depends on the client program or high-level library we're using. SqlPlus has parameter ARRAYSIZE. Oracle Export and Import utilities are using BUFFER. Informatica is using DatabaseArrayOperatorSize setting. Oracle provider for .NET offers property FetchSize in OracleCommand and OracleDataReader.

    Microsoft Data Provider for Oracle supplies nothing.

    This is really strange shortcoming, and I don't believe there is a technical reason for this.

    AlderPump does not fetch lots of data. Think the biggest result set comes from DATABASE_EXPORT_OBJECTS and similar tables - when creating a job.

Considering the above, the decision was to stick with OleDb provider.
The provider comes with .NET just as Microsoft Data Provider, and does not require configuration nor brings extra dependencies. Connecting AS SYSDBA works. Queries are cancelled as expected. The only limitation remaining is fetch buffer size.

So far I only bumped into one strange problem, but it was easy to fix:

When running certain anonymous PL/SQL blocks, the provider fails to bind output variables and input variables are replaced with literal values.

Here is sample program:
using System;
using System.Data;
using System.Data.OleDb;

public class test {
public static void Main() {
try {
OleDbConnection conn = new OleDbConnection();
conn.ConnectionString = "Provider=MSDAORA;User ID=scott;Password=tiger;Data Source=ora10g";
conn.Open();

OleDbCommand cmd = new OleDbCommand("", conn);
cmd.CommandText = "declare n number; begin ? := 123; end;";

cmd.Parameters.Add("?", OleDbType.Numeric).Direction = ParameterDirection.Output;
//cmd.Parameters.Add("?", OleDbType.Numeric).Value = 321;
Console.WriteLine("parameter added");
cmd.ExecuteNonQuery();
decimal d = (decimal)cmd.Parameters[0].Value;
Console.WriteLine("result: {0}", d);
} catch( Exception x ) {
Console.WriteLine(x.Message);
}
}
}

The code is throwing Oracle exception "Not all variables bound".

After several hours trying to find a workaround, I almost accidentally enclosed the block into another begin/end pair - and it worked:
     cmd.CommandText = "begin declare n number; begin ? := 123; end; end;";

More problems may be lurking around, but so far OleDb provider is working.


And yes, the ODBC Data Provider.
I didn't test this one. ODBC requires an Oracle Driver to be installed (from Oracle, Microsoft or other party), and I don't think the driver is installed on every machine. Installing it requires either MDAC (Microsoft) or Oracle Client CD (Oracle). Nah.

No comments: