Monday, August 24, 2009

AlderPump 2.2 released

First update to publicly available release 2.1 is shipping now. While its official number is 2.2, the release has a major new feature: file management. Upgrade from 2.1 is free, existing licenses continue to work.

Oracle DataPump is a server component; hence it can't handle files on user machines or other servers. File Manager closes the gap by allowing files transfer between user workstation and database server machine. Other basic capabilities such as file deletion, renaming, etc are also present.

The biggest problem was directory listing. None of Oracle releases to date support directory browsing. One can read, write, delete, or rename files - but only whose names were known from elsewhere. There is no rational explanation to that. Security reasons - one might say? Nonsense. Limit listings to folders exposed via Directory objects, add BROWSE privilege to already existing READ and WRITE - and let us be as secure as we want.

Until this is available, there are few workarounds:

Listing server-side directories
  • First is Java stored procedure. Apparently Java code can browse files via java.io.File interfaces. We must ensure Java is installed on the database, compile a piece of Java code, create a PL/SQL wrapper around it, and finally grant JAVAUSERPRIV to the interested users. This is doable, but not so straightforward for developers of shrinkwrap software considering number of points of possible failure.
  • Another method is undocumented, but simpler. Oracle 10g+'s package DBMS_BACKUP_RESTORE has a procedure to list contents of directory. The package is not accessible by public, EXECUTE privilege must be explicitly granted to use it. The procedure populates an in-memory table which we can read. Interestingly, it not only lists contents of requested directory, but recursively dives into subdirectories and lists them too. This is better explained at Christopher Poole's page.

  • One may also consider capability of DBMS_SCHEDULER to execute OS commands. We could run dir and redirect its output to temp file, then parse it for file names. Again, from shrinkwrap software point of view, this is hell. Think about points of failure starting with scheduler jobs stuck (say, because job_queue_processes is 1 and job it is currently running got stuck - a real situation witnessed), recall all the OS-es out there and their variations of ls or dir, then mediate on where to write the temp file, finally think about formats of output. The method may work on a particular database with particular OS, but supporting any platform? Forget it.
AlderPump equally supports the first two methods i.e. Java and PL/SQL. Java has little advantage in terms of operations as it returns files sizes along with names; PL/SQL is simpler to configure, manage, and remove. It also runs on databases where Java is not present. Both methods with their advantages and disadvantages explained in greater detail on AlderProgs site.

But again, the entire idea of installing something on server side sucks. We live with it, but we are less than happy about it.

Reading and writing files
Another challenge was file copying. We couldn't read the entire file into big BLOB and transfer it to the client; DataPump files can easily span to gigabytes and reading them all to memory is not a good idea. So, slash them to chunks. But to read chunk, one must have file handle, and how to preserve the handle between calls? Pass it as a parameter you've said? Well, in 10g+ handle is not a single number it used to be in 9i, it is 3-field structure (see utl_file package). Worse yet, it is PL/SQL type, not Oracle type. Passing PL/SQL structures is not easy, especially with limitations of Microsoft provider for Oracle.

The only robust method is to pass file name and open/close file every time next chunk is read or written. Chunk size is limited to less than 32K. Reading 2G file would result in 64000 open/close operations not considering reads/writes themselves. Not very efficient.

AlderPump is using a hack: we pass PL/SQL in string unpacking it prior to any file operation. Should Oracle change the structure we are doomed, but the risk is measured.

As a side note, it is sad Oracle file handls are not atomic any more. There could be pretty complex structures behind them (and they are), but values exposed to users should be as simple as possible. Making it structure, especially one which can't be easily passed to client, effectively kills either performance or compatibility.

Working with remote databases
One of the features considered for implementation was working via database links. Indeed, if we can access a database, and that database has links to other databases - why can't we run AlderPump jobs there or at least manage files? As it tuned out, we can't.

What killed it was Oracle policy about types. Simply put, the fact that two types on different databases have the same name does not guarantee they are same type. Sounds logical, right? Yes, but implementation lacks forethought. Even though both types belong to SYS schema and database versions are the same up to patch level - stubborn type system still considers them different. DataPump uses types for job status, dumpfile info, and other purposes, unfortunately this effectively kills remote capabilities.

This type compatibility problem is very common, Oracle should really do something about it. Simplest coming to mind is hash, or checksum, or other sort of signature. "Sign" type with a key, then compare keys to ensure types are the same.

Installer
There was major rework on the installer. It can now install fresh version, upgrade 1.x and 2.1 to 2.2, or repair existing 2.2 installation. Uninstaller allso got smarter. Most of unsinstallers out there only clean out files they've created. AlderPump uninstaller also wipes out temporary ones such as sqlnet.log created by SqlNet on connection failure. It hunts down and removes saved job templates too - although this may be too obsessive. Finally, there is option to remove license, say to transfer it to another machine. Preserved licenses are picked up automatically on next install, they remain valid for all 2.x versions free of charge. Owners of 1.x versions can upgrade their licenses for free.

Next release
We are mostly done planning features for next release. It is scheduled to ship in 4 to 6 months, the rate at which AlderPump version are normally shipped. Like with this release, all 2.x licenses will continue to work and owners of 1.x (should any remain) may contact sales for free license upgrade. More details will be posted closer to release date.

AlderPump Lite will remain free for everybody although with limited features.

No comments: