BPM

Documentum’s BPM is one of those technologies which business can’t wait to implement. The evolution from the Workflow Manager to this is quite remarkable. And, in all fairness, I should also say that its gotten quite temperamental over time, rightly so, at the rate at which the features are being added in.

I had the privilege to run around with the EMC Engineering folks to get couple of BPM bugs fixed. And this gets especially annoying when you realize that BPM is a different group from Content Server and that one can point to another. On the same note, I should also admit that EMC Support try their best to keep all those battles internal to EMC.

One of the errors which long bothered me is — “Failed to launch due to an operating system error: No Error.” This error just plain pisses me off. Now, come on, give some reasonable error messages and especially if this error has been around for many versions now. So, lets talk about how we could go about debugging BPM.

The key for debugging BPM is to isolate the issue to BPM and BPM alone. A classic example — Developer is trying to run this WF: {start –> BPM apply LC –> end}. And the apply LC goes into Paused. The first thing as a developer you want to make sure is to differentiate between functionality and the driver which is utilizing it. In this case, the functionality is the apply LC and the driver is BPM LC attach activity. Lets check that the functionality is working by itself. Use DA to apply LC on that same object and see if its working. If its not, forget BPM, check your LC. Otherwise, its the driver that’s at fault – BPM apply LC.

Once you identified that its the BPM, here are some suggestions:

* First and foremost, please check your docbase config object to make sure that the worker threads (3) and sleep interval (5) have valid values [select r_object_id, object_name, wf_agent_worker_threads, wf_sleep_interval from dm_server_config]
* Take advantage of the WF tracing option.

– Turn on: apply,c,NULL,SET_OPTIONS,OPTION,S,trace_workflow_agent,VALUE,B,T
– Turn off: apply,c,NULL,SET_OPTIONS,OPTION,S,trace_workflow_agent,VALUE,B,F

* If you know the specific activity that is causing the problem, be sure to turn the trace on for the activity method.

– retrieve,c,dm_method where object_name = ”
– set,c,l,trace_launch,T

* If the issue is related to business policy, please look in DA->Administration->configuration->Repository and see if for your repository attribute — Run Actions — is set as Super User.
* If the WF activities are go into PENDING, there are several items to look at:

– Is your FAST running? Otherise the dmi_queue_item table could have grown significantly large that you are seeing noticeable delays
– Is your DB response timely? Otherwise, the activity might be merely waiting on the request to the DB.
– If there is a firewall between Documentum and the DB, please review the firewall logs and confirm that the TCP packet state from the Content Server is valid. Also, look into timeouts on the three layers – Documentum, firewall & DB.
– get the process id for WF agent threads and run the query in the DB to get the session info. This should return one master thread which has a valid session and the worker threads with or without sessions assigned. Now kick off as many WF’s as there are worker threads and when it goes into Pending, check to see if all the worker threads have sessions assigned. If not, there ‘s your culprit. (Oracle — select sid, SERIAL# from v$session where process = ‘
‘)

* If your WF activity is in PAUSED, it most probably encountered an error. If you are using LC Promote, check the objects previous state before it reaches this activity. You might wish to ignore this error and in which case take advantage of “If method fails, Continue execution” at the activity configuration. Also dump the activity and check its r_runtime_state –> 0, meaing dormant; 1, meaning running; 2, meaning finished; 3, meaning halted; 4, meaning terminated. And review Docbase and JMS logs with tracing turned on.
* If WF activities go into PAUSED Rrandomly — look in the “method times out in” in activity configuration. If the server becomes very busy at certain point of the day, the JMS response to the method execution could potentially go over 60 seconds (default setting) causing this error.
* And last but not the least, there’s always our good old DMCL trace if you are in 5.3 or earlier. And ofcourse, don’t forget to review the session logs.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: