System jobs, typically triggered via workflow, may have a status reason of ‘waiting’. There are valid uses of this such as when a wait or timeout condition is used within the workflow logic. There are many scenarios that can result in waiting system jobs either because the workflow fails due to environmental issues or simply is misconfigured by the user.

In previous versions of CRM, a high number of waiting system jobs could have an adverse impact on performance. This has become less of an issue in recent versions and especially with CRMOnline where the load (and problem) is seen to be Microsoft’s responsibility to deal with. Upgrade times can take longer if many of these system jobs exist and for many on-premise upgrades, a necessary step often involved the execution of SQL scripts to delete waiting system jobs to either enable or expedite the upgrade.

The CRMOnline user is impacted in other ways though, namely paying for needless storage of those waiting system jobs. Although not an exact science, in tests we have witnessed 120000 system jobs consuming anything between 2Gb and 4.5Gb of storage.

So why not use bulk delete to clear out those system jobs? Well the answer is the Microsoft Dynamics CRM platform does not allow waiting jobs to be deleted, they must first be cancelled and bulk delete simply can’t do this.

An enhancement to bulk delete was suggested on Microsoft Connect back in 2012 but the suggestion hasn’t yet made it in to core product.

Here at Gap Consulting we have decided to enhance our free community tool ‘The workflow executor’ (for CRM2011, CRM2013 and 2015) to be able to choose a system job view then cancel or resume the system jobs contained within that view that have a status reason of waiting. Note that in this initial release we are filtering any system job view so that only system jobs with a ‘system job type’ of ‘Workflow’ and status reason equal to ‘Waiting’ are counted and cancelled/resumed.

Should I cancel or resume?

There are certain scenarios where workflows will wait until the year 9999 simply because a data value referenced in the workflow logic doesn’t allow the workflow to complete. Notable examples are;

  • A ‘Send Email’ step tries to send an email to a contact/user/lead/account/email enabled entity that doesn’t have a valid email address. (system job advanced find search tip: use ‘Friendly Message’ contains ‘object address not found’)
  • A ‘Send Email’ step tries to send an email to a contact/lead/account that had ‘send marketing materials’ incorrectly set to ‘do not allow’. (system job advanced find search tip: use ‘Friendly Message’ contains ‘party is marked as non-emailable’)
  • A child workflow that’s referenced in a ‘start child workflow’ step wasn’t activated when the workflow was run. (system job advanced find search tip: use ‘Friendly Message’ contains ‘cannot run’)
  • Platform issues resulted in a generic error (system job advanced find search tip: use ‘Friendly Message’ contains ‘objectfaultedexception’)

 

In these cases, remedial action could be taken to populate email addresses, change marketing preferences or activate a child workflow then the system job if resumed, should progress past the previous failure point. For platform issues, simply resuming the system jobs during a period of less activity could result in the jobs succeeding.

The reasons for cancelling waiting workflows are typically more numerous and more often than not are rooted in bad workflow design. Examples;

  • Wait or timeout conditions that wait for a condition that will never be met
  • A wait/timeout condition failed to be met before the parent record becomes inactive (such as a case being closed or quote becoming  locked).
  • Emails are sent to contacts/leads/accounts that legitimately have marketing preferences set to do not allow emails/marketing materials to be sent.
  • Assign steps attempt to assign records to invalid users or teams.
  • Emails are sent from records that either don’t have an email address or don’t allow emails to be sent on their behalf.

Advanced find must be used to construct suitable system job views that identify system jobs synonymous with your problematic workflow logic. Once created however, you can fire up the workflow executor and commence cancelling!

Toggle the ‘Mode’ setting to ‘Cancel / Resume’.

Select the system job view (system and personal are available for selection). Note the number of waiting system jobs will display in the ‘record count’ area. This can take several seconds if counting hundreds of thousands of records.

Choose the action (cancel or resume)

Apply throttling as desired although we recommend you perform this action outside of core business hours and you will find the cancel action is typically imparts less load on the asynchronous processing service than triggering workflows.

Then hit Start!

 

The above example was a CRMOnline organisation which took around 16 minutes to cancel 121,716 system jobs.

Go to the Workflow Executor  page to download!