PDPM

Probabilistic Declarative Process Mining

This site contains supplementary material to the paper Probabilistic Declarative Process Mining by Elena Bellodi, Fabrizio Riguzzi and Evelina Lamma.

It presents an approach that first uses the DPML (Declarative Process Model Learner) [Lamma, et al., 2007] algorithm to extract a process model as a set of integrity constraints (ICs) in first-order logic from a collection of traces. Then, the constraints are translated into Markov Logic formulas and the weights of each formula are tuned using the Alchemy system. The resulting theory allows to perform probabilistic classification of unseen traces.

Code

Here you can find the YAP Prolog source code of the Declarative Process Model Learner (DPML) for learning ICs theories. YAP Prolog can be found here.
The weight learning and inference algorithms for Markov Logic Networks can be found in the CVS version of Alchemy http://alchemy.cs.washington.edu/alchemy1.html.

Mining Technique

(1) First, decompress the dpml_src.zip file in a folder. You will find:

dpml.pl (the main file implementing functions DPML and FindBestIC of [Lamma, et al., 2007])
ref_op.pl (implementing clause refinements in FindBestIC)

Add to the folder the following three files with just alike names:

<name>.kb: contains the example interpretations (training log)
<name>.bg: contains the background knowledge
<name>.l: contains language bias information

where <name> is the dataset name. Examples of these files can be found in the DPML training logs and DPML bg_languagebias directories of the zip file downloadable here concerning three different logs (NetBill [Cox, et al., 1995], Cervical cancer screening and Students’ careers), where they are referred to 5 fold cross-validation experiments. The .kb and .l files are common to every fold.

To execute DPML on one fold, load dpml.pl with YAP and call:
?- i(<name>).

The resulting ICs theory can be found in the output file <name>.icl.out in the same folder. An example of this file can be found in Netbill\DPML bg_languagebias.

(2) Second, translate the integrity constraints of <name>.icl.out into a Markov Logic Network (.mln file) and the training dataset .kb into the format required by Alchemy (.db file). The Alchemy syntax is illustrated at http://alchemy.cs.washington.edu/user-manual/manual.html. Examples of these two files (for one fold) can be found in the “Alchemy weight learning” directories of the logs.

To execute weight learning on one fold, put in the same folder the .mln file and the training .db file and run:
learnwts -i <name>.mln -o <output_name>.mln -t <train_name>.db -ne Neg -noAddUnitClauses
where Neg identifies the negative example interpretations. The resulting weighted theory can be found in the output file <output_name>.mln in the same folder; an example of this file can be found in Netbill\Alchemy weight learning.

(3) Third, to make inference, add to the folder of (2) the testing .db file and run:
infer -ms -i <output_name>.mln -r <output_name>.result -e <test_name>.db -q Neg

The resulting marginal probabilities (for one fold) can be found in the output file .result. Examples of the testing files (for all folds in Alchemy format) can be found in the “Alchemy test logs” directories of the logs. An example of output file can be found in Netbill\test logs.

To make inference on the ICs theory (purely logical model) add a period at the end of the MLN clauses (after “Neg(i)”) of point (2) and call directly infer on this .mln file.

References
[Lamma, et al., 2007] Lamma, E., Mello, P., Riguzzi, F., Storari, S.: Applying inductive logic programming to process mining. Proceedings of the 17th International Conference on Inductive Logic Programming, ILP 2007. pp. 132-146. No. 4894 in LNAI, Springer, Heidelberg, Germany.

[Cox, et al., 1995] Cox, B., Tygar, J., and Sirbu, M. (1995) NetBill security and transaction protocol. Proceedings of the First USENIX Workshop on Electronic Commerce (WOEC 1995), New York, NY, USA, 11-12 July, pp.77{88. USENIX Association, Berkeley, CA, USA

Bibliography
Elena Bellodi, Fabrizio Riguzzi, and Evelina Lamma. Probabilistic declarative process mining. In Yaxin Bi and Mary-Anne Williams, editors, Proceedings of the 4th International Conference on Knowledge Science, Engineering & Management (KSEM 2010), Belfast, UK, September 1-3, 2010, volume 6291 of Lecture Notes in Computer Science, pages 292-303, Heidelberg, Germany, 2010. © Springer, Springer. The original publication is available at http://www.springerlink.com. [ bib | DOI | .pdf | http ]