Quality covers
Quality covers is a pattern mining algorithm.
Install
pip3 install --upgrade quality_covers
Transactional file
If your file looks like this
chess.dat:
1 3 5 7 10
1 3 5 7 10
1 3 5 8 9
1 3 6 7 9
1 3 6 8 9
or
P30968
P48551 P17181
P05121 Q03405 P00747 P02671
Q02643
P48551 P17181
use
import quality_covers
quality_covers.run_classic_size("chess.dat", False)
Binary file
If your file looks like this
chess.data:
1 0 1 0 1 0 1 0 0 1
1 0 1 0 1 0 1 0 0 1
1 0 1 0 1 0 0 1 1 0
1 0 1 0 0 1 1 0 1 0
1 0 1 0 0 1 0 1 1 0
use
import quality_covers
quality_covers.run_classic_size("chess.data", True)
Output of the functions
The functions will create two files in current directory:
- chess.data.out: the result file
- chess.data.clock: information about time execution
You can obtain binary matrices by calling extract_binary_matrices
on the output file
quality_covers.extract_binary_matrices('chess.data.out')
Optional arguments
Threshold coverage
You can provide a threshold to the coverage.
quality_covers.run_classic_size("chess.data", True, 0.6)
Measures
You can also ask for information about measures:
- frequency
- monocle
- separation
- object uniformity
quality_covers.run_classic_size("chess.data", True, 0.6, True)
3,4,9 ; 4,5,6,7,8#Object Uniformity=0.81944; Monocole=91.00000; Frequency=0.33333; Separation=0.48387
2,9 ; 1,3,7#Object Uniformity=0.68750; Monocole=28.00000; Frequency=0.22222; Separation=0.27273
1,6,9 ; 2,7#Object Uniformity=0.63889; Monocole=28.00000; Frequency=0.33333; Separation=0.31579
# Mandatory: 0
# Non-mandatory: 3
# Total: 3
# Coverage: 25/38(65.78947%)
# Mean frequency: 0.29630
# Mean monocole: 49.00000
# Mean object uniformity: 0.71528
# Mean separation: 0.35746
Different algorithms
There are currently four different algorithms:
run_classic_size
run_approximate_size
run_fca_cemb_with_mandatory
run_fca_cemb_without_mandatory
Examples
Transactional file with 80% coverage and measures information with approximate size algorithm
Data file
1 3 5 7 10
1 3 5 7 10
1 3 5 8 9
1 3 6 7 9
1 3 6 8 9
1 4 5 7 10
1 4 5 7 10
1 4 5 8 9
1 4 6 7 9
1 4 6 8 9
2 3 5 7 10
2 3 5 7 10
2 3 5 8 9
2 3 6 7 9
2 3 6 8 9
2 4 5 7 10
2 4 5 7 10
2 4 5 8 9
2 4 6 7 9
2 4 6 8 9
Python commands
import quality_covers
quality_covers.run_approximate_size(file.data', True, 0.8, True)
Results file.data.out
1,2,6,7,11,12,16,17 ; 5,7,10#Object Uniformity=0.60000; Monocle=648.00000; Frequency=0.40000; Separation=0.50000
4,5,9,10,14,15,19,20 ; 9,6#Object Uniformity=0.40000; Monocle=352.00000; Frequency=0.40000; Separation=0.36364
3,5,8,10,13,15,18,20 ; 8,9#Object Uniformity=0.40000; Monocle=352.00000; Frequency=0.40000; Separation=0.36364
11,12,13,14,15,16,17,18,19,20 ; 2#Object Uniformity=0.20000; Monocle=228.00000; Frequency=0.50000; Separation=0.20000
6,7,8,9,10,16,17,18,19,20 ; 4#Object Uniformity=0.20000; Monocle=258.00000; Frequency=0.50000; Separation=0.20000
1,2,3,4,5,11,12,13,14,15 ; 3#Object Uniformity=0.20000; Monocle=258.00000; Frequency=0.50000; Separation=0.20000
# Mandatory: 0
# Non-mandatory: 6
# Total: 6
# Coverage: 82/100(82.00000%)
# Mean frequency: 0.45000
# Mean monocle: 349.33334
# Mean object uniformity: 0.33333
# Mean separation: 0.30455
import quality_covers
quality_covers.extract_binary_matrices('file.data.out')
Result binary matrices extent
1 0 0 0 0 1
1 0 0 0 0 1
0 0 1 0 0 1
0 1 0 0 0 1
0 1 1 0 0 1
1 0 0 0 1 0
1 0 0 0 1 0
0 0 1 0 1 0
0 1 0 0 1 0
0 1 1 0 1 0
1 0 0 1 0 1
1 0 0 1 0 1
0 0 1 1 0 1
0 1 0 1 0 1
0 1 1 1 0 1
1 0 0 1 1 0
1 0 0 1 1 0
0 0 1 1 1 0
0 1 0 1 1 0
0 1 1 1 1 0
Result binary matrices extent
The first line is the name of the column
5 7 10 9 6 8 2 4 3
1 1 1 0 0 0 0 0 0
0 0 0 1 1 0 0 0 0
0 0 0 1 0 1 0 0 0
0 0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 0 1
More info
Paper associated
To come
Research lab
More tools about association rules
Authors
Amira Mouakher (amira.mouakher@u-bourgogne.fr)
Nicolas Gros (nicolas.gros01@u-bourgogne.fr)
Sebastien Gerin (sebastien.gerin@sayens.fr)