问题
Say I want to find the set of features/attributes that differentiate two classes in a simple matching manner can I use clpfd in prolog to do this?
c_s_mining(Features,Value):-
Features = [F1,F2,F3,F4],
Features ins 0..1,
ExampleA = [A1,A2,A3,A4],
ExampleB =[B1,B2,B3,B4],
ExampleC =[C1,C2,C3,C4],
A1 #=0, A2#=1,A3#=0,A4#=1,
B1 #=0, B2#=1,B3#=0,B4#=1,
C1 #=1, C2#=0,C3#=0,C4#=1,
ExampleD =[D1,D2,D3,D4],
ExampleE =[E1,E2,E3,E4],
ExampleQ =[Q1,Q2,Q3,Q4],
D1#=1,D2#=0,D3#=1,D4#=0,
E1#=1,E2#=0,E3#=1,E4#=0,
Q1#=0,Q2#=1,Q3#=1,Q4#=0,
Positives =[ExampleA,ExampleB,ExampleC],
Negatives = [ExampleD,ExampleE,ExampleQ],
TP in 0..sup,
FP in 0..sup,
covers(Features,Positives,TP),
covers(Features,Negatives,FP),
Value in inf..sup,
Value #= TP-FP.
covers(Features,Examples,Number_covered):-
findall(*,(member(E,Examples),E=Features),Covers), length(Covers,Number_covered).
Each example is described by four binary features, and there are three positive examples (A,B,C) and three negative examples (D,E,Q).
An example is covered by a set of selected features if they match.
So for example if Features
is unified with [0,1,0,1]
, then this will match two positives and 0 negatives.
I set Value
to be equal to TP
(true positives) - TN
(true negatives). I want to maximise Value and find the corresponding set of features.
I query ?-c_s_mining(Features,Value),labelling([max(Value)],[Value]).
The answer I expect is: Features =[0,1,0,1], Value =2
but I get Features =[_G1,_G2,_G3,G4],Value =0, G1 in 0..1, G2 in 0..1, G3 in 0..1, G4 in 0..1.
回答1:
Reification of CLP(FD) constraints
To reason about what is matched and what is not, use constraint reification: It allows you to reflect the truth value of a constraint into a CLP(FD) variable denoting a Boolean value.
You can perform arithmetic with such values to denote the number of matched examples etc.
For example, in your case, you can write:
:- use_module(library(clpfd)).
c_s_mining(Features, Value) :-
ExampleA = [0,1,0,1],
ExampleB = [0,1,0,1],
ExampleC = [1,0,0,1],
ExampleD = [1,0,1,0],
ExampleE = [1,0,1,0],
ExampleQ = [0,1,1,0],
same_length(Features, ExampleA),
Features ins 0..1,
Positives = [ExampleA,ExampleB,ExampleC],
Negatives = [ExampleD,ExampleE,ExampleQ],
covers_number(Features, Positives, TP),
covers_number(Features, Negatives, FP),
Value #= TP-FP.
covers_number(Features, Examples, Number):-
maplist(covers_(Features), Examples, Numbers),
sum(Numbers, #=, Number).
covers_([F1,F2,F3,F4], [E1,E2,E3,E4], Covered) :-
Covered #<==> (F1#=E1 #/\ F2#=E2 #/\ F3#=E3 #/\ F4#=E4).
And then use the optimisation options of labeling/2
to get largest values first:
?- c_s_mining(Fs, Value), labeling([max(Value)], Fs). Fs = [0, 1, 0, 1], Value = 2 ; Fs = [1, 0, 0, 1], Value = 1 ; Fs = [0, 0, 0, 0], Value = 0 ; etc.
Notice also that I have removed some superfluous constraints, such as Value in inf..sup
, since the constraint solver can figure them out on its own.
CLP(B): A declarative alternative for Boolean constraints
For the case of such Boolean patterns, also check out CLP(B): Constraint Logic Programming over Boolean variables, available for example in SICStus Prolog and SWI. Using CLP(B) requires you formulate the search a bit differently, since it lacks the powerful labeling options of CLP(FD). However, in contrast to CLP(FD), CLP(B) is complete and may detect inconsistencies as well as entailed constraints much earlier.
In the following code, I am using CLP(FD) to guide the search for optimal values, and then use CLP(B) to state the actual constraints. A final call of labeling/1
(note that this is from library(clpb)
, not to be confused with CLP(FD)'s labeling/2
) is used to ensure ground values for all CLP(B) variables. At the point it appears, it is only a formality in some sense: We already know that there is a solution at this point, thanks to CLP(B)'s completeness.
:- use_module(library(clpb)).
:- use_module(library(clpfd)).
c_s_mining(Features, Value):-
ExampleA = [0,1,0,1],
ExampleB = [0,1,0,1],
ExampleC = [1,0,0,1],
ExampleD = [1,0,1,0],
ExampleE = [1,0,1,0],
ExampleQ = [0,1,1,0],
same_length(Features, ExampleA),
Positives = [ExampleA,ExampleB,ExampleC],
Negatives = [ExampleD,ExampleE,ExampleQ],
[TP,FP] ins 0..3, % (in this case)
Value #= TP-FP,
labeling([max(Value)], [TP,FP]),
covers_number(Features, Positives, TP),
covers_number(Features, Negatives, FP),
labeling(Features).
covers_number(Features, Examples, Number):-
maplist(covers_(Features), Examples, Numbers),
sat(card([Number], Numbers)).
covers_([F1,F2,F3,F4], [E1,E2,E3,E4], Covered) :-
sat(Covered =:= ((F1=:=E1)*(F2=:=E2)*(F3=:=E3)*(F4=:=E4))).
来源:https://stackoverflow.com/questions/32565418/can-you-use-clpfd-to-implement-a-coverage-algorithm