Isabelle: getting three different results with sledgehammer for what seems to be identical lemmas

泪湿孤枕 提交于 2019-12-07 03:04:28

The non-expert, short answer to your question is that your different versions generate different problems, which can be seen simply by doing a diff on the problem files that are generated with the overlord option, as I explain below.

Jasmin Blanchette is the main developer for Sledgehammer, but I haven't seen him on SO. He responds on the Isabelle user mailing list. Larry Paulson also has some doings with Sledgehammer.

I answer this question to provide an example of how to use the overlord option with sledgehammer. Vampire ATP on Windows doesn't work, and so in my reporting that to Jasmin a long time ago, he told me how to use overlord to generate a problem file to send to him, as I explain below.

I generate the problem files for your first two versions for the e prover.

I've pursued trying to understand Sledgehammer enough that I'm satisfied with a short answer like, "There's a language for e, Sledgehammer converts the lemma into a bunch of facts that are in the language of e, e does some magic and reports that back to Sledgehammer. A lemma stated slightly differently will result in Sledgehammer's algorithm generating a different set of facts. After all, we can many times prove a theorem more than one way also."

An ambitious person could match up what's in a problem file generated by Sledgehammer, with the language that e uses. That might make it obvious why a particular lemma, stated two different ways, results in two different problems for e.

There is a TUM link for e: The E Theorem Prover, and a Wiki link.

Now I give the details. The overlord option generates a problem file in my home folder named prob_e_1. It will report back with a file named prob_e_1_proof, but I don't concern myself with that.

(* Version 1 *)
notepad 
begin
  fix a :: "('a:: comm_ring_1) poly" 
  have "degree((monom 1 1) -CONST pCons a 0) =1"
sledgehammer[overlord=true, provers="e"]
oops

I do it again with your second version, and it overwrites the previous file:

(* Version 2 *)
notepad begin
 fix a :: "('a:: comm_ring_1) poly"
 def p ≡ "(monom 1 1) - CONST pCons a 0"
 from p_def have "degree p = 1"
sledgehammer[overlord=true, provers="e"]
oops

I had renamed the first file. The file sizes are different. I use the jEdit plugin JDiff to easily do a diff. They are different starting with the very first line, which is the command that e is given.

To show you what you would see, here are the first few lines of the first version problem:

% TIMEFORMAT='%3R'; { time (exec 2>&1; '/cygdrive/e/E_2/dev/Isabelle2013-2/contrib/e-1.8/x86-cygwin/eprover' --tstp-in --tstp-out --silent --split-clauses=4 --split-reuse-defs --simul-paramod --forward-context-sr --destructive-er-aggressive --destructive-er --presat-simplify --prefer-initial-clauses -winvfreqrank -c1 -Ginvfreqconjmax -F1 --delete-bad-limit=150000000 -WSelectMaxLComplexAvoidPosPred -H'(4*FunWeight(SimulateSOS,20,20,1.5,1.5,1,a2:0,degree_poly_a:0,minus_154650241poly_a:0,monom_poly_a:0,one_one_nat:0,one_one_poly_a:0,pCons_poly_a:0,zero_z2096148049poly_a:0,minus_1267152911poly_a:8,minus_1927295133poly_a:8,pCons_1263018438poly_a:8,minus_minus_a:9,minus_minus_poly_a:9,pCons_a:9,pCons_poly_poly_a:9,monom_1144868891atural:9,tt_poly_Code_natural:9,zero_z1864290105atural:9,zero_z2076475201atural:9,monom_poly_nat:9,tt_poly_nat:9,zero_z1059985641ly_nat:9,zero_zero_poly_nat:9,monom_67158909poly_a:9,tt_poly_poly_poly_a:9,zero_z1199790189poly_a:9,zero_z2064990175poly_a:9,monom_a:9,tt_a:9,zero_zero_a:9,zero_zero_poly_a:9,monom_poly_poly_a:10,tt_poly_poly_a:10,monom_nat:10,zero_zero_nat:10,monom_Code_natural:10,zero_z353611057atural:10,tt_poly_a:10,pCons_1690884498atural:11,tt_pol1445527701atural:11,pCons_poly_nat:11,tt_poly_poly_nat:11,tt_pol1146364953poly_a:12,pCons_nat:12,pCons_Code_natural:12,minus_1521903873atural:14,minus_minus_nat:14,one_one_poly_poly_a:15,one_one_a:16,one_on446885109atural:16,one_one_Code_natural:16,one_one_poly_nat:16,one_on392296739poly_a:16,degree_a:17,degree_Code_natural:18,degree_nat:18,degree_poly_poly_a:18,aa_poly_poly_a_bool:23,pp:23,aa_poly_a_bool:23,aa_pol1791115049l_bool:23,aa_poly_nat_bool:23,aa_pol1868968491a_bool:23,degree1502533245atural:24,degree_poly_nat:24,degree368812443poly_a:24,is_zero_poly_a:26,is_zero_a:26,is_zero_Code_natural:26,is_zero_nat:26,is_zero_poly_poly_a:26,pcompose_poly_a:27,pcompose_a:27,pcompo775675211atural:27,pcompose_nat:27,pcompose_poly_poly_a:27,power_276493840poly_a:27,power_power_poly_a:27,power_1061922746atural:28,power_power_poly_nat:28,power_1749536158poly_a:28,synthetic_div_poly_a:28,synthetic_div_a:28,synthe389744629atural:28,synthetic_div_nat:28,synthe1271504013poly_a:29,suc:29,coeff_a:30,coeff_poly_poly_a:30,coeff_nat:30,coeff_Code_natural:30,coeff_poly_a:30,coeff_2076392010atural:30,one_on1013003517atural:30,coeff_poly_nat:30,one_on1411366565ly_nat:30,coeff_36192014poly_a:30,one_on1584232881poly_a:30,poly_poly_a:31,poly_nat:31,poly_a:31,a:32,code_natural:32,fFalse:32,fTrue:32,nat:32,poly:32,tt_bool:32,undefi1030841758poly_a:32,undefi122925008atural:32,undefi1684997496ly_nat:32,undefi2131925448atural:32,undefi65090320poly_a:32,undefi880707458poly_a:32,undefined_a:32,undefined_poly_a:32,undefined_poly_nat:32,power_power_nat:36,aa_nat_bool:37,aa_nat_fun_nat_bool:37,code_natural_size:38,size_s686587580atural:38,ord_less_nat:39),3*ConjectureGeneralSymbolWeight(PreferNonGoals,200,100,200,50,50,1,100,1.5,1.5,1),1*Clauseweight(PreferProcessed,1,1,1),1*FIFOWeight(PreferProcessed))'  --term-ordering=KBO6 --cpu-limit=9 --proof-object=1 '/cygdrive/e/E_1/02-p/pi/home/.isabelle/Isabelle2013-2/prob_e_1' ) ; }
% This file was generated by Isabelle (most likely Sledgehammer)
% 2014-01-12 10:46:41.037

% Explicit typings (80)
fof(tsy_c_Groups_Ominus__class_Ominus_001t__Polynomial__Opoly_It__Polynomial__Opoly_It__Polynomial__Opoly_It__Polynomial__Opoly_Itf__a_J_J_J_J, axiom,
    ((![B_1, B_2]: tt_pol1146364953poly_a(minus_1927295133poly_a(B_1, B_2)) = minus_1927295133poly_a(B_1, B_2)))).
fof(tsy_c_Groups_Ominus__class_Ominus_001t__Polynomial__Opoly_It__Polynomial__Opoly_It__Polynomial__Opoly_Itf__a_J_J_J, axiom,
    ((![B_1, B_2]: tt_poly_poly_poly_a(minus_1267152911poly_a(B_1, B_2)) = minus_1267152911poly_a(B_1, B_2)))).
fof(tsy_c_Groups_Ominus__class_Ominus_001t__Polynomial__Opoly_It__Polynomial__Opoly_Itf__a_J_J, hypothesis,
    ((![B_1, B_2]: tt_poly_poly_a(minus_154650241poly_a(B_1, B_2)) = minus_154650241poly_a(B_1, B_2)))).

I converted your first version to use theorem instead of notepad, and the problem file is the same except for the timestamp, and about one other line.

(* Version 1 as theorem *)
theorem
  fixes a :: "('a:: comm_ring_1) poly"
  shows "degree((monom 1 1) -CONST pCons a 0) =1"
(*sledgehammer[overlord=true, provers="e"]*)
by(metis 
  One_nat_def degree_pCons_eq_if diff_0_right diff_pCons monom_0 
  monom_Suc one_poly_def zero_neq_one)

On Whether Sledgehammer Is Deterministic

Update 140112_1651

No experts have shown up on a Sunday evening, so I act as the document gofer by providing some Sledgehammer document links below.

First, though, I make 3 easy observations about whether Sledgehammer can be deterministic as defined by the OP, "in the sense that if you run sledgehammer consecutively several times (i.e., on the exactly same theorem definition) you get the same results". Other than his definition, I know nothing about determinism.

For Sledgehammer to be deterministic, I see that three things have to be deterministic, where I try to be general enough to be correct.

Also, there can be lots of multi-threading by the PIDE, and that can greatly affect how processing is allocated to anything and everything, which can affect whether a particular ATP will find a particular proof.

  1. Given a theorem statement and a fixed set of available facts (the prior theorems), for a particular automatic theorem prover (ATP), the Sledgehammer algorithm must always generate the same problem for the ATP. There is the initial problem, but also Sledgehammer may generate a series of problems for an ATP to try and eliminate unneeded facts. (It's probably more complicated, that some set of fixed problems will all produce the same result.)
  2. For a fixed problem that an ATP receives from Sledgehammer, the ATP proof attempt must always succeed or fail in the same way.
  3. The smt or metis methods, given the same facts, must always succeed or fail in the same way. This might be a no-brainer, but I list it.

Essentially, the value in my list above is that it might take away some of the mystery of what Sledgehammer is. What Sledgehammer is is a team effort between the Sledgehammer algorithm that generates a problem, 3rd-party ATPs, and the smt or metis proof methods.

Sledgehammer, out of thousands of theorems available, finds applicable theorems and uses them to give an ATP a problem in the language of the ATP. The ATP performs its logical magic with the theorems, and it reports back to Sledgehammer. Sledgehammer reads the return message from the ATP, and if a proof was found, it suggests in the output panel a smt or metis proof that can be used. In the process, Sledgehammer may play back proofs on its own to eliminate unneeded facts, or whatever various things it can do based on the Sledgehammer options chosen.

Here is Larry Paulson's web page with documents specific to Isabelle and ATPs:

Linking Isabelle with Automated Theorem Provers

At the top are various links to different pages of publications, some of which drop down.

Here is Jasmin Blanchette's publication page, which happens to contain all his stuff, so I link to his workshop papers section, and the one paper by Paulson and Blanchette which I found to be an easy overview to read.

We can assume that the Sledgehammer experts could answer 1 and 3 in my list. They're also going to be familiar with much of what the ATPs do, but they probably don't concern themselves with knowing everything about the ATPs, other than what's necessary to prevent things like inconsistencies.

Many statements have multiple proofs. When you transform a statement as you've done here, you affect the direction of the search, and in particular, the relevance filter. Determinism was never a design goal for sledgehammer.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!