1
$\begingroup$

I'm reading this paper (Tactic Learning and Proving for the Coq Proof Assistant written by Blaauwbroek et al). In section 3 (Proof Recording) it mentions one-shingles and two-shingles :

After creating a database of tactics and recorded proof states, we want to predict the correct tactic to make progress in a new, unseen proof state. For this, we must find a good characterization of the sentences present in a proof state. Currently, this characterization is a set of features. The features of a sentence are all of its one-shingles and two-shingles, meaning they are all identifiers and pairs of adjacent identifiers in the abstract syntax tree. For example, the sentence (x + y) + z = x + (y + z) will be converted to the feature set $$\{eq, plus, x, y, z, eq\_plus, plus\_x, plus\_y, plus\_z, eq\_z, eq\_x\}$$

What does a two-shingle mean? Here is my interpretation, first we need to draw the abstract syntax tree:

 =
 / \
 + +
 / \ | \
 + z x +
 / \ / \
 x y y z

Now, I assume one-shingles mean the set of all of the nodes, so it would be S1 = {eq, plus, x, y, z}. I also assume that two-shingles mean the node and it's parent, then it would be S2 = { eq plus, plus plus, plus x, plus y, plus z }.

How "eq_z" is considered a two shingle? I'm totally confused!

asked Oct 30 at 19:28
$\endgroup$
3
  • $\begingroup$ Please include title, authors and publisher of the paper. $\endgroup$ Commented Oct 31 at 21:18
  • $\begingroup$ title: "Tactic Learning and Proving for the Coq Proof Assistant" authors: "Blaauwbroek, Josef Urban, Herman Geuvers" $\endgroup$ Commented Nov 1 at 18:10
  • $\begingroup$ I suspect that's a typo, eq_z and eq_x shouldn't be there. You can email the authors for confirmation. $\endgroup$ Commented Nov 3 at 20:48

1 Answer 1

1
$\begingroup$

By the context you provided I suspect the author is talking about a way to automatically apply a tactic to a proof goal having some lexical property. In that case he/she is trying to solve some sort of matching problem.

The paper already told you what they are: "The features of a sentence are [...] all identifiers and pairs of adjacent identifiers in the abstract syntax tree." The notation and the language employed are definitely strange. By some books' standards (Dragon book, Modern compiler design) plain wrong.

Here, the author is finding all pairs of symbols appearing in the input string. In fact, "shingles" are lexems of language. "Two-shingles" are pair of lexems, but apparently, independently by the position of the adjacent lexem. This is why there is 'z=' and '=x'. To me, the mistake is the presence of '=+'.

I assume parentheses are not included because the precedence of the operations is encoded by the abstract syntax tree.

answered Nov 5 at 23:06
$\endgroup$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.