Hackerrank Matchstick experiment

Question 1

The problem is regarding probability. The problem is given in detail here.

In an \$m \times n\$ grid, \2ドルmn -m -n\$ matchsticks are placed at the boundaries between cells.

enter image description here

We can play a chance experiment where each matchstick can be removed with a certain probability \$p\$. We can define the ratio between number of cells with area less than or equal to 3 and the total number of cells pre-experiment (\$m \times n\$) as score. We need to find the expectation of this score for a given grid specified by \$m\$, \$n\$ and a given probability \$p\$.

enter image description here

The score of the instance in fig.2 is: $$ \text{score} = \frac{16}{9 \times > 5} = 0.3555 $$

It is sufficient if we calculate the expected value of the score within an absolute error of 1e-9.

The code I have produced till now is:

import math
class Cell:
 def __init__(self, Row, Coloumn):
 self.Row = Row
 self.Coloumn = Coloumn
 self.Name = 'CellR'+str(Row)+'C'+str(Coloumn)
 self.UNeighbor = None
 self.DNeighbor = None
 self.LNeighbor = None
 self.RNeighbor = None
 self.UConnection = False
 self.DConnection = False
 self.LConnection = False
 self.RConnection = False 
class Grid:
 def __init__(self, m, n):
 self.Rows = m
 self.Coloumns = n
 self.Cells = [[Cell(i,j) for j in range(n)] for i in range(m)]
 for i in range(m):
 for j in range(n):
 if(i != 0):
 self.Cells[i][j].UNeighbor = self.Cells[i - 1][j]
 if(i != m-1):
 self.Cells[i][j].DNeighbor = self.Cells[i + 1][j]
 if(j != 0):
 self.Cells[i][j].LNeighbor = self.Cells[i][j - 1]
 if(j != n-1):
 self.Cells[i][j].RNeighbor = self.Cells[i][j + 1]
 def Remove(self, BoundaryStatuses):
 m = self.Rows
 n = self.Coloumns
 for i in range(m):
 for j in range(2*n-1):
 ThisCell = self.Cells[i][math.floor(j/2)]
 if(i != m-1):
 if(j %2 == 0):
 if(BoundaryStatuses[i][j] == 1):
 ThisCell.DConnection = True 
 ThisCell.DNeighbor.UConnection = True
 else:
 if(BoundaryStatuses[i][j] == 1):
 ThisCell.RConnection = True
 ThisCell.RNeighbor.LConnection = True
 else:
 if(j%2 != 0):
 if(BoundaryStatuses[i][j] == 1):
 ThisCell.RConnection = True
 ThisCell.RNeighbor.LConnection = True 
def Connections(Grid, x, y, Partial):
 ThisCell = Grid.Cells[x][y]
 Partial.add(ThisCell.Name)
 if(ThisCell.LConnection & (ThisCell.LNeighbor != None)):
 if(ThisCell.LNeighbor.Name not in Partial):
 Partial.union(Connections(Grid, x, y-1, Partial))
 if(ThisCell.RConnection & (ThisCell.RNeighbor != None)):
 if(ThisCell.RNeighbor.Name not in Partial):
 Partial.union(Connections(Grid, x, y+1, Partial))
 if(ThisCell.UConnection & (ThisCell.UNeighbor != None)):
 if(ThisCell.UNeighbor.Name not in Partial):
 Partial.union(Connections(Grid, x-1, y, Partial))
 if(ThisCell.DConnection & (ThisCell.DNeighbor != None)):
 if(ThisCell.DNeighbor.Name not in Partial):
 Partial.union(Connections(Grid, x+1, y, Partial))
 return Partial
def ConnectedRegions(Grid):
 ListOfRegions = []
 for i in range(Grid.Rows):
 for j in range(Grid.Coloumns):
 ThisCell = Grid.Cells[i][j]
 Accounted = False
 for Region in ListOfRegions:
 if(ThisCell.Name in Region):
 Accounted = True
 if(not(Accounted)):
 NewRegion = Connections(Grid, i, j, set())
 ListOfRegions.append(NewRegion)
 return ListOfRegions
def CalcScore(ListOfRegions, m, n):
 score = 0
 for Region in ListOfRegions:
 if(len(Region) <= 3):
 score = score + 1
 return score/(m*n)
def CalcExp(m, n, p):
 NoS = 2*m*n -m -n
 NoP = 2**(NoS)
 ListOfScores = []
 ProbsOfScores = []
 for i in range(NoP):
 BoundaryStatuses = [[0 for iy in range(2*n-1)] for ix in range(m)]
 quo = i
 NoSR = 0
 for ix in range(m):
 for iy in range(2*n-1):
 if(ix != m-1):
 BoundaryStatuses[ix][iy] = (quo%2)
 if(quo %2 == 1):
 NoSR = NoSR + 1 
 quo = int(quo/2) 
 else:
 if(iy %2 != 0):
 BoundaryStatuses[ix][iy] = (quo%2)
 if(quo %2 == 1):
 NoSR = NoSR + 1
 quo = int(quo/2)
 MatchGrid = Grid(m, n)
 MatchGrid.Remove(BoundaryStatuses)
 ListOfRegions = ConnectedRegions(MatchGrid)
 score = CalcScore(ListOfRegions, m, n)
 ProbOfInstance = (p**(NoSR))*((1-p)**(NoS - NoSR))
 if(score not in ListOfScores):
 ListOfScores.append(score)
 ProbsOfScores.append(ProbOfInstance)
 else:
 idx = ListOfScores.index(score)
 ProbsOfScores[idx] = ProbsOfScores[idx] + ProbOfInstance
 print(ListOfScores)
 print(ProbsOfScores)
 Exp = 0
 for idx,score in enumerate(ListOfScores):
 Exp = Exp + ProbsOfScores[idx]*score
 return Exp
q = 1
for q_itr in range(q):
 m = 4
 n = 3
 p = 0.9
 Exp = CalcExp(m, n, p)
 print(Exp)

My reasoning behind it is:

For an \$m\$ by \$n\$ grid, there are \$n_{ms} = 2mn -m -n\$ matchsticks (provided in the problem & represented in the code as NoS). For each stick, there are two possibilities - it can either be removed or can be retained (with a probability of \$p\$ or \1ドル-p\$). Therefore, there are \2ドル^{n_{ms}}\$ possible states of the Grid post experiment.
The expectation of the score is (from definition): $$E[score] = \sum x.Pr(score = x)$$ As I am unable to come up with an estimate of the probability of landing on a particular score. I have decided to go the other way around by searching through all the possibilities as: $$E[score] = \sum_{i=0}^{2^{n_{ms}}-1} score_i \times Pr(case_i)$$ Here every case is one of the possibilities listed in Step 1.
To do this, I can generate numbers from \0ドル\$ till \2ドル^{n_{ms}}\$ in binary. If each digit in the binary number represents the presence/absence of a particular matchstick, then if I remove/retain matchsticks according to the binary string (Remove method of the Grid class) I will have simulated the entire space of possibilities. For every case I can compute the score (\$score_i\$) with ConnectedRegions and CalcScore functions if I have the corresponding Grid.
For a particular case in step 3 (say, case \$i\$, \$i \in Z\$ & \$i \in [0, 2^{n_{ms}})\$), there will be \$n_r\$ sticks that are removed (represented in code as NoSR) and \$n_{ms}-n_r\$ sticks that are retained (basically the number of 1s and 0s in the binary representation of \$i\$). The probability of this particular case to occur is \$p^{n_r}(1-p)^{n_{ms} - n_r}\$ (which is nothing but \$Pr(case_i)\$). Now finally to compute the expected score, we just need to list the different scores and their corresponding probabilities to plug into the expression in Step 2.

It works as expected. However, in the question we are required to find this value correct to 9 decimal places. My code doesn't exploit this. Also, it is incredibly slow.

Can you help me with math ideas or alternate algorithms that can help me leverage the accuracy requirement into speed? Is my code style consistent enough? Or does it cause much confusion?

Question 2

Hi, I just want to point out I think this is a great first question, welcome to Code Review :)

Question 3

Thank you very much. Question credit goes to Hackerrank :)

Question 4

The question may come from HackerRank, but you've written your post in a great way, that's what I wanted to congratulate you about. There are explanations, code is there, everything's sharp

Question 5

As long as there are no answers to your post I believe it is fine to add things, just try not to add too much as it may break an answer someone might be writting at the moment for example

Question 6

math

... unable to come up with an estimate of the probability of landing on a particular score

This is the max-3-cell problem in a two-dimensional setting. I wonder if we could make analytic progress against this by considering the max-2-cell problem in a one-dimensional setting. There must be some sort of symmetry we can appeal to. Considering the density on an infinite line or an infinite plane would let us ignore the complexities of boundary conditions, at least for a little while.

code

 def __init__(self, Row, Coloumn):

PEP-8 asks that you not use caps for such variables. Prefer snake_case names like u_neighbor or u_connection. Run $ flake8, and heed its advice, before sharing your code. Also, typo: column.

You made a modeling choice here, that the Von Neumann neighborhood should be represented as union of four distinct unrelated things. I don't agree with that perspective. I would prefer to see a vector of four similar things, where you iterate through this list:

[(0, -1),
 (0, 1),
 (-1, 0),
 (1, 0)]

You wrote:

 self.Name = 'CellR'+str(Row)+'C'+str(Coloumn)

If a modern python is available to you, it would be more natural to use an f-string:

 self.name = f'CellR{row}C{column}'

Here is some tediously repetitive code to express the notion of adjacency:

 if(i != 0):
 self.Cells[i][j].UNeighbor = self.Cells[i - 1][j]
 if(i != m-1):
 self.Cells[i][j].DNeighbor = self.Cells[i + 1][j]
 if(j != 0):
 self.Cells[i][j].LNeighbor = self.Cells[i][j - 1]
 if(j != n-1):
 self.Cells[i][j].RNeighbor = self.Cells[i][j + 1]

Recommend using a single assignment that loops through the von_neumann list of delta x's & y's above.

This is obscure:

 for i in range(m):
 for j in range(2 * n - 1):
 ThisCell = self.Cells[i][math.floor(j / 2)]

It's not clear why there's no symmetry between horizontal and vertical. At a minimum it warrants a comment.

 if (i != m - 1):

This is lovely C bounds checking, and you can certainly keep it. But consider following the pythonic "ask forgiveness, not permission", by using try / catch IndexError to deal with the borders. Same remark for CalcExp.

Also, no need for ( extra parens ) in a python if statement.

Couple of issues with this one:

if (ThisCell.LConnection & (ThisCell.LNeighbor != None)):

No ( extra parens ), please.
Do not use bitwise & and on booleans. Use boolean and instead.
flake8 would have told you to use is to test identity of the None singleton, rather than an equality operator.

So that would leave us with:

if this_cell.l_connection and this_cell.l_neighbor is not None:

or even more simply:

if this_cell.l_connection and this_cell.l_neighbor:

Similarly a subsequent line would be if not accounted:

More tedious statements follow, that could be a loop through von_neumann.

In def connected_regions, this takes too long:

 Accounted = False
 for Region in ListOfRegions:
 if(ThisCell.Name in Region):
 Accounted = True

Please use a generator expression:

 accounted = any(this_cell.name in region
 for region in list_of_regions)

Why? Because any() will bail out early, upon encountering 1st True.

Kudos for using set(), so the in test will be fast.

 score = score + 1

In python we prefer to phrase that as score += 1.

The name calc_exp surprisingly turns out to relate to expectations, rather than math.exp or exponentials.

This is a bit cumbersome:

ListOfScores = []
ProbsOfScores = []

Better to store a single list of (score, prob) tuples. It would simplify some manipulations farther down in the function, including allowing you to finish with return sum( ... ).

visualization

Consider writing code that depicts how a given configuration of matchsticks turned out, with markings of the 1-, 2-, & 3-cell regions. This would facilitate manual checking of your results for a couple of particular configurations.

In a similar vein, unit tests using very small number of matchsticks wouldn't hurt.

high level advice

Run flake8. Follow its instructions. This will make your code more easily understood by people who interact with it.

J_H J_H 41.8k3 gold badges38 silver badges157 bronze badges · Accepted Answer · 2019-03-31 19:32:02Z

math

... unable to come up with an estimate of the probability of landing on a particular score

This is the max-3-cell problem in a two-dimensional setting. I wonder if we could make analytic progress against this by considering the max-2-cell problem in a one-dimensional setting. There must be some sort of symmetry we can appeal to. Considering the density on an infinite line or an infinite plane would let us ignore the complexities of boundary conditions, at least for a little while.

code

 def __init__(self, Row, Coloumn):

PEP-8 asks that you not use caps for such variables. Prefer snake_case names like u_neighbor or u_connection. Run $ flake8, and heed its advice, before sharing your code. Also, typo: column.

You made a modeling choice here, that the Von Neumann neighborhood should be represented as union of four distinct unrelated things. I don't agree with that perspective. I would prefer to see a vector of four similar things, where you iterate through this list:

[(0, -1),
 (0, 1),
 (-1, 0),
 (1, 0)]

You wrote:

 self.Name = 'CellR'+str(Row)+'C'+str(Coloumn)

If a modern python is available to you, it would be more natural to use an f-string:

 self.name = f'CellR{row}C{column}'

Here is some tediously repetitive code to express the notion of adjacency:

 if(i != 0):
 self.Cells[i][j].UNeighbor = self.Cells[i - 1][j]
 if(i != m-1):
 self.Cells[i][j].DNeighbor = self.Cells[i + 1][j]
 if(j != 0):
 self.Cells[i][j].LNeighbor = self.Cells[i][j - 1]
 if(j != n-1):
 self.Cells[i][j].RNeighbor = self.Cells[i][j + 1]

Recommend using a single assignment that loops through the von_neumann list of delta x's & y's above.

This is obscure:

 for i in range(m):
 for j in range(2 * n - 1):
 ThisCell = self.Cells[i][math.floor(j / 2)]

It's not clear why there's no symmetry between horizontal and vertical. At a minimum it warrants a comment.

 if (i != m - 1):

This is lovely C bounds checking, and you can certainly keep it. But consider following the pythonic "ask forgiveness, not permission", by using try / catch IndexError to deal with the borders. Same remark for CalcExp.

Also, no need for ( extra parens ) in a python if statement.

Couple of issues with this one:

if (ThisCell.LConnection & (ThisCell.LNeighbor != None)):

No ( extra parens ), please.
Do not use bitwise & and on booleans. Use boolean and instead.
flake8 would have told you to use is to test identity of the None singleton, rather than an equality operator.

So that would leave us with:

if this_cell.l_connection and this_cell.l_neighbor is not None:

or even more simply:

if this_cell.l_connection and this_cell.l_neighbor:

Similarly a subsequent line would be if not accounted:

More tedious statements follow, that could be a loop through von_neumann.

In def connected_regions, this takes too long:

 Accounted = False
 for Region in ListOfRegions:
 if(ThisCell.Name in Region):
 Accounted = True

Please use a generator expression:

 accounted = any(this_cell.name in region
 for region in list_of_regions)

Why? Because any() will bail out early, upon encountering 1st True.

Kudos for using set(), so the in test will be fast.

 score = score + 1

In python we prefer to phrase that as score += 1.

The name calc_exp surprisingly turns out to relate to expectations, rather than math.exp or exponentials.

This is a bit cumbersome:

ListOfScores = []
ProbsOfScores = []

Better to store a single list of (score, prob) tuples. It would simplify some manipulations farther down in the function, including allowing you to finish with return sum( ... ).

visualization

Consider writing code that depicts how a given configuration of matchsticks turned out, with markings of the 1-, 2-, & 3-cell regions. This would facilitate manual checking of your results for a couple of particular configurations.

In a similar vein, unit tests using very small number of matchsticks wouldn't hurt.

high level advice

Run flake8. Follow its instructions. This will make your code more easily understood by people who interact with it.

Stack Exchange Network

Hackerrank Matchstick experiment

1 Answer 1

math

code

visualization

high level advice

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Hackerrank Matchstick experiment

1 Answer 1

math

code

visualization

high level advice

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions