The problem is regarding probability. The problem is given in detail here.
In an \$m \times n\$ grid, \2ドルmn -m -n\$ matchsticks are placed at the boundaries between cells.
We can play a chance experiment where each matchstick can be removed with a certain probability \$p\$. We can define the ratio between number of cells with area less than or equal to 3 and the total number of cells pre-experiment (\$m \times n\$) as score. We need to find the expectation of this score for a given grid specified by \$m\$, \$n\$ and a given probability \$p\$.
The score of the instance in fig.2 is: $$ \text{score} = \frac{16}{9 \times > 5} = 0.3555 $$
It is sufficient if we calculate the expected value of the score within an absolute error of 1e-9.
The code I have produced till now is:
import math
class Cell:
def __init__(self, Row, Coloumn):
self.Row = Row
self.Coloumn = Coloumn
self.Name = 'CellR'+str(Row)+'C'+str(Coloumn)
self.UNeighbor = None
self.DNeighbor = None
self.LNeighbor = None
self.RNeighbor = None
self.UConnection = False
self.DConnection = False
self.LConnection = False
self.RConnection = False
class Grid:
def __init__(self, m, n):
self.Rows = m
self.Coloumns = n
self.Cells = [[Cell(i,j) for j in range(n)] for i in range(m)]
for i in range(m):
for j in range(n):
if(i != 0):
self.Cells[i][j].UNeighbor = self.Cells[i - 1][j]
if(i != m-1):
self.Cells[i][j].DNeighbor = self.Cells[i + 1][j]
if(j != 0):
self.Cells[i][j].LNeighbor = self.Cells[i][j - 1]
if(j != n-1):
self.Cells[i][j].RNeighbor = self.Cells[i][j + 1]
def Remove(self, BoundaryStatuses):
m = self.Rows
n = self.Coloumns
for i in range(m):
for j in range(2*n-1):
ThisCell = self.Cells[i][math.floor(j/2)]
if(i != m-1):
if(j %2 == 0):
if(BoundaryStatuses[i][j] == 1):
ThisCell.DConnection = True
ThisCell.DNeighbor.UConnection = True
else:
if(BoundaryStatuses[i][j] == 1):
ThisCell.RConnection = True
ThisCell.RNeighbor.LConnection = True
else:
if(j%2 != 0):
if(BoundaryStatuses[i][j] == 1):
ThisCell.RConnection = True
ThisCell.RNeighbor.LConnection = True
def Connections(Grid, x, y, Partial):
ThisCell = Grid.Cells[x][y]
Partial.add(ThisCell.Name)
if(ThisCell.LConnection & (ThisCell.LNeighbor != None)):
if(ThisCell.LNeighbor.Name not in Partial):
Partial.union(Connections(Grid, x, y-1, Partial))
if(ThisCell.RConnection & (ThisCell.RNeighbor != None)):
if(ThisCell.RNeighbor.Name not in Partial):
Partial.union(Connections(Grid, x, y+1, Partial))
if(ThisCell.UConnection & (ThisCell.UNeighbor != None)):
if(ThisCell.UNeighbor.Name not in Partial):
Partial.union(Connections(Grid, x-1, y, Partial))
if(ThisCell.DConnection & (ThisCell.DNeighbor != None)):
if(ThisCell.DNeighbor.Name not in Partial):
Partial.union(Connections(Grid, x+1, y, Partial))
return Partial
def ConnectedRegions(Grid):
ListOfRegions = []
for i in range(Grid.Rows):
for j in range(Grid.Coloumns):
ThisCell = Grid.Cells[i][j]
Accounted = False
for Region in ListOfRegions:
if(ThisCell.Name in Region):
Accounted = True
if(not(Accounted)):
NewRegion = Connections(Grid, i, j, set())
ListOfRegions.append(NewRegion)
return ListOfRegions
def CalcScore(ListOfRegions, m, n):
score = 0
for Region in ListOfRegions:
if(len(Region) <= 3):
score = score + 1
return score/(m*n)
def CalcExp(m, n, p):
NoS = 2*m*n -m -n
NoP = 2**(NoS)
ListOfScores = []
ProbsOfScores = []
for i in range(NoP):
BoundaryStatuses = [[0 for iy in range(2*n-1)] for ix in range(m)]
quo = i
NoSR = 0
for ix in range(m):
for iy in range(2*n-1):
if(ix != m-1):
BoundaryStatuses[ix][iy] = (quo%2)
if(quo %2 == 1):
NoSR = NoSR + 1
quo = int(quo/2)
else:
if(iy %2 != 0):
BoundaryStatuses[ix][iy] = (quo%2)
if(quo %2 == 1):
NoSR = NoSR + 1
quo = int(quo/2)
MatchGrid = Grid(m, n)
MatchGrid.Remove(BoundaryStatuses)
ListOfRegions = ConnectedRegions(MatchGrid)
score = CalcScore(ListOfRegions, m, n)
ProbOfInstance = (p**(NoSR))*((1-p)**(NoS - NoSR))
if(score not in ListOfScores):
ListOfScores.append(score)
ProbsOfScores.append(ProbOfInstance)
else:
idx = ListOfScores.index(score)
ProbsOfScores[idx] = ProbsOfScores[idx] + ProbOfInstance
print(ListOfScores)
print(ProbsOfScores)
Exp = 0
for idx,score in enumerate(ListOfScores):
Exp = Exp + ProbsOfScores[idx]*score
return Exp
q = 1
for q_itr in range(q):
m = 4
n = 3
p = 0.9
Exp = CalcExp(m, n, p)
print(Exp)
My reasoning behind it is:
For an \$m\$ by \$n\$ grid, there are \$n_{ms} = 2mn -m -n\$ matchsticks (provided in the problem & represented in the code as
NoS
). For each stick, there are two possibilities - it can either be removed or can be retained (with a probability of \$p\$ or \1ドル-p\$). Therefore, there are \2ドル^{n_{ms}}\$ possible states of the Grid post experiment.The expectation of the score is (from definition): $$E[score] = \sum x.Pr(score = x)$$ As I am unable to come up with an estimate of the probability of landing on a particular score. I have decided to go the other way around by searching through all the possibilities as: $$E[score] = \sum_{i=0}^{2^{n_{ms}}-1} score_i \times Pr(case_i)$$ Here every case is one of the possibilities listed in Step 1.
To do this, I can generate numbers from \0ドル\$ till \2ドル^{n_{ms}}\$ in binary. If each digit in the binary number represents the presence/absence of a particular matchstick, then if I remove/retain matchsticks according to the binary string (
Remove
method of theGrid
class) I will have simulated the entire space of possibilities. For every case I can compute the score (\$score_i\$) withConnectedRegions
andCalcScore
functions if I have the corresponding Grid.For a particular case in step 3 (say, case \$i\$, \$i \in Z\$ & \$i \in [0, 2^{n_{ms}})\$), there will be \$n_r\$ sticks that are removed (represented in code as
NoSR
) and \$n_{ms}-n_r\$ sticks that are retained (basically the number of 1s and 0s in the binary representation of \$i\$). The probability of this particular case to occur is \$p^{n_r}(1-p)^{n_{ms} - n_r}\$ (which is nothing but \$Pr(case_i)\$). Now finally to compute the expected score, we just need to list the different scores and their corresponding probabilities to plug into the expression in Step 2.
It works as expected. However, in the question we are required to find this value correct to 9 decimal places. My code doesn't exploit this. Also, it is incredibly slow.
Can you help me with math ideas or alternate algorithms that can help me leverage the accuracy requirement into speed? Is my code style consistent enough? Or does it cause much confusion?
-
1\$\begingroup\$ Hi, I just want to point out I think this is a great first question, welcome to Code Review :) \$\endgroup\$IEatBagels– IEatBagels2019年03月20日 13:31:19 +00:00Commented Mar 20, 2019 at 13:31
-
\$\begingroup\$ Thank you very much. Question credit goes to Hackerrank :) \$\endgroup\$Balakrishnan Rajan– Balakrishnan Rajan2019年03月20日 14:08:32 +00:00Commented Mar 20, 2019 at 14:08
-
1\$\begingroup\$ The question may come from HackerRank, but you've written your post in a great way, that's what I wanted to congratulate you about. There are explanations, code is there, everything's sharp \$\endgroup\$IEatBagels– IEatBagels2019年03月20日 14:09:36 +00:00Commented Mar 20, 2019 at 14:09
-
1\$\begingroup\$ As long as there are no answers to your post I believe it is fine to add things, just try not to add too much as it may break an answer someone might be writting at the moment for example \$\endgroup\$IEatBagels– IEatBagels2019年03月20日 18:19:03 +00:00Commented Mar 20, 2019 at 18:19
1 Answer 1
math
- ... unable to come up with an estimate of the probability of landing on a particular score
This is the max-3-cell problem in a two-dimensional setting. I wonder if we could make analytic progress against this by considering the max-2-cell problem in a one-dimensional setting. There must be some sort of symmetry we can appeal to. Considering the density on an infinite line or an infinite plane would let us ignore the complexities of boundary conditions, at least for a little while.
code
def __init__(self, Row, Coloumn):
PEP-8 asks that you not use caps for such variables.
Prefer snake_case names like u_neighbor
or u_connection
.
Run $ flake8
, and heed its advice, before sharing your code.
Also, typo: column
.
You made a modeling choice here, that the Von Neumann neighborhood should be represented as union of four distinct unrelated things. I don't agree with that perspective. I would prefer to see a vector of four similar things, where you iterate through this list:
[(0, -1),
(0, 1),
(-1, 0),
(1, 0)]
You wrote:
self.Name = 'CellR'+str(Row)+'C'+str(Coloumn)
If a modern python is available to you, it would be more natural to use an f-string:
self.name = f'CellR{row}C{column}'
Here is some tediously repetitive code to express the notion of adjacency:
if(i != 0):
self.Cells[i][j].UNeighbor = self.Cells[i - 1][j]
if(i != m-1):
self.Cells[i][j].DNeighbor = self.Cells[i + 1][j]
if(j != 0):
self.Cells[i][j].LNeighbor = self.Cells[i][j - 1]
if(j != n-1):
self.Cells[i][j].RNeighbor = self.Cells[i][j + 1]
Recommend using a single assignment that loops through the von_neumann
list of delta x's & y's above.
This is obscure:
for i in range(m):
for j in range(2 * n - 1):
ThisCell = self.Cells[i][math.floor(j / 2)]
It's not clear why there's no symmetry between horizontal and vertical. At a minimum it warrants a comment.
if (i != m - 1):
This is lovely C bounds checking, and you can certainly keep it.
But consider following the pythonic "ask forgiveness, not permission",
by using try
/ catch IndexError
to deal with the borders.
Same remark for CalcExp
.
Also, no need for (
extra parens )
in a python if
statement.
Couple of issues with this one:
if (ThisCell.LConnection & (ThisCell.LNeighbor != None)):
- No
(
extra parens)
, please. - Do not use bitwise
&
and on booleans. Use booleanand
instead. flake8
would have told you to useis
to test identity of theNone
singleton, rather than an equality operator.
So that would leave us with:
if this_cell.l_connection and this_cell.l_neighbor is not None:
or even more simply:
if this_cell.l_connection and this_cell.l_neighbor:
Similarly a subsequent line would be if not accounted:
More tedious statements follow, that could be a loop through von_neumann
.
In def connected_regions
, this takes too long:
Accounted = False
for Region in ListOfRegions:
if(ThisCell.Name in Region):
Accounted = True
Please use a generator expression:
accounted = any(this_cell.name in region
for region in list_of_regions)
Why? Because any()
will bail out early, upon encountering 1st True
.
Kudos for using set()
, so the in
test will be fast.
score = score + 1
In python we prefer to phrase that as score += 1
.
The name calc_exp
surprisingly turns out to relate to expectations,
rather than math.exp
or exponentials.
This is a bit cumbersome:
ListOfScores = []
ProbsOfScores = []
Better to store a single list of (score, prob) tuples.
It would simplify some manipulations farther down in the function,
including allowing you to finish with return sum( ... )
.
visualization
Consider writing code that depicts how a given configuration of matchsticks turned out, with markings of the 1-, 2-, & 3-cell regions. This would facilitate manual checking of your results for a couple of particular configurations.
In a similar vein, unit tests using very small number of matchsticks wouldn't hurt.
high level advice
Run flake8
.
Follow its instructions.
This will make your code more easily understood by people who interact with it.
Explore related questions
See similar questions with these tags.