I solved this problem: A. Ice Skating, Codeforces
My approach was to create grid from given snow drifts coordinates, then convert this grid into graph and then count how many disconnected components are there. The key observation to solve this problem was to see that any disconnected component can be connected with another by using one snow drift, so we also can assume we that we connect everything to the component that contains node 0.
#include <iostream>
#include <vector>
#include <list>
using namespace std;
void dfs(vector<list<int>> &adj_list, int node, vector<bool> &visited)
{
visited[node] = true;
for (const int ngbr : adj_list[node])
{
if (visited[ngbr])
continue;
dfs(adj_list, ngbr, visited);
}
}
int main()
{
ios_base::sync_with_stdio(false);
cin.tie(nullptr);
cout.tie(nullptr);
int n;
cin >> n;
vector<vector<int>> grid(1000, vector<int>(1000, -1));
int node = 0;
for (int i = 0; i < n; ++i)
{
int x, y;
cin >> x >> y;
--x;
--y;
grid[y][x] = node;
++node;
}
vector<list<int>> adj_list(n, list<int>());
for (int i = 0; i < 1000; ++i)
{
for (int j = 0; j < 1000; ++j)
{
if (grid[i][j] == -1)
continue;
for (int k = 0; k < 1000; ++k)
{
if (k != j && grid[i][k] != -1)
adj_list[grid[i][j]].push_back(grid[i][k]);
if (k != i && grid[k][j] != -1)
adj_list[grid[i][j]].push_back(grid[k][j]);
}
}
}
vector<bool> visited(n);
dfs(adj_list, 0, visited);
int ans = 0;
for (int i = 1; i < n; ++i)
{
if (!visited[i])
{
++ans;
dfs(adj_list, i, visited);
}
}
cout << ans << '\n';
}
All test cases passed, but the complexity of this, at least I think so but I'm not very convinced is O(10^6 + 10^5) (or maybe 10^8). That's because the most inner loop will be runned at most 100 times, because the condition will fail most of the time as we have N (<= 100) snow drifts.
It's not something very performant, so I wonder how could I improve time complexity of this?
1 Answer 1
meaningful identifier
dfs
is great, but main
is rather long, and generically named.
Consider having it delegate tasks to
appropriately named helper function(s), such as one that's responsible for reading in the problem setup.
magic numbers
for (int j = 0; j < 1000; ++j)
{
...
for (int k = 0; k < 1000; ++k)
Those come straight from the problem statement.
Still, they deserve a constexpr
or at least a #define
.
sparsity
We have a hundred drifts on a grid of a million cells. Which is to say it's a sparse setup, with .01% cell occupancy.
grid[y][x] = node;
That's a somewhat expensive assignment in terms of memory, and the time to scan a raster for node occupancy is especially expensive.
When thinking about this, you may as well assume
the world_size
parameter can be bigger than
\10ドル^3\$, and might be e.g. \10ドル^5\$ or other
arbitrarily large numbers.
Armed with a solution for such a world,
you can easily conquer the stated problem.
So rather than testing individual cell occupancy, we will want a pair of datastructures that can be queried by \$x\$ or \$y\$ coordinate. Sorting might prove useful, but for now a hash map should suffice. For a given coordinate it may need to produce the set of all matching drifts.
components
Trivially we could consider each drift a
connected component
of size \1ドル\$, and then greedily connect them.
But given the cost of one drift per connection,
would we be better off with first producing connected components,
labeling the biggest giant
,
and then connecting components to that, in descending size order?
Let's see.
Would it make a difference? Well, consider a world of four drifts containing a N-S component and a disconnected E-W component. Clearly we should add a drift. But notice that it doesn't matter where exactly we put it, since only connectivity matters. To minimize travel times we would place it in the middle. But since that's not part of our cost function, any of the other four (corner) locations would do just as well.
Now imagine that we had a pair of "L"-shaped disconnected components, each of size \3ドル\$. Does it matter where we place that one additional drift? No. It would only make a difference if placing a single drift could connect three disconnected components, something that's not possible in 2-space. So we can rest easy that the details of new drift coordinates don't really matter -- we only care that the drift we've placed induces a new edge in the connectivity graph.
algorithm
- Arbitrarily put the first drift in the "Giant"
connected
component. - Pick an arbitrary disconnected (unclassified) node. Assess whether it's already connected to Giant and skip if it is. (It might already be connected to some other component -- we don't care.) Use its \$x\$ coordinate to connect it to Giant via a new drift, arbitrarily ignoring \$y\$.
- Lather, rinse, return to step 2.
Hmmm, turns out a hash map suffices for this. And actually, we don't even need to query it for all matching drifts. It's enough to always use the \$y\$ coordinate of Giant's first node when constructing a new drift.
A hash map is going to be the winning datastructure here, for memory cost proportional to number of drifts. But interestingly, a still fairly modest allocation of \2ドル \times 10^3\$ elements lets us efficiently solve the problem using just array references, without hashing, perhaps in a more cache-friendly and performant manner. Each coordinate needs just 125 bytes, a pair of cache lines, for a bit vector. Or keep it simple and use byte vectors.