#!/usr/bin/python # -*- coding: utf-8 -*- """I’m facing the following situation in a sort of data prevalence system. I have the entire history of deltas, but I may be faced with arbitrary sequences of undos followed by adding new deltas. Recomputing the state after an undo requires finding the latest snapshotted or checkpointed state before that undo, then replaying the deltas forward from that snapshot (possibly creating more snapshots in the process). Ideally, I’d like a guarantee that no sequence of undos and adding deltas ever departs from a regime of amortized constant time per (undo or delta-adding) operation, in constant space. This is probably not quite possible; I think I can get to amortized logarithmic time in logarithmic space. I think I may have an algorithm that guarantees amortized logarithmic time in logarithmic space, but I don’t have proof. The algorithm is: maintain a set of O(log t) snapshots when you’re at delta-history-length t. When adding another snapshot, first delete the lowest-value snapshot, where ‘value’ is the ratio of how much work that snapshot will save you if you have to use it (b) to the number of undos (d) you’ll have to go through before you get to it. The motivating intuition here is that if you start at t=256 and undo all the way back down to 0, you’d be in good shape if you had snapshots at 128, 192, 224, 240, 248, 252, 254, and 255. The first undo puts you at 255 and requires zero replays; the second at 254 and requires zero replays; the third at 253 requiring one replay; the fourth at 252 requiring 0; the fifth at 251 requiring three replays, creating new snapshots at 249 and 250; and so on. On the 65th undo you reach 191, needing to replay 63 deltas starting from 128. I think this works out to a logarithmic (rather than constant) number of replays per undo, and in any case you’re managing a logarithmic number of snapshots. The key thing here is that after doing the Nth undo, you never have to do more than N replays. Or something like that. The greedy algorithm described above seems to do a reasonable job of approximating such an exponentially-spaced set of checkpoints, in the simple test given here. It’s possible to implement this incremental greedy algorithm considerably more efficiently than it’s done here; removing a snapshot only changes the values of the adjacent snapshots, so it’s not necessary to recompute all the values every time. """ import Numeric def tocull(snapshots, present): s = Numeric.asarray(snapshots) b = s - Numeric.concatenate(([0], s[:-1])) d = (present - Numeric.concatenate((s[1:], [present]))) + 1 v = b.astype(Numeric.Float64) / d return min(range(len(v)), key=lambda i: v[i]) def tocull_potato(snapshots, present): "A reimplementation in potato programming." s = [0] + snapshots + [present] worst_v = infinity = 1e309 assert infinity == infinity * 2 for ii in range(1, len(s)-1): v = float(s[ii] - s[ii-1]) / (present - s[ii+1] + 1) if v < worst_v: worst_v = v worst_index = ii-1 return worst_index def cull(snapshots, present): assert tocull_potato(snapshots, present) == tocull(snapshots, present) snapshots.pop(tocull(snapshots, present)) def cullto(snapshots, present, max): while len(snapshots)> max: cull(snapshots, present) ns = [90, 50, 40, 20, 10, 6, 4, 3, 2, 1] snapshots = range(100) for n in ns: cullto(snapshots, 100, n) print snapshots # incremental approach: def incremental(n, max): snapshots = [] for present in range(max): if snapshots: cullto(snapshots, present, n-1) snapshots.append(present) return snapshots for n in ns: print incremental(n, 100) print incremental(7, 400) for n in [8, 9, 10, 11, 12, 13]: for max in [900, 1000, 1100, 1200, 1300]: print incremental(n, max)