| LEFT | RIGHT |
| (no file at all) |
| 1 /* |
| 2 The sockdrawer command is an analysis and visualization tool to help |
| 3 you reorganize a complex Go package into several simpler ones. |
| 4 |
| 5 Overview |
| 6 |
| 7 sockdrawer operates on three kinds of graphs at different levels of |
| 8 abstraction. The lowest level is the NODE GRAPH. A node is a |
| 9 package-level declaration of a named entity (func, var, const or type). |
| 10 |
| 11 An entire constant declaration is treated as a single node, even if it |
| 12 contains multiple "specs" each defining multiple names, since constants |
| 13 so grouped are typically closely related; an important special case is |
| 14 an enumerated set data type. Also, we treat each "spec" of a var or |
| 15 type declaration as a single node. |
| 16 |
| 17 func f() // a func node |
| 18 const ( a, b = 0, 1; c = 0 ) // a single const node |
| 19 var ( |
| 20 a, b = 0, 1 // a single var node |
| 21 c = 0 // another var node |
| 22 ) |
| 23 type ( x int; y int ) // a single type node |
| 24 |
| 25 Each reference to a package-level entity E forms an edge in the node |
| 26 graph, from the node in which it appears to the node E. For example: |
| 27 |
| 28 var x int |
| 29 var y = x // edge y -> x |
| 30 func f() int { return y } // edge f -> y |
| 31 |
| 32 Each method declaration depends on its receiver named type; in addition |
| 33 we add an edge from each receiver type to its methods: |
| 34 |
| 35 type T int // edge T -> T.f |
| 36 func (T) f() // edge T.f -> T |
| 37 |
| 38 to ensure that a type and its methods stay together. |
| 39 |
| 40 The node graph is highly cyclic, and obviously all nodes in a cycle must |
| 41 belong to the same package for the package import graph to remain |
| 42 acyclic. |
| 43 |
| 44 So, we compute the second graph, the SCNODE GRAPH. In essence, the |
| 45 scnode graph is the graph of strongly connected components (SCCs) of the |
| 46 (ordinary) node graph. By construction, the scnode graph is acyclic. |
| 47 |
| 48 We optionally perform an optimization at this point, which is to fuse |
| 49 single-predecessor scnodes with their sole predecessor, as this tends to |
| 50 reduce clutter in big graphs. This means that the scnodes are no longer |
| 51 true SCCs; however, the scnode graph remains acyclic. |
| 52 |
| 53 We define a valid PARTITION P of the scnode graph as a mapping from |
| 54 scnodes to CLUSTERS such that the projection of the scnode graph using |
| 55 mapping P is an acyclic graph. This third graph is the CLUSTER GRAPH. |
| 56 |
| 57 Every partition represents a valid refactoring of the original package |
| 58 into hypothetical subpackages, each cluster being a subpackage. Two |
| 59 partitions define the extreme ends of a spectrum: the MINIMAL partition |
| 60 maps every scnode to a single cluster; it represents the status quo, a |
| 61 monolithic package. The MAXIMAL partition maps each scnode to a unique |
| 62 cluster; this breaks the package up into an impractically large number |
| 63 of small fragments. The ideal partition lies somewhere in between. |
| 64 |
| 65 |
| 66 Clusters file |
| 67 |
| 68 The --clusters=<file> argument specifies a CLUSTERS FILE that constrains |
| 69 the partition algorithm. The file consists of a number of stanzas, each |
| 70 assigning an import path to a cluster ("mypkg/internal/util") and |
| 71 assigning a set of initial nodes ({x, y, z}) to it: |
| 72 |
| 73 = mypkg/internal/util |
| 74 x |
| 75 y # this is a comment |
| 76 z |
| 77 |
| 78 Order of stanzas is important: clusters must be be declared bottom to |
| 79 top. After each stanza, all nodes transitively reachable (via the node |
| 80 graph) from that cluster are assigned to that cluster, if they have not |
| 81 yet been assigned to some other cluster. Thus we need only mention the |
| 82 root nodes of the cluster, not all its internal nodes. A warning is |
| 83 reported if a node mentioned in a stanza already belongs to a previously |
| 84 defined cluster. |
| 85 |
| 86 There is an implicit cluster, "residue", that holds all remaining nodes |
| 87 after the clusters defined by the file have been processed. Initially, |
| 88 when the clusters file is empty, the residue cluster contains the entire |
| 89 package. (It is logically at the top.) The task for the user is to |
| 90 iteratively define new clusters until the residue becomes empty. |
| 91 |
| 92 |
| 93 Visualization |
| 94 |
| 95 When sockdrawer is run, it analyzes the source package, builds the node |
| 96 graph and the scgraph, loads the clusters file, computes the clusters for |
| 97 every node, and then emits SVG renderings of the three levels of graphs, |
| 98 with nodes colors coded as follows: |
| 99 |
| 100 green = cluster (candidate subpackage) |
| 101 pink = scnode (strong component of size > 1) |
| 102 blue = node (func/type/var/const decl) |
| 103 |
| 104 The graphs of all clusters, a DAG, has green nodes; clicking one takes |
| 105 you to the graph over scnodes for that cluster, also a DAG. Each pink |
| 106 node in this graph represents a cyclical bunch of the node graph, |
| 107 collapsed together for ease of viewing. Each blue node here represents a |
| 108 singleton SCC, a single declaration; singular SCCs are replaced by |
| 109 their sole element for simplicity. |
| 110 |
| 111 Clicking a pink (plural) scnode shows the cyclical portion of the node |
| 112 graph that it represents. (If the fusion optimization was enabled, it |
| 113 may not be fully cyclic.) All of its nodes are blue. |
| 114 |
| 115 Clicking a blue node shows the definition of that node in godoc. |
| 116 (The godoc server's base URL is specified by the --godoc flag.) |
| 117 |
| 118 |
| 119 Workflow |
| 120 |
| 121 Initially, all nodes belong to the "residue" cluster. (GraphViz graph |
| 122 rendering can be slow for the first several iterations. A large monitor |
| 123 is essential.) |
| 124 |
| 125 The sockdrawer user's task when decomposing a package into clusters is |
| 126 to identify the lowest-hanging fruit (so to speak) in the residue |
| 127 cluster---a coherent group of related scnodes at the bottom of the |
| 128 graph---and to "snip off" a bunch at the "stem" by appending a new |
| 129 stanza to the clusters file and listing the roots of that bunch in the |
| 130 stanza, and then to re-run the tool. |
| 131 |
| 132 |
| 133 Nodes may be added to an existing stanza if appropriate, but if they are |
| 134 added to a cluster that is "too low", this may create conflicts; keep an |
| 135 eye out for warnings. |
| 136 |
| 137 This process continues iteratively until the residue has become empty |
| 138 and the sets of clusters are satisfactory. |
| 139 |
| 140 The tool prints the assignments of nodes to clusters: the "shopping |
| 141 list" for the refactoring work. Clusters should be split off into |
| 142 subpackages in dependency order, lowest first. |
| 143 |
| 144 |
| 145 Caveats |
| 146 |
| 147 The analysis chooses a single configuration, such as linux/amd64. |
| 148 Declarations for other configurations (e.g. windows/arm) will be absent |
| 149 from the node graph. |
| 150 |
| 151 There may be some excessively large SCCs in the node graph that reflect |
| 152 a circularity in the design. For the purposes of analysis, you can |
| 153 break them arbitrarily by commenting out some code, though more thought |
| 154 will be required for a principled fix (e.g. dependency injection). |
| 155 |
| 156 |
| 157 TODO |
| 158 |
| 159 - Document the refactoring. |
| 160 - Make pretty and stable names for anonymous nodes such as: |
| 161 func init() {...} |
| 162 var _ int = ... |
| 163 Currently their names are very sensitive to lexical perturbations. |
| 164 - Infer more constraints from co-located declarations. Most of the stuff |
| 165 in the runtime's residue could be disposed of this way. |
| 166 - Analyze the package's *_test.go files too. If they define an external |
| 167 test package, we'll have to deal with two packages at once. |
| 168 - Write tests. |
| 169 |
| 170 */ |
| 171 package main |
| LEFT | RIGHT |