Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

github.com/arl/sockdrawer

Package Overview
Dependencies
Alerts
File Explorer
Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

github.com/arl/sockdrawer

  • v0.0.0-20220716234242-7294b911f995
  • Source
  • Go
  • Socket score

Version published
Created
Source

sockdrawer

The sockdrawer command is an analysis and visualization tool to help you reorganize a complex Go package into several simpler ones.

Install with:

go install github.com/arl/sockdrawer@latest

Disclaimer

The code is in this repository is based on the sockdrawer program, an original work by Alan Donovan, which is accessible at https://codereview.appspot.com/186270043/.

As such it's covered by the original license and copyrighted to the Go Authors.

There are some rough edges, mostly due to the fact that sockdrawer has been developed way before the apparition of modules.

The only differences with the original repository have been made so the project can build.

No other modificaitons are planned for now.

Overview

sockdrawer operates on three kinds of graphs at different levels of abstraction. The lowest level is the NODE GRAPH. A node is a package-level declaration of a named entity (func, var, const or type).

An entire constant declaration is treated as a single node, even if it contains multiple "specs" each defining multiple names, since constants so grouped are typically closely related; an important special case is an enumerated set data type. Also, we treat each "spec" of a var or type declaration as a single node.

func f()                        // a func node
const ( a, b = 0, 1; c = 0 )    // a single const node
var (
    a, b = 0, 1                 // a single var node
    c = 0                       // another var node
)
type ( x int; y int )           // a single type node

Each reference to a package-level entity E forms an edge in the node graph, from the node in which it appears to the node E. For example:

var x int
var y = x                   // edge y -> x
func f() int { return y }   // edge f -> y

Each method declaration depends on its receiver named type; in addition we add an edge from each receiver type to its methods:

type T int      // edge T -> T.f
func (T) f()    // edge T.f -> T

to ensure that a type and its methods stay together.

The node graph is highly cyclic, and obviously all nodes in a cycle must belong to the same package for the package import graph to remain acyclic.

So, we compute the second graph, the SCNODE GRAPH. In essence, the scnode graph is the graph of strongly connected components (SCCs) of the (ordinary) node graph. By construction, the scnode graph is acyclic.

We optionally perform an optimization at this point, which is to fuse single-predecessor scnodes with their sole predecessor, as this tends to reduce clutter in big graphs. This means that the scnodes are no longer true SCCs; however, the scnode graph remains acyclic.

We define a valid PARTITION P of the scnode graph as a mapping from scnodes to CLUSTERS such that the projection of the scnode graph using mapping P is an acyclic graph. This third graph is the CLUSTER GRAPH.

Every partition represents a valid refactoring of the original package into hypothetical subpackages, each cluster being a subpackage. Two partitions define the extreme ends of a spectrum: the MINIMAL partition maps every scnode to a single cluster; it represents the status quo, a monolithic package. The MAXIMAL partition maps each scnode to a unique cluster; this breaks the package up into an impractically large number of small fragments. The ideal partition lies somewhere in between.

Clusters file

The --clusters=<file> argument specifies a CLUSTERS FILE that constrains the partition algorithm. The file consists of a number of stanzas, each assigning an import path to a cluster ("mypkg/internal/util") and assigning a set of initial nodes ({x, y, z}) to it:

= mypkg/internal/util
x
y  # this is a comment
z

Order of stanzas is important: clusters must be be declared bottom to top. After each stanza, all nodes transitively reachable (via the node graph) from that cluster are assigned to that cluster, if they have not yet been assigned to some other cluster. Thus we need only mention the root nodes of the cluster, not all its internal nodes. A warning is reported if a node mentioned in a stanza already belongs to a previously defined cluster.

There is an implicit cluster, "residue", that holds all remaining nodes after the clusters defined by the file have been processed. Initially, when the clusters file is empty, the residue cluster contains the entire package. (It is logically at the top.) The task for the user is to iteratively define new clusters until the residue becomes empty.

Visualization

When sockdrawer is run, it analyzes the source package, builds the node graph and the scgraph, loads the clusters file, computes the clusters for every node, and then emits SVG renderings of the three levels of graphs, with nodes colors coded as follows:

green = cluster  (candidate subpackage)
pink  = scnode   (strong component of size > 1)
blue  = node     (func/type/var/const decl)

The graphs of all clusters, a DAG, has green nodes; clicking one takes you to the graph over scnodes for that cluster, also a DAG. Each pink node in this graph represents a cyclical bunch of the node graph, collapsed together for ease of viewing. Each blue node here represents a singleton SCC, a single declaration; singular SCCs are replaced by their sole element for simplicity.

Clicking a pink (plural) scnode shows the cyclical portion of the node graph that it represents. (If the fusion optimization was enabled, it may not be fully cyclic.) All of its nodes are blue.

Clicking a blue node shows the definition of that node in godoc. (The godoc server's base URL is specified by the --godoc flag.)

Workflow

Initially, all nodes belong to the "residue" cluster. (GraphViz graph rendering can be slow for the first several iterations. A large monitor is essential.)

The sockdrawer user's task when decomposing a package into clusters is to identify the lowest-hanging fruit (so to speak) in the residue cluster — a coherent group of related scnodes at the bottom of the graph — and to "snip off" a bunch at the "stem" by appending a new stanza to the clusters file and listing the roots of that bunch in the stanza, and then to re-run the tool.

Nodes may be added to an existing stanza if appropriate, but if they are added to a cluster that is "too low", this may create conflicts; keep an eye out for warnings.

This process continues iteratively until the residue has become empty and the sets of clusters are satisfactory.

The tool prints the assignments of nodes to clusters: the "shopping list" for the refactoring work. Clusters should be split off into subpackages in dependency order, lowest first.

Caveats

The analysis chooses a single configuration, such as linux/amd64. Declarations for other configurations (e.g. windows/arm) will be absent from the node graph.

There may be some excessively large SCCs in the node graph that reflect a circularity in the design. For the purposes of analysis, you can break them arbitrarily by commenting out some code, though more thought will be required for a principled fix (e.g. dependency injection).

FAQs

Package last updated on 16 Jul 2022

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc