Link to this section Background
You might've noticed the tiny buttons/badges at the footer of this site and others, each showing a pixel-art design and linking to another website. These are called 88x31 buttons, and they're a de-facto standard of the indie web. Sources on this are few and far between, but The Neonauticon identified the start to be Netscape, which published "Netscape Now" buttons that sites could add to show their use for then-new web features.
People took this idea and ran with it, placing 88x31 buttons for themselves, their friends, and projects they supported in their websites, forum signatures, etc. Similar to webrings, 88x31 buttons provided a fun way for people to find related content to a given page.
My friends and I are working on mapping the entire 88x31 graph. The result of our work is the first ever snapshot of the entire 88x31 network, available in a single JSON file.
Relatedly, a few semesters ago, I took a course on network science. One of the main takeaways was that lots of real-world networks exhibit something called the small-world property, specifically:
- High clustering (you are more likely to be friends with your friends' friends)
- Low distance (the typical distance between two randomly-chosen nodes stays small as the graph grows)
We see this phenomenon across all sorts of network types: the original 1998 Watts & Strogatz paper uses networks of film actors, the electrical grid, and the neural network of a worm.
So: we've got a brand new dataset and some criteria to test. Does the 88x31 graph exhibit the small-worlds property?
Link to this section Analysis
To analyze the behavior of the graph as it expands, I'll be using two datasets: one with 4428 nodes that we obtained partway through the scraping work, and the most recent dataset with 16023 nodes. Analysis was done using Gephi.
Link to this section Clustering
Clustering is the measure of how locally-connected a graph is. Think about a social graph: if everyone had a set of, say, 20 friends randomly chosen from a group of 1 million, you'd have quite low clustering. However, typically, you're much more likely to be friends with your friends' friends, implying high clustering. Clustering is quantified by the clustering coefficient.
The formal definition of a graph is a set of vertices (nodes) and a set of edges (links) , such that:
We write that edge connects vertex with vertex .
To define clustering, it's useful to define the neighborhood of a vertex as the set of all of its connected neighbors, considering both links "in" and links "out":
Let's go a step further and define the degree of a vertex as the number of connected neighbors it has, i.e.,
where we use the vertical bars to mean the cardinality (number of items contained within) the set of neighbors.
How many links could exist within the neighborhood of a node? Let's ignore self-loops, so the number of possible links is -- each node can connect to each of the other nodes.
The number of links actually in the neighborhood can be written as:
And thus, the clustering coefficient for a given node is:
We can use software to compute the clustering for each node, and take the average across all of them. The average clustering coefficient is:
Nodes | Clustering Coefficient |
---|---|
4428 | 0.122 |
16023 | 0.123 |
Is this value high enough to indicate high clustering? For comparison, let's construct two random graphs with the same number of nodes and edges as each of our sample datasets: one with 4428 nodes and 17254 edges, and one with 16023 nodes and 57202 edges. Gephi supports the ErdΕsβRΓ©nyi model, which generates a graph based on the number of nodes and the probability of an edge between any two given nodes. Based on the real graph, we can determine the probability numbers as:
I needed to multiply these by 2 to get them to load into Gephi properly; perhaps there is some correction going on for undirected vs directed graphs.
Finally, we can get the average clustering coefficient for each of these graphs:
Nodes | Clustering (88x31) | Clustering (Random) |
---|---|---|
4428 | 0.122 | 0.001 |
16023 | 0.123 | 0.000 |
Note that the precision of these values is limited by the output of Gephi to 3 decimal places -- those values really are that low!
Our random graph has low (sometimes called vanishing) clustering as the number of nodes increases. This is the typical behavior for a random graph to have -- as graphs grow, the likelihood of a given node being connected to another node decreases, regardless if those nodes are linked by an intermediate node.
In contrast, our 88x31 graph has high (or nonvanishing) clustering even as the number of nodes increases. This is a notable property in and of itself, and brings us halfway to demonstrating the small-worlds property!
Link to this section Distance
The average path length of a graph is the average of the length of the shortest path between each pair of nodes, ignoring node pairs with no connecting path.
We can compute the average path length of our graph to be:
Nodes () | Avg. Path Length | |
---|---|---|
4428 | 8.396 | 7.182 |
16023 | 9.681 | 11.269 |
For the network to exhibit the small-worlds property, we need the average path length to be proportional to the logarithm of the number of nodes -- that's why I've added it to the table above.
Between graphs, the average path length increased by a factor of , while the number of nodes increased by a factor of (and its logarithm increased by ). This isn't exactly on the expected value for (where means "proportional to"), but it's also quite clearly less than the amount that increased, meaning . Personally, I'm comfortable calling this bound satisfied.
A few reasons come to mind as to why we might not be getting the expected value here:
- is small enough that this is just noise in the dataset. 16K nodes isn't that big.
- The topology of the two 88x31 graphs is different. The first graph consists of the nodes discovered off of notnite.com in a few hours, while the second graph is the entire network as identified by our scrapers. The first graph is going to skew towards the English-speaking, indie web, and LGBT communities, while the second graph includes more Italian and Brazilian communities, tabletop RPG forums, and other groups which are broadly separate from the cluster in which the scraper started. Qualitatively, these other groups seem less interconnected compared to the initial dataset.
- The second graph has a slightly different scraper implementation which analyses the HTML statically instead of using a webdriver to load the page, potentially missing more links on sites which use client-side rendering techniques.
Link to this section Hubs
Real-world graphs often have hubs. In a transportation network, hubs are points with connections to lots of other places; in the Internet, hubs are major ASNs like Hurricane Electric; in a social graph, hubs are just people who know a lot of other people.
We can see if our 88x31 graph exhibits this same behavior by looking at the degree distribution of the graph. For this, I generated the degree of each node in the full 88x31 graph in Gephi, then exported the table into Python to take a histogram and then into Google Sheets to make charts.
Above is a log-log plot of the degree distribution of the full 88x31 graph. Let's walk through it, since it's not the easiest thing to digest.
Each x-value represents a degree which a node could have, and the corresponding y-value is the number of nodes with that degree. Per the chart, there are about 10,000 nodes with degree 1, but closer to 500 with degree 10, and only a handful with degree 100.
The "clumping" taking place at the bottom is since we're working with a discrete number of nodes. For each of the higher degrees, we'll probably only see only 1 or 2 nodes with that degree, and whether it's 1 node or 2 nodes is up to random chance. Since there's nothing in between, we see those two lines at the bottom of the graph, and we can notice quite a bit more noise there as well. Similarly, towards the top-left of the graph, we're running into quantization there as the degree of a node must be a whole number.
So what are we looking for here? For reasons which will become clear, let's bring back our random graph from earlier and run the same procedure on it:
That looks very different -- sort of like a bell curve around some single most-popular degree value, giving the network a sense of "scale" or "size." The 88x31 graph, on the other hand, is scale-free in that hubs in the network get larger as the network grows.
Scale-free networks are more rigorously defined as a network in which the degree distribution follows a power law. That blue line in the graph is the power law function that best fits the data. Especially compared to the random graph, it's pretty clear that our 88x31 graph meets the definition of scale-free.
What are these hubs in our network? In our case, they're mostly sites which try to collect as many 88x31 buttons as possible into a single site, like the neonaut collection (which currently holds the #1 spot).
Link to this section Conclusions
In this post, we looked at an entirely new network dataset through the lens of network science, discovering that it fits the same criteria that characterize social, communication, transport, and biological networks in our world.
Network science is a field that interests me greatly, but that doesn't often come up much in my work, so I'm grateful for the chance to work on this project. A lot of key insights about real-world networks from the field come up in unexpected ways.
We'll continue to track the 88x31 network over the forseeable future. Maybe we'll discover an unmapped part of the network and end up with even more data to sort through. If you'd like to follow the technical effort, feel free to watch or engage via the project's GitHub repository.
Of course, you can join the network as well, and it's as straightforward as it sounds: find a friend with 88x31s on their website, make up your own little 88 pixel by 31 pixel image in your favorite bitmap image editor, and get them to add it to their site along with a link to your website! A few tips:
- GIMP is a great editor to get started with -- just crank the zoom up to 800% or so depending on your screen size.
- Save as a PNG (or GIF for animated 88x31s) to prevent lossy compression from messing with your pixel art.
- When adding a button to your website, use the CSS
image-rendering: pixelated
property to keep your pixel-art goodness from being blurred on high-DPI displays or when zoomed in. - Look for inspiration in the buttons made by your friends, and don't be afraid to experiment!
Hope to see you on eightyeightthirty.one soon!