The figure below is a standard GSEA “enrichment plot” showing the running sum statistic for some set / ranking over genes. We note that usually a false discovery rate is estimated when performing large numbers of these tests. We’ve skipped this procedure as we’re short on time, but this should be implemented when this technique is applied at scale.

6 Inch LED lightStrip

Simple connections. All Stage Series LED lighting products use a standard DT type Deutsch-style connector, and come with a 6-inch pigtail, with bare leads. You can wire yourself, use an off-the-shelf Deutsch wire harness, or select an optional Diode Dynamics wire harness.

8inch LED Light Bar

Here, we’re going to perform Token Set Enrichment Analysis, which you can think of as a kind of “reverse probing”. When probing, we train a classifier to distinguish data points according to some labels. Here, SAEs have already given us a number of classifiers (over tokens) and we wish to know which sets of tokens they mean to distinguish. The solution is to use a cheap statistical test which takes a hypothesis set of tokens, and checks whether they are elevated in the logit weight distributions.

Feature 4467. Above: Feature Dashboard Screenshot from Neuronpedia. It is not immediately obvious from the dashboard what this feature does. Below: Logit Weight distribution classified by whether the token starts with a space, clearly indicating that this feature promotes tokens which lack an initial space character.

I’d like to thank Johnny Lin for his work on Neuronpedia and ongoing collaboration which makes working with SAEs significantly more feasible (and enjoyable!).

Introducing Stage Series LED lighting! Inspired by the needs of a professional rally driver, these light bars have been designed for maximum functionality, with compact size, custom-engineered TIR optics, and a useful beam pattern, all in a highly durable package.

4inch Light Bar

6 Inch LED Light Barfor motorcycle

Many features in GPT2 small seem fairly sexist so it seemed like an interesting idea to use traditionally gendered names as enrichment sets in order to  find features which promote them jointly or exclusively. Luckily, there’s actually a python package which makes it easy to get the most American first names. We plot the enrichment scores for one set on the x-axis and the enrichment scores for another set on the y-axis to aid us in locating features

Below, we show a plot of skewness vs kurtosis in the logit weight distribution of each feature, coloring by the standard deviation. See the appendix for skewness / kurtosis boxplots for all layers and this link to download scatterplots for all layers.

Token Set Enrichment Analysis (TSEA) is a statistical test which maps each logit weight distribution, and a library of sets of tokens, to an “enrichment score” (which indicates how strong that feature seems to be promoting/suppressing each set of features).

White or Amber. All Stage Series light bars are available in a cool white or amber color. The white is a 6000K color temperature, which is a true white output, without any blue. The amber is a brilliant deep yellow color, perfect for hazards or high contrast in poor weather. The SS6 Stage Series 6 Amber Light Bar can be found here. Change it up. Each lightbar is based on 6-inch segments. Each segment is sealed with an individual lens and a double-wall seal. This makes them more weather tight than a large, single seal, and it also allows you to change up the outer lens to suit your needs. With just a few screws, you can replace the outer lens with a new one to change the beam pattern or color. Replacement lenses can be found here.

SAre you looking to feed your ST with some of the best oil on the market? Look no further then Motul! We have put together a bundle of oil for you ...

In previous work, we trained and open-sourced a set of sparse autoencoders (SAEs) on the residual stream of GPT2 small. In collaboration with Neuronpedia, we’ve produced feature dashboards, auto-interpretability explanations and interfaces for browsing for ~300k+ features. The analysis in this post is performed on features from the layer 8 residual stream of GPT2 small (for no particular reason).

To gain a sense for what kinds of features we show the logit weight distribution for feature 89 below, which was enriched for all caps tokens. We show a screenshot of it’s feature dashboard and a logit weight distribution grouped by the all_caps classification, which show us:

We can use the NLTK part-of-speech tagger to automatically generate sets of tokens which are interesting from an NLP perspective. In practice these sets are highly imperfect as the tagger was not designed to be used on individual words, let alone tokens. We can nevertheless get passable results.

I’d like to thank Neel Nanda and Arthur Conmy for their support and feedback while I’ve been working on this and other SAE related work.

We then use the above plot as a launching point for finding different kinds of features (analogous to types of neurons found by Gurnee et al).

Focused. The compact size doesn't mean they're short on power! Thanks to the high-intensity Lumileds LED chips, coupled with our U.S. Patent Pending total internal reflection (TIR) optic, the total intensity of the Stage Series is far higher than a standard, generic light bar using basic optics or reflectors. Standard optics lose light internally, and to glare, spreading a little bit of light in every direction. Instead, the TIR optic in the Stage Series collects all of the light from the LED, and directs it only where you need it, drastically reducing glare and improving total efficiency.

Brightest6 Inch Light Bar

As before, we see a token set effect, though after seeing this result I feel more confident that set size doesn’t explain the set effect.  Why do we not see features for base form verbs achieve higher enrichment scores than other (even smaller) verb sets? Possibly this is an artifact of tokenization in some way, though it's hard to say for sure. As before, let’s look at some examples to gain intuition.

Do you want to squeeze every last drop of performance out of your car and do so with added efficiency and safety? If you're serious about modifying...

To clarify if this has pointed to some interesting features, let’s look at a case study from the bottom right and the top left.

Not just off road. In six-inch size, when installed as a pair and aimed in accordance with your state's regulations, the Driving pattern meets SAE J581 as a Driving/Auxiliary High Beam Light, and the Wide pattern meets SAE J583 as a Fog lamp, for legal on-road use. Please check your local laws and regulations for aiming, installation, and applicability.

Proven Reliability. Stage Series lighting has been tested for long-term operation from -40 to 185 degrees F, vibration, moisture intrusion, and corrosion. All of these tests are completed to SAE/DOT standards, just like factory components. They've also been put to the test on-road, and are in-use by dozens of rally and endurance racing teams for nighttime stages and laps.

This is an informal post sharing statistical methods which can be used to quickly / cheaply better understand Sparse Autoencoder (SAE) features.

Below we show the manhattan plot of the enrichment scores of the top 5000 features by skewness and the following token set:

Moving on, feature 18006 appears to promote tokens labeled as past participles (nltk_pos_VBN) as well as past verbs (nltk_pos_VBD). This is actually somewhat expected once you realize that all of these tokens are verbs in the past tense (and that you can’t really distinguish the two out of context). Thus we see that our set enrichment results can be misleading if we aren’t keeping track of the relationship between our sets. To be clear, it is possible that a feature could promote one set and not the other, but to detect this we would need to track tokens which aren’t in the overlap (eg: “began” vs “ begun” or “saw” vs “seen”. I don’t pursue this further here but consider it a cautionary tale and evidence we should be careful about how we generate these token lists in the future.

Compact. The lightbar is based on a narrow-profile extruded housing, to allow fitment in small and tight areas. At less than 42mm tall, they'll fit in many places a traditional dual-row light bar won't, while cutting total weight as well. The heatsink design also drastically reduces wind noise, and does not whistle in the wind like many generic housings.

Give your Kia Forte GT some added "whoosh" with the Panda Motorworks BPV Delete Kit! This mod is cheap, easily done, and can be reverted back to st...

Gene Set Enrichment Analysis (GSEA) is a statistical method used to check if the genes within some set are elevated within some context. Biologists have compiled extensive sets of proteins associated with different biological phenomena which are often used as a reference point for various analyses. For example, the Gene Ontology Database contains hierarchical sets which group proteins by their structures, processes and functions. Other databases group proteins by their interactions or involvement in pathways (essentially circuits). Each of these databases support GSEA, which is routinely used to map between elevated levels of proteins in samples and broader knowledge about biology or disease. For example, researchers might find that the set of proteins associated with insulin signally are in particularly low abundance in patients with type 2 diabetes, indicating that insulin signaling may be related to diabetes.

Panda Motorworks E-tunes allow users with their own COBB Accessport to get professional tunes quickly and efficiently. After we gather the required...

To better understand these distributions, (eg: how many have thick tails or how many have lots of tokens shifted left or right), we can use three well known statistical measures:

Our largest enrichment score overall is feature 5382 for verbs in the gerund form (ending in ing). I don’t identify a more specific theme in the top 10 positive logits (verbs starting in “ing"), though maybe there is one, so it seems like the enrichment result is in agreement with the statistics. I’m disappointed with the NLTK tagger which said that tokens like “ Viking”, “Ring” and “String” were gerund form verbs (and these are the far left outliers where the feature does not promote those tokens.

4Inch LED Light Bar

We see that there is a fairly strong token set effect whereby some of the sets we tested achieved generally higher enrichment scores than others. If we wanted to use these results to automatically label features, we’d want to decide on some meaningful threshold here, but let’s first establish we’re measuring what we think we are.

Functional Patterns. All Stage Series beam pattern options were designed with optical simulation modeling, to shape the output in a highly-functional beam pattern. No matter what your auxiliary lighting needs are, there is a Stage Series optic for you!

Since SAEs haven’t been around for very long, we don’t yet know  what the logit weight distributions typically look like for SAE features. Moreover, we find that the form of logit weight distribution can vary greatly. In most cases we see a vaguely normal distribution and some outliers (which often make up an interpretable group of tokens boosted by the feature). However, in other cases we see significant left or right skew, or a second mode. The standard case has been described previously by Anthropic in the context of the Arabic feature they found here and is shown above for feature 6649.

This work was produced as part of the ML Alignment & Theory Scholars Program - Winter 2023-24 Cohort, with support from Neel Nanda and Arthur Conmy. Joseph Bloom is funded by the LTFF, Manifund Regranting Program, donors and LightSpeed Grants. This post makes extensive use of Neuronpedia, a platform for interpretability focusing on accelerating interpretability researchers working with SAEs.

I think both of these cases studies suggested we had found interesting features that were non-trivially related to boys/names and girls names, but clearly enrichment results can’t be taken at face value due to factors like overlapping sets and the fact we’re apply the logit lens in model with a tied embedding.

This is the ultimate maintenance package to get your Focus or Fiesta ST running to its highest potential. We recommend you change your transmission...

Given previous results, we are particularly interested in identifying the set of tokens which a particular feature promotes or suppresses. Luckily, the field of bioinformatics has been doing set enrichment tests for years and it’s a staple of some types of data analysis in systems biology. We provide some inspiration and technical detail in the appendix, but will otherwise provide only a cursory explanation of the technique.

Experience. After over a decade in business, Diode Dynamics is one of the most trusted names in automotive LED lighting. We directly manufacture and engineer our own products in the United States, allowing for higher quality and performance, with the newest and brightest LED technology. No matter what you're driving, we pride ourselves in offering only the best possible LED solutions.Features

SAEs might enable us to decompose model internals into interpretable components. Currently, we don’t have a good way to measure interpretability at scale, but we can generate feature dashboards which show things like how often the feature fires, its direct effect on tokens being sampled (the logit weight distribution) and when it fires (see examples of feature dashboards below). Interpreting the logit weight distribution in feature dashboards for multi-layer models is implicitly using Logit Lens, a very popular technique in mechanistic interpretability. Applying the logit lens to features means that we compute the product of a feature direction and the unembed (WuWdec[feature]), referred to as the “logit weight distribution”.

If we treat each of our feature logit distributions as a ranking over tokens, and then construct sets of interpretable tokens, we can calculate a running-sum statistic which quantifies the elevation of those tokens in each of the logit weight distributions for each set. The score is calculated by walking down the logit weight distribution, increasing a running-sum statistic when we encounter a token in the set, S, and decreasing it when we encounter a token not in S.

We note that statistics of the logit weight distribution of neurons have been previously studied in Universal Neurons in GPT2 Language models (Gurnee et al) where universal neurons (neurons firing on similar examples across different models) appeared likely to have elevated WU kurtosis. Neurons with high kurtosis and positive skew were referred to as “prediction neurons” whilst neurons with high kurtosis and negative skew were described as suppression neurons. Furthermore, partition neurons, which promoted a sizable proportion of tokens while suppressing the remaining tokens, were identifiable via high variance in logit weight distribution.

Below, we share some feature dashboard examples which have non-standard characteristics. We refer specifically to the red/blue histogram representing logit weight distribution of each feature, but share other dashboard components for completeness.

Looking at the feature activations on neuronpedia, it seems like the feature is loss reducing prior to tokens that are in all caps but not made only of two tokens, which supports the hypothesis suggested by the TSEA result.