Go 1.23: The New, Unique Package That Comes With It

The standard Go 1.23 standard library now includes the new unique
Pack. The goal behind this package is to allow the canonization of comparable values. In other words, this package allows you to deduce values so that they point to a single canonical and unique copy, while effectively managing canonical copies under the hood. You may already know this concept, called “interinous”. Let's dive to see how it works and why it's useful.
A simple internship implementation
At a high level, the internship is very simple. Take the code sample below, which deduces the channels using just a regular card.
var internPool map[string]string
// Intern returns a string that is equal to s but that may share storage with
// a string previously passed to Intern.
func Intern(s string) string {
pooled, ok := internPool[s]
if !ok {
// Clone the string in case it's part of some much bigger string.
// This should be rare, if interning is being used well.
pooled = strings.Clone(s)
internPool[pooled] = pooled
}
return pooled
}
This is useful when you build many channels likely to be duplicates, as when analyzing a text format.
This implementation is super simple and works well enough for some cases, but it has some problems:
- He never removes the ropes from the pool.
- It cannot be used safely by several goroutines simultaneously.
- This only works with strings, even if the idea is quite general.
There is also a missed opportunity in this implementation, and it is subtle. Under the hood, The strings are immutable structures composed of a pointer and a length. When you compare two channels, if the pointers are not equal, we must compare their content to determine equality. But if we know that two strings are canonical East Enough to simply check their pointers.
Enter unique
pack
The new unique
The package has a function similar to Intern
called Make
.
It works roughly the same way as Intern
. Internally, there is also a global card (a rapid generic simultaneous card) and Make
Search for the value provided in this card. But it also differs from Intern
in two important ways. First, he accepts the values of any comparable type. And secondly, it returns a value of Wrapper, a Handle[T]
From which the canonical value can be recovered.
This Handle[T]
is the key to design. A Handle[T]
to the property that two Handle[T]
The values are equal if and only if the values used to create them are equal. In addition, the comparison of two Handle[T]
The values are cheap: it comes down to a pointer comparison. Compared to the comparison of two long strings, it is a cheaper order of magnitude!
So far, it is nothing that you cannot do in the ordinary appointment code.
But Handle[T]
also has a second objective: as long as a Handle[T]
Exists for a value, the card will keep the canonical copy of the value. Once all Handle[T]
The values that map for a specific value have disappeared, the package marks this internal card entry as deleted, to be recovered in the near future. This defines a clear strategy to remove the card entries: when the canonical entries are no longer used, the garbage collector is free to clean them.
If you've already used Lisp, this may seem quite familiar to you. Lisp symbols are interned channels, but not the strings themselves, and all the chain values of all the symbols are guaranteed to be in the same pool. This relationship between symbols and strings is parallel to the relationship between Handle[string]
And string
.
An example of the real world
So how could we use unique.Make
? Does not seek further than the net/netip
package in the standard library, which internal values of the type addrDetail
part of netip.Addr
structure.
Below you will find an abbreviated version of the real code of net/netip
useful unique
.
// Addr represents an IPv4 or IPv6 address (with or without a scoped
// addressing zone), similar to net.IP or net.IPAddr.
type Addr struct {
// Other irrelevant unexported fields...
// Details about the address, wrapped up together and canonicalized.
z unique.Handle[addrDetail]
}
// addrDetail indicates whether the address is IPv4 or IPv6, and if IPv6,
// specifies the zone name for the address.
type addrDetail struct {
isV6 bool // IPv4 is false, IPv6 is true.
zoneV6 string // May be != "" if IsV6 is true.
}
var z6noz = unique.Make(addrDetail{isV6: true})
// WithZone returns an IP that's the same as ip but with the provided
// zone. If zone is empty, the zone is removed. If ip is an IPv4
// address, WithZone is a no-op and returns ip unchanged.
func (ip Addr) WithZone(zone string) Addr {
if !ip.Is6() {
return ip
}
if zone == "" {
ip.z = z6noz
return ip
}
ip.z = unique.Make(addrDetail{isV6: true, zoneV6: zone})
return ip
}
Since many IP addresses are likely to use the same area and that this area is part of their identity, it is very logical to canonicalize them. The deduplication of zones reduces the imprint of the average memory of each netip.Addr
While the fact that they are canonized means netip.Addr
The values are more effective to compare, because the comparison of zone names becomes a simple pointer comparison.
While the unique
The package is useful, Make
is certainly not quite like Intern
for chains, since the Handle[T]
is necessary to prevent a chain from deleting from the internal card. This means that you need to change your code to keep the handles as well as the chains.
But the strings are special in that, although they behave as values, they actually contain pointers under the hood, as we mentioned earlier. This means that we could potentially canonicalize only the underlying storage of the chain, hiding the details of a Handle[T]
inside the chain itself. So there is still a place in the future for what I will call Transparent String Interinardin which the chains can be interned without the Handle[T]
type, similar to Intern
function but with semantics more closely resembling Make
.
Waiting for, unique.Make("my string").Value()
is a possible bypass solution. Even if the defect in keeping the handle will allow the chain to be deleted from unique
The internal card of the card, the card inputs are not immediately deleted. In practice, the entries will only be deleted at least the next collection of waste ends, so this bypass solution always allows a certain degree of deduction in the periods between the collections.
A story and a look to the future
The truth is that the net/netip
The package has actually interned area chains since its introduction. The endless package he used was an internal copy of the Go4.org/intern package. Like the unique
package, he has a Value
Type (which looks a lot like a Handle[T]
pre-generics), has the notable property that the inputs of the internal card are deleted once their handles are no longer referenced.
But to achieve this behavior, he must do dangerous things. In particular, he makes some assumptions on the behavior of the garbage collector to implement weak Apart from the execution. A weak pointer is a pointer that does not prevent the garbage collector from recovering a variable; When this happens, the pointer automatically becomes zero. In this case, weak pointers are Also central abstraction underlying unique
Pack.
It is true: while implementing the unique
Package, we added a good low pointer support to the garbage collector. And after having crossed the mines field of regrettable design decisions that accompany the weak pointers (as, should the weak pointers follow the resurrection of objects? No!), We were surprised by the way it turned out. Sufficiently surprised that weak pointers are now a public proposal.
This work has also led us to re -examine the finalizers, which led to another proposal for an easier to use Replacement of finalizers. With A hash function for comparable values Also on the way, the future of Elocol caches in memory In Go is shiny!
Credits: Michael Knyszek
Photo of Mildlee on UNCLASH
This article is available on