Bitcoin

Go 1.23: The New, Unique Package That Comes With It

The standard Go 1.23 standard library now includes the new unique Pack. The goal behind this package is to allow the canonization of comparable values. In other words, this package allows you to deduce values ​​so that they point to a single canonical and unique copy, while effectively managing canonical copies under the hood. You may already know this concept, called “interinous”. Let's dive to see how it works and why it's useful.

A simple internship implementation

At a high level, the internship is very simple. Take the code sample below, which deduces the channels using just a regular card.

var internPool map[string]string

// Intern returns a string that is equal to s but that may share storage with
// a string previously passed to Intern.
func Intern(s string) string {
    pooled, ok := internPool[s]
    if !ok {
        // Clone the string in case it's part of some much bigger string.
        // This should be rare, if interning is being used well.
        pooled = strings.Clone(s)
        internPool[pooled] = pooled
    }
    return pooled
}

This is useful when you build many channels likely to be duplicates, as when analyzing a text format.

This implementation is super simple and works well enough for some cases, but it has some problems:

  • He never removes the ropes from the pool.
  • It cannot be used safely by several goroutines simultaneously.
  • This only works with strings, even if the idea is quite general.

There is also a missed opportunity in this implementation, and it is subtle. Under the hood, The strings are immutable structures composed of a pointer and a length. When you compare two channels, if the pointers are not equal, we must compare their content to determine equality. But if we know that two strings are canonical East Enough to simply check their pointers.

Enter unique pack

The new unique The package has a function similar to Intern called Make.

It works roughly the same way as Intern. Internally, there is also a global card (a rapid generic simultaneous card) and Make Search for the value provided in this card. But it also differs from Intern in two important ways. First, he accepts the values ​​of any comparable type. And secondly, it returns a value of Wrapper, a Handle[T]From which the canonical value can be recovered.

This Handle[T] is the key to design. A Handle[T] to the property that two Handle[T] The values ​​are equal if and only if the values ​​used to create them are equal. In addition, the comparison of two Handle[T] The values ​​are cheap: it comes down to a pointer comparison. Compared to the comparison of two long strings, it is a cheaper order of magnitude!

So far, it is nothing that you cannot do in the ordinary appointment code.

But Handle[T] also has a second objective: as long as a Handle[T] Exists for a value, the card will keep the canonical copy of the value. Once all Handle[T] The values ​​that map for a specific value have disappeared, the package marks this internal card entry as deleted, to be recovered in the near future. This defines a clear strategy to remove the card entries: when the canonical entries are no longer used, the garbage collector is free to clean them.

If you've already used Lisp, this may seem quite familiar to you. Lisp symbols are interned channels, but not the strings themselves, and all the chain values ​​of all the symbols are guaranteed to be in the same pool. This relationship between symbols and strings is parallel to the relationship between Handle[string] And string.

An example of the real world

So how could we use unique.Make? Does not seek further than the net/netip package in the standard library, which internal values ​​of the type addrDetailpart of netip.Addr structure.

Below you will find an abbreviated version of the real code of net/netip useful unique.

// Addr represents an IPv4 or IPv6 address (with or without a scoped
// addressing zone), similar to net.IP or net.IPAddr.
type Addr struct {
    // Other irrelevant unexported fields...

    // Details about the address, wrapped up together and canonicalized.
    z unique.Handle[addrDetail]
}

// addrDetail indicates whether the address is IPv4 or IPv6, and if IPv6,
// specifies the zone name for the address.
type addrDetail struct {
    isV6   bool   // IPv4 is false, IPv6 is true.
    zoneV6 string // May be != "" if IsV6 is true.
}

var z6noz = unique.Make(addrDetail{isV6: true})

// WithZone returns an IP that's the same as ip but with the provided
// zone. If zone is empty, the zone is removed. If ip is an IPv4
// address, WithZone is a no-op and returns ip unchanged.
func (ip Addr) WithZone(zone string) Addr {
    if !ip.Is6() {
        return ip
    }
    if zone == "" {
        ip.z = z6noz
        return ip
    }
    ip.z = unique.Make(addrDetail{isV6: true, zoneV6: zone})
    return ip
}

Since many IP addresses are likely to use the same area and that this area is part of their identity, it is very logical to canonicalize them. The deduplication of zones reduces the imprint of the average memory of each netip.AddrWhile the fact that they are canonized means netip.Addr The values ​​are more effective to compare, because the comparison of zone names becomes a simple pointer comparison.

While the unique The package is useful, Make is certainly not quite like Intern for chains, since the Handle[T] is necessary to prevent a chain from deleting from the internal card. This means that you need to change your code to keep the handles as well as the chains.

But the strings are special in that, although they behave as values, they actually contain pointers under the hood, as we mentioned earlier. This means that we could potentially canonicalize only the underlying storage of the chain, hiding the details of a Handle[T] inside the chain itself. So there is still a place in the future for what I will call Transparent String Interinardin which the chains can be interned without the Handle[T] type, similar to Intern function but with semantics more closely resembling Make.

Waiting for, unique.Make("my string").Value() is a possible bypass solution. Even if the defect in keeping the handle will allow the chain to be deleted from uniqueThe internal card of the card, the card inputs are not immediately deleted. In practice, the entries will only be deleted at least the next collection of waste ends, so this bypass solution always allows a certain degree of deduction in the periods between the collections.

A story and a look to the future

The truth is that the net/netip The package has actually interned area chains since its introduction. The endless package he used was an internal copy of the Go4.org/intern package. Like the unique package, he has a Value Type (which looks a lot like a Handle[T]pre-generics), has the notable property that the inputs of the internal card are deleted once their handles are no longer referenced.

But to achieve this behavior, he must do dangerous things. In particular, he makes some assumptions on the behavior of the garbage collector to implement weak Apart from the execution. A weak pointer is a pointer that does not prevent the garbage collector from recovering a variable; When this happens, the pointer automatically becomes zero. In this case, weak pointers are Also central abstraction underlying unique Pack.

It is true: while implementing the unique Package, we added a good low pointer support to the garbage collector. And after having crossed the mines field of regrettable design decisions that accompany the weak pointers (as, should the weak pointers follow the resurrection of objects? No!), We were surprised by the way it turned out. Sufficiently surprised that weak pointers are now a public proposal.

This work has also led us to re -examine the finalizers, which led to another proposal for an easier to use Replacement of finalizers. With A hash function for comparable values Also on the way, the future of Elocol caches in memory In Go is shiny!


Credits: Michael Knyszek

Photo of Mildlee on UNCLASH

This article is available on The Go Blog Under a CC license by 4.0 acts.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button

Adblocker Detected

Please consider supporting us by disabling your ad blocker