[Update Nov 27: This post had issues, and I retract some of my more provocative claims. See the errata at the end.]
All software comes with a version, some sequence of digits, periods and characters that seems to march ever upward. Rarely are the optimistically increasing versions accompanied by a commensurate increase in robustness. Instead, upgrading to new versions often causes regressions, and the stream of versions ends up spawning an extensive grapevine to disseminate information about the best version to use. Unsatisfying as this state of affairs is to everyone, I didn't think that the problem lay with these version numbers themselves. They're just names, right? However, over the past year I've finally had my attention focused on them, thanks to two people:
Why is version pinning so prevalent? The proximal reason is that modern package managers uniformly1 fail to provide the sane default of "give me the latest compatible version, excluding breaking changes."
These are all deep, competent projects. Why are their defaults so uniformly useless and misleading? The underlying reason is the traditional format of version numbers: mashing together multiple numbers into a single string, and more importantly separating the version string from the name of a package. A dependency that is just a name provides no hint on what version you want compatibility with, so a package manager has no easy way to pick a good default version.
Towards a better approach
To begin with, it's weird that versions are strings. Parsing versions is non-trivial. Let's just make them a tuple. Instead of "3.0.2", we'll say "(3, 0, 2)".
Next, move the major version to part of the name of a package. "Rails 5.1.4" becomes "Rails-5 (1, 4)". By following Rich Hickey's suggestion above, we also sidestep the question of what the default version should be. There's just no way to refer to a package without its major version.
Since we always want to provide the latest version by default, the distinction between minor versions and patch levels is moot. Just combine the 2-tuple into a single number. "LeftPad 17.5.0" would now become something like "LeftPad-17 37".
At this point you could even get rid of the version altogether and just use the commit hash on the rare occasions when we need a version identifier. We're all on the internet, and we're all constantly running npm install or equivalent. Just say "Leftpad-17", it's cleaner.
And that's it. Package managers should provide no options for version pinning.
A package manager that followed such a proposal would foster an eco-system with greater care for introducing incompatibility2. Packages that wantonly broke their consumers would "gain a reputation" and get out-competed by packages that didn't, rather than gaining a "temporary" pinning that serves only to perpetuate them. The occasional unintentional breakage would necessitate people downstream cloning repositories and changing dependency URLs, which would create a much more stringent atmosphere of accountability for the breaking package. As a result, breaking changes wouldn't live so long that they gain new users.
In particular, Semantic Versioning is misguided, an attempt to fix something that is broken beyond repair. The correct way to practice semantic versioning is without any version strings at all, just Rich Hickey's directive: if you change behavior, rename. Or ok, keep a number around if you really need a security blanket. Either way, we programmers should be manually messing with version numbers a whole lot less. They're a holdover from the pre-version-control pre-internet days of shrink-wrapped software, vestigial in a world of pervasive connectivity and constant push updates. All version numbers do is provide cover for incompatibilities to hide under.
Update (Nov 27)
This post aroused a lot of great feedback on Hacker News and Lobste.rs. After a day of engaging with comments my conclusion was that I should have been more explicit about my focus: the flow for upgrading (and testing) software in development, not deploying to production. In particular, the post doesn't make any claims about versions in production. Reproducible builds are great! But you just need a hash for them. Right?
A prolonged exchange with Joel Parker Henderson convinced me that it's just not feasible to separate operational concerns from development concerns. A common question when managing software in production is, "what version is this running?" And that question quickly requires drilling down to the constituent pieces of a release and their versions. A hash makes that too hard. And you can't have separate version strings for development and deployment either, that's just a recipe for confusion. Therefore, if you take operational considerations into account, my claim that we don't need versions at all is invalid.
What, if anything, remains of value in this post? Package managers should by default never upgrade dependencies past a major version.
The design goal of a package manager should be that a dependency once added to Gemfile or package.json should never need to be modified until it's deleted. What people specify manually goes there, what the package manager deduces goes somewhere else (like Gemfile.lock). If people are editing version strings en masse in Gemfile or equivalent, that is a smell.
In the next mainstream platform, the versions people specify for dependencies should consist of just a major version, because that's the part that the package manager can never deduce. SemVer is a siren here because it conflates pieces from multiple jurisdictions. The major version is the user's responsibility, and minor and patch versions are the package manager's. Why coalesce the two? That just necessitates baroque syntax like twiddle-waka to do the safe thing.
(And oh, if RubyGems and NPM are smelly, the Clojure approach totally stinks. Clojure requires manual intervention to pull in compatible/security fixes for dependencies. It follows the existing Java approach, but Java's eco-system predates the advance of package managers, half of whose reason for existence — after installing dependencies — is updating dependencies. I may still be unaware of some design rationale here, but for now I think Leiningen really missed an opportunity to improve on Java here.)
1. One exception here is Go, where the standard go get command requires no versions, and always grabs the head of the repo. However, the community seems to be turning from the light to the darkness with a proposal for a tool called go dep. It's unclear to me if this is due to a failure of communication on the part of the original authors of Go, or if there's a deeper justification for go dep that I'm missing. If you know, set me straight in the comments below.
2. Following Steve Losh, we'll allow packages ending in "-0" to make incompatible changes to their heart's content.
comments
Comments gratefully appreciated. Please send them to me by any method of your choice and I'll include them here.
Well, the issue obviously is that just "getting the latest head" doesn't work if that can have breaking changes.
If go packages also changed name (and thus github repo too?) to package-1 package-2, package-n, when they introduced a breaking change, like you propose above, then "get the latest of package-n" might work fine, but since this is not the case, "get the latest" is very dangerous and what nobody wants.
I spent three! days (literally!) trying to find why something does not work on my dev box. It worked in CI, it worked on others' boxes. What I found we did not have fixed versions. Mine was a few weeks later than the rest of the pack. We switched to fixed versions as result.
The question I'm trying to answer (following in the footsteps of Rich Hickey and Steve Losh) is: how can we make it easy for developers to upgrade their dependencies so that they pick up bugfixes and security fixes but not breaking changes?
We all know that we code and test against specific behavior, and this can include behavior resulting from things that can be considered bugs. Especially when bugs result in behavior that is not conformant with spec, but which works, when code confirming to spec would not. Legitimate bug fixes often result in downstream breakages in functionality.
I believe that the only way to ensure maximum reliability in software is by version pinning, or vendoring, or immutable binaries... allowing dependancies to change without full regression testing is a recipe for breakage, and dependancy upgrades should be a managed, planned process. The last thing I want is to have to fix issues resulting from dependancy changes while I'm trying to roll out my own fixes, and it is unacceptable for a dependancy to change between code complete and a production roll-out.
This statement is disingenuous.
Yes, NPM will default to installing the latest version by invoking `npm install leftpad`, but when you run `npm install --save leftpad` to persist the dependency to your package manifest, npm will by default add a major version constraint.
Most usage of npm within a project follows this `--save` pattern, and if you don't use the save flag, the dependency is not stored in your package manifest. So no, unless you have a script that looks like this:
``` npm install dependency1 npm install dependency 2 ..... ```
instead of using package manifest to install your project dependencies, npm is not a major version footgun.
NB: This has been the case as long as I've used npm (since v2) - some very very old version may not behave in this way, but I've never heard of it.
`--save` is indeed handy. FYI, for devDependencies, there's `--save-dev` as well.
Thanks for updating the body with a note about this catch.
Seriously, the last paragraph says all that needs saying. Good article!
I spent the first half of my life in India, where people casually drive on the wrong side of the road. In the US, on the handful of times I saw someone accidentally go down the wrong way on a 1-way street, others immediately honk at them. Long before they're in any personal danger of collision. This is a hopeful sign to me, that it _is_ possible to engineer customs that safeguard a commons even though each individual isn't immediately affected.
Hopefully the next language to go mainstream will create this awareness from the start. It doesn't have to be _precisely_ the solution I outlined. Just so long as we think about what eco-system we want to have, and what policy will engender it. I'm just throwing another log into the fire, in hopes of turning the tide of awareness. Consider not letting people pin versions, because that easy fix is a source of technical debt for the community at large.
I hope so, but I think at some scale you have to treat the community as an unreliable system. Once you get enough people contributing, I don't think community norms are enough to keep everyone in line. Not everyone who contributes will participate in the community. Fixing the versioning system is great but I think it ultimately locates the problem in the wrong place, and hence can't fix it.
It could be interesting if you tried to build a tool to enforce the versioning scheme though. Parse the code, and examine the exposed API. You could then classify sets of changes as neutral, breaking, or additive, at least for a shallow definition based on function signatures and types. 🤔
In any case, I'm not claiming the problem is entirely located in the version format or package manager. But now that I have a handle on that part it's worth eradicating it. That'll make the system less noisy and hopefully make it easier to 'listen for' the next problem.
Your suggestion is indeed interesting. See Khalid's comment on Elm elsewhere in this thread. I wonder if avoiding smarts would be a better experience here than getting to 99% accuracy. I still prefer to avoid voice menus and use the touch tone interface because I know it'll be perfectly reliable. But that said, I'd be rooting for anyone who were to explore this possibility.
The Elm approach is interesting. I wish go had thought of this before going forward with dep.
A compromise approach is to enforce a better versioning scheme but still allow pegging your imported deps to a particular commit hash (or something). That way you get the versioning upgrade but can defend against errors.
So I think I'm already at your compromise position, and I refuse to compromise further :D I absolutely agree that enforcing a better version format isn't a panacea. I'm thinking of it rather as a strong default that will hopefully guide more people to doing the right thing.
ie. Sure, all bets are off if you go forward and back between Foo-N and Foo-M. But if Foo-N.2 is an open / closed extension of Foo-N.1, if you depend on Foo-N.1 you KNOW you can go forward to Foo-N.2, but if you depend on Foo-N.2, you DON'T know if you can go backwards to Foo-N.1
Semantic versioning minor number will at least tell you if the change is open closed and going backwards will work (iff it compiles).
Perhaps I'm captured by my constraints, but it didn't at all occur to me that going backwards is something someone may want to do. I've always treated that as a deployment consideration rather than a pinning consideration. When upgrading I may spend a while trying to fix issues, but if it doesn't work I may just roll back to an older version without any of my changes. Should I be wanting more than this?
Suppose you have a package foo that is semantically versioned...
foo-n.m.o
Your proposal essentially says call it foo-n version max_o*m+o.
So what have you lost? Essentially the minor number.
What did the minor number give you? The recognition that you have changed the API in an open closed manner.
https://en.wikipedia.org/wiki/Open/closed_principle
ie. If a program compiled, linked, worked and all tests ran OK using version foo-n.m.o, you're promised by the developer that it will compile, link and work if you build with foo-n.m'.o' for m` >=m
That's the promise, and the developer of foo's testing and design practices should verify that.
That is sort of the same promise you get for foo-n.m.o' for all o' >= o
The only difference should be some of you bugs may have disappeared. ie. Some of your tests that failed for foo-n.m.o because of bugs in foo may pass in foo.n.m.o'
Conversely, you are _not_ guaranteed a program that compiles with foo.n.m' will compile with foo-n.m for m < m'
You are guaranteed a statically typed program that compiles with foo.n.m' AND foo-n.m for m < m' will run successfully.
Of course, Hickey being The Clojure Guy, a dynamically typed language, has no way of checking that. It's one of the things he has made the conscious choice to give up.
In the ruby or javascript world, the same problem.
In the C/C++ world, rolling back a minor number, if it compiles... you know you're safe to do.
So in a dynamically typed language the minor version number is less useful.
In the statically typed language the minor version number is... well... not very useful but slightly more so than in the dynamic language world.
Dynamic languages like Clojure and Ruby and Python lean very heavily on automated testing to give you safety in moving versions. ie. You sort of don't care what the hell the version numbers are. They could be a sha1 has for all you care.
I'd love a gemspec that says, "give me whatever damn version, any version that passes the most of my test cases weighted by customer value".
In C/C++ world we have this optimistic fantasy "It compiles... so it must work.... Right?"
Of course best practice statically typed language library design and evolution is...
"Yup, if it compiles, odds on it will work, if it worked before and compiles now... I promise you I have evolved the library so that it will only work better.
I also promise you if I have changed things so it won't work now... I have done it in a manner so that the compiler will slap you in the face so you know about it".
- Doctrine ORM should have been renamed to Dogma ORM as of the so called "v2.0" was released, and Axiom ORM in the future. - Rails should have been renamed to Trainwheel, Track, Line Framework, - Symfony to Orchestra, Harmony - Ruby itself to Zephyr, Esmerald - Windows to Aperture OS, Fenestra OS, Lancet OS - PHP to ... eer
Still an interesting idea and approach to problem. At least obvious breaking changes (like changing the signature of a public API) are always accompagnied by a major bump. So at least some part of the problem of SemVer is handled properly by the compiler.
[1] - https://github.com/elm-lang/elm-package/#version-rules
Do you have any insight into the Haskell eco-system? Does Cabal respect SemVer?
Setuptools can work well with most versioning schemes; there are, however, a few special things to watch out for..
A version consists of an alternating series of release numbers and pre-release or post-release tags. A release number is a series of digits punctuated by dots, such as 2.4 or 0.5. Each series of digits is treated numerically, so releases 2.1 and 2.1.0 are different ways to spell the same release number, denoting the first sub-release of release 2. But 2.10 is the tenth sub-release of release 2, and so is a different and newer release from 2.1 or 2.1.0. Leading zeros within a series of digits are also ignored, so 2.01 is the same as 2.1, and different from 2.0.1.
Following a release number, you can have either a pre-release or post-release tag. Pre-release tags make a version be considered older than the version they are appended to. So, revision 2.4 is newer than revision 2.4c1, which in turn is newer than 2.4b1 or 2.4a1. Post-release tags make a version be considered newer than the version they are appended to. So, revisions like 2.4-1 and 2.4pl3 are newer than 2.4, but are older than 2.4.1 (which has a higher release number).
A pre-release tag is a series of letters that are alphabetically before “final”. Some examples of pre-release tags would include alpha, beta, a, c, dev, and so on. You do not have to place a dot or dash before the pre-release tag if it’s immediately after a number, but it’s okay to do so if you prefer. Thus, 2.4c1 and 2.4.c1 and 2.4-c1 all represent release candidate 1 of version 2.4, and are treated as identical by setuptools.
In addition, there are three special pre-release tags that are treated as if they were the letter 'c': pre, preview, and rc. So, version 2.4rc1, 2.4pre1 and 2.4preview1 are all the exact same version as 2.4c1, and are treated as identical by setuptools.
A post-release tag is either a series of letters that are alphabetically greater than or equal to “final”, or a dash (-). Post-release tags are generally used to separate patch numbers, port numbers, build numbers, revision numbers, or date stamps from the release number. For example, the version 2.4-r1263 might denote Subversion revision 1263 of a post-release patch of version 2.4. Or you might use 2.4-20051127 to denote a date-stamped post-release.
After each pre or post-release tag, you are free to place another release number, followed again by more pre- or post-release tags. For example, 0.6a9.dev-r41475 could denote Subversion revision 41475 of the in-development version of the ninth alpha of release 0.6. Notice that dev is a pre-release tag, so this version is a lower version number than 0.6a9, which would be the actual ninth alpha of release 0.6. But the -r41475 is a post-release tag, so this version is newer than 0.6a9.dev.
For the most part, setuptools’ interpretation of version numbers is intuitive, but [one tip]: Don’t stick adjoining pre-release tags together without a dot or number between them. Version 1.9adev is the adev prerelease of 1.9, not a development pre-release of 1.9a. Use .dev instead, as in 1.9a.dev, or separate the pre-release tags with a number, as in 1.9a0dev. 1.9a.dev, 1.9a0dev, and even 1.9.a.dev are identical versions from setuptools’ point of view, so you can use whatever scheme you prefer.
Rotflmao.
It was controversial when first proposed, but seems to be working well in practice. Like you suggested, Go requires that you rename the package if you bump the major version. It still retains the three-segment semver versioning scheme though, rather than a single number or a commit hash.