Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Amazon internal tools for building codes are _amazing_.

Brazil is their internal dependency management tool. It handles building and versioning software. It introduced the concept of version sets which essentially allows you to group up related software, e.g. version 1.0 of my app needs version 1.1 of library x and 2.0 of runtime y. This particular set of software versions get its own version number.

Everything from the CI/CD to the code review tool to your local builds use the same build configuration with Brazil. All software packages in Brazil are built from source on Amazon's gigantic fleet of build servers. Builds are cached, so even though Amazon builds its own version of Make, Java, etc., these are all built and cached by the build servers and downloaded.

A simple Java application at Amazon might have hundreds of dependencies (because you'll need to build Java from scratch), but since this is all cached you don't have to wait very long.

Lastly, you have Pipelines which is their internal CI/CD tool which integrates naturally with Brazil + the build fleet. It can deploy to their internal fleet with Apollo, or to AWS Lambda, S3 buckets, etc.

In all, everything is just very well integrated. I haven't seen anything come close to what you get internally at Amazon.



How did you avoid version hell? At Google, almost everything just shipped from master (except for some things that had more subtle bugs, those did their work on a dev branch and merged into master after testing).


Version sets take care of everything. A version set can be thought of as a Git repo with just one file. The file is just key/value pairs with the dependencies and major/minor version mappings, e.g.

<Name> <Major>-<Minor>

Java 8-123

Lombok 1.12-456

...

A version set revision is essentially a git commit of that version set file. It's what determines exactly what software version you use when building/developing/deploying/etc.

Your pipeline (which is a specific noun at Amazon, not the general term) acts on a single version set. When you clone a repo, you have to choose which version set you want, when you deploy you have to choose a version set, etc.

Unlike most other dependency management systems, there's no notion of a "version of a package" without choosing what version set you're working on, which can choose the minor versions of _all of the packages you're using_.

e.g. imagine you clone a Node project with all of its dependencies. Each dependency will have a package.json file declaring what versions it needs. You have some _additional_ metadata that goes a step further that chooses the exact minor version that a major version is mapped to.

All that to say that the package can declare what major version they depend on, but not what minor version. The version set that you're using determines what minor version is used. The package determines the major version.

Version sets can only have one minor version per major version of a package which prevents consistency issues.

e.g. I can have Java 8-123 and Java 11-123 in my version set, but I cannot have Java 8-123 and Java 8-456 in my version set.

Your pipeline will automatically build in new minor versions into your version set from upstream. If the build fails then someone needs to do something. Every commit produces a new minor version of a package, that is to say that you can say your package is major version X, but the minor version is left up to Brazil.

This scheme actually works pretty well. There are internal tools (Gordian Knot) which performs analysis on your dependencies to make sure that your dependencies are correct.

It's a lot to know. It took me a year or so to fully understand and appreciate. Most engineers at Amazon treat it like they do Git -- learn the things you need to and ignore the rest. For the most part, this stuff is all hands off, you just need one person on the team keeping everything correct.


That sounds actually brilliant. Someone decided to brush less version stuff under the carpet.


You don't, you embrace version hell.


so what I'm hearing is that app-1.0 needs app-1.0-runtime-build-20240410 which was, itself, built from a base of runtime-y-2.0 and layering library-x-1.11 upon it, kind of like

  # in some "app-runtimes" project, they assemble your app's runtime
  cat > Dockerfile <<FOO
  FROM public.ecr.aws/runtimes/runtime-y:2.0
  ADD https://cache.example/library-x/1.1/library-x-1.1.jar
  FOO
  tar -cf - Dockerfile | podman build -t public.ecr.aws/app-runtimes/app-1.0-runtime-build:20240410 -

  # then you consume it in your project
  cat > Dockerfile <<FOO
  FROM public.ecr.aws/app-runtimes/app-1.0-runtime-build:20240410
  ADD ./app-1.0.jar
  FOO

  cat > .gitlab-ci.yml <<'YML'
  # you can also distribute artifacts other than just docker images
  # https://docs.gitlab.com/ee/user/packages/package_registry/supported_package_managers.html
  cook image:
    stage: package
    script:
    # or this https://docs.gitlab.com/ee/topics/autodevops/customize.html#customize-buildpacks-with-cloud-native-buildpacks
    - podman build -t $CI_REGISTRY_IMAGE .
    # https://docs.gitlab.com/ee/user/packages/#container-registry is built in
    - podman push     $CI_REGISTRY_IMAGE
  review env:
    stage: staging
    script: auto-devops deploy
    # for free: https://docs.gitlab.com/ee/ci/review_apps/index.html
    environment:
      name: review/${CI_COMMIT_REF_SLUG}
      url: https://${CI_ENVIRONMENT_SLUG}.int.example
      on_stop: teardown-review
  teardown-review:
    stage: staging
    script: auto-devops stop
    when: manual
    environment:
      name: review/${CI_COMMIT_REF_SLUG}
      action: stop
  ... etc ...
  YML
and then, yadda, yadda, blue-green, incremental rollout <https://gitlab.com/gitlab-org/gitlab/-/blob/v16.10.2-ee/lib/...>, feature flags <https://docs.gitlab.com/ee/operations/feature_flags.html>, error capture <https://docs.gitlab.com/ee/operations/error_tracking.html#in...>, project-managed provisioning <https://docs.gitlab.com/ee/user/infrastructure/iac/#integrat...>, on call management <https://docs.gitlab.com/ee/operations/incident_management/>, on call runbooks <https://docs.gitlab.com/ee/user/project/clusters/runbooks/in...>

you can orchestrate all that from ~~Slack~~ Chime :-D if you're into that kind of thing https://docs.gitlab.com/ee/ci/chatops/


No, not even close. You might even have it exactly backwards.


which is why, as I originally asked GP: what have you already tried and what features were they missing

I presume by "exactly backwards" you mean that one should have absolutely zero knobs to influence anything because the Almighty Jeff Build System does all the things, which GitLab also supports but is less amusing to look at on an Internet forum because it's "you can't modify anything, it just works, trust me"

Or, you know, if you have something constructive to add to this discussion feel free to use more words than "lol, no"


I don't work at Amazon, and haven't for a long time, and this format is insufficient to fully express what they're doing, so I won't try.

You're better off searching for how Brazil and Apollo work.

That being said, the short of it is that: imagine when you push a new revision to source control, you (you) can run jobs testing every potential consumer of that new revision. As in, you push libx-1.1.2 and anyone consuming libx >= 1.1 (or any variety of filters) is identified. If the tests succeed, you can update their dependencies on your package and even deploy them, safely and gradually, to production without involving the downstream teams at all. If they don't, you can choose your own adventure: pin them, fork, fix them, patch the package, revise the versioning, whatever you want.

It's designed to be extremely safe and put power in the hands of those updating dependencies to do so safely within reason.

Imagine you work on a library and you can test your PR against every consumers.

It's not unlike what Google and other monorepos accomplish but it's quite different also. You can have many live versions simultaneously. You don't have to slog it out and patch all the dependents -- maybe you should, but you have plenty of options.

It all feels very simple. I'm glossing over a lot.


Sorry, I wish I could phrase it better for you. All I can say is that I have tried a _lot_ of tools, and nothing has come close. Amazon has done a lot of work to make efficient tools.

Here's a better explanation: https://gist.github.com/terabyte/15a2d3d407285b8b5a0a7964dd6...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: