Java Magazine Logo
Originally published in the May/June 2014 issue of Java Magazine. Subscribe today.

Mastering Binaries with Hudson, Maven, Git, Artifactory, and Bintray

by Michael Hüttermann

Published May 2014

A powerful tool chain that can be the backbone in a build/release and delivery pipeline

In this article, we discuss patterns for mastering binaries in Maven-based projects. We differentiate between version control systems and component repositories as well as Maven releases and Maven snapshots. We talk about automatic releasing and explore a tool chain that integrates Hudson, Maven, Git, Artifactory, and Bintray to be a backbone in a build/release and delivery pipeline.

Now let’s start by setting the stage with some important aspects of continuous integration (CI) and continuous delivery (CD).

CI and CD

CI includes code integrations that are run at least on a daily basis. The word continuous, as used in this context, denotes a repeatable process that occurs regularly and frequently. The word integration means that individually developed pieces of code that are stored in a source code repository are checked out as a whole; then they’re compiled, packaged, tested, inspected, and deployed with build results integrated into web pages, sent out as an e-mail, or both.

Building and integrating software as soon as developers check in their changes is called continuous build. A release build (or a release candidate build) is often the build that pulls a specific version (which was tested before successfully) from the version control system (VCS) and creates a new baseline from that. The build server acts as a “single point of truth,” so builds can be used with confidence for testing or as a production candidate.

With CD, you implement delivery pipelines, also called build staging, according to your corporate standards. For example, validating special quality requirements on higher build stages prevents code from being promoted before it’s ready. The initial staging area is the developers’ workspaces. Operationally, build pipelines often consist of different build jobs that are triggered automatically or manually and might have dependencies on each other. We talk about builds, but what exactly is a build?

Builds

A build is a standard, repeatable, and measurable compilation and packaging process done automatically. The specific steps of this process can vary. Some argue that building means compiling. Others might include in the process a preparation of the system, working on baselines in the VCS, compiling sources, running different sorts of tests, applying the “infrastructure as code” paradigm (for example, with Chef or Puppet), and packaging and distributing the configuration items. Build can also refer to the greater automation flow, including static code analysis, unit testing, and deployment. The output (the deliverables) of this process is often called a build, but from now on, we will call it an artifact.

The changes from multiple developers are integrated continuously and automatically. Each version of the build is the basis of later releases and distributions. As shown in Figure 1, a build—local and central ones—checks out and may update sources from the VCS and consumes and produces artifacts from and to a component repository (also called a binary manager or distribution management). Version control and distribution management are in place and are complementary.


binaries-f1
Figure 1

The artifact is configurable and runs on different target environments, including Windows and Linux machines (“build once, configure anywhere”). Be aware that even early in the process, you should use environments that match those you’ll use in production. With CD, promoting versions to higher staging environments is often a one-click event, where a lead engineer, a release manager, a deployment manager, or the domain expert pulls a new version to be deployed and configured to a defined target environment. An environment is all of the resources that your application needs to work and their configuration, as well as the hardware configuration (including CPUs, memory, and spindles) and the configuration of the operating system and middleware (including messaging systems and web servers). The term infrastructure summarizes all of the environments in your organization together with supporting services, such as firewalls and monitoring systems.

As shown in Figure 2, parts of each artifact are held in version control, and other parts are held in distribution management. The configuration management discipline calls all those artifacts configuration items. Configuration management decides whether to put artifacts into version control or into distribution management.

binaries-f2

Figure 2

Now let’s explore version control and distribution management (component repositories) in detail.

Version Control

Generally, it’s important to archive your artifacts, but which types of artifacts you store where depends on the project context and the requirements. Coding artifacts—particularly source code, build scripts, and infrastructure as code—should be stored in the VCS. Although this sounds obvious, it’s not always the case. Many projects, for example, patch their applications in production without having those changes under source control. In addition to source code, tests and build scripts need to be versioned as well as Puppet manifests or Chef cookbooks. The latter is often referred to as DevOps, although DevOps is much more than that.

I recommend that you externalize (and version) your runtime configuration settings. It’s best practice to control all variable configuration values externally. Additionally, you should set up one central repository to store your assets, to avoid having multiple places where documentation might be located (for example, the VCS, a file sharing server, and your favorite e-mail system, all in parallel).

When you check your artifacts into the VCS, you’ll have to decide how to arrange the files in the system and decide who works on which stream and why. The rule of thumb here is that you shouldn’t open any additional streams for longer than necessary, even with a distributed VCS such as Git.

Component Repositories

Often it’s necessary to reproduce software that’s running on different machines. It can, therefore, be useful for a central release or configuration management department to label final versions of the software and put the build artifacts into a specific build archive, often called a definitive media library. This ensures binary integrity, which means that the same deployment units are delivered to each target environment (there’s no recompilation for further environments), and source code is always linked to produced binaries.

A component repository can also protect company assets and boost reusability. In the Java ecosystem, the binary versions are those from the standardized set of deployment units, such as JAR, WAR, and EAR files. Full-fledged component repositories offer many features including search capability, role-based permission systems, transactional handling of binary access, and much more. Now that we’ve differentiated between the VCS and component repositories, you’re fit for automatic releasing.

Automatic Releasing

During automatic releasing, which is fundamental for a CD initiate, major parts of the release process are performed by scripts and tool chains. In this process, the whole team profits from automation and, under optimal conditions, a specific button is simply pressed to promote automatically created snapshot versions to release candidates or to promote release candidates to release status.

Prerequisites of a holistic automatic releasing process include

  • Use highly integrated tool chains consisting of lightweight tools, for example, Hudson or Maven, that can be chosen and orchestrated as needed.
  • Put configuration items (including sources, database scripts, middleware, infrastructure, configuration files—such as Java properties files—and build/deploy scripts) into version control.
  • Wherever possible, use declarative formats (for example, Puppet manifests and Chef cookbooks) to set up the automatic releasing process.
  • Declare (explicitly) and isolate dependencies of application, middleware, and infrastructure.
  • Apply CI that continuously synchronizes the work of your colleagues.
  • Distinguish between VCSs (such as Subversion and Git), where you hold your sources, and component repositories (such as Artifactory and Nexus), where you hold your software binaries.
  • Build binaries once and deploy them to target systems by configuration (in other words, runtime configuration data is not packaged into the binaries in a static fashion; rather, the application is configured during deployment time or upon startup).
  • Keep environments similar between development and operations (keeping them equal is not practical because of costs, benefits, and different nonfunctional requirements in specific environments).
  • Define a process (and also one for patching and merging back changes in production versions).
  • To ensure reproducibility, ensure that delivered software is solely built by the build server and is neither built manually nor patched or built on developers’ desktops.

It’s critical to understand that automation is important in order to gain fast feedback. Don’t just automate because automating defined activities is so much fun, although this is a good motivator as well. 

Take Note

Be aware that even early in the process, you should use build environments that match those you’ll use in production.
When running an automatic releasing process with Maven, you need to take special care of Maven snapshots and Maven releases, which we discuss next.

Snapshots and Releases

First, it’s important to consider build hygiene while releasing and working with Maven projects. For instance, a release build (in Maven terms, that is every version without a -SNAPSHOT in its version element) includes only frozen versions of both the produced and consumed artifacts. You might want to check this by using the convenient Maven Enforcer plugin, or you might want to adjust your project object model (POM) accordingly with the help of Maven’s Versions plugin.

Let’s make this even clearer by introducing a very small example. Let’s imagine we want to play around with lambdas in a freshly built Java SE 8 application that looks like Listing 1.

package com.huettermann;

import java.lang.System;

/**
 * Hello world!
 */
public class App
{
    public static void main(String[] args) {
        Runnable r = () -> System.out.println("Hello World!");
        Thread t = new Thread(r);
        t.start();

    }
}

Listing 1

To build this in a reproducible way, we just set up a minimalistic Maven POM, as shown in Listing 2. Take note of the compiler settings. We use Java 8 for both source and target to be able to use and compile the lambda expression.

<project>
    <modelVersion>4.0.0</modelVersion>
    <groupId>com.huettermann</groupId>
    <artifactId>cat</artifactId>
    <version>${myVersion}</version>
    <name>devops</name>
    <url>http://huettermann.net/devops/</url>
    <properties>
        <project.build.sourceEncoding>UTF-8
</project.build.sourceEncoding>
    </properties>
    <build>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>2.3.2</version>
                <configuration>
                    <source>1.8</source>
                    <target>1.8</target>
                </configuration>
            </plugin>
        </plugins>
    </build>
</project>

Listing 2

Now look at the version element. We’ve introduced a variable, ${myVersion}, for the version that can be passed through while starting the Maven build. The parameterized version number is often a good approach during releasing. We can just call a build—for example, with mvn clean install -DmyVersion=1.0.0-GA—and the produced binary will have the correct version. This way we can easily create releases candidates or release continuously, without the overhead of frequently patching the POM files with respective version numbers. The drawback is that the POM, as part of a baseline in your VCS, does not contain the concrete version number. This can be a showstopper, depending on project constraints and governance requirements.

Often a project team feels comfortable using Maven snapshots during development and providing releases for deployment to target environments. Take note: the term deployment is heavily overloaded. Please differentiate between deploying artifacts to a component repository—that means publishing binaries to it—and deploying artifacts to an application server (that is, the runtime container for the application). In this article, I don’t cover any deployments of WAR files or EAR files to application servers. It’s important to understand that although it’s often fine to work with Maven snapshots during development and deploy them to a component repository, it’s seldom a good idea to deploy snapshots to an application server on a higher integration test environment.

Let’s now look at a different approach for what releasing can look like by designing an appropriate delivery pipeline. Keep in mind that for that, we’ve changed the POM file and replaced the version variable with 0.0.1-SNAPSHOT.

A Delivery Pipeline

Let’s go through the essentials of the solution and discuss some common recipes for a Hudson build pipeline that has three build stages triggered continuously on each Git commit (see Figure 3). Be aware that, in practice, you’ll probably want to set up a holistic pipeline, which might consist of multiple subpipelines. All build jobs have the Git integration configured pointing to the central master repository.

binaries-f3

Figure 3

Stage 1. The first stage is triggered automatically on each commit in Git. Code is compiled, tested, packaged, and installed locally on the build server. Choose your build server topology wisely, for example, by scaling out horizontally and running builds only on build slaves, not on the master.

After the Maven build step mvn clean install runs successfully, the produced binary is placed in the folder named “target” of the build job’s workspace and installed into the local Maven repository, which, depending on your configuration, is either in the home directory of the user running Hudson or another place local to your Hudson builds. According to the Maven approach, the artifact is named cat-0.0.1 -SNAPSHOT.jar, which is the concatenation of the artifactId (cat), the version (0.0.1-SNAPSHOT), and the deployment type (.jar).

After we’ve produced the artifact, we can place it into an exchange medium for later reuse, for example, for sharing among developers, or later use between development and operations. Often, this is a component repository such as Artifactory, for both snapshots and releases, or Bintray, for releases only.

In our case, as part of the pipeline, we want to locally reuse the freshly created artifact in downstreamed build steps and check its quality before we distribute it to any teams. Thus, we go for the approach of temporarily archiving the artifact on the file system. We could also move and exchange artifacts across build slaves, but let’s keep it simple here and focus on the scope of this article.

So let’s copy the file to a transfer folder by squeezing some shell commands into a Hudson “execute shell” script build step, as shown in Listing 3.

#!/bin/sh
rm -rf /home/michael/talk/transfer
mkdir /home/michael/talk/transfer
cp devops/target/*.jar /home/michael/talk/transfer

Listing 3

Although there are options for versioning scripts and tracking changes in Hudson, for better maintainability, in real project life, you often want to put your shell scripts into the VCS as well. Because we later need the project version and the Git checkout hash, we now extract and store that information. Let’s start with the version of the Maven-based project (see Listing 4), which is also part of the dedicated Hudson build step.

#!/bin/sh
version1=$(sed -n 's/.*version>/\0/p' devops/pom.xml | head -1)
version2=$(echo $version1 | sed 's///g')
version=$(echo $version2 | sed 's/<\/version>//g')
echo "version=$version" > 
/home/michael/talk/transfer/version.properties

Listing 4

Many possible ways exist for extracting the version number. We’ve decided to use some sed commands, and store the version number in a Java properties file named version.properties. Given the version 0.0.1-SNAPSHOT, the file afterward includes the key/value pair version=0.0.1-SNAPSHOT.

Because we later want to cherry-pick a tested version to promote to be a release, we also store and visualize the Git hash, that is, the commit that was built successfully by Hudson. For that, we’ve coded a Groovy postbuild action, similar to Listing 5.

def matcher = manager.getLogMatcher(
".*Checking out Revision (.*)\$")
if(matcher?.matches()) {
    manager.addShortText(matcher.group(1).substring(0,8))
}

Listing 5

This script adds the first eight characters of the hash to the build history of the build job for later reference; see Figure 4, which shows an enriched build history. Git commits at your fingertips.

binaries-f4

Figure 4

Finally, Stage 1 triggers the downstream build job. In our case, this is Stage 2; thus, we add this build job to be Hudson’s target destination of the “Trigger parameterized build on other project.” Here, don’t forget to pass through parameters you want to use later from inside downstream build jobs. In our case, we’ll need the Git commit hash; thus, we configure to pass through the “Git commit that was built.” Let’s now move forward to Stage 2.

Stage 2. Stage 2 is an example dummy stage that illustrates any further activities on the previously built artifact or code baseline. Please imagine some heavy processing here, testing, and many more helpful things.

If you need to access the sources from version control, it’s important to not rely only on Hudson’s Git plugin. It’s more stable to trigger a git checkout $GIT_COMMIT as the first build step in that particular build job of Stage 2. This ensures that the build job does work on exactly the same Git commit as the build job of Stage 1.

Closing Stage 2, we trigger Stage 3 with the same approach that we used to call Stage 2 from Stage 1. Additionally, we now add “Parameters from properties file” while configuring the “Trigger parameterized build on other project” section. Here, we use the properties from file /home/michael/talk/transfer/version.properties, which was created during Stage 1. It’s sometimes a bit tricky to make context information and environment variables available in Hudson build jobs. By passing the properties file, its key/value are made available to the downstream build job dynamically.

Stage 3. Our last stage contains the integration with the component repository, that is, deploying artifacts. If everything went well until now, nothing breaks, and no quality gates are missed, we can now deploy the previously created artifact to the component repository.

What has to be done? First of all, we add a shell build step to the newly created Hudson build job for Stage 3. To be more consistent and to document what’s going on, the build step copies the previously created artifact to the workspace local to this build job, as shown in Listing 6.

#!/bin/sh
rm *.jar
cp /home/michael/talk/transfer/*.jar .

Listing 6

Now we need to configure the component repository, in our case, Artifactory. Artifactory is a component repository that can be smartly integrated with Hudson. It serves as a tool for artifact exchange and provisioning across all artifact types, including Java, Linux RPMs, and many more.

In the Hudson build job, we now configure the desired Artifactory target repository—that is, a logical target path in the overall repository. Side note: having the Git baseline as well, it would be easy to just rebuild the Maven project and deploy the outcome directly to Artifactory. In that case, we would not use mvn clean deploy because that is an antipattern. The better approach would be to fire an mvn clean install command and let Artifactory do the deployment in a transactional way.

But, in our case, requirements might differ, for instance, needs for highly performant build cycles, and we’ve decided to take the built artifact from Stage 1 and deploy it with Artifactory’s generic integration facility. For this, we can use an Ant-style configuration of the file pattern for the published artifacts (what to deploy) and target destination (where to deploy). Our configuration might look like Listing 7.

**/*.jar=>com/huettermann/cat/${version}

Listing 7

Best Practices

I recommend that you externalize (and version) your runtime configuration settings. It’s best practice to control all variable configuration values externally.
Now we’re done with deploying ongoing snapshot artifacts to Artifactory as part of our continuous build. Our Hudson build job history shows the Artifactory icon to document that binaries were deployed to Artifactory; see Figure 5, which shows that the Hudson build job history visualizes that artifacts were deployed to Artifactory.

binaries-f5

Figure 5

It comes in handy to be able to directly navigate from the build history to the built artifacts by just selecting the Artifactory icon of a specific build. We’re then directed to Artifactory and can work on or just obtain information for those artifacts. Figure 6 shows the Artifactory browser listing different deployed snapshot versions. As you see, although we deploy snapshot versions, inside Artifactory, artifacts are handled as unique, fully qualified versions. This is achieved by pure configuration inside Artifactory.

binaries-f6

Figure 6

In Artifactory, artifacts are browsable. You can operate on those artifacts depending on your permissions.

We’re done: Git commits continuously trigger the pipeline, and new snapshot versions are produced and distributed. That’s fine for development, but we move on in the releasing process and sooner or later (with luck, sooner), we want to create a release, which we explore next.

Creating a Release

Our releasing process is implemented in a dedicated Hudson build job. First, we want to check out a previously compiled and tested baseline of the continuous build. Thus, we parameterize the Hudson release build job with an input field of type String and assign HEAD to be its default value. During build job execution, it’s highly recommended to enter a specific, well-tested Git commit hash, but for build chain testing purposes, HEAD might be sufficient. As the first build step, we can then access the parameter and check out the given Git commit as a shell execution by using git checkout $rev.

Hudson’s Git plugin creates a clone, and performing the checkout aligns us with the desired baseline. Maven’s Release plugin is often a good choice—see Section 5.4 in my Agile ALM book (Manning, 2011)—but sometimes it’s not the best fit for the given requirements. An alternative is to directly apply fine-grained steps as needed. Examples include tagging in the VCS (for example, by executing shell commands as build steps) or setting the version number in the POM files. The latter can be achieved easily by a self-written Maven Mojo, as shown in Listing 8.

package VersionFetcher;

import org.apache.maven.plugin.AbstractMojo;
import org.apache.maven.plugin.MojoExecutionException;
import org.apache.maven.project.MavenProject;


/**
 * @goal release
 * @phase process-sources
 */
public class VersionFetcher extends AbstractMojo {

    /**
     * @parameter expression = "${project}"
     * @readonly
     */
    private MavenProject project;

    public void execute() throws MojoExecutionException {
    String version = project.getVersion();
    String release = version;
    if (version.indexOf("-SNAPSHOT") > -1) {
        release = version.substring(0, version.indexOf(
"-SNAPSHOT"));
        getLog().info("SNAPSHOT found: " + release);
    }
    project.getProperties().setProperty("newVersion", release);
  }
} 

Listing 8 

The special trick with this Maven plugin is that it reads the project version of the Maven project it was applied on, strips its snapshot version, and dynamically assigns the result (that is, the release version) to the property newVersion. That property, in turn, is the input of the set goal of the Maven Versions plug-in. Thus, in the Hudson build job, we can fire the command shown in Listing 9 on the underlying to-be-released Maven-based project.

cp devops/target/cat-$version.jar devops/target/cat-$release.jar

Listing 9

We skip the generation of backup POMs, because we work on defined baselines and we can always roll back. Even better, we always roll forward because we just work on the new changes piped through the pipeline. Also, here, it does not make any difference whether we work on one POM or any hundreds of child modules. The procedure is the same.

After we’ve patched the POM, the source code on the build machine is now ready to be tagged and deployed to a full-fledged component repository. We use mvn clean install to install the release artifact locally. Afterward, depending on our requirements, we orchestrate the target version number, that is, the version number the artifact is labeled with in the component repository. For that, we execute the shell as a build step in our Hudson release build job (see Listing 10).

com.huettermann:versionfetcher:1.0.0:release versions:set
-DgenerateBackupPoms=false

Listing 10

After processing, the release variable contains a string that is enriched with a lot of useful context information including the version from the POM file, the first six characters of the Git hash, the Git revision number, and the build number of the executing Hudson build job. It’s just an example, but you get the point: take the information that is most helpful to you; concrete requirements might influence your choice. But be aware of any implications. For example, only adding the build number from the Hudson build job itself is too fragile, because it is not unique across different Hudson build jobs.

Let’s now proceed with our releasing process. The local target directory contains the release artifact that was built by Maven. We want to assign our individual release version to it; thus, we copy the artifact, as shown in Listing 11.

#!/bin/sh
version1=$(sed -n 's/.*version>/\0/p' devops/pom.xml | head -1)
version2=$(echo $version1 | sed 's///g')
version=$(echo $version2 | sed 's/<\/version>//g')
hash=$GIT_COMMIT
var=$(echo $hash | cut -b 1-6)
revisioncount='git log --oneline | wc -l'
release=$version-$var.r$revisioncount.b$BUILD_NUMBER
echo $release

Listing 11

Now let’s deploy the release artifact to our component repository. In our case, we use Bintray. Bintray is like a “GitHub for binaries.” It’s a social media service for developers to publish, download, store, promote, and share open source software packages. We use Bintray’s powerful REST API to distribute our artifact.

First, for testing purposes only, we delete the target version of our component (in Bintray notation, this is a “package”) in Bintray, as shown in Listing 12. This comes in handy if the version is available already and we don’t need it.

curl -u michaelhuettermann:${bintray_key} -X DELETE
https://api.bintray.com/packages/
michaelhuettermann/meow/cat/versions/$release

Listing 12

As you see, a cURL command is sufficient, but you can, of course, also wrap the handling with Groovy (see this library for encapsulating the access) or any other language of choice.

Bintray follows some conventions. We host repositories (in our case, meow), which is owned by me, michaelhuettermann. An access key must be generated in Bintray, and we deposit it in Hudson as a configuration variable. Our repository consists of a package named cat.

Next we create the new target version by defining a Hudson build step to execute a shell script, as shown in Listing 13. After we’ve created the new version, we’re ready to deploy the binary, as shown in Listing 14.

curl -u michaelhuettermann:${bintray_key} -H "Content-Type: 
application/json" -X
POST https://api.bintray.com/packages/michaelhuettermann/meow/
cat/versions
--data "{ \"name\": \"$release\", \"desc\": \"desc\" }"

Listing 13 

curl -T "$WORKSPACE/devops/target/cat-$release.jar" -u
michaelhuettermann:${bintray_key} -H "X-Bintray-Package:cat" -H
"X-Bintray-Version:$release"
https://api.bintray.com/content/michaelhuettermann/meow/ 

Listing 14 

Learn More


 Agile ALM, by Michael Hüttermann (Manning, 2011)

 DevOps for Developers, by Michael Hüttermann (Apress, 2012)

Finally, we have to make the binary visible; that is, we have to publish it. We can do that either in Bintray’s web application itself or we can again execute an API call (see Listing 15). As a result, we now have distributed the artifact, in its new version, to Bintray. Figure 7 shows that the freshly published versions of artifacts are available in Bintray. Crisp, isn’t it?

curl -u michaelhuettermann:${bintray_key} -H "Content-Type: 
application/json" -X
POST
https://api.bintray.com/content/michaelhuettermann/meow/
cat/$release/publish
--data "{ \"discard\": \"false\" }"

Listing 15

binaries-f7

Figure 7

Conclusion

This article explained some concepts for mastering binaries with Maven projects, and it introduced a powerful tool chain. We’ve done a round-trip through some concrete examples to show how part of a delivery pipeline can look. Now you are prepared to further streamline your software delivery process. As a result, you’ll have more effective and efficient processes and tool chains and even more fun with your daily work.

Now, I wish you much success with mastering your binaries and fun trying out these tools yourself!


huettermann-headshot




Java Champion Michael Hüttermann is a freelance delivery engineer and the author of the first books on agile application lifecycle management (ALM) and DevOps. Follow him at @huettermann.