Archive for the 'Atomic' Category

Matching Fedora OSTree Released Content With Each 2 Week Atomic Release

Cross posted with this Project Atomic Blog post

TL;DR: The default Fedora cadence for updates in the RPM streams is once a day. Until now, the OSTree-based updates cadence has matched this, but we're changing the default OSTree update stream to match the Fedora Atomic Host image release cadence (once every two weeks).

---

In Fedora we release a new Atomic Host approximately every two weeks. In the past this has meant that we bless and ship new ISO, QCOW, and Vagrant images that can then be used to install and or start a new Atomic Host server. But what if you already have an Atomic Host server up and running?

Servers that are already running are configured to get their updates directly from the OSTree repo that is sitting on Fedora Infrastructure servers. The client will ask What is the newest commit for my branch/ref? and the server will kindly reply with the most recent commit. If the client is at an older version then it will start to pull the newer commit and will apply the update.

This is exactly how the client is supposed to behave, but one problem with the way we have been doing things in the past is that we have been updating everyone's branch/ref every night when we do our updates runs in Fedora.

This has the side effect of meaning that users can get content as soon as it has been created, but it also means that the two week release process where we perform testing and validation really means nothing for these users, as they will get something before we ever did testing on it.

We have decided to slow down the cadence of the fedora-atomic/25/x86_64/docker-host ref within the OSTree repo to match the exact releases that we do for the two week release process. Users will be able to track this ref like they always have, but it will only update when we do a release, approximately every two weeks.

We have also decided to create a new ref that will get updated every night, so that we can still do our testing. This ref will be called fedora-atomic/25/x86_64/updates/docker-host. If you want to keep following the content as soon as it is created you can rebase to this branch/ref at any time using:

# rpm-ostree rebase fedora-atomic/25/x86_64/updates/docker-host

As an example, let's say that we have a Fedora Atomic host which is on the default ref. That ostree will now be updated every two weeks, and only every two weeks:

-bash-4.3# date
Fri Feb 10 21:05:27 UTC 2017

-bash-4.3# rpm-ostree status
State: idle
Deployments:
● fedora-atomic:fedora-atomic/25/x86_64/docker-host
       Version: 25.51 (2017-01-30 20:09:59)
        Commit: f294635a1dc62d9ae52151a5fa897085cac8eaa601c52e9a4bc376e9ecee11dd
        OSName: fedora-atomic

-bash-4.3# rpm-ostree upgrade
Updating from: fedora-atomic:fedora-atomic/25/x86_64/docker-host
1 metadata, 0 content objects fetched; 569 B transferred in 1 seconds
No upgrade available.

If you want the daily ostree update instead, as you previously had, you need to switch to the _updates_ ref:

-bash-4.3# rpm-ostree rebase --reboot fedora-atomic/25/x86_64/updates/docker-host

812 metadata, 3580 content objects fetched; 205114 KiB transferred in 151 seconds                                                                                                                                                           
Copying /etc changes: 24 modified, 0 removed, 54 added
Connection to 192.168.121.128 closed by remote host.
Connection to 192.168.121.128 closed.

[laptop]$ ssh fedora@192.168.121.128
[fedora@cloudhost ~]$ sudo su -
-bash-4.3# rpm-ostree status
State: idle
Deployments:
● fedora-atomic:fedora-atomic/25/x86_64/updates/docker-host
       Version: 25.55 (2017-02-10 13:59:37)
        Commit: 38934958d9654721238947458adf3e44ea1ac1384a5f208b26e37e18b28ec7cf
        OSName: fedora-atomic

  fedora-atomic:fedora-atomic/25/x86_64/docker-host
       Version: 25.51 (2017-01-30 20:09:59)
        Commit: f294635a1dc62d9ae52151a5fa897085cac8eaa601c52e9a4bc376e9ecee11dd
        OSName: fedora-atomic

We hope you are enjoying using Fedora Atomic Host. Please share your success or horror stories with us on the mailing lists or in IRC: #atomic or #fedora-cloud on Freenode.

Cheers!

The Fedora Atomic Team

Installing an OpenShift Origin Cluster on Fedora 25 Atomic Host: Part 2

Cross posted with this Project Atomic Blog post

Introduction

In part 1 of this series we used the OpenShift Ansible Installer to install Openshift Origin on three servers that were running Fedora 25 Atomic Host. The three machines we'll be using have the following roles and IP address configurations:

+-------------+----------------+--------------+
|     Role    |   Public IPv4  | Private IPv4 |
+=============+================+==============+
| master,etcd | 54.175.0.44    | 10.0.173.101 |
+-------------+----------------+--------------+
|    worker   | 52.91.115.81   | 10.0.156.20  |
+-------------+----------------+--------------+
|    worker   | 54.204.208.138 | 10.0.251.101 |
+-------------+----------------+--------------+

In this blog, we'll explore the installed Origin cluster and then launch an application to see if everything works.

The Installed Origin Cluster

With the cluster up and running, we can log in as admin to the master node via the oc command. To install the oc CLI on a your machine, you can follow these instructions or, on Fedora, you can install via dnf install origin-clients. For this demo, we have the origin-clients-1.3.1-1.fc25.x86_64 rpm installed:

$ oc login --insecure-skip-tls-verify -u admin -p OriginAdmin https://54.175.0.44:8443
Login successful.

You have access to the following projects and can switch between them with 'oc project <projectname>':

  * default
    kube-system
    logging
    management-infra
    openshift
    openshift-infra

Using project "default".
Welcome! See 'oc help' to get started.

NOTE: --insecure-skip-tls-verify was added because we do not have properly signed certificates. See the docs for installing a custom signed certificate.

After we log in we can see that we are using the default namespace. Let's see what nodes exist:

$ oc get nodes
NAME           STATUS                     AGE
10.0.156.20    Ready                      9h
10.0.173.101   Ready,SchedulingDisabled   9h
10.0.251.101   Ready                      9h

The nodes represent each of the servers that are a part of the Origin cluster. The name of each node corresponds with its private IPv4 address. Also note that the 10.0.173.101 is the private IP address from the master,etcd node and that its status contains SchedulingDisabled. This is because we specified openshift_schedulable=false for this node when we did the install in part 1.

Now let's check the pods, services, and routes that are running in the default namespace:

$ oc get pods -o wide 
NAME                       READY     STATUS    RESTARTS   AGE       IP             NODE
docker-registry-3-hgwfr    1/1       Running   0          9h        10.129.0.3     10.0.156.20
registry-console-1-q48xn   1/1       Running   0          9h        10.129.0.2     10.0.156.20
router-1-nwjyj             1/1       Running   0          9h        10.0.156.20    10.0.156.20
router-1-o6n4a             1/1       Running   0          9h        10.0.251.101   10.0.251.101
$ 
$ oc get svc
NAME               CLUSTER-IP       EXTERNAL-IP   PORT(S)                   AGE
docker-registry    172.30.2.89      <none>        5000/TCP                  9h
kubernetes         172.30.0.1       <none>        443/TCP,53/UDP,53/TCP     9h
registry-console   172.30.147.190   <none>        9000/TCP                  9h
router             172.30.217.187   <none>        80/TCP,443/TCP,1936/TCP   9h
$ 
$ oc get routes
NAME               HOST/PORT                                        PATH      SERVICES           PORT               TERMINATION
docker-registry    docker-registry-default.54.204.208.138.xip.io              docker-registry    5000-tcp           passthrough
registry-console   registry-console-default.54.204.208.138.xip.io             registry-console   registry-console   passthrough

NOTE: If there are any pods that have failed to run you can try to debug with the oc status -v, and oc describe pod/<podname> commands. You can retry any failed deployments with the oc deploy <deploymentname> --retry command.

We can see that we have a pod, service, and route for both a docker-registry and a registry-console. The docker registry is where any container builds within OpenShift will be pushed and the registry console is a web frontend interface for the registry.

Notice that there are two router pods and they are running on two different nodes; the worker nodes. We can effectively send traffic to either of these nodes and it will get routed appropriately. For our install we elected to set the openshift_master_default_subdomain to 54.204.208.138.xip.io. With that setting we are only directing traffic to one of the worker nodes. Alternatively, we could have configured this as a hostname that was load balanced and/or performed round robin to either worker node.

Now that we have explored the install, let's try out logging in as admin to the openshift web console at https://54.175.0.44:8443:

image

And after we've logged in, we see the list of projects that the admin user has access to:

image

We then select the default project and can view the same applications that we looked at before using the oc command:

image

At the top, there is the registry console. Let's try out accessing the registry console by clicking the https://registry-console-default.54.204.208.138.xip.io/ link in the top right. Note that this is the link from the exposed route:

image

We can log in with the same admin/OriginAdmin credentials that we used to log in to the OpenShift web console.

image

After logging in, there are links to each project so we can see images that belong to each project, and we see recently pushed images.

And.. We're done! We have poked around the infrastructure of the installed Origin cluster a bit. We've seen registry pods, router pods, and accessed the registry web console frontend. Next we'll get fancy and throw an example application onto the platform for the user user.

Running an Application as a Normal User

Now that we've observed some of the more admin like items using the admin user's account, we'll give the normal user a spin. First, we'll log in:

$ oc login --insecure-skip-tls-verify -u user -p OriginUser https://54.175.0.44:8443                                                                                        
Login successful.

You don't have any projects. You can try to create a new project, by running

    oc new-project <projectname>

After we log in as a normal user, the CLI tools recognize pretty quickly that this user has no projects and no applications running. The CLI tools give us some helpful clues as to what we should do next: create a new project. Let's create a new project called myproject:

$ oc new-project myproject
Now using project "myproject" on server "https://54.175.0.44:8443".

You can add applications to this project with the 'new-app' command. For example, try:

    oc new-app centos/ruby-22-centos7~https://github.com/openshift/ruby-ex.git

to build a new example application in Ruby.

After creating the new project the CLI tools again give us some helpful text showing us how to get started with a new application on the platform. It is telling us to try out the ruby application with source code at github.com/openshift/ruby-ex.git and build it on top of the Source-to-Image (or S2I) image known as centos/ruby-22-centos7. Might as well give it a spin:

$ oc new-app centos/ruby-22-centos7~https://github.com/openshift/ruby-ex.git
--> Found Docker image ecd5025 (10 hours old) from Docker Hub for "centos/ruby-22-centos7"

    Ruby 2.2 
    -------- 
    Platform for building and running Ruby 2.2 applications

    Tags: builder, ruby, ruby22

    * An image stream will be created as "ruby-22-centos7:latest" that will track the source image
    * A source build using source code from https://github.com/openshift/ruby-ex.git will be created
      * The resulting image will be pushed to image stream "ruby-ex:latest"
      * Every time "ruby-22-centos7:latest" changes a new build will be triggered
    * This image will be deployed in deployment config "ruby-ex"
    * Port 8080/tcp will be load balanced by service "ruby-ex"
      * Other containers can access this service through the hostname "ruby-ex"

--> Creating resources with label app=ruby-ex ...
    imagestream "ruby-22-centos7" created
    imagestream "ruby-ex" created
    buildconfig "ruby-ex" created
    deploymentconfig "ruby-ex" created
    service "ruby-ex" created
--> Success
    Build scheduled, use 'oc logs -f bc/ruby-ex' to track its progress.
    Run 'oc status' to view your app.

Let's take a moment to digest that. A new image stream was created to track the upstream ruby-22-centos7:latest image. A ruby-ex buildconfig was created that will perform an S2I build that will bake the source code into the image from the ruby-22-centos7 image stream. The resulting image will be the source for another image stream known as ruby-ex. A deploymentconfig was created to deploy the application into pods once the build is done. Finally, a ruby-ex service was created so the application can be load balanced and discoverable.

After a short time, we check the status of the application:

$ oc status 
In project myproject on server https://54.175.0.44:8443

svc/ruby-ex - 172.30.213.94:8080
  dc/ruby-ex deploys istag/ruby-ex:latest <-
    bc/ruby-ex source builds https://github.com/openshift/ruby-ex.git on istag/ruby-22-centos7:latest 
      build #1 running for 26 seconds
    deployment #1 waiting on image or update

1 warning identified, use 'oc status -v' to see details.

NOTE: The warning referred to in the output is a warning about there being no healthcheck defined for this service. You can view the text of this warning by running oc status -v.

We can see here that there is a svc (service) that is associated with a dc (deploymentconfig) that is associated with a bc (buildconfig) that has a build that has been running for 26 seconds. The deployment is waiting for the build to finish before attempting to run.

After some more time:

$ oc status 
In project myproject on server https://54.175.0.44:8443

svc/ruby-ex - 172.30.213.94:8080
  dc/ruby-ex deploys istag/ruby-ex:latest <-
    bc/ruby-ex source builds https://github.com/openshift/ruby-ex.git on istag/ruby-22-centos7:latest 
    deployment #1 running for 6 seconds

1 warning identified, use 'oc status -v' to see details.

The build is now done and the deployment is running.

And after more time:

$ oc status 
In project myproject on server https://54.175.0.44:8443

svc/ruby-ex - 172.30.213.94:8080
  dc/ruby-ex deploys istag/ruby-ex:latest <-
    bc/ruby-ex source builds https://github.com/openshift/ruby-ex.git on istag/ruby-22-centos7:latest 
    deployment #1 deployed about a minute ago - 1 pod

1 warning identified, use 'oc status -v' to see details.

We have an app! What are the running pods in this project?:

$ oc get pods
NAME              READY     STATUS      RESTARTS   AGE
ruby-ex-1-build   0/1       Completed   0          13m
ruby-ex-1-mo3lb   1/1       Running     0          11m

The build has Completed and the ruby-ex-1-mo3lb pod is Running. The only thing we have left to do is expose the service so that it can be accessed via the router from the outside world:

$ oc expose svc/ruby-ex
route "ruby-ex" exposed
$ oc get route/ruby-ex
NAME      HOST/PORT                                 PATH      SERVICES   PORT       TERMINATION
ruby-ex   ruby-ex-myproject.54.204.208.138.xip.io             ruby-ex    8080-tcp   

With the route exposed we should now be able to access the application on ruby-ex-myproject.54.204.208.138.xip.io. Before we do that we'll log in to the openshift console as the user user and view the running pods in project myproject:

image

And pointing the browser to ruby-ex-myproject.54.204.208.138.xip.io we see:

image

Woot!

Conclusion

We have explored the basic OpenShift Origin cluster that we set up in part 1 of this two part blog series. We viewed the infrastructure docker registry and router components, as well as discussed the router components and how they are set up. We also ran through an example application that was suggested to us by the command line tools and were able to define that application, monitor its progress, and eventually access it from our web browser. Hopefully this blog gives the reader an idea or two about how they can get started with setting up and using an Origin cluster on Fedora 25 Atomic Host.

Enjoy!
Dusty

Installing an OpenShift Origin Cluster on Fedora 25 Atomic Host: Part 1

Cross posted with this Project Atomic Blog post

Introduction

Openshift Origin is the upstream project that builds on top of the Kubernetes platform and feeds into the OpenShift Container Platform product that is available from Red Hat today. Origin is a great way to get started with Kubernetes, and what better place to run a container orchestration layer than on top of Fedora Atomic Host?

We recently released Fedora 25, along with the first biweekly release of Fedora 25 Atomic Host. This blog post will show you the basics for getting a production installation of Origin running on Fedora 25 Atomic Host using the OpenShift Ansible Installer. The OpenShift Ansible installer will allow you to install a production-worthy OpenShift cluster. If instead you'd like to just try out OpenShift on a single node instead, you can set up OpenShift with the oc cluster up command, which we will detail in a later blog post.

This first post will cover just the installation. In a later blog post we'll take the system we just installed for a spin and make sure everything is working as expected.

Environment

We've tried to make this setup as generic as possible. In this case we will be targeting three generic servers that are running Fedora 25 Atomic Host. As is common with cloud environments these servers each have an "internal" private address that can't be accessed from the internet, and a public NATed address that can be accessed from the outside. Here is the identifying information for the three servers:

+-------------+----------------+--------------+
|     Role    |   Public IPv4  | Private IPv4 |
+=============+================+==============+
| master,etcd | 54.175.0.44    | 10.0.173.101 |
+-------------+----------------+--------------+
|    worker   | 52.91.115.81   | 10.0.156.20  |
+-------------+----------------+--------------+
|    worker   | 54.204.208.138 | 10.0.251.101 |
+-------------+----------------+--------------+

NOTE In a real production setup we would want mutiple master nodes and multiple etcd nodes closer to what is shown in the installation docs.

As you can see from the table we've marked one of the nodes as the master and the other two as what we're calling worker nodes. The master node will run the api server, scheduler, and controller manager. We'll also run etcd on it. Since we want to make sure we don't starve the node running etcd, we'll mark the master node as unschedulable so that application containers don't get scheduled to run on it.

The other two nodes, the worker nodes, will have the proxy and the kubelet running on them; this is where the containers (inside of pods) will get scheduled to run. We'll also tell the installer to run a registry and an HAProxy router on the two worker nodes so that we can perform builds as well as access our services from the outside world via HAProxy.

The Installer

Openshift Origin uses Ansible to manage the installation of different nodes in a cluster. The code for this is aggregated in the OpenShift Ansible Installer on GitHub. Additionally, to run the installer we'll need to install Ansible on our workstation or laptop.

NOTE At this time Ansible 2.2 or greater is REQUIRED.

We already have Ansible 2.2 installed so we can skip to cloning the repo:

$ git clone https://github.com/openshift/openshift-ansible.git &>/dev/null
$ cd openshift-ansible/
$ git checkout 734b9ae199bd585d24c5131f3403345fe88fe5e6
Previous HEAD position was 6d2a272... Merge pull request #2884 from sdodson/image-stream-sync
HEAD is now at 734b9ae... Merge pull request #2876 from dustymabe/dusty-fix-etcd-selinux

In order to document this better in this blog post we are specifically checking out commit 734b9ae199bd585d24c5131f3403345fe88fe5e6 so that we can get reproducible results, since the Openshift Ansible project is fast-moving. These instructions will probably work on the latest master, but you may hit a bug, in which case you should open an issue.

Now that we have the installer we can create an inventory file called myinventory in the same directory as the git repo. This inventory file can be anywhere, but for this install we'll place it there.

Using the IP information from the table above we create the following inventory file:

# Create an OSEv3 group that contains the masters and nodes groups
[OSEv3:children]
masters
nodes
etcd

# Set variables common for all OSEv3 hosts
[OSEv3:vars]
ansible_user=fedora
ansible_become=true
deployment_type=origin
containerized=true
openshift_release=v1.3.1
openshift_router_selector='router=true'
openshift_registry_selector='registry=true'
openshift_master_default_subdomain=54.204.208.138.xip.io

# enable htpasswd auth
openshift_master_identity_providers=[{'name': 'htpasswd_auth', 'login': 'true', 'challenge': 'true', 'kind': 'HTPasswdPasswordIdentityProvider', 'filename': '/etc/origin/master/htpasswd'}]
openshift_master_htpasswd_users={'admin': '$apr1$zgSjCrLt$1KSuj66CggeWSv.D.BXOA1', 'user': '$apr1$.gw8w9i1$ln9bfTRiD6OwuNTG5LvW50'}

# host group for masters
[masters]
54.175.0.44 openshift_public_hostname=54.175.0.44 openshift_hostname=10.0.173.101

# host group for etcd, should run on a node that is not schedulable
[etcd]
54.175.0.44

# host group for worker nodes, we list master node here so that
# openshift-sdn gets installed. We mark the master node as not
# schedulable.
[nodes]
54.175.0.44    openshift_hostname=10.0.173.101 openshift_schedulable=false
52.91.115.81   openshift_hostname=10.0.156.20  openshift_node_labels="{'router':'true','registry':'true'}"
54.204.208.138 openshift_hostname=10.0.251.101 openshift_node_labels="{'router':'true','registry':'true'}"

Well that is quite a bit to digest, isn't it? Don't worry, we'll break down this file in detail.

Details of the Inventory File

OK, so how did we create this inventory file? We started with the docs and copied one of the examples from there. This type of install we are doing is called a BYO (Bring Your Own) install because we are bringing our own servers and not having the installer contact a cloud provider to bring up the infrastructure for us. For reference there is also a much more detailed BYO inventory file you can look study.

So let's break down our inventory file. First we have the OSEv3 group and list the hosts in the masters, nodes, and etcd groups as children of that group:

# Create an OSEv3 group that contains the masters and nodes groups
[OSEv3:children]
masters
nodes
etcd

Then we set a bunch of variables for that group:

# Set variables common for all OSEv3 hosts
[OSEv3:vars]
ansible_user=fedora
ansible_become=true
deployment_type=origin
containerized=true
openshift_release=v1.3.1
openshift_router_selector='router=true'
openshift_registry_selector='registry=true'
openshift_master_default_subdomain=54.204.208.138.xip.io

Let's run through each of them:

  • ansible_user=fedora - fedora is the user that you use to connect to Fedora 25 Atomic Host.
  • ansible_become=true - We want the installer to sudo when running commands.
  • deployment_type=origin - Run OpenShift Origin.
  • containerized=true - Run Origin from containers.
  • openshift_release=v1.3.1 - The version of Origin to run.
  • openshift_router_selector='router=true' - Set it so that any nodes that have this label applied to them will run a router by default.
  • openshift_registry_selector='registry=true' - Set it so that any nodes that have this label applied to them will run a registry by default.
  • openshift_master_default_subdomain=54.204.208.138.xip.io - This setting is used to tell OpenShift what subdomain to apply to routes that are created when exposing services to the outside world.

Whew ... quite a bit to run through there! Most of them are relatively self-explanatory but the openshift_master_default_subdomain might need a little more explanation. Basically, the value of this needs to be a Wildcard DNS Record so that any domain can be prefixed onto the front of the record and it will still resolve to the same IP address. We have decided to use a free service called xipiio so that we don't have to set up wildcard DNS just for this example.

So for our example, a domain like app1.54.204.208.138.xip.io will resolve to IP address 54.204.208.138. A domain like app2.54.204.208.138.xip.io will also resolve to that same address. These requests will come in to node 54.204.208.138, which is one of our worker nodes where a router (HAProxy) is running. HAProxy will route the traffic based on the domain used (app1 vs app2, etc) to the appropriate service within OpenShift.

OK, next up in our inventory file we have some auth settings:

# enable htpasswd auth
openshift_master_identity_providers=[{'name': 'htpasswd_auth', 'login': 'true', 'challenge': 'true', 'kind': 'HTPasswdPasswordIdentityProvider', 'filename': '/etc/origin/master/htpasswd'}]
openshift_master_htpasswd_users={'admin': '$apr1$zgSjCrLt$1KSuj66CggeWSv.D.BXOA1', 'user': '$apr1$.gw8w9i1$ln9bfTRiD6OwuNTG5LvW50'}

You can use a multitude of authentication providers with OpenShift. The above statements say that we want to use htpasswd for authentication and we want to create two users. The password for the admin user is OriginAdmin, while the password for the user user is OriginUser. We generated these passwords by running htpasswd on the command line like so:

$ htpasswd -bc /dev/stdout admin OriginAdmin
Adding password for admin user
admin:$apr1$zgSjCrLt$1KSuj66CggeWSv.D.BXOA1
$ htpasswd -bc /dev/stdout user OriginUser
Adding password for user user
user:$apr1$.gw8w9i1$ln9bfTRiD6OwuNTG5LvW50

OK, now on to the host groups. First up, our master nodes:

# host group for masters
[masters]
54.175.0.44 openshift_public_hostname=54.175.0.44 openshift_hostname=10.0.173.101

We have used 54.175.0.44 as the hostname and also set openshift_public_hostname to this same value so that certificates will use that hostname rather than a detected hostname. We're also setting the openshift_hostname=10.0.173.101 because there is a bug where the golang resolver can't resolve *.ec2.internal addresses. This is also documented as an issue against Origin. Once this bug is resolved, you won't have to set openshift_hostname.

Next up we have the etcd host group. We're simply re-using the master node for a single etcd node. In a production deployment, we'd have several:

# host group for etcd, should run on a node that is not schedulable
[etcd]
54.175.0.44

Finally, we have our worker nodes:

# host group for worker nodes, we list master node here so that
# openshift-sdn gets installed. We mark the master node as not
# schedulable.
[nodes]
54.175.0.44    openshift_hostname=10.0.173.101 openshift_schedulable=false
52.91.115.81   openshift_hostname=10.0.156.20  openshift_node_labels="{'router':'true','registry':'true'}"
54.204.208.138 openshift_hostname=10.0.251.101 openshift_node_labels="{'router':'true','registry':'true'}"

We include the master node in this group so that the openshift-sdn will get installed and run there. However, we do set the master node as openshift_schedulable=false because it is running etcd. The last two nodes are our worker nodes and we have also added the router=true and registry=true node labels to them so that the registry and the router will run on them.

Executing the Installer

Now that we have the installer code and the inventory file named myinventory in the same directory, let's see if we can ping our hosts and check their state:

$ ansible -i myinventory nodes -a '/usr/bin/rpm-ostree status'
54.175.0.44 | SUCCESS | rc=0 >>
State: idle
Deployments:
● fedora-atomic:fedora-atomic/25/x86_64/docker-host
       Version: 25.42 (2016-11-16 10:26:30)
        Commit: c91f4c671a6a1f6770a0f186398f256abf40b2a91562bb2880285df4f574cde4
        OSName: fedora-atomic

54.204.208.138 | SUCCESS | rc=0 >>
State: idle
Deployments:
● fedora-atomic:fedora-atomic/25/x86_64/docker-host
       Version: 25.42 (2016-11-16 10:26:30)
        Commit: c91f4c671a6a1f6770a0f186398f256abf40b2a91562bb2880285df4f574cde4
        OSName: fedora-atomic

52.91.115.81 | SUCCESS | rc=0 >>
State: idle
Deployments:
● fedora-atomic:fedora-atomic/25/x86_64/docker-host
       Version: 25.42 (2016-11-16 10:26:30)
        Commit: c91f4c671a6a1f6770a0f186398f256abf40b2a91562bb2880285df4f574cde4
        OSName: fedora-atomic

Looks like they are up and all at the same state. The next step is to unleash the installer. Before we do, we should note that Fedora has moved to python3 by default. While Atomic Host still has python2 installed for legacy package support not all of the modules needed by the installer are supported in python2 on Atomic Host. Thus, we'll forge ahead and use python3 as the interpreter for ansible by specifying -e 'ansible_python_interpreter=/usr/bin/python3' on the command line:

$ ansible-playbook -i myinventory playbooks/byo/config.yml -e 'ansible_python_interpreter=/usr/bin/python3'
Using /etc/ansible/ansible.cfg as config file
....
....
PLAY RECAP *********************************************************************
52.91.115.81               : ok=162  changed=49   unreachable=0    failed=0   
54.175.0.44                : ok=540  changed=150  unreachable=0    failed=0   
54.204.208.138             : ok=159  changed=49   unreachable=0    failed=0   
localhost                  : ok=15   changed=9    unreachable=0    failed=0

We snipped pretty much all of the output. You can download the log file in its entirety from here.

So now the installer has run, and our systems should be up and running. There is only one more thing we have to do before we can take this system for a spin.

We created two users user and admin. Currently there is no way to have the installer associate one of these users with the cluster admin role in OpenShift (we opened a request for that). We must run a command to associate the admin user we created with cluster admin role for the cluster. The command is oadm policy add-cluster-role-to-user cluster-admin admin.

We'll go ahead and run that command now on the master node via ansible:

$ ansible -i myinventory masters -a '/usr/local/bin/oadm policy add-cluster-role-to-user cluster-admin admin'
54.175.0.44 | SUCCESS | rc=0 >>

And now we are ready to log in as either the admin or user users using oc login https://54.175.0.44:8443 from the command line or visiting the web frontend at https://54.175.0.44:8443.

NOTE To install the oc CLI tool follow these instructions.

To Be Continued

In this blog we brought up an OpenShift Origin cluster on three servers that were running Fedora 25 Atomic Host. We reviewed the inventory file in detail to explain exactly what options were used and why. In a future blog post we'll take the system for a spin, inspect some of the running system that was generated from the installer, and spin up an application that will run on and be hosted by the Origin cluster.

If you run into issues following these installation instructions, please report them in one of the following places:

Cheers!
Dusty

Kompose Up for OpenShift and Kubernetes

Cross posted with this Red Hat Developer Blog post

Introduction

Kompose is a tool to convert from higher level abstractions of application definitions into more detailed Kubernetes artifacts. These artifacts can then be used to bring up the application in a Kubernetes cluster. What higher level application abstraction should kompose use?

One of the most popular application definition formats for developers is the docker-compose.yml format for use with docker-compose that communicates with the docker daemon to bring up the application. Since this format has gained some traction we decided to make it the initial focus of Kompose to support converting this format to Kubernetes. So, where you would choose docker-compose to bring up the application in docker, you can use kompose to bring up the same application in Kubernetes, if that is your preferred platform.

How Did We Get Here?

At Red Hat, we had initially started on a project similar to Kompose, called Henge. We soon found Kompose and realized we had a lot of overlap in our goals so we decided to jump on board with the folks at Skippbox and Google who were already working on it.

TL;DR We have been working hard with the Kompose and Kubernetes communities. Kompose is now a part of the Kuberetes Incubator and we also have added support in Kompose for getting up and running into your target environment in one command:

$ kompose up 

In this blog I'll run you through a simple application example and use kompose up to bring up the application on Kuberenetes and OpenShift.

Getting an Environment

It is now easier than ever to get up and running with Kubernetes and Openshift. If you want hosted you can spin up clusters in many cloud environments including Google Container Engine and OpenShift Online (with the developer preview). If you want a local experience for trying out Kubernetes/OpenShift on your laptop, there is the RHEL based CDK, (and the ADB for upstream components), oc cluster up, minikube, and the list goes on!

Any way you look at it, there are many options for trying out Kubernetes and OpenShift these days. For this blog I'll choose to run on OpenShift Online, but the steps should work on any Openshift or Kubernetes environment.

Once I had logged in to the openshift console at api.preview.openshift.com I was able to grab a token by visiting https://api.preview.openshift.com/oauth/token/request and clicking Request another token. It then will show you the oc command you can run to log your local machine into openshift online.

I'll log in below and create a new project for this example blog:

$ oc login --token=xxxx-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx --server=https://api.preview.openshift.com
Logged into "https://api.preview.openshift.com:443" as "dustymabe" using the token provided.

You don't have any projects. You can try to create a new project, by running

    $ oc new-project <projectname>

$ oc new-project blogpost
Now using project "blogpost" on server "https://api.preview.openshift.com:443".

You can add applications to this project with the 'new-app' command. For example, try:

    $ oc new-app centos/ruby-22-centos7~https://github.com/openshift/ruby-hello-world.git

to build a new hello-world application in Ruby.
$

Example Application

Now that I have an environment to run my app in I need to give it an app to run! I took the example mlbparks application that we have been using for openshift for some time and converted the template to a more simplified definition of the application using the docker-compose.yml format:

$ cat docker-compose.yml
version: "2"
services:
  mongodb:
    image: centos/mongodb-26-centos7
    ports:
      - '27017'
    volumes:
      - /var/lib/mongodb/data
    environment:
      MONGODB_USER: user
      MONGODB_PASSWORD: mypass
      MONGODB_DATABASE: mydb
      MONGODB_ADMIN_PASSWORD: myrootpass
  mlbparks:
    image: dustymabe/mlbparks
    ports:
      - '8080'
    environment:
      MONGODB_USER: user
      MONGODB_PASSWORD: mypass
      MONGODB_DATABASE: mydb
      MONGODB_ADMIN_PASSWORD: myrootpass

Basically we have the mongodb service and then the mlbparks service which is backed by the dustymabe/mlbparks image. I simply generated this image from the openshift3mlbparks source code using s2i with the following command:

$ s2i build https://github.com/gshipley/openshift3mlbparks openshift/wildfly-100-centos7 dustymabe/mlbparks 

Now that we have our compose yaml file we can use kompose to bring it up. I am using kompose version v0.1.2 here:

$ kompose --version
kompose version 0.1.2 (92ea047)
$ kompose --provider openshift up
We are going to create OpenShift DeploymentConfigs, Services and PersistentVolumeClaims for your Dockerized application. 
If you need different kind of resources, use the 'kompose convert' and 'oc create -f' commands instead. 

INFO[0000] Successfully created Service: mlbparks       
INFO[0000] Successfully created Service: mongodb        
INFO[0000] Successfully created DeploymentConfig: mlbparks 
INFO[0000] Successfully created ImageStream: mlbparks   
INFO[0000] Successfully created DeploymentConfig: mongodb 
INFO[0000] Successfully created ImageStream: mongodb    
INFO[0000] Successfully created PersistentVolumeClaim: mongodb-claim0 

Your application has been deployed to OpenShift. You can run 'oc get dc,svc,is,pvc' for details.

Ok what happened here... We created an mlbparks Service, DeploymentConfig and ImageStream as well as a mongodb Service, DeploymentConfig, and ImageStream. We also created a PersistentVolumeClaim named mongodb-claim0 for the /var/lib/mongodb/data.

Note: If you don't have Persistent Volumes the application will never come up because the claim will never get satisfied. If you want to deploy somewhere without Persistent Volumes then add --emptyvols to your command like kompose --provider openshift up --emptyvols.

So let's see what is going on in OpenShift by querying from the CLI:

$ oc get dc,svc,is,pvc
NAME             REVISION                               REPLICAS       TRIGGERED BY
mlbparks         1                                      1              config,image(mlbparks:latest)
mongodb          1                                      1              config,image(mongodb:latest)
NAME             CLUSTER-IP                             EXTERNAL-IP    PORT(S)     AGE
mlbparks         172.30.67.72                           <none>         8080/TCP    4m
mongodb          172.30.111.51                          <none>         27017/TCP   4m
NAME             DOCKER REPO                            TAGS           UPDATED
mlbparks         172.30.47.227:5000/blogpost/mlbparks   latest         4 minutes ago
mongodb          172.30.47.227:5000/blogpost/mongodb    latest         4 minutes ago
NAME             STATUS                                 VOLUME         CAPACITY   ACCESSMODES   AGE
mongodb-claim0   Bound                                  pv-aws-adbb5   100Mi      RWO           4m

and the web console looks like:

image

One final thing we have to do is set it up so that we can connect to the service (i.e. the service is exposed to the outside world). On OpenShift, we need to expose a route. This will be done for us automatically in the future (follow along at #140), but for now the following command will suffice:

$ oc expose svc/mlbparks
route "mlbparks" exposed
$ oc get route mlbparks 
NAME       HOST/PORT                                          PATH      SERVICE         TERMINATION   LABELS
mlbparks   mlbparks-blogpost.44fs.preview.openshiftapps.com             mlbparks:8080                 service=mlbparks

For me this means I can now access the mlbparks application by pointing my web browser to mlbparks-blogpost.44fs.preview.openshiftapps.com.

Let's try it out:

image

Success!
Dusty

Atomic Host Red Hat Summit Lab

Red Hat Summit was a blast this year. I participated in several Hands On Labs to help the community learn about the new tools that are available in the ecosystem. For one of the labs I wrote up a section on Atomic Host, but more specifically on rpm-ostree. I have copied a portion of the lab here as well as added example text to the code blocks.

Lab Intro

Atomic Host is a minimalistic operating system that is designed to contain a very small subset of tools that are needed for running container based applications. A few of it's features are shown below:

  • It is Lightweight
    • a small base means less potential issues.
  • Provides Atomic Upgrades and Rollbacks
    • upgrades/rollbacks are staged and take effect on reboot
  • Static and Dynamic
    • software/binaries in /usr and other similar directories are read-only
      • this guarantees no changes have been made to the software
    • configuration and temporary directories are read/write
      • you can still make important configuration changes and have them propagate forward

We will explore some of these features as we illustrate a bit of the lifecycle of managing a RHEL Atomic Host.

Hello rpm-ostree World

In an rpm-ostree world you can't install new software on the system or even touch most of the software that exists. Go ahead and try:

-bash-4.2# echo 'Crazy Talk' > /usr/bin/docker
-bash: /usr/bin/docker: Read-only file system

What we can do is configure the existing software on the system using the provided mechanisms for configuration. We can illustrate this by writing to motd and then logging in to see the message:

-bash-4.2# echo 'Lab 1 is fun' > /etc/motd
-bash-4.2# ssh root@localhost
Last login: Fri Jun  5 02:26:59 2015 from localhost
Lab 1 is fun
-bash-4.2# exit
logout
Connection to localhost closed.

Even though we can't install new software, your Atomic Host operating system isn't just a black box. The rpm command is there and we can run queries just the same as if we were on a traditional system. This is quite useful because we can use the tools we are familiar with to investigate the system. Try out a few rpm queries on the Atomic Host:

-bash-4.2# rpm -q kernel
kernel-3.10.0-229.4.2.el7.x86_64
-bash-4.2# rpm -qf /usr/bin/vi
vim-minimal-7.4.160-1.el7.x86_64
-bash-4.2# rpm -q --changelog util-linux | wc -l
1832

Another nice thing about atomic, or rather the underlying ostree software is that it is like git for your OS. At any point in time you can see what has changed between what was delivered in the tree vs. what is on the system. That means for those few directories that are read/write, you can easily view what changes have been made to them.

Let's take a look at the existing differences between what we have and what was delivered in the tree:

-bash-4.2# ostree admin config-diff | head -n 5
M    adjtime
M    motd
M    group
M    hosts
M    gshadow

You can see right in the middle the the motd file we just modified.

As a final step before we do an upgrade let's run a container and verify all is working:

-bash-4.2# docker run -d -p 80:80 --name=test
repo.atomic.lab:5000/apache
e18a5f7d54c8dbe0d352e2c2854af16d27f166d11b95bc37a3b4267cfcd39cd6
-bash-4.2# curl http://localhost
Apache
-bash-4.2# docker rm -f test
test

Performing an Upgrade

Ok, now that we have took a little tour, let's actually perform an upgrade in which we move from one version of the tree to a newer version. First, let's check the current status of the system:

-bash-4.2# atomic host status
  TIMESTAMP (UTC)         VERSION   ID             OSNAME               REFSPEC
* 2015-05-30 04:10:40               d306dcf255     rhel-atomic-host     lab:labtree
  2015-05-07 19:00:48     7.1.2     203dd666d3     rhel-atomic-host     rhel-atomic-host-ostree:rhel-...

Note that the * indicates which tree is currently booted. The ID is a short commit ID for that commit in the tree. The REFSPEC for the latest tree specifies the remote we are using (lab) and the ref that we are tracking (labtree). Quite a lot of information!

A fun fact is that the atomic host command is just a frontend for the rpm-ostree utility. It has some of the functionality of the rpm-ostree utility that is suitable for most daily use. Let's use rpm-ostree now to check the status:

-bash-4.2# rpm-ostree status
  TIMESTAMP (UTC)         VERSION   ID             OSNAME               REFSPEC
* 2015-05-30 04:10:40               d306dcf255     rhel-atomic-host     lab:labtree
  2015-05-07 19:00:48     7.1.2     203dd666d3     rhel-atomic-host     rhel-atomic-host-ostree:rhel-...

The next step is to actually move to a new tree. For the purposes of this lab, and to illustrate Atomic's usefulness, we are actually going to upgrade to a tree that has some bad software in it. If we were to run an atomic host upgrade command then it would actually take us to the newest commit in the repo. In this case we want to go to an intermediate commit (a bad one) so we are going to run a special command to get us there:

-bash-4.2# rpm-ostree rebase lab:badtree

26 metadata, 37 content objects fetched; 101802 KiB transferred in 7 seconds
Copying /etc changes: 26 modified, 8 removed, 70 added
Transaction complete; bootconfig swap: yes deployment count change: 0
Freed objects: 180.1 MB
Deleting ref 'lab:labtree'
Changed:
  etcd-2.0.11-2.el7.x86_64
  kubernetes-0.17.1-1.el7.x86_64
Removed:
  setools-console-3.3.7-46.el7.x86_64

What we did there was rebase to another ref (badtree), but we kept with the same remote (lab).

So we have rebased to a new tree but we aren't yet using that tree. During upgrade the new environment is staged for the next boot, but not yet being used. This allows the upgrade to be atomic. Before we reboot we can check the status. You will see the new tree as well as the old tree listed. The * still should be next to the old tree since that is the tree that is currently booted and running:

-bash-4.2# atomic host status
  TIMESTAMP (UTC)         ID             OSNAME               REFSPEC
  2015-05-30 04:39:22     146b72d9d7     rhel-atomic-host     lab:badtree
* 2015-05-30 04:10:40     d306dcf255     rhel-atomic-host     lab:labtree

After checking the status reboot the machine in order to boot into the new tree.

Rolling Back

So why would you ever need to roll back? It's a perfect world and nothing ever breaks right? No! Sometimes problems arise and it is always nice to have an undo button to fix it. In the case of Atomic, there is atomic host rollback. Do we need to use it now? Let's see if everything is OK on the system:

-bash-4.2# atomic host status
  TIMESTAMP (UTC)         ID             OSNAME               REFSPEC
* 2015-05-30 04:39:22     146b72d9d7     rhel-atomic-host     lab:badtree
  2015-05-30 04:10:40     d306dcf255     rhel-atomic-host     lab:labtree
-bash-4.2#
-bash-4.2# docker run -d -p 80:80 --name=test repo.atomic.lab:5000/apache
ERROR
-bash-4.2# curl http://localhost
curl: (7) Failed connect to localhost:80; Connection refused
-bash-4.2# systemctl --failed | head -n 3
UNIT           LOAD   ACTIVE SUB    DESCRIPTION
docker.service loaded failed failed Docker Application Container Engine

Did anything fail? Of course it did. So let's press the eject button and get ourselves back to safety:

-bash-4.2# atomic host rollback
Moving 'd306dcf255b370e5702206d064f2ca2e24d1ebf648924d52a2e00229d5b08365.0' to be first deployment
Transaction complete; bootconfig swap: yes deployment count change: 0
Changed:
  etcd-2.0.9-2.el7.x86_64
  kubernetes-0.15.0-0.4.git0ea87e4.el7.x86_64
Added:
  setools-console-3.3.7-46.el7.x86_64
Sucessfully reset deployment order; run "systemctl reboot" to start a reboot
-bash-4.2# reboot

Now, let's check to see if we are back to a good state:

-bash-4.2# atomic host status
  TIMESTAMP (UTC)         ID             OSNAME               REFSPEC
* 2015-05-30 04:10:40     d306dcf255     rhel-atomic-host     lab:labtree
  2015-05-30 04:39:22     146b72d9d7     rhel-atomic-host     lab:badtree
-bash-4.2# docker run -d -p 80:80 --name=test repo.atomic.lab:5000/apache
a28a5f80bc2d1da9d405199f88951a62a7c4c125484d30fbb6eb2c4c032ef7f3
-bash-4.2# curl http://localhost
Apache
-bash-4.2# docker rm -f test
test

All dandy!

Final Upgrade

So since the badtree has been released the developers fixed the bug and have put out a new tree that is fixed. Now we can upgrade to the newest tree. As part of this upgrade let's explore some of the rpm-ostree features.

First, create a file in /etc/ and show that ostree knows that it has been created and differs from the tree that was delivered:

-bash-4.2# echo "Before Upgrade d306dcf255" > /etc/before-upgrade.txt
-bash-4.2# ostree admin config-diff | grep before-upgrade
A    before-upgrade.txt

Now we can do the upgrade:

-bash-4.2# atomic host upgrade --reboot
Updating from: lab:labtree

48 metadata, 54 content objects fetched; 109056 KiB transferred in 9 seconds
Copying /etc changes: 26 modified, 8 removed, 74 added
Transaction complete; bootconfig swap: yes deployment count change: 0

After the upgrade let's actually run a few commands to see the actual difference is (in terms of rpms) between the two trees:

-bash-4.2# atomic host status
  TIMESTAMP (UTC)         ID             OSNAME               REFSPEC
* 2015-05-30 05:12:55     ec89f90273     rhel-atomic-host     lab:labtree
  2015-05-30 04:10:40     d306dcf255     rhel-atomic-host     lab:labtree
-bash-4.2# rpm-ostree db diff -F diff d306dcf255 ec89f90273
ostree diff commit old: d306dcf255 (d306dcf255b370e5702206d064f2ca2e24d1ebf648924d52a2e00229d5b08365)
ostree diff commit new: ec89f90273 (ec89f902734e70b4e8fbe5000e87dd944a3c95ffdb04ef92f364e5aaab049813)
!atomic-0-0.22.git5b2fa8d.el7.x86_64
=atomic-0-0.26.gitcc9aed4.el7.x86_64
!docker-1.6.0-11.el7.x86_64
=docker-1.6.0-15.el7.x86_64
!docker-python-1.0.0-35.el7.x86_64
=docker-python-1.0.0-39.el7.x86_64
!docker-selinux-1.6.0-11.el7.x86_64
=docker-selinux-1.6.0-15.el7.x86_64
!docker-storage-setup-0.0.4-2.el7.noarch
=docker-storage-setup-0.5-2.el7.x86_64
!etcd-2.0.9-2.el7.x86_64
=etcd-2.0.11-2.el7.x86_64
!kubernetes-0.15.0-0.4.git0ea87e4.el7.x86_64
=kubernetes-0.17.1-4.el7.x86_64
+kubernetes-master-0.17.1-4.el7.x86_64
+kubernetes-node-0.17.1-4.el7.x86_64
!python-websocket-client-0.14.1-78.el7.noarch
=python-websocket-client-0.14.1-82.el7.noarch
-setools-console-3.3.7-46.el7.x86_64

This shows added, removed, changed rpms between the two trees.

Now remember that file we created before the upgrade? Is it still there? Let's check and also create a new file that represents the after upgrade state:

-bash-4.2# cat /etc/before-upgrade.txt
Before Upgrade d306dcf255
-bash-4.2# echo "After Upgrade ec89f90273" > /etc/after-upgrade.txt
-bash-4.2# cat /etc/after-upgrade.txt
After Upgrade ec89f90273

Now which of the files do you think will exist after a rollback? Only you can find out!:

-bash-4.2# rpm-ostree rollback --reboot
Moving 'd306dcf255b370e5702206d064f2ca2e24d1ebf648924d52a2e00229d5b08365.0' to be first deployment
Transaction complete; bootconfig swap: yes deployment count change: 0

After rollback:

-bash-4.2# atomic host status
  TIMESTAMP (UTC)         ID             OSNAME               REFSPEC
* 2015-05-30 04:10:40     d306dcf255     rhel-atomic-host     lab:labtree
  2015-05-30 05:12:55     ec89f90273     rhel-atomic-host     lab:labtree
-bash-4.2# ls -l /etc/*.txt
-rw-r--r--. 1 root root 26 Jun  5 03:35 /etc/before-upgrade.txt

Fin!

Now you know quite a bit about upgrading, rolling back, and querying information from your Atomic Host. Have fun exploring!

Dusty

Fedora 22 Updates-Testing Atomic Tree

It has generally been difficult to test new updates for the rpm-ostree or ostree packages for Atomic Host. This is because in the past you had to build your own tree in order to test them. Now, however, Fedora has starting building a tree based off the updates-testing yum repositories. This means that you can easily test updates by simply running Fedora Atomic Host and rebasing to the fedora-atomic/f22/x86_64/testing/docker-host ref:

# rpm-ostree rebase fedora-atomic:fedora-atomic/f22/x86_64/testing/docker-host
# reboot

After reboot you are now (hopefully) booted into the tree with updates baked in. You can do your tests and report your results back upstream. If you ever want to go back to following the stable branch then you can do that by running:

# rpm-ostree rebase fedora-atomic:fedora-atomic/f22/x86_64/docker-host
# reboot

Testing updates this way can apply to any of the packages within Atomic Host. Since Atomic Host has a small footprint the package you want to test might not be included, but if it is then this is a great way to test things out.

Dusty

F22 Cloud/Atomic Test Day May 7th!

Hey everyone! Fedora 22 is on the cusp of being released and the Fedora Cloud Working Group has elected to organize a test day for May 7th in order to work out some bugs before shipping it off to the rest of the world.

With a new release comes some new features and tools. We are working on Vagrant images as well as a testing tool called Tunir. Joe Brockmeier has a nice writeup about Vagrant and Kushal Das maintains some docs on Tunir.

On the test day we will be testing both the Cloud Base Image and the Fedora Atomic Host cloud image. The landing pages where we are organizing instructions and information are here (for Cloud Base) and here (for Atomic). If you're available to test on the test day (or any other time) please go there and fill out your name and test results.

Happy Testing!

Dusty

Crisis Averted.. I'm using Atomic Host

This blog has been running on Docker on Fedora 21 Atomic Host since early January. Occasionally I log in and run rpm-ostree upgrade followed by a subsequent reboot (usually after I inspect a few things). Today I happened to do just that and what did I come up with?? A bunch of 404s. Digging through some of the logs for the systemd unit file I use to start my wordpress container I found this:

systemd[1]: wordpress-server.service: main process exited, code=exited, status=1/FAILURE
docker[2321]: time="2015-01-31T19:09:24-05:00" level="fatal" msg="Error response from daemon: Cannot start container 51a2b8c45bbee564a61bcbffaee5bc78357de97cdd38918418026c26ae40fb09: write /sys/fs/cgroup/memory/system.slice/docker-51a2b8c45bbee564a61bcbffaee5bc78357de97cdd38918418026c26ae40fb09.scope/memory.memsw.limit_in_bytes: invalid argument"

Hmmm.. So that means I have updated to the latest atomic and docker doesn't work?? What am I to do?

Well, the nice thing about atomic host is that in moments like these you can easily go back to the state you were before you upgraded. A quick rpm-ostree rollback and my blog was back up and running in minutes.

Whew! Crisis averted.. But now what? Well the nice thing about atomic host is that I can easily go to another (non-production) system and test out exactly the same scenario as the upgrade that I performed in production. Some quick googling led me to this github issue which looks like it has to do with setting memory limits when you start a container using later versions of systemd.

Let's test out that theory by recreating this failure.

Recreating the Failure

To recreate I decided to start with the Fedora 21 atomic cloud image that was released in December. Here is what I have:

-bash-4.3# ostree admin status
* fedora-atomic ba7ee9475c462c9265517ab1e5fb548524c01a71709539bbe744e5fdccf6288b.0
    origin refspec: fedora-atomic:fedora-atomic/f21/x86_64/docker-host
-bash-4.3#
-bash-4.3# rpm-ostree status
  TIMESTAMP (UTC)         ID             OSNAME            REFSPEC
* 2014-12-03 01:30:09     ba7ee9475c     fedora-atomic     fedora-atomic:fedora-atomic/f21/x86_64/docker-host
-bash-4.3#
-bash-4.3# rpm -q docker-io systemd
docker-io-1.3.2-2.fc21.x86_64
systemd-216-12.fc21.x86_64
-bash-4.3#
-bash-4.3# docker run --rm --memory 500M busybox echo "I'm Alive"
Unable to find image 'busybox' locally
Pulling repository busybox
4986bf8c1536: Download complete
511136ea3c5a: Download complete
df7546f9f060: Download complete
ea13149945cb: Download complete
Status: Downloaded newer image for busybox:latest
I'm Alive

So the system is up and running and able to run a container with the --memory option set. Now lets upgrade to the same commit that I did when I saw the failure earlier and reboot:

-bash-4.3# ostree pull fedora-atomic 153f577dc4b039e53abebd7c13de6dfafe0fb64b4fdc2f5382bdf59214ba7acb

778 metadata, 4374 content objects fetched; 174535 KiB transferred in 156 seconds
-bash-4.3#
-bash-4.3# echo 153f577dc4b039e53abebd7c13de6dfafe0fb64b4fdc2f5382bdf59214ba7acb > /ostree/repo/refs/remotes/fedora-atomic/fedora-atomic/f21/x86_64/docker-host
-bash-4.3#
-bash-4.3# ostree admin deploy fedora-atomic:fedora-atomic/f21/x86_64/docker-host
Copying /etc changes: 26 modified, 4 removed, 36 added
Transaction complete; bootconfig swap: yes deployment count change: 1
-bash-4.3#
-bash-4.3# ostree admin status
  fedora-atomic 153f577dc4b039e53abebd7c13de6dfafe0fb64b4fdc2f5382bdf59214ba7acb.0
    origin refspec: fedora-atomic:fedora-atomic/f21/x86_64/docker-host
* fedora-atomic ba7ee9475c462c9265517ab1e5fb548524c01a71709539bbe744e5fdccf6288b.0
    origin refspec: fedora-atomic:fedora-atomic/f21/x86_64/docker-host
-bash-4.3#
-bash-4.3# rpm-ostree status
  TIMESTAMP (UTC)         ID             OSNAME            REFSPEC
  2015-01-31 21:08:35     153f577dc4     fedora-atomic     fedora-atomic:fedora-atomic/f21/x86_64/docker-host
* 2014-12-03 01:30:09     ba7ee9475c     fedora-atomic     fedora-atomic:fedora-atomic/f21/x86_64/docker-host
-bash-4.3# reboot

Note that I had to manually update the ref to point to the commit I downloaded in order to get this to work. I'm not sure why this is but it wouldn't work otherwise.

Ok now I had a system using the same tree that I was when I saw the failure. Let's check to see if it still happens:

-bash-4.3# rpm-ostree status
  TIMESTAMP (UTC)         ID             OSNAME            REFSPEC
* 2015-01-31 21:08:35     153f577dc4     fedora-atomic     fedora-atomic:fedora-atomic/f21/x86_64/docker-host
  2014-12-03 01:30:09     ba7ee9475c     fedora-atomic     fedora-atomic:fedora-atomic/f21/x86_64/docker-host
-bash-4.3#
-bash-4.3# rpm -q docker-io systemd
docker-io-1.4.1-5.fc21.x86_64
systemd-216-17.fc21.x86_64
-bash-4.3#
-bash-4.3# docker run --rm --memory 500M busybox echo "I'm Alive"
FATA[0003] Error response from daemon: Cannot start container d79629bfddc7833497b612e2b6d4cc2542ce9a8c2253d39ace4434bbd385185b: write /sys/fs/cgroup/memory/system.slice/docker-d79629bfddc7833497b612e2b6d4cc2542ce9a8c2253d39ace4434bbd385185b.scope/memory.memsw.limit_in_bytes: invalid argument

Yep! Looks like it consistently happens. This is good because this is a recreator that can now be used by anyone to verify the problem on their own. For completeness I'll go ahead and rollback the system to show that the problem goes away when back in the old state:

-bash-4.3# rpm-ostree rollback
Moving 'ba7ee9475c462c9265517ab1e5fb548524c01a71709539bbe744e5fdccf6288b.0' to be first deployment
Transaction complete; bootconfig swap: yes deployment count change: 0
Changed:
  NetworkManager-1:0.9.10.0-13.git20140704.fc21.x86_64
  NetworkManager-glib-1:0.9.10.0-13.git20140704.fc21.x86_64
  ...
  ...
Removed:
  flannel-0.2.0-1.fc21.x86_64
Sucessfully reset deployment order; run "systemctl reboot" to start a reboot
-bash-4.3# reboot

And the final test:

-bash-4.3# rpm-ostree status
  TIMESTAMP (UTC)         ID             OSNAME            REFSPEC
* 2014-12-03 01:30:09     ba7ee9475c     fedora-atomic     fedora-atomic:fedora-atomic/f21/x86_64/docker-host
  2015-01-31 21:08:35     153f577dc4     fedora-atomic     fedora-atomic:fedora-atomic/f21/x86_64/docker-host
-bash-4.3# docker run --rm --memory 500M busybox echo "I'm Alive"
I'm Alive

Bliss! And you can thank Atomic Host for that.

Dusty

F21 Atomic Test Day && Test steps for Atomic Host

Test Day on Thursday 11/20

The F21 test day for atomic is this Thursday, November 20th. If anyone can participate please do drop into #atomic on freenode as it will be great to have more people involved in helping build/test this new technology.

In anticipation of the test day I have put together some test notes for other people to follow in hopes that it will help smooth things along.

Booting with cloud-init

First step is to start an atomic host using any method/cloud provider you like. For me I decided to use openstack since I have Juno running on F21 here in my apartment. I used this user-data for the atomic host:

#cloud-config password: passw0rd chpasswd: { expire: False } ssh_pwauth: True runcmd: - [ sh, -c, 'echo -e "ROOT_SIZE=4GnDATA_SIZE=10G" > /etc/sysconfig/docker-storage-setup']

Note that the build of atomic I used for this testing resides here

Verifying docker-storage-setup

docker-storage-setup is a service that can be used to configure the storage configuration for docker in different ways on instance bringup. Notice in the user-data above that I decided to set config variables for docker-storage-setup. They basically mean that I want to resize my atomicos/root LV to 4G and I want to create an atomicos/docker-data LV and make it 10G in size.

To verify the storage was set up successfully, log in (as the fedora user) and become root (usind sudo su -). Now you can check if docker-storage-setup worked by checking the logs as well as looking at the output from lsblk:

# journalctl -o cat --unit docker-storage-setup.service CHANGED: partition=2 start=411648 old: size=12171264 end=12582912 new: size=41531232,end=41942880 Physical volume "/dev/vda2" changed 1 physical volume(s) resized / 0 physical volume(s) not resized Size of logical volume atomicos/root changed from 1.95 GiB (500 extents) to 4.00 GiB (1024 extents). Logical volume root successfully resized Rounding up size to full physical extent 24.00 MiB Logical volume "docker-meta" created Logical volume "docker-data" created # # lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT vda 252:0 0 20G 0 disk ├─vda1 252:1 0 200M 0 part /boot └─vda2 252:2 0 19.8G 0 part ├─atomicos-root 253:0 0 4G 0 lvm /sysroot ├─atomicos-docker--meta 253:1 0 24M 0 lvm └─atomicos-docker--data 253:2 0 10G 0 lvm

Verifying Docker Lifecycle

To verify Docker runs fine on the atomic host we will perform a simple run of the busybox docker image. This will contact the docker hub, pull down the image, and run /bin/true:

# docker run -it --rm busybox true && echo "PASS" || echo "FAIL" Unable to find image 'busybox' locally Pulling repository busybox e72ac664f4f0: Download complete 511136ea3c5a: Download complete df7546f9f060: Download complete e433a6c5b276: Download complete PASS

After the Docker daemon has started the LVs that were created by docker-storage-setup will be used by device mapper as shown in the lsblk output below:

# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT vda 252:0 0 20G 0 disk ├─vda1 252:1 0 200M 0 part /boot └─vda2 252:2 0 19.8G 0 part ├─atomicos-root 253:0 0 4G 0 lvm /sysroot ├─atomicos-docker--meta 253:1 0 24M 0 lvm │ └─docker-253:0-6298462-pool 253:3 0 10G 0 dm │ └─docker-253:0-6298462-base 253:4 0 10G 0 dm └─atomicos-docker--data 253:2 0 10G 0 lvm └─docker-253:0-6298462-pool 253:3 0 10G 0 dm └─docker-253:0-6298462-base 253:4 0 10G 0 dm

Atomic Host: Upgrade

Now on to an atomic upgrade. First let's check what commit we are currently at and store a file in /etc/file1 to save it for us:

# rpm-ostree status TIMESTAMP (UTC) ID OSNAME REFSPEC * 2014-11-12 22:28:04 1877f1fa64 fedora-atomic-host fedora-atomic:fedora-atomic/f21/x86_64/docker-host # # ostree admin status * fedora-atomic-host 1877f1fa64be8bec8adcd43de6bd4b5c39849ec7842c07a6d4c2c2033651cd84.0 origin refspec: fedora-atomic:fedora-atomic/f21/x86_64/docker-host # # cat /ostree/repo/refs/heads/ostree/0/1/0 1877f1fa64be8bec8adcd43de6bd4b5c39849ec7842c07a6d4c2c2033651cd84 # # cat /ostree/repo/refs/heads/ostree/0/1/0 > /etc/file1

Now run an upgrade to the latest atomic compose:

# rpm-ostree upgrade Updating from: fedora-atomic:fedora-atomic/f21/x86_64/docker-host 14 metadata, 19 content objects fetched; 33027 KiB transferred in 16 seconds Copying /etc changes: 26 modified, 4 removed, 39 added Transaction complete; bootconfig swap: yes deployment count change: 1) Updates prepared for next boot; run "systemctl reboot" to start a reboot

And do a bit of poking around right before we reboot:

# rpm-ostree status TIMESTAMP (UTC) ID OSNAME REFSPEC 2014-11-13 10:52:06 18e02c4166 fedora-atomic-host fedora-atomic:fedora-atomic/f21/x86_64/docker-host * 2014-11-12 22:28:04 1877f1fa64 fedora-atomic-host fedora-atomic:fedora-atomic/f21/x86_64/docker-host # # ostree admin status fedora-atomic-host 18e02c41666ef5f426bc43d01c4ce1b7ffc0611e993876cf332600e2ad8aa7c0.0 origin refspec: fedora-atomic:fedora-atomic/f21/x86_64/docker-host * fedora-atomic-host 1877f1fa64be8bec8adcd43de6bd4b5c39849ec7842c07a6d4c2c2033651cd84.0 origin refspec: fedora-atomic:fedora-atomic/f21/x86_64/docker-host # # reboot

Note that the * in the above output indicates which tree is currently booted.

After reboot now the new tree should be booted. Let's check things out and make /etc/file2 with our new commit hash in it:

# rpm-ostree status TIMESTAMP (UTC) ID OSNAME REFSPEC * 2014-11-13 10:52:06 18e02c4166 fedora-atomic-host fedora-atomic:fedora-atomic/f21/x86_64/docker-host 2014-11-12 22:28:04 1877f1fa64 fedora-atomic-host fedora-atomic:fedora-atomic/f21/x86_64/docker-host # # ostree admin status * fedora-atomic-host 18e02c41666ef5f426bc43d01c4ce1b7ffc0611e993876cf332600e2ad8aa7c0.0 origin refspec: fedora-atomic:fedora-atomic/f21/x86_64/docker-host fedora-atomic-host 1877f1fa64be8bec8adcd43de6bd4b5c39849ec7842c07a6d4c2c2033651cd84.0 origin refspec: fedora-atomic:fedora-atomic/f21/x86_64/docker-host # # cat /ostree/repo/refs/heads/ostree/1/1/0 18e02c41666ef5f426bc43d01c4ce1b7ffc0611e993876cf332600e2ad8aa7c0 # # cat /ostree/repo/refs/heads/ostree/1/1/0 > /etc/file2

As one final item let's boot up a docker container to make sure things still work there:

# docker run -it --rm busybox true && echo "PASS" || echo "FAIL" PASS

Atomic Host: Rollback

Atomic host provides the ability to revert to the previous working tree if things go awry with the new tree. Lets revert our upgrade now and make sure things still work:

# rpm-ostree rollback Moving '1877f1fa64be8bec8adcd43de6bd4b5c39849ec7842c07a6d4c2c2033651cd84.0' to be first deployment Transaction complete; bootconfig swap: yes deployment count change: 0) Sucessfully reset deployment order; run "systemctl reboot" to start a reboot # # rpm-ostree status TIMESTAMP (UTC) ID OSNAME REFSPEC 2014-11-12 22:28:04 1877f1fa64 fedora-atomic-host fedora-atomic:fedora-atomic/f21/x86_64/docker-host * 2014-11-13 10:52:06 18e02c4166 fedora-atomic-host fedora-atomic:fedora-atomic/f21/x86_64/docker-host # # reboot

After reboot:

# rpm-ostree status TIMESTAMP (UTC) ID OSNAME REFSPEC * 2014-11-12 22:28:04 1877f1fa64 fedora-atomic-host fedora-atomic:fedora-atomic/f21/x86_64/docker-host 2014-11-13 10:52:06 18e02c4166 fedora-atomic-host fedora-atomic:fedora-atomic/f21/x86_64/docker-host # # cat /etc/file1 1877f1fa64be8bec8adcd43de6bd4b5c39849ec7842c07a6d4c2c2033651cd84 # cat /etc/file2 cat: /etc/file2: No such file or directory

Notice that /etc/file2 did not exist until after the upgrade so it did not persist during the rollback.

And the final item on the list is to make sure Docker still works:

# docker run -it --rm busybox true && echo "PASS" || echo "FAIL" PASS

Anddd Boom.. You have just put atomic through some paces.