Introduction

As Kubernetes and its RedHat version, Openshift, gain more popularity, we at Oteemo are tasked with solving some of the challenges with integrating third party tools with the platform. One of our most interesting and challenging assignments was integrating VMWare’s NSX-T networking software with Redhat’s Openshift platform. Our hope is that you walk away from this blog with some of the prerequisites that you can implement to successfully install Openshift with NSX-T integration.

NSX-T Overview

NSX is a software defined network (SDN) provided by VMWare. NSX-T is a specific offering of NSX that supports different virtualization platforms (e.g. KVM, Docker, OpenShift, etc.). This SDN can be integrated with RedHat’s OpenShift Container Platform (OCP). However, integration between these two platforms isn’t as easy as filling a few parameters in a host file. Therefore, it is highly recommended that one reads the official NSX-T integration with OpenShift documentation from VMware and RedHat respectively or the following blog may seem like a foreign language.[1][2]

As one can see from the official documentation, the integration process has many manual steps. These manual steps such as tagging the second vNic with the cluster name and VM name will eventually succumb to human error. Furthermore, troubleshooting this error isn’t trivial; speaking from experience, it takes going through all the installation steps meticulously to find the root source of the problem. These manual steps are tedious and often lead to frustration because something will go awry and seeing an OpenShift installation fail at the 40 minute mark is demoralizing.

Therefore, this blog will focus on automating the manual steps and build confidence that the OpenShift installation with NSX-T will run seamlessly and successfully. In fact, we at Oteemo, were able to spin-up OpenShift clusters within 35 minutes successfully and repeatedly. We hope to help you achieve this as well.

NSX-T Automation

There are a few prerequisites that have to be met on NSX-T, which determines a successful OpenShift installation.

These are the NSX-T prerequisites:[2]

  • A Tier 0 router
  • An Overlay Transport Zone
  • An IP Block for Pod Networking
  • An IP Pool for SNAT (i.e. pod external egress)
  • Tagging all of the logical ports on the second vNIC

This blog will focus on the automation of the following prerequisites: 3 & 5. However, it is important to state that there is symmetry between the approach used to automate 3 with the other prerequisites (e.g. 1, 2, and 4).

By implementing this automation, you and your team will save time spent on troubleshooting to swiftly getting an OpenShift cluster up with NSX-T integration.

Automating IP Block for Pod Networking

Automating the creation of the IP block also called a Pod CIDR block (but for clarity will refer to it as IP block) was done using Ansible; it can also be automated using Python, specifically, by implementing the request module, but there is more overhead using this approach. Therefore, we will stick to using Ansible to create the IP Block.

How it Works

In order to create the IP block using ansible, we need to create four files:
1. create_ipblock.yml– This is the ansible playbook that will call the role and the task located in ../roles/tasks

# create_ipblock.yml
---
- name: "Automate IP Block Creation"
  hosts: localhost
  connection: local
  gather_facts: no
  roles:
    - create_ipblock

2. ../roles/create_ipblock/tasks/main.yml: The main task that [1] calls. It gathers IP block information, checks if the IP block already exists, populates a JSON file [5] using the jinja2 template [4], and creates the IP block.

---
- name: Get IP Block Information
  uri:
    url: "https://{{ hostname }}/api/v1/pools/ip-block"
    method: GET
    return_content: yes
    client_cert: "{{ cert_path }}"
    client_key: "{{ key_path }}"
    force_basic_auth: yes
    validate_certs: no
  register: nsx_facts

- name: Don't create if IP Block already exists
  set_fact:
    create_ipblock: false
  when: ip_block_pods_name == "{{ item.display_name }}"
  with_items: "{{ nsx_facts.json.results }}"

- debug:
    var: create_ipblock

- name:
  block:
    - name: Create json from template
      template:
        src: ip_block.json.j2
        dest: /tmp/ip_block.json

    - name: Create IP Block
      uri:
        url: "https://{{ hostname }}/api/v1/pools/ip-blocks"
        method: POST
        return_content: yes
        client_cert: "{{ cert_path }}"
        client_key: "{{ key_path }}"
        force_basic_auth: yes
        validate_certs: no
        status_code: 201
        body: "{{ lookup('file', '/tmp/ip_block.json') }}"
        body_format: json
      register: nsx_block

    - name: Creation results
      debug:
        var: nsx_block
  when: create_ipblock | bool
3. ../answerfile.yml – The variables file that populates the jinja2 template below.
4. ipblock.json.j2 – The jinja2 template that helps populate a json file in order to be used as a body for a POST request.
POST https:///api/v1/pools/ip-blocks[3]
{
	"display_name": "{{ ip_block_pods_name }}",
	"description": "{{ ip_block_pods_desc }}".
	"tags": [
		{
			"{{ ip_block_pods_cluster_scope }}": "{{ ip_block_pods_cluster_name }}"
		},
	],
	"cidr": "{{ ip_block_pods_cidr }}"
}

5. ipblock.json – The JSON body that is populated from [4]

Make a Unique IP Block

In order to make a unique IP block under the IPAM setting in NSX-T, one must make changes to the answerfile.yml that contains the variables. The variables that need to be changed include:

  • ip_block_pods_name – The name of the IP block given under IPAM setting in NSX-T.
  • ip_block_pods_desc – A description to give to the CIDR block.
  • ip_block_pods_cluster_scope – This will always be ‘ncp/cluster’.
  • ip_block_pods_cluster_name – This will be the name of the cluster that integrates NSX-T.
  • ip_block_pods_cidr – The block in which the IPs are provisioned from for the OpenShift pods.

After making changes to one’s variables file, to create the IP block is as easy as running the create_ipblock.yml playbook (e.g. ansible-playbook create_ipblock.yml).

Automating NSX-T Tagging

The tagging of the logical ports that are associated to the second vNic is essential for a successful OpenShift installation with NSX-T integration. The tagging allows the NSX Container Plugin (NCP) to recognize which port is the parent VIF for all of the pods running in an OpenShift node.[2] Furthermore, it allows the NSX node agents to propagate the tags to the NCP which in turn make these tags prominent in NSX-T and its respective resource when new OpenShift resources are created.[2] In other words, when an OpenShift project is created, an associated tier 1 router will be created in NSX-T and it will have the tags, which were created when tagging the logical ports. This is integral to making NSX-T work properly with OpenShift.

Note: VMWare dropped the ‘T’ nomenclature in naming the NSX Container Plugin (NCP) and NSX node agents.

How it Works

In order to tag all the logical ports associated with the second vNic, we need to use Python (i.e. tag_vnic.py); the process to tag the ports is more complex such that it would be very difficult to do it with Ansible. The tags in question are two key/value pairs:

tags = [{'ncp/node_name': 'node_name'}, {'ncp/cluster': 'cluster_name'}]

As one can see, tags is a list of nested dictionaries. Therefore, tags will be a key in the json body of the PUT request in which the values are the contents of the tags. The keys 'ncp/node_name' and 'ncp/cluster' are mandatory keys while its values should be the name of the VM that the logical port of the second vNic belongs to and the name of the cluster respectively.

Note: This script was created by altering the nsx_cleanup.py script authored by Yasen Simeonov from VMware.[4] The link to his Github repo and Oteemo Github repo containing the tagging code will be linked below.[4]

The script, tag_vnic.py, imports 4 important modules.

  1. optparse – The library allows you to pass cli-like arguments when running the python script. This is specifically used to instantiate the NSXClient object. The reason for creating an object is to persist the session that the request module makes when interacting with the REST API of NSX-T.
  2. requests – The library allows the script to interact with the REST API of NSX-T.
  3. itertools – The library was used specifically to incorporate the zip function, which allows one to traverse two lists at the same time.
  4. json – The library is used to convert a dictionary to json in order to make a PUT request to NSX-T.

The script runs 4 methods:

  1. get_logical_ports_for_second_vnic(self) – Gets all the logical ports associated with the second vNic (i.e. from all VMs/clusters). It uses the id of the second vNic in order to make a GET request to get all logical ports of that vNic (we don’t want to accidentally tag the first vNic). This method returns all the logical ports of the second vNic.
  2. generate_node_names(self) – This method takes in the cluster name parameter (i.e. self.cluster) to dynamically create the node names because an inventory of specific VMs is needed in order to be tagged. Important: Following a naming schema for the VMs is very important. One needs to be able to determine from the name of the VM, which cluster it belongs to and whether the VM is a master, compute, or infra node. An example of a good naming scheme for the VMs would be -<m/c/i>-### (e.g. dev-m-001 would correspond to the first master of the dev cluster). Therefore, this method will easily be able to dynamically create the inventory of the VMs that need to be tagged without having to manually enter a list of VMs to the script. The method ultimately dynamically creates and returns a list of VMs associated to the cluster.
  3. get_lport_attachment_id(self, nodes) – Gets the external ids for all of the VMs and gets all of the logical port attachments. Then, filters the external ids of all of the VMs down to the external ids of the VMs outputted by [2] (i.e. the VMs that we want to tag). Next, it uses a selection sort algorithm to filter down all of the logical port attachments down to only the ones that correspond to the second vNic. Afterwards, it matches the filtered external ids to the corresponding logical port attachment of the second vNic by creating a dictionary key on the fly to store the lport_attachment_id of the second vNic to its corresponding VM. This method returns a list of dictionaries of VMs with its lport_attachment_id.
  4. tag_logical_ports_of_second_vnic(self, filter_nodes, json_ports) – Takes the output from [1] and [3] in order to do another selection sort where all of the logical ports will be filtered to match the inventory of the nodes that you provide; it makes sure that the lport_attachment_id key matches in both lists of resources. This results in retrieving the logical ports of the second vNic that is associated with the particular VM from [2]. Finally, the method will tag the ports of the VMs that you specified with the keys (i.e. 'ncp/node_name' and 'ncp/cluster') and its respective values.

To run the script:

python2.7 tag_vnic.py --nsx-cert= --key= --mgr-ip=<https:// --cluster=

Summary

In our engagement, we noticed that there is sparse documentation regarding the integration between NSX-T and OpenShift. For example, RedHat’s official documentation is verbatim word-for-word similar to VMWare’s documentation. Therefore, if one runs into troubleshooting issues, there isn’t much documentation that one can refer to for finding a solution. Furthermore, most of the troubleshooting solutions are locked behind RedHat’s forums, which requires a paid subscription account; most of these solutions seem to be for earlier versions of OpenShift and NSX-T — it might not be useful for later versions of the platforms. Finally, both of these platforms are being actively developed. Therefore, new compatibility issues arise. One such issue that Oteemo faced was that NSX-T uses two different APIs in their UI. The Simple UI uses the new declarative API and the Advanced Settings UI uses the current API. The Simple UI and its new, declarative API were released in NSX-T version 2.4.[5] This new UI was introduced to help users configure and create NSX-T resources easily with “bare minimum user input.”[5] The new, Simple UI uses the declarative API. The declarative API adheres to infrastructure as code; allowing the user to leverage automation frameworks such as Ansible or scripting languages such as Python.[5] As great as this is, VMware neglected to state in their NSX-T integration with OpenShift documentation that the NSX-T GUI uses both APIs dependent on whether the user uses the Simple UI or Advanced Settings UI. This led to many OpenShift installations with NSX-T integrations to fail because both APIs were being used on the same installation run. It wasn’t until receiving official VMware support and reading a forum post that we figured this out.[6] As for which API should one use for their installation, it depends. The new API is still being actively developed and during the time of this publication, there still aren’t official Ansible modules for NSX-T. However, there is an official VMware GitHub repo where they are actively developing this.[7] In our engagement, we leveraged the old API due to our client having already created existing NSX-T resources using the Advanced Settings UI. However, It would be best practice to leverage the new, declarative API because the old one will be phased out. Therefore, make sure to create any OpenShift cluster dependent NSX-T resources using the Simple UI, otherwise, the installation will fail. Furthermore, make sure that the Python script that automates these prerequisites uses the new API as well.

By following these steps, one will have the ability to automate the NSX-T prerequisites that is required for OpenShift and NSX-T integration. This will save a lot of time from the manual process of entering these items in the NSX-T web UI and prevent human error of mistyping a tag name. Have a great and seamless OpenShift install with NSX-T integration!

References

  1. Configuring NSX-T SDN. (n.d.). Retrieved from https://docs.openshift.com/container-platform/3.11/install_config/configuring_nsxtsdn.html
  2. Simeonov, Y. (2019, April 09). NSX-T Integration with Openshift: Network Virtualization: VMware. Retrieved from https://blogs.vmware.com/networkvirtualization/2019/02/nsx-t-integration-with-openshift.html/
  3. NSX-T Data Center Policy Manager API. (n.d.). Retrieved from https://code.vmware.com/apis/521/nsx-t-data-center-nsx-t-data-center-policy-manager-api
  4. Simeonov, Y. (2019, March 13). K8s-lab. Retrieved from https://github.com/yasensim/k8s-lab 
  5. Pisolkar, D., Nair, R., Mark G February, Devyani Pisolkar February, & Delano February. (2019, April 24). Introducing NSX-T 2.4 – A Landmark Release in the History of NSX: Network Virtualization: VMware. Retrieved from https://blogs.vmware.com/networkvirtualization/2019/02/introducing-nsx-t-2-4-a-landmark-release-in-the-history-of-nsx.html/
  6. VMTN. (n.d.). Retrieved from https://communities.vmware.com/thread/612695.
  7. Vmware. (2019, August 22). vmware/ansible-for-nsxt. Retrieved from https://github.com/vmware/ansible-for-nsxt.