Getting started
Installing Kubernetes
Kubernetes Infra Offload requires Kubernetes, Docker*, and containerd* to be installed. See Kubernetes, Docker, and containerd Installation for instructions. If these components are already installed on the machine, proceed to next step.
Set Up Target and Dependencies
Kubernetes Infra Offload supports two targets, viz. P4-DPDK and Intel IPU E2100. The Intel IPU E2100 target requires proper hardware setup and initialization. On both these platforms, Kubernetes Infra Offload software depends upon the daemon InfraP4d of the IPDK networking recipe to be runnning in the background. Once InfraP4d is running, Kubernetes can load its P4 pipeline and offload various functionalities on it (i.e. on the P4 data plane).
The instructions to setup the target and install infrap4d and its dependencies, are different for the two targets. See Target Setup for P4-DPDK for instructions on installation of SDE and InfraP4d on P4-DPDK target. See Target Setup for Intel IPU E2100 for host setup and compilation of P4-SDE and P4-CP on Intel IPU E2100 target.
Set Up P4 Kubernetes
On the Intel IPU, k8s-infra-offload can run in two different modes, details of which are present in all relevant sections where mode based configurations are needed. The modes are -
a. The split mode, where the inframanager runs on IPU ARM cores for rule offloads while the infraagent runs on host.
b. The host mode, where every component runs on the host and offload happens from host.
On DPDK, only the host mode is supported.
Following steps cover instructions on setting up P4-K8S in either modes, once mentioned dependencies are compiled and installed.
Install Go package by following instructions at https://go.dev/doc/install (Pick the right version for golang go compiler corresponding to K8s Recipe release version. This information can be found in file release-notes.rst. Information on the latest supported version is available in “Versions and third-parties” section below.)
Pull P4-K8s software from the GitHub repository:
git clone https://github.com/ipdk-io/k8s-infra-offload.git p4-k8s cd p4-k8s git checkout ipdk_v24.01
For building K8S recipe, follow the steps below.
Build K8s P4 artifacts
Notes: i) For E2100 target, get the K8s P4 artifacts and copy them into p4-k8s/k8s_dp/es2k/. This must be done before running below make commands. Ensure the following artifacts are present.
cd k8s_dp/es2k/ ls
tdi.json context.json k8s_dp.p4 k8s_dp.pb.bin p4Info.txt
For generating the artifacts for E2100, refer to the compiling-p4-programs section
ii) By default, Makefile is configured to build for E2100 target. To build for P4-DPDK target, use “tagname=dpdk” argument for both make targets below.
Build Kubernetes binaries:
make build
Generate the certificates required for the mTLS connection between infraagent, inframanager, and infrap4d:
make gen-certs
Note that the above script generates the default keys and certificates and uses cipher suites as specified in the
deploy/common-config.yaml
file. Refer to section inframanager config file update for any custom cipher suite, key, certificate change.Note that the above script generates the default keys and certificates and uses cipher suites as specified in the
deploy/common-config.yaml
file.For split mode, the openssl.cnf file under scripts/tls dir would require addition of the remote node IP address under
[server_alt_names]
section. inframanager in this mode runs on the remote ARM-ACC complex. This is required for mTLS between infraagent and inframanager to work. Look for sample “10.10.0.2” and replace it with the right IP Address.Run
make install
to install all config and other artifacts to relevant directoriesRun the
setup_infra.sh
script, which in addition to creating the specified number of virtual interfaces (TAP type on DPDK target and IDPF sub-functions on E2100), sets up the HugePages and starts infrap4d. The script supports infrastructure setup in two different modes.a. The split mode on E2100, where the inframanager runs on IPU ARM cores(remote end) while the infraagent runs on the host. In this mode, the communication channel between IPU ACC-ARM complex and host must pre-exist prior to execution of the script. This communication channel can be provisioned using node policy file on the IPU. The sample cdq node policy file has this communication channel pre-configured and channel will be functional if IPU is booted with this file. Please configure an IP address with a netmask of 255.255.0.0 on the remote ARM-ACC vport of this communication channel. This will be used later as an argument in setup_infra.sh script. For user convenience, certificates configuration file
openssl.cnf
is pre-configured with an example IP address of10.10.0.2
for the remote end. Incase a different IP address is configured, updatescripts/tls/openssl.cnf
and re-execute step 4.b. The host mode on both targets, where every component runs on the host(engineering preview).
For CDQ interfaces :
./scripts/setup_infra.sh -i <8|16|..> -m <host> [-r <remote IP>]
For SRIOV interfaces:
./scripts/setup_infra_sriov.sh -i <8|16|..> -m <host>
Where, the options: -i Num interfaces to configure for deployment -m Mode host for running inframanager on host -r IP address configured by the user on the ACC-ARM complex for connectivity to the Host. This is provisioned using Node Policy - comms channel ([5,0],[4,0]),([4,2],[0,3]). This is needed only for runnning in split mode.
Script will auto assign an IP addresss from the same subnet on the Host side vport for connectivity. The communication channel vport interface name is autodetected by the script for the above mentioned comms channel configuration in the cdq node policy.
Please also set following env variables for the deployment. These paths are needed to set the dependencies correctly. SDE_INSTALL - Default p4sde install directory P4CP_INSTALL - Default p4-cp install directory DEPEND_INSTALL - Default target dependencies directory K8S_RECIPE - Path to k8s recipe on the host
After running the above script, verify that infrap4d is running.
ps -ef | grep infrap4d
root 1254701 1 99 13:34 ? 00:13:10 /host/networking-recipe/install/sbin/infrap4d
On E2100 target, this script will also load the IDPF driver. Verify the presence of the PF:
devlink dev show
pci/0000:af:00.0
For the Intel IPU E2100, connect to IMC from host and run the following command on IMC:
devmem 0x202920C100 64 0x8yy
where yy is the 2nd octet of the MAC of the interface on host ending with ‘d5’ (when using CDQ) or ‘v1’ (when using SR-IOV). For example if the interface
ens801f0d5
has MAC00:11:00:05:03:14
, then it needs to bedevmem 0x202920C100 64 0x811
This command is currently required for the core-dns component of kubernetes to function. For CDQ, till ending with d3 are default sub-functions so first interface ending with d4 needs to be assigned to arp and d5 to host.
For SRIOV - First VF interface ending with v0 goes to arp and V1 goes to host
Run ARP-Proxy script, which creates a new namespace and assigns an interface from the pool of interfaces created in previous step. On E2100 target, user needs to explicitly configure the interface to be assigned using IFACE environment variable.
export IFACE=ens801f0d4
Make changes to the infraagent config file for interface and interface type.
For DPDK target, change the interfaceType in config.yaml file to “tap”.
The script finally runs the arp-proxy on that assigned interface, within the isolated namespace.
./scripts/arp_proxy.sh
Please note, any changes in config file need to be made as per section inframanager config file update before building the images in next step. Refer to it for updating ARP MAC address in the config file.
Make the docker images. This step builds the Kubernetes container images:
make docker-build
Push InfraManager and InfraAgent images into docker private repo either manually or through make command, using either of the following:
make docker-push
or
docker push localhost:5000/infraagent:latest
docker push localhost:5000/inframanager:latest
The docker images should now be listed in the local repository as below.
docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
localhost:5000/inframanager latest 7605ed47e042 5 minutes ago 22.1MB
<none> <none> 485d7bc6ec38 5 minutes ago 1.38GB
localhost:5000/infraagent latest 500075b89922 6 minutes ago 68.7MB
<none> <none> dc519d06de56 6 minutes ago 1.68GB
...
Pull images for use by Kubernetes Container Runtime Interface (CRI):
crictl pull localhost:5000/inframanager:latest crictl pull localhost:5000/infraagent:latest
infraagent config file update
The config file deploy/common-config.yaml
is used to inform the
infraagent which interface and interfacetype to use.
The interfaceType should be cdq
for E2100 and the the interface name is the
base name for PF for PCI device ID 1452.
For SRIOV interfaces, the type should be sriov
interfaceType : cdq
interface: ens801f0
mtls: true
insecure: false
For split mode, also configure the follwing.
managerAddr : <IP address of comms channel on ACC>
managerPort : 50002
Perform “make” after updates to deploy/common-config.yaml
to bring changes
into effect.
inframanager config file update
The config file deploy/common-config.yaml
is used to define the parameters
which the inframanager will use for the connection establishment with infrap4d
and for the interfaces created.
All fields have a default value in the file. Please verify if the values correspond to the desired values especially arpmac.
InfraManager section:
addr: The local address to which the inframanager will bind to as the
listening socket for infraagent. In host
mode, it can be the localhost.
addr: 127.0.0.1:50002
For split
mode, it needs to be the ACC comms channel IP. Example
addr:10.10.0.2:50002
For InterfaceType, it needs to be sriov
for SRIOV since it defaults to cdq
.
arpmac: The arpmac needs to be configured. This should be the
MAC of the interface the user wants to configure as the ARP-proxy gateway.
This is the address of the interface which is given to the arp-proxy
namespace using the scrips/arp_proxy.sh
script mentioned in
the Set Up P4 Kubernetes for ARP proxy gateway.
If user doesn’t wish to use these default keys, certificates, and cipher suites, then
modify the scripts/mev/tls/gen_certs.sh
script accordingly before running
make gen-certs
and modify the deploy/common-config.yaml
file with preferred
cipher suites. These changes need to be done prior to the creation of container
images in step 9 of the Set Up P4 Kubernetes section.
Perform “make” after updates to deploy/common-config.yaml
to bring changes
into effect.
Deploy P4 Kubernetes
Initialize and start the core Kubernetes components:
kubeadm init --pod-network-cidr=<pod-cidr> --service-cidr=<service-cidr>
Once the Kubernetes control plane initialization has completed successfully, then do either of the following:
As a non-root user:
mkdir -p $HOME/.kube cp -i /etc/kubernetes/admin.conf $HOME/.kube/config chown $(id -u):$(id -g) $HOME/.kube/config
Or as root user:
export KUBECONFIG=/etc/kubernetes/admin.conf
Install and setup Calico plugin
cd /usr/local/bin curl -L https://github.com/projectcalico/calico/releases/download/v3.24.1/calicoctl-linux-amd64 -o kubectl-calico chmod +x kubectl-calico
Remove taints from the node. For single node deployment, the node must be untainted to allow worker pods to share the node with control plane. The taint to remove is “control-plane” or “master” or both. These taints can be removed as shown:
kubectl taint node <node-name> node-role.kubernetes.io/control-plane- kubectl taint node <node-name> node-role.kubernetes.io/master-
Create Kubernetes secrets from the generated certificates. The infraagent and inframanager read the certificates from the secrets.
make tls-secrets
Start the deployments:
For split mode on the Intel IPU E2100, run the below on the host
make deploy-split make deploy-calico
For host mode, run the below instead
make deploy make deploy-calico
Check deployment using the following:
kubectl get pods -A -o wide
Pod-to-Pod Ping
To run a simple ping test from one pod to another, create two test pods. Note
that, the yaml file in the package is to create a single test pod; you can copy
and modify it to create pod with different name. For example, copy it as
test-pod2.yaml
and change the metadata name and container name to be
test-pod2
. The .yaml file for test-pod2 should look as below.
apiVersion: v1
kind: Pod
metadata:
name: test-pod2
spec:
containers:
- name: test-pod2
image: quay.io/quay/busybox:latest
...
Then, carry out the following steps.
Create both the test pods:
kubectl create -f example/test_pod.yaml kubectl create -f example/test_pod2.yaml
Check that the two test pods are ready and running:
kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES test-pod 1/1 Running 0 10m 10.244.0.6 ins21 <none> <none> test-pod2 1/1 Running 0 9m33s 10.244.0.7 ins21 <none> <none>
Use the IP address from above output or
ifconfig
to get the IP address assigned to one of the pods. Then, ping that address from the other pod:kubectl exec test-pod -- ifconfig eth0 kubectl exec test-pod2 -- ping 10.244.0.6
PING 10.244.0.6 (10.244.0.6): 56 data bytes 64 bytes from 10.244.0.6: seq=0 ttl=64 time=0.112 ms 64 bytes from 10.244.0.6: seq=1 ttl=64 time=0.098 ms 64 bytes from 10.244.0.6: seq=2 ttl=64 time=0.102 ms 64 bytes from 10.244.0.6: seq=3 ttl=64 time=0.112 ms ...
To delete above created test pods:
kubectl delete pod test-pod test-pod2
Service Deployment
Under example/service
directory, there are TCP and UDP examples. They contain scripts for running and cleaning,
and also a README each giving guidelines for the steps. After running the run scripts, the sample output command
should look like below
kubectl get svc -A -o wide
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
iperf-server ClusterIP 10.111.123.3 <none> 5201/TCP 6m56s app=iperf-server
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 15m <none>
kubectl get ep -A -o wide
NAMESPACE NAME ENDPOINTS AGE
default iperf-server 10.244.0.5:5201,10.244.0.6:5201 5h22m
default kubernetes 10.233.134.119:6443 5h35m
kube-system kube-dns 10.244.0.3:53,10.244.0.4:53,10.244.0.3:53 + 3 more... 5h35m
kubectl get pods -A -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
iperf-clients-8gkv7 1/1 Running 0 18m 10.244.0.9 ins21 <none> <none>
iperf-server-deployment-59bf4754f9-4hp4c 1/1 Running 0 18m 10.244.0.8 ins21 <none> <none>
...
Troubleshooting
Debugging
The Kubernetes Infrastructure Offload software provides logging capabilities. Check logs emitted to stdout and stderr using
"kubectl logs <pod> -n <namespace>"
.
FAQs
idpf crash observed leading to host reboot
Reason : The setup_infra.sh in scripts/es2k dir installs idpf driver and then proceeds with creation of sub-functions. Under some circumtances, the sleep configured in the script is not sufficient. Driver is still busy allocating resources and initializing the base interfaces when the first subfunction creation request comes in, leading to crash.
Solution : Increase the sleep time in the setup_infra.sh script after
"install_drivers"
function.“failed to get a CDQ interface for pod: no free resources left” error is seen on infraagent and remaining pods do not come up
Reason : The wrong cp_init.cfg file was used in the IMC and the correct number of host apf under num_max_vport in the cp_init file needs to be at least 50. Solution : Use the cdq uses cases cp_init.cfg file
CDQ interfaces not coming up
Reason : IDPF driver failed to load Solution : Verify using
dmesg
command that it is the case. Then perform amodprobe idpf
Failed to connect to inframanager seen on host when in
split
mode.Reason: Firewalld blocking it Solution: Disable firewall service on ACC. Might need to disable network-manager service on both host and ACC.
systemctl disable NetworkManager
Certs error while processing seen on inframanager when in
split
mode.Reason: Time might be out of sync. Solution: Ensure that the time is synced using the correct protocol.
Clean Up
Reset kubernetes which would stop and remove all pods. Then, remove all k8s runtime configurations and other files. Finally, stop container services. Short way to cleanup everything
./scripts/cleanup.sh
If only delete all started pods, service deployments, namespace and daemonsets
kubectl delete pod < >
kubectl delete deployment < >
sudo ip -all netns delete
make undeploy
make undeploy-calico
Reset Kubernetes and remove all configuration and runtime directories associated with Kubernetes.
kubeadm reset -f
rm -rf /etc/cni /etc/kubernetes
rm -rf /var/lib/etcd /var/lib/kubelet /var/lib/cni
rm -rf /var/run/kubernetes
rm -rf $HOME/.kube
Stop the local container registry and stop container services
docker container stop registry && docker container rm -v registry
systemctl stop containerd
Stop the ARP proxy and infrap4d processes running. This will also remove all the virtual interfaces that were created earlier.
pkill arp_proxy
pkill infrap4d
For ACC cleanup, only the below are needed
pkill infrap4d
pkill inframanager
Versions and Third-parties
Versions of Kubernetes, linux distros, docker and other third-party libraries tested with (calico, felix)
OS
Linux
Rocky Linux 9.2
RHEL 9.2
golang
go1.21.6
docker
docker version
Client: Docker Engine - Community
Version: 20.10.12
API version: 1.41
containerd
Tested on 1.6.x
ctr version
kubernetes
Versions tested and supported with
1.25.x
$ dnf list installed | grep kube
cri-tools.x86_64 1.25.2-0
kubeadm.x86_64 1.25.4-0
kubectl.x86_64 1.25.4-0
kubelet.x86_64 1.25.4-0
Calico
v3.24.1