BERT HuggingFace Model Deployment using Kubernetes [ Github Repo] - 03/07/2024

Transformer Model deployment using Kubernetes

Jul 03, 2024

Github Repo : https://github.com/vaibhawkhemka/ML-Umbrella/tree/main/MLops/Model_Deployment/Bert_Kubernetes_deployment

Motivation:

Model development is useless if you don’t deploy it to production which comes with lot of issues of scalability and portability.

I have deployed a basic BERT model from the huggingface transformer on Kubernetes with the help of docker which will give a feel how to deploy and manage pods on production.

Model Serving and deployment:

ML Pipeline:

Workflow:

Model server (using FastAPI, uvicorn) for BERT uncased model →

Containerize model and inference scripts to create docker image →

Kubernetes deployment for these model server (for scalability) → Testing

Components:

Model server

Used BERT uncased model from hugging face for prediction of next word [MASK]. Inference is done using transformer-cli which uses fastapi and uvicorn to serve the model endpoints
Server streaming:
Testing: (fastapi docs)
http://localhost:8888/docs/
{ "output": [ { "score": 0.21721847355365753, "token": 2204, "token_str": "good", "sequence": "today is a good day" }, { "score": 0.16623663902282715, "token": 2047, "token_str": "new", "sequence": "today is a new day" }, { "score": 0.07342924177646637, "token": 2307, "token_str": "great", "sequence": "today is a great day" }, { "score": 0.0656224861741066, "token": 2502, "token_str": "big", "sequence": "today is a big day" }, { "score": 0.03518620505928993, "token": 3376, "token_str": "beautiful", "sequence": "today is a beautiful day" } ]

Containerization

Created a docker image from huggingface GPU base image and pushed to dockerhub after testing.

Testing on docker container:

You can directly pull the image vaibhaw06/bert-kubernetes:latest

K8s deployment

Used minikube and kubectl commands to create a single pod container for serving the model by configuring deployment and service config

deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: bert-deployment
  labels:
    app: bertapp
spec:
  replicas: 1
  selector:
    matchLabels:
      app: bertapp
  template:
    metadata:
      labels:
        app: bertapp
    spec:
      containers:
      - name: bertapp
        image: vaibhaw06/bert-kubernetes
        ports:
        - containerPort: 8080
        
---
apiVersion: v1
kind: Service
metadata:
  name: bert-service
spec:
  type: NodePort
  selector:
    app: bertapp
  ports:
    - protocol: TCP
      port: 8080
      targetPort: 8080
      nodePort: 30100

Setting up minikube and running pods using kubectl and deployment.yaml

minikube start
kubectl apply -f deployment.yaml

Final Testing:

kubectl get all

It took around 15 mins to pull and create container pods.

kubectl image list

kubectl get svc

minikube service bert-service

After running the last command minikube service bert-service, you can verify the result deployment on web endpoint.

Find the Github Link: https://github.com/vaibhawkhemka/ML-Umbrella/tree/main/MLops/Model_Deployment/Bert_Kubernetes_deployment

If you have any questions, ping me on my LinkedIn: https://www.linkedin.com/in/vaibhaw-khemka-a92156176/

Follow ML Umbrella for more such detailed actionable projects.

Future Extension:

Scaling with pod replicas and load balancer -

Self healing

ML Umbrella