Build An Paddy Disesase Object Detection with ONLY $13

9 min readOct 21, 2024

In the current era, AI has become a powerful tool that simplifies our work and makes analysis more efficient. However, many might think that developing AI requires a massive budget and access to supercomputers. From my own experience working with my lecturer, I’ve learned that it’s possible to build an affordable AI solution for object detection, costing as little as $13 per month. Here, I’ll share some key aspects of how we made it happen.

a. Introduction Dataset and Training Process

b. Explanation Concept of Rest-API

c. Engineering Perspective for implement it

A. Introduction Dataset and Training Process

For this study, I utilized a publicly available dataset from Roboflow called Paddy Dataset v2, which contains 1,704 images across 4 classes. You can download the dataset using the provided link. For the detection task, I employed Ultralytics, a company specializing in AI and computer vision, primarily known for their development of YOLOv5, a leading model in object detection. YOLO (You Only Look Once) is a family of models built for real-time object detection, image classification, and segmentation. Ultralytics has optimized the YOLO architecture to be faster, more accurate, and user-friendly.

The original YOLO was developed by Joseph Redmon using a custom framework called Darknet, a highly flexible research framework written in low-level languages. Over the years, this framework has produced a series of top-performing real-time object detectors in computer vision, including YOLOv2, YOLOv3, YOLOv4, all the way to YOLOv5, and even further with YOLOv6, YOLOv7, YOLOv8, YOLO-NAS, YOLO-World, YOLOv9, and YOLOv10.

In this mini-project, I used YOLOv8 with the specific model is YOLOv8n for object detection, and to optimize the training process for speed and efficiency, I ran the training on Google Colaboratory Pro+. To make the concept clearer, I’ve illustrated it in the image below.

Based on the image above, end of the training process is export the model to onxx format using ultralytics. As information the ONNX (Open Neural Network Exchange) format is a way to save and share machine learning models across different platforms and frameworks. It acts like a universal language for AI models, so once you train a model in one framework (like PyTorch or TensorFlow), you can easily export it in ONNX format and use it in another environment. The another reasons why onnx is because this model can run fasther than other format model in the cpu environment during inference process. There are some references to explain about the ONNX performance like below:

B. Explanation Concept of Rest-API

Based on the image above there are some tech stack i used to create this systems. First is FastAPI, then Celerywith Redis and at the end OpenCV ,Ultralytics, and Flower .

a. Fast-API is a modern Python web framework that allows you to rapidly develop APIs (Application Programming Interfaces). I chose this framework because it offers various tools and built-in functions that make scaling our systems much easier.

b. Celery is an open-source distributed task queue that enables you to manage and execute tasks asynchronously (in the background) and schedule them to run at specific times. It is often utilized in web applications to handle time-consuming tasks without interrupting the main application. This is especially important because resource limitations necessitate careful usage to prevent our VPS from becoming stuck or shutting down due to excessive computational demands.

c. Redis (Remote Dictionary Server) is an open-source, in-memory data structure store used as a database, cache, and message broker. It is known for its speed, flexibility, and support for a variety of data structures. I used it, because this database is simple, lightweight, and fast simple task like this mini-project.

d. Ultralytics is used for inference or predictions based on model onnx we obtain from training process.

e. OpenCV (Open Source Computer Vision Library) is a robust and popular library for computer vision and image processing tasks. After obtaining the prediction results from Ultralytics, we utilized OpenCV to add bounding boxes and labels to the video, and then saved it under a new name.

f. Flower is a web-based monitoring tool for Celery, a distributed task queue system used to run tasks asynchronously and schedule them to run at specific times. Flower provides real-time monitoring of Celery workers, task statuses, and overall system health. With this library we can easily know how the status our task id.

For VPS (Virtual Private Server) i buy it in the jagoanhosting with the name product is nebula with only $12.93.

C. Engineering Perspective for implement it

Create new environment you can create it using conda environment or just create it with venv (virtual environment). Make sure you’re already install some libraries like ultralytics, fastapi, venv, open-cv and reddis. I’ll give you my results from pip freeze

amqp==5.2.0
annotated-types==0.7.0
anyio==4.6.0
asttokens==2.4.1
billiard==4.2.1
celery==5.4.0
certifi==2024.8.30
charset-normalizer==3.4.0
click==8.1.7
click-didyoumean==0.3.1
click-plugins==1.1.1
click-repl==0.3.0
coloredlogs==15.0.1
contourpy==1.3.0
cycler==0.12.1
decorator==5.1.1
dnspython==2.7.0
email_validator==2.2.0
environs==11.0.0
executing==2.1.0
fastapi==0.115.2
fastapi-cli==0.0.5
fastapi_cors==0.0.6
filelock==3.16.1
flatbuffers==24.3.25
flower==2.0.1
fonttools==4.54.1
fsspec==2024.9.0
h11==0.14.0
httpcore==1.0.6
httptools==0.6.1
httpx==0.27.2
humanfriendly==10.0
humanize==4.11.0
idna==3.10
ipython==8.28.0
jedi==0.19.1
Jinja2==3.1.4
kiwisolver==1.4.7
kombu==5.4.2
lapx==0.5.11
markdown-it-py==3.0.0
MarkupSafe==3.0.1
marshmallow==3.22.0
matplotlib==3.9.2
matplotlib-inline==0.1.7
mdurl==0.1.2
mpmath==1.3.0
networkx==3.4.1
numpy==2.1.2
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==9.1.0.70
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-nccl-cu12==2.20.5
nvidia-nvjitlink-cu12==12.6.77
nvidia-nvtx-cu12==12.1.105
onnx==1.17.0
onnxruntime==1.19.2
opencv-python==4.10.0.84
packaging==24.1
pandas==2.2.3
parso==0.8.4
pexpect==4.9.0
pillow==10.4.0
prometheus_client==0.21.0
prompt_toolkit==3.0.48
protobuf==5.28.2
psutil==6.0.0
ptyprocess==0.7.0
pure_eval==0.2.3
py-cpuinfo==9.0.0
pydantic==2.9.2
pydantic_core==2.23.4
Pygments==2.18.0
pyparsing==3.1.4
python-dateutil==2.9.0.post0
python-dotenv==1.0.1
python-multipart==0.0.12
pytz==2024.2
PyYAML==6.0.2
redis==5.1.1
requests==2.32.3
rich==13.9.2
scipy==1.14.1
seaborn==0.13.2
setuptools==75.1.0
shellingham==1.5.4
six==1.16.0
sniffio==1.3.1
stack-data==0.6.3
starlette==0.39.2
sympy==1.13.3
torch==2.4.1+cpu
torchaudio==2.4.1+cpu
torchvision==0.19.1+cpu
tornado==6.4.1
tqdm==4.66.5
traitlets==5.14.3
triton==3.0.0
typer==0.12.5
typing_extensions==4.12.2
tzdata==2024.2
ultralytics==8.3.9
ultralytics-thop==2.0.9
urllib3==2.2.3
uvicorn==0.31.1
uvloop==0.20.0
vine==5.1.0
watchfiles==0.24.0
wcwidth==0.2.13
websockets==13.1

API Upload

The concept of endpoint upload, i’ll create it using fast api with the body is file especially with the extensions of video. Then, the file will be forward to the celery to obtain the queue task id. Task id need to saved by user for the next endpoint to check the status their uploaded video.

# api.py
from fastapi import FastAPI, File, UploadFile
import shutil
import os
from celery_config import celery_app, track_video
from fastapi.staticfiles import StaticFiles

app = FastAPI()

app.mount("/results", StaticFiles(directory="results"), name="results")

UPLOAD_DIR = "/Users/sadewawicak/Project/JagoPadi/ultralytics-object-detection-paddy/files"

@app.post("/upload/")
async def upload_file(file: UploadFile = File(...)):
    
    unique_filename = f"{uuid.uuid4()}_{file.filename}"
    file_location = os.path.join(UPLOAD_DIR, unique_filename)
    with open(file_location, "wb") as buffer:
        shutil.copyfileobj(file.file, buffer)
    
    # send it into celery for queue management
    task = track_video.delay(file_location, unique_filename)
    
    return {"token": task.id, "filename": unique_filename}

# celery_config.py
from celery import Celery
from time import sleep
import ultralytics
from ultralytics import YOLO

import cv2

# initialize connection
celery_app = Celery(
    'worker',
    broker='redis://localhost:6379/0',
    backend='redis://localhost:6379/0'
)

# process for object detection
@celery_app.task
def track_video(location: str, unique_filename: str):
    # Initialize the model (assuming you have a model variable)
    model = YOLO('best.onnx', task='detect')

    # Open the video source
    cap = cv2.VideoCapture(location)

    # Get video properties
    frame_width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
    frame_height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
    fps = int(cap.get(cv2.CAP_PROP_FPS))

    # Define the codec and create VideoWriter object
    out = cv2.VideoWriter(f'results/{unique_filename}.avi', cv2.VideoWriter_fourcc(*'XVID'), fps, (frame_width, frame_height))

    # Run inference with stream=True
    results = model(source=location, stream=True)  # generator of Results objects

    # Process the results and save the bounding boxes to the video
    for r in results:
        ret, frame = cap.read()
        if not ret:
            break
        
        boxes = r.boxes  # Boxes object for bbox outputs
        for box in boxes:
            x1, y1, x2, y2 = map(int, box.xyxy[0])
            class_id = int(box.cls[0])  # Class ID
            class_name = model.names[class_id]  # Get class name from model
            confidence = box.conf[0]  # Confidence score
            label = f'{class_name} {confidence:.2f}'  # Label with class name and confidence score
            cv2.rectangle(frame, (x1, y1), (x2, y2), (255, 0, 0), 2)  # Draw bounding box
            cv2.putText(frame, str(label), (x1, y1 - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.9, (255, 0, 0), 2)  # Draw class name
        
        out.write(frame)  # Write the frame with bounding boxes to the video

    # Release everything if job is finished
    cap.release()
    out.release()
    cv2.destroyAllWindows()
    return {"file_path": f'results/{unique_filename}.avi'}

There are several commands you can do to run that code above, for example.

#api.py
uvicorn api:app --reload --host 0.0.0.0

#celery_config.py
celery -A  celery_config.celery_app worker --loglevel=info --concurrency=1
celery -A  celery_config.celery_app flower --host 0.0.0.0 --loglevel=info

How if there are 10 users upload the file video what the VPS will be stuck or freeze ?, the answer is, actually NO. Because i set the concurrency to 1. That means, if there are 10 users upload the video. Celery will process it one by one, sort by received date and time. In the above image above is illustration how the first task is started to process and the other is pending or waiting until the previous task completed.

To illustrate the completed process, you can see on the image above which the column result is return value when task is completed or you can see in the image below with my sample illustration. 😁😁

2. API Result

In the example image labeled “Pending Process,” when users attempt to access this endpoint, they will receive a response indicating that the status is pending. On the other hand, in the “Completed” image, users will receive a response containing the download link for their video. The following sample code demonstrates how this process works.

@app.get("/result/{task_id}")
async def get_result(task_id: str):
    task_result = celery_app.AsyncResult(task_id)
    if task_result.state == 'PENDING':
        return {"task_id": task_id, "status": task_result.state, "result": None}
    elif task_result.state != 'FAILURE':
        return {"task_id": task_id, "status": task_result.state, "result": task_result.result}
    else:
        # something went wrong in the background job
        return {"task_id": task_id, "status": task_result.state, "result": str(task_result.info)}

Ex. Request and Response Result (Pending)

Ex. Request and Response Result (Success)

For additional information, how to run it in VPS?, I implement it with simple services nohup. nohup (no hang up) is a command in Unix-based operating systems, like Linux, that allows you to run a process or command in the background, even after you log out or disconnect from a terminal session. Normally, when you close a terminal, the processes running inside it are terminated, but nohup prevents that from happening.

nohup uvicorn api:app --reload --host 0.0.0.0 &
nohup celery -A celery_config.celery_app worker --loglevel=info --concurrency=1 &
nohup celery -A celery_config.celery_app flower --host 0.0.0.0 --loglevel=info &

In advanced, you can run fastapi, celery, and flower inside docker. That’s the better best practices my recommendations.

Where is the flower in list store?, That is store in the redis. For example you can see in the image below.

Build An Paddy Disesase Object Detection with ONLY $13

A. Introduction Dataset and Training Process

B. Explanation Concept of Rest-API

C. Engineering Perspective for implement it

Happy to explore! :)

Written by M Sadewa Wicaksana

No responses yet