Recently, I was working on my final-year thesis, which proposes an ecosystem for precision farming using IoT, environmental data, and embedded machine learning & Edge computing. The main idea is simple but powerful: create a low-cost platform for farmers where they can analyze their own fields, decide which crops to cultivate, and determine the right fertilizers — all without relying on expensive cloud infrastructure. Farmers can still send data to the cloud for long-term storage and deeper analysis, but the system is designed to work off-grid, remain sustainable, and minimize recurring costs.
The key to keeping costs down was processing data directly on the device. This meant integrating embedded machine learning so that the collected environmental data could be processed onboard. Farmers could get instant results without an internet connection. To demonstrate this concept, I needed to build a portable sensor module with onboard ML capabilities. That’s when my exploration of embedded machine learning began.
This is the idea, basically |
For the hardware, my primary choice was the ESP32. It’s IoT-friendly, with both BLE and Wi-Fi built in, and it packs a surprising amount of processing power thanks to its dual-core Tensilica Xtensa LX7 RISC SoC. With 4MB of flash and 520KB of SRAM, it’s more than capable for many ML inference tasks. And most importantly, It is cheap. You can get a generic ESP32 module at around 3 USD or 450 BDT. For this price range, this microcontroller SoC is packed with a lot of features. For an embedded device, that’s plenty of room to get started.
But then came the big question: how do I actually run a machine learning model on such a tiny device? At first, this felt overwhelming. Unlike on a laptop where you can just load a model into Python, microcontrollers have strict memory limits and can’t handle big frameworks. So I started exploring the available approaches to embedded ML.
The Ways Around Embedded Machine Learning & TinyML
My friend Navid collected the dataset and worked on the paper, so the hardware and programming responsibilities were mine. To implement a model using his dataset, I had to search around some ways. After some google bing and chatGPT, I was able to find the perfect way into embedded machine learning and Tiny ML.
So at this point you might get confused "what is the diff within TinyML and EmbeddedML?"
TinyML is basically targeted for ultra low power microcontrollers like Arduino or ESP32 and STM32. These microcontrollers uses power under 1W and also, have limited resources like SRAM and Flash memory. And EmbeddedML is machine learning for embedded systems like Raspberry Pi or Jetson Nano. They are a lot more powerful that microcontrollers.
Think of it like layers: Machine Learning ⟶ Embedded ML (for embedded devices) ⟶ TinyML (for ultra-low-power microcontrollers
Firstly, I was stuck with the TensorFlowLite for Microcontroller or TFLM Framework. That required building and training my model, converting the model into a tflite format file.TFLite format is basically the model itself, just the extra dependencies are stripped out, values are quantiezed to make it lighter. After that, the TFLite model have to converted in to C array, to embed the model into the ESP32 firmware. The ESP32 can't directly read tflite format file from SD Card. And will require a lot of memory. So all this process is necessery to make the ESP32 able to read the model data.
This process was fine in theory, but in practice I faced many problems and was really frustrated.. Then my friend Muntakim gave me some articles and blogs to read about Embedded Machine Learning in ESP32. Then my concept about TinyML got more clear. Here are the links to the blogs,
- TinyML — Random Forest (Classifier and Regressor) | by Thommaskevin | Medium
- TinyML — XGBoost (Classifier). From mathematical foundations to… | by Thommaskevin | Medium
- TinyML —K-Nearest Neighbors (KNN-Classifier) | by Thommaskevin | Medium
These three articles have enough informations to get started with TinyML. That's what I did. As my target was to take soil sensor measurement data and Classify the Fertilizer recommendation towards the user, I had already thought of what I had to do next.
And yes, I've uploaded all the codes in this blog to my github, you can check that out.
Preparing The Dataset & The Model
The Environment
pip install numpy pandas scikit-learn micromlgen
Here, each package is necessery for different part of our working process. numpy is used for mathmetical operations in python, like arrays, matrices, preprocessing inputs and doing vectorized operations. And then comes pandas, it is used for data manipulation and analysis. Our dataset is in an CSV format file. To read and process that, we are using pandas. scikit-learn is used for training our models. Scikit-learn contains ML Algorithms and other tools to split dataset, scaling data nad evaluating trained models and many more. And the last one is micromlgen, which converts our scikit-learn models into pure C code, so that they can run in microcontrollers.
Preparing My Dataset
feature1,feature2,...,featureN,labelvalue11,value12,...,value1N,class1value21,value22,...,value2N,class2
In my dataset, there are two types of data. Temperature, humidity, npk measurements and moisture, these are numeric type data. and other features, are indeed String type or catagorical data.
Writing The Script, preparing the model
# train_model.py import pandas as pd from sklearn.model_selection import train_test_split from sklearn.ensemble import RandomForestClassifier # Use Random Forest from sklearn.preprocessing import LabelEncoder from micromlgen import port # Load dataset df = pd.read_csv("Fertilizer_Prediction.csv") # Strip whitespaces from column names df.columns = df.columns.str.strip() # Encode categorical columns label_encoders = {} categorical_columns = ['Soil Type', 'Crop Type', 'Fertilizer Name'] for col in categorical_columns: le = LabelEncoder() df[col] = le.fit_transform(df[col]) label_encoders[col] = le # Prepare features and target X = df.drop(columns=['Fertilizer Name']).values y = df['Fertilizer Name'].values # Split dataset X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) # Train using Random Forest (better accuracy) model = RandomForestClassifier(n_estimators=10, max_depth=5, random_state=42) model.fit(X_train, y_train) # Evaluate accuracy print("Accuracy:", model.score(X_test, y_test)) # Try exporting using micromlgen (note: micromlgen only supports DecisionTreeClassifier) try: c_code = port(model) with open("fertilizer_model_micromlgen.h", "w") as f: f.write(c_code) except Exception as e: print("Export error:", e) print("Note: micromlgen supports only DecisionTreeClassifier. Use Decision Tree for microcontroller inference.") # Print encoded indexes print("\nIndex Mapping for Soil Type:") for idx, label in enumerate(label_encoders['Soil Type'].classes_): print(f"{idx}: {label}") print("\nIndex Mapping for Crop Type:") for idx, label in enumerate(label_encoders['Crop Type'].classes_): print(f"{idx}: {label}") print("\nIndex Mapping for Fertilizer Name:") for idx, label in enumerate(label_encoders['Fertilizer Name'].classes_): print(f"{idx}: {label}")
721 Lines of just decision logic, and this is just the header file for the model. |
Why print all the Indexes?
py main.py
Writing the Code for ESP32
#include <Arduino.h> #include "fertilizer_model_micromlgen_RandomForest.h" // Fertilizer index → name const char *fertilizerNames[] = { "10-26-26", "14-35-14", "17-17-17", "20-20", "28-28", "DAP", "Urea" }; // Soil Type name → index mapping (same as in training) const char *soilTypes[] = { "Black", // 0 "Clayey", // 1 "Loamy", // 2 "Red", // 3 "Sandy" // 4 }; int selectedSoilType = -1; bool soilTypeSet = false; void askSoilType() { Serial.println("Select Soil Type by index:"); for (int i = 0; i < sizeof(soilTypes) / sizeof(soilTypes[0]); i++) { Serial.print(i); Serial.print(" → "); Serial.println(soilTypes[i]); } Serial.print("Enter soil type index: "); } void setup() { Serial.begin(115200); delay(1000); askSoilType(); } String inputString = ""; int stage = 0; float N, P, K; void loop() { if (Serial.available()) { char c = Serial.read(); if (c == '\n' || c == '\r') { inputString.trim(); if (!soilTypeSet) { selectedSoilType = inputString.toInt(); if (selectedSoilType >= 0 && selectedSoilType < (sizeof(soilTypes) / sizeof(soilTypes[0]))) { soilTypeSet = true; Serial.print("Selected Soil Type: "); Serial.println(soilTypes[selectedSoilType]); Serial.println("Enter Nitrogen value (N): "); } else { Serial.println("Invalid index. Try again."); askSoilType(); } } else { switch (stage) { case 0: N = inputString.toFloat(); Serial.println("Enter Phosphorous value (P): "); stage++; break; case 1: P = inputString.toFloat(); Serial.println("Enter Potassium value (K): "); stage++; break; case 2: K = inputString.toFloat(); // All inputs are collected, now predict Serial.println("Predicting fertilizer..."); // Set default/sample values float temperature = 30; float humidity = 50; float moisture = 40; int cropType = 1; // Example crop (should match label-encoded index from training) float input[] = { temperature, humidity, moisture, (float)selectedSoilType, (float)cropType, N, K, P }; int prediction = model.predict(input); Serial.print("Predicted Fertilizer Index: "); Serial.println(prediction); if (prediction >= 0 && prediction < sizeof(fertilizerNames) / sizeof(fertilizerNames[0])) { Serial.print("Predicted Fertilizer Name: "); Serial.println(fertilizerNames[prediction]); } else { Serial.println("Prediction out of range."); } // Reset for next run stage = 0; soilTypeSet = false; Serial.println("\n--- Restarting ---"); askSoilType(); break; } } inputString = ""; // clear buffer` } else { inputString += c; } } }
Why it's saying model not defined or declared?
Eloquent::ML::Port::RandomForest model;
Select Soil Type by index: 0 → Black 1 → Clayey 2 → Loamy 3 → Red 4 → Sandy Enter soil type index: 2 Selected Soil Type: Loamy Enter Nitrogen value (N): 45 Enter Phosphorous value (P): 30 Enter Potassium value (K): 25 Predicting fertilizer... Predicted Fertilizer Index: 6 Predicted Fertilizer Name: Urea --- Restarting --- Select Soil Type by index: 0 → Black 1 → Clayey 2 → Loamy 3 → Red 4 → Sandy Enter soil type index: