imageomics
/

x3d-BaboonLand

@@ -17,51 +17,56 @@ tags:
 - UAV
 - drone
 - video
-model_description: "Behavior recognition model for in situ drone videos of baboons, built using X3D model. It is trained on the BaboonLand mini-scene dataset, which is comprised of 20 hours of aerial video footage of baboons captured using a DJI Mavic 2S drone."
 ---
-# Model Card for X3D-KABR-Kinetics
-x3d-BaboonLand is a behavior recognition model for in situ drone videos of zbaboons,
-built using X3D model.
-It is trained on the [BaboonLand](https://huggingface.co/datasets/imageomics/BaboonLand) dataset.
-It includes both spatiotemporal (i.e., mini-scenes) and behavior annotations provided by an expert
-behavioral ecologist.
 ## Model Details
 ### Model Description
 - **Developed by:** Isla Duporge, Maksim Kholiavchenko, Roi Harel, Scott Wolf, Daniel Rubenstein, Meg Crofoot, Tanya Berger-Wolf, Stephen Lee, Julie Barreau, Jenna Kline, Michelle Ramirez, Charles Stewart
 - **Model type:** X3D-L
 - **License:** MIT
 - **Fine-tuned from model:** [X3D-L](https://github.com/facebookresearch/SlowFast/blob/main/configs/Kinetics/X3D_L.yaml)
-This model was developed for the benefit of the community as an open-source product, thus we request that any derivative products are also open-source.
 ### Model Sources
-- **Repository:** [Project Repo](https://github.com/Imageomics/kabr-tools)
-- **Paper:** [Paper Link](https://link.springer.com/article/10.1007/s11263-025-02493-5)
 - **Project Page:** [BaboonLand Project Page](https://baboonland.xyz)
 ## Uses
-Baboon behavior recognition form in situ drone videos.
 ### Out-of-Scope Use
-This model was trained to detect and classify behavior from drone videos of baboons in Kenya. It may not perform well on other species or settings.
 ## How to Get Started with the Model
-Please see the illustrative examples in the [kabr-tools docs](https://imageomics.github.io/kabr-tools/)
-for more information on how this model can be used.
 ## Training Details
-We include the configuration file ([config.yaml](https://huggingface.co/imageomics/x3d-BaboonLand/blob/main/config.yaml)) utilized by SlowFast for X3D model training.
 ### Training Data
@@ -69,17 +74,17 @@ This model was trained on the [BaboonLand](https://huggingface.co/datasets/image
 #### Training Hyperparameters
-The model was trained for 120 epochs, using a batch size of 5.
-We used the EQL loss function to address the long-tailed class distribution and SGD optimizer with a learning rate of 1e5.
-We used a sample rate of 16x5, and random weight initialization.
 ## Evaluation
-The dataset was evaluated on the X3D-L model utilizing the [SlowFast](https://github.com/facebookresearch/SlowFast) framework, specifically utilizing the [test_net script](https://github.com/facebookresearch/SlowFast/blob/main/tools/test_net.py).
 ### Testing Data
-We provide a train-test split of the mini-scenes from the [BaboonLand](https://huggingface.co/datasets/imageomics/BaboonLand) for evaluation purposes, with 75% for train and 25% for testing. No mini-scene was divided by the split.
 #### Metrics
@@ -87,17 +92,17 @@ We report Top-1, Top-3, and Top-5 macro-scores. For full details, please refer t
 **Micro-Average (Per Instance) Scores**
-| WI       | BS | Top-1 | Top-3 | Top-5 |
-|----------|----|----------|----------|----------|
-|  Random  | 5 | **64.89**   | **92.54**| **96.66**|
 ### Model Architecture and Objective
-Please see the [Base Model Description](https://arxiv.org/pdf/2004.04730).
 #### Hardware
-Running the X3D model requires a modern NVIDIA GPU with CUDA support. X3D-L is designed to be computationally efficient, and requires 10–16 GB of GPU memory during training.
 ## Citation

 - UAV
 - drone
 - video
+model_description: "Behavior recognition model for in situ drone videos of baboons, built using an X3D model. It was trained on the BaboonLand mini-scene dataset, which is comprised of 20 hours of aerial video footage of baboons captured using a DJI Mavic 2S drone."
 ---
+# Model Card for x3d-BaboonLand
+x3d-BaboonLand is a behavior recognition model for in situ drone videos of baboons, built using the X3D architecture. It was trained on the [BaboonLand](https://huggingface.co/datasets/imageomics/BaboonLand) dataset, which includes both spatiotemporal clips (mini-scenes) and behavior annotations provided by an expert behavioral ecologist.
 ## Model Details
 ### Model Description
 - **Developed by:** Isla Duporge, Maksim Kholiavchenko, Roi Harel, Scott Wolf, Daniel Rubenstein, Meg Crofoot, Tanya Berger-Wolf, Stephen Lee, Julie Barreau, Jenna Kline, Michelle Ramirez, Charles Stewart
 - **Model type:** X3D-L
 - **License:** MIT
 - **Fine-tuned from model:** [X3D-L](https://github.com/facebookresearch/SlowFast/blob/main/configs/Kinetics/X3D_L.yaml)
+This model was developed for the benefit of the community as an open-source product; we request that derivative products also remain open-source.
 ### Model Sources
+- **Repository:** [kabr-tools](https://github.com/Imageomics/kabr-tools)
+- **BaboonLand scripts:** [BaboonLand/scripts](https://huggingface.co/datasets/imageomics/BaboonLand/tree/main/BaboonLand/scripts)
+- **Paper:** [BaboonLand Dataset: Tracking Primates in the Wild and Automating Behaviour Recognition from Drone Videos](https://link.springer.com/article/10.1007/s11263-025-02493-5)
 - **Project Page:** [BaboonLand Project Page](https://baboonland.xyz)
+### Data Processing Software
+The [kabr-tools](https://github.com/Imageomics/kabr-tools) repository is the primary open-source package used as the basis for processing and formatting data for behavior-recognition workflows. For BaboonLand, we did **not** duplicate the full codebase into this model repository. Instead, we used the `kabr-tools` workflow with BaboonLand-specific inputs and lightweight script adaptations.
+In particular, several scripts used for BaboonLand were derived from `kabr-tools` utilities, but were adapted for this dataset and renamed for clarity. The resulting BaboonLand-specific scripts are provided here:
+- [BaboonLand/scripts](https://huggingface.co/datasets/imageomics/BaboonLand/tree/main/BaboonLand/scripts)
+These scripts document the dataset-specific preprocessing used for BaboonLand, while `kabr-tools` remains the main reference implementation for the broader workflow.
 ## Uses
+This model is intended for baboon behavior recognition from in situ drone videos.
 ### Out-of-Scope Use
+This model was trained to classify behavior from drone videos of baboons in Kenya. It may not perform well for other species, environments, camera viewpoints, annotation schemes, or behavior taxonomies.
 ## How to Get Started with the Model
+Please see the illustrative examples in the [kabr-tools](https://imageomics.github.io/kabr-tools) for the general workflow.
 ## Training Details
+We include the configuration file ([config.yaml](https://huggingface.co/imageomics/x3d-BaboonLand/blob/main/config.yaml)) used for X3D training in SlowFast.
 ### Training Data
 #### Training Hyperparameters
+The model was trained for 120 epochs using a batch size of 5.
+We used the EQL loss function to address the long-tailed class distribution and SGD optimization with a learning rate of `1e-5`.
+We used a sample rate of `16x5` and random weight initialization.
 ## Evaluation
+The model was evaluated using the [SlowFast](https://github.com/facebookresearch/SlowFast) framework, specifically the [test_net.py](https://github.com/facebookresearch/SlowFast/blob/main/tools/test_net.py) evaluation script.
 ### Testing Data
+We provide a train-test split of the mini-scenes from the [BaboonLand](https://huggingface.co/datasets/imageomics/BaboonLand) dataset for evaluation, with 75% used for training and 25% for testing. No mini-scene was split across train and test partitions.
 #### Metrics
 **Micro-Average (Per Instance) Scores**
+| WI      | BS | Top-1 | Top-3 | Top-5 |
+|---------|----|------:|------:|------:|
+| Random  | 5  | 64.89 | 92.54 | 96.66 |
 ### Model Architecture and Objective
+Please see the [base model description](https://arxiv.org/pdf/2004.04730).
 #### Hardware
+Running the X3D-L model requires a modern NVIDIA GPU with CUDA support. X3D-L is designed to be computationally efficient and typically requires 10–16 GB of GPU memory during training.
 ## Citation