Datasets

Provide NEO with access to your data using multiple methods and formats

NEO handles real-world data automatically, supporting various access methods, file formats, and data sources.

Supported File Formats

Format	Type	Description	Platform	VS Code
CSV	Tabular	Standard format for tabular data and time series	✅	✅
Parquet	Tabular	Recommended for datasets >100MB	✅	✅
JSON	Structured	Ideal for logs and nested data structures	✅	✅
Excel	Tabular	Business data and reports	Upload only ✅	✅
Images	Visual	JPG, PNG, TIFF formats (ZIP)	ZIP ✅	✅
Audio	Audio	WAV, MP3, FLAC formats (ZIP)	ZIP ✅	✅

Data Access Methods

Platform Mode

Method	Description	Limit
File Upload	Drag and drop directly in chat	Max 50MB per file
Public URLs	Reference public dataset URLs	Any size
Cloud Storage	S3, GCS, Azure via Secrets Manager	No size limit
GitHub	Access repository datasets	Public & Private repos
Kaggle	Competition datasets	Via API

VS Code Extension

Local Files - Place datasets in workspace folder - no size limits!

Integrated Providers:

Provider	Use Case
Amazon S3	Datasets and model checkpoints
Weights & Biases	Experiment tracking and artifacts
Hugging Face	Model hub access
Kaggle	Competition data
GitHub	Repository datasets

Quick Setup Guide

Step	Action	Details
Step 1	Choose Access Method	Upload, URL, cloud, or local files
Step 2	Prepare Your Data	Use supported formats (CSV, Parquet, JSON, etc.)
Step 3	Reference in Task	Include file path or URL in your task description

Example Tasks

CSV Dataset Example

Analyze the retail sales data in sales_data.csv (columns: date, product_id,
quantity, price, store_id) and forecast demand for each product category.
Include confidence intervals.

Parquet Dataset Example

Use the large transaction dataset in transactions.parquet (10M+ records)
to detect fraudulent transactions. Optimize for precision to minimize
false positives.

Cloud Storage Example

Analyze customer feedback from s3://company-data/feedback/2024/ and
perform sentiment analysis. Generate monthly sentiment trends.

Multi-Source Example

Combine customer_data.parquet, transactions.json, and product_images.zip
to build a personalized recommendation engine.

Best Practices

Start Small - Test with samples under 10MB before scaling
Clean Column Names - Use descriptive, consistent naming (e.g., customer_id not “Customer ID”)
Standardize Dates - Use ISO 8601 format for consistent parsing
Use Parquet for Large Data - Faster processing, smaller storage footprint

Troubleshooting

Issue	Solution
File Not Found	Verify file path, check spelling, ensure file exists
Format Not Supported	Convert to CSV, Parquet, or JSON
File Too Large	Use cloud storage or VS Code Extension
Access Denied	Verify credentials and permissions

Need Help?

Resource	Description
Getting Started	Learn how to submit your first task
Use Cases	See data examples in action
FAQ	Find answers to common questions