Datasets
Provide NEO with access to your data using multiple methods and formats
NEO handles real-world data automatically, supporting various access methods, file formats, and data sources.
Supported File Formats
| Format | Type | Description | Platform | VS Code |
|---|
| CSV | Tabular | Standard format for tabular data and time series | ✅ | ✅ |
| Parquet | Tabular | Recommended for datasets >100MB | ✅ | ✅ |
| JSON | Structured | Ideal for logs and nested data structures | ✅ | ✅ |
| Excel | Tabular | Business data and reports | Upload only ✅ | ✅ |
| Images | Visual | JPG, PNG, TIFF formats (ZIP) | ZIP ✅ | ✅ |
| Audio | Audio | WAV, MP3, FLAC formats (ZIP) | ZIP ✅ | ✅ |
Data Access Methods
Platform Mode
| Method | Description | Limit |
|---|
| File Upload | Drag and drop directly in chat | Max 50MB per file |
| Public URLs | Reference public dataset URLs | Any size |
| Cloud Storage | S3, GCS, Azure via Secrets Manager | No size limit |
| GitHub | Access repository datasets | Public & Private repos |
| Kaggle | Competition datasets | Via API |
VS Code Extension
- Local Files - Place datasets in workspace folder - no size limits!
Integrated Providers:
| Provider | Use Case |
|---|
| Amazon S3 | Datasets and model checkpoints |
| Weights & Biases | Experiment tracking and artifacts |
| Hugging Face | Model hub access |
| Kaggle | Competition data |
| GitHub | Repository datasets |
Quick Setup Guide
| Step | Action | Details |
|---|
| Step 1 | Choose Access Method | Upload, URL, cloud, or local files |
| Step 2 | Prepare Your Data | Use supported formats (CSV, Parquet, JSON, etc.) |
| Step 3 | Reference in Task | Include file path or URL in your task description |
Example Tasks
CSV Dataset Example
Analyze the retail sales data in sales_data.csv (columns: date, product_id,
quantity, price, store_id) and forecast demand for each product category.
Include confidence intervals.
Parquet Dataset Example
Use the large transaction dataset in transactions.parquet (10M+ records)
to detect fraudulent transactions. Optimize for precision to minimize
false positives.
Cloud Storage Example
Analyze customer feedback from s3://company-data/feedback/2024/ and
perform sentiment analysis. Generate monthly sentiment trends.
Multi-Source Example
Combine customer_data.parquet, transactions.json, and product_images.zip
to build a personalized recommendation engine.
Best Practices
- Start Small - Test with samples under 10MB before scaling
- Clean Column Names - Use descriptive, consistent naming (e.g.,
customer_id not “Customer ID”) - Standardize Dates - Use ISO 8601 format for consistent parsing
- Use Parquet for Large Data - Faster processing, smaller storage footprint
Troubleshooting
| Issue | Solution |
|---|
| File Not Found | Verify file path, check spelling, ensure file exists |
| Format Not Supported | Convert to CSV, Parquet, or JSON |
| File Too Large | Use cloud storage or VS Code Extension |
| Access Denied | Verify credentials and permissions |
Need Help?
| Resource | Description |
|---|
| Getting Started | Learn how to submit your first task |
| Use Cases | See data examples in action |
| FAQ | Find answers to common questions |