Evaluations API
The Evaluations API allows you to create, run, and analyze evaluations for your AI workflows on the Lumea platform.
Overview
The Evaluations API provides a comprehensive framework for testing and validating AI workflows. It allows you to create evaluation datasets, run evaluations against your workflows, and analyze the results to ensure reliability and quality of your AI operations. This is a critical component for maintaining trust in your AI systems through systematic evaluation.
Note: Each customer receives a custom deployment with a unique API endpoint. The endpoints described in this documentation should be prefixed with your organization's specific API URL provided during onboarding.
Data Models
EvalDataset Object
Field | Type | Description |
---|---|---|
id | string | Unique identifier for the dataset |
name | string | Name of the dataset |
description | string | Optional description of the dataset |
created_at | string | Timestamp when the dataset was created |
updated_at | string | Timestamp when the dataset was last updated |
archived | boolean | Whether the dataset is archived |
EvalDatasetEntry Object
Field | Type | Description |
---|---|---|
dataset_id | string | ID of the dataset this entry belongs to |
entry_id | string | Unique identifier for the entry within dataset |
data | object | The content of the dataset entry |
EvalRun Object
Field | Type | Description |
---|---|---|
id | string | Unique identifier for the eval run |
dataset_id | string | ID of the dataset being evaluated |
workflow_id | string | ID of the workflow being evaluated |
eval_workflow_id | string | ID of the evaluation workflow |
eval_workflow_run_id | string | ID of the evaluation workflow run |
comment | string | Optional comment about this evaluation run |
created_at | string | Timestamp when the eval run was created |
updated_at | string | Timestamp when the eval run was last updated |
status | string | Current status of the eval run |
EvalRunResult Object
Field | Type | Description |
---|---|---|
id | string | Unique identifier for the result |
eval_run_id | string | ID of the eval run this result belongs to |
entry_id | string | ID of the dataset entry |
result | object | The evaluation result data |
created_at | string | Timestamp when the result was created |
Endpoints
Create Dataset
Creates a new evaluation dataset.
URL: /evals/datasets
Method: POST
Request Body:
{
"name": "Test Dataset",
"description": "Dataset for testing our model"
}
Field | Type | Required | Description |
---|---|---|---|
name | string | Yes | Name of the dataset |
description | string | No | Description of the dataset |
Success Response:
- Code: 200 OK
- Content:
{
"id": "dataset123",
"name": "Test Dataset",
"description": "Dataset for testing our model",
"created_at": "2023-06-01T12:00:00Z",
"updated_at": "2023-06-01T12:00:00Z",
"archived": false
}
List Datasets
Retrieves a list of evaluation datasets.
URL: /evals/datasets
Method: GET
Query Parameters:
Parameter | Required | Description |
---|---|---|
dataset_name | No | Filter datasets by name |
archived | No | Whether to include archived datasets (default: false) |
Success Response:
- Code: 200 OK
- Content:
[
{
"id": "dataset123",
"name": "Test Dataset",
"description": "Dataset for testing our model",
"created_at": "2023-06-01T12:00:00Z",
"updated_at": "2023-06-01T12:00:00Z",
"archived": false
},
{
"id": "dataset456",
"name": "Production Dataset",
"description": "Dataset for production evaluation",
"created_at": "2023-06-02T12:00:00Z",
"updated_at": "2023-06-02T12:00:00Z",
"archived": false
}
]
Get Dataset
Retrieves details of a specific dataset.
URL: /evals/datasets/{dataset_id}
Method: GET
URL Parameters:
Parameter | Description |
---|---|
dataset_id | The unique ID of the dataset |
Success Response:
- Code: 200 OK
- Content:
{
"id": "dataset123",
"name": "Test Dataset",
"description": "Dataset for testing our model",
"created_at": "2023-06-01T12:00:00Z",
"updated_at": "2023-06-01T12:00:00Z",
"archived": false
}
Update Dataset
Updates properties of an existing dataset.
URL: /evals/datasets/{dataset_id}
Method: PUT
URL Parameters:
Parameter | Description |
---|---|
dataset_id | The unique ID of the dataset |
Request Body:
{
"name": "Updated Dataset Name",
"description": "Updated dataset description"
}
Field | Type | Required | Description |
---|---|---|---|
name | string | No | Updated name of the dataset |
description | string | No | Updated description of the dataset |
Success Response:
- Code: 200 OK
- Content:
{
"id": "dataset123",
"name": "Updated Dataset Name",
"description": "Updated dataset description",
"created_at": "2023-06-01T12:00:00Z",
"updated_at": "2023-06-01T14:00:00Z",
"archived": false
}
Error Response:
- Code: 400 Bad Request
- Content:
{"detail": "Dataset not found"}
- Content:
Update Dataset Entries
Updates entries in a dataset.
URL: /evals/datasets/{dataset_id}/entries
Method: PUT
URL Parameters:
Parameter | Description |
---|---|
dataset_id | The unique ID of the dataset |
Request Body:
{
"entries": [
{
"entry_id": "entry1",
"data": {
"input": "What is the capital of France?",
"expected_output": "Paris"
}
},
{
"entry_id": "entry2",
"data": {
"input": "What is the capital of Italy?",
"expected_output": "Rome"
}
}
]
}
Field | Type | Required | Description |
---|---|---|---|
entries | array | Yes | Array of dataset entries to update |
Success Response:
- Code: 200 OK
- Content:
{
"message": "Entries updated successfully"
}
Error Response:
- Code: 400 Bad Request
- Content:
{"detail": "Dataset not found"}
- Content:
Get Dataset Entries
Retrieves all entries in a dataset.
URL: /evals/datasets/{dataset_id}/entries
Method: GET
URL Parameters:
Parameter | Description |
---|---|
dataset_id | The unique ID of the dataset |
Success Response:
- Code: 200 OK
- Content:
[
{
"dataset_id": "dataset123",
"entry_id": "entry1",
"data": {
"input": "What is the capital of France?",
"expected_output": "Paris"
}
},
{
"dataset_id": "dataset123",
"entry_id": "entry2",
"data": {
"input": "What is the capital of Italy?",
"expected_output": "Rome"
}
}
]
Error Response:
- Code: 400 Bad Request
- Content:
{"detail": "Dataset not found"}
- Content:
Get Dataset Entry
Retrieves a specific entry in a dataset.
URL: /evals/datasets/{dataset_id}/entries/{entry_id}
Method: GET
URL Parameters:
Parameter | Description |
---|---|
dataset_id | The unique ID of the dataset |
entry_id | The ID of the entry (may contain slashes) |
Success Response:
- Code: 200 OK
- Content:
{
"dataset_id": "dataset123",
"entry_id": "entry1",
"data": {
"input": "What is the capital of France?",
"expected_output": "Paris"
}
}
Error Response:
- Code: 400 Bad Request
- Content:
{"detail": "Entry not found"}
- Content:
Delete Dataset Entry
Deletes a specific entry in a dataset.
URL: /evals/datasets/{dataset_id}/entries/{entry_id}
Method: DELETE
URL Parameters:
Parameter | Description |
---|---|
dataset_id | The unique ID of the dataset |
entry_id | The ID of the entry (may contain slashes) |
Success Response:
- Code: 200 OK
- Content:
{
"message": "Entry deleted successfully"
}
Error Response:
- Code: 400 Bad Request
- Content:
{"detail": "Entry not found"}
- Content:
Start Eval Run
Initiates a new evaluation run.
URL: /evals/runs
Method: POST
Request Body:
{
"dataset_id": "dataset123",
"workflow_id": "workflow123",
"eval_workflow_id": "eval_workflow123",
"comment": "Testing model performance on basic questions",
"start_task": "task1",
"end_at_task": "task3"
}
Field | Type | Required | Description |
---|---|---|---|
dataset_id | string | Yes | ID of the dataset to use for evaluation |
workflow_id | string | No | ID of the workflow to evaluate |
eval_workflow_id | string | No | ID of the evaluation workflow to use |
comment | string | No | Comment about this evaluation run |
start_task | string | No | Task to start from in the workflow |
end_at_task | string | No | Task to end at in the workflow |
Success Response:
- Code: 200 OK
- Content:
{
"id": "evalrun123",
"dataset_id": "dataset123",
"workflow_id": "workflow123",
"eval_workflow_id": "eval_workflow123",
"eval_workflow_run_id": "run123",
"comment": "Testing model performance on basic questions",
"created_at": "2023-06-01T12:00:00Z",
"updated_at": "2023-06-01T12:00:00Z",
"status": "pending"
}
Error Response:
- Code: 400 Bad Request
- Content:
{"detail": "Error message"}
- Content:
List Eval Runs
Retrieves a list of evaluation runs with optional filtering.
URL: /evals/runs
Method: GET
Query Parameters:
Parameter | Required | Description |
---|---|---|
dataset_id | No | Filter runs by dataset ID |
workflow_id | No | Filter runs by workflow ID |
eval_workflow_id | No | Filter runs by evaluation workflow ID |
limit | No | Maximum number of runs to return |
offset | No | Number of runs to skip for pagination |
Success Response:
- Code: 200 OK
- Content:
[
{
"id": "evalrun123",
"dataset_id": "dataset123",
"workflow_id": "workflow123",
"eval_workflow_id": "eval_workflow123",
"eval_workflow_run_id": "run123",
"comment": "Testing model performance on basic questions",
"created_at": "2023-06-01T12:00:00Z",
"updated_at": "2023-06-01T12:05:00Z",
"status": "completed"
},
{
"id": "evalrun456",
"dataset_id": "dataset456",
"workflow_id": "workflow456",
"eval_workflow_id": "eval_workflow456",
"eval_workflow_run_id": "run456",
"comment": "Production evaluation",
"created_at": "2023-06-02T12:00:00Z",
"updated_at": "2023-06-02T12:01:00Z",
"status": "running"
}
]
Error Response:
- Code: 400 Bad Request
- Content:
{"detail": "Error message"}
- Content:
Get Eval Run
Retrieves details of a specific evaluation run.
URL: /evals/runs/{run_id}
Method: GET
URL Parameters:
Parameter | Description |
---|---|
run_id | The unique ID of the eval run |
Success Response:
- Code: 200 OK
- Content:
{
"id": "evalrun123",
"dataset_id": "dataset123",
"workflow_id": "workflow123",
"eval_workflow_id": "eval_workflow123",
"eval_workflow_run_id": "run123",
"comment": "Testing model performance on basic questions",
"created_at": "2023-06-01T12:00:00Z",
"updated_at": "2023-06-01T12:05:00Z",
"status": "completed"
}
Error Response:
- Code: 400 Bad Request
- Content:
{"detail": "Error message"}
- Content:
Get Eval Run Batch
Retrieves the batch information for a specific evaluation run.
URL: /evals/runs/{run_id}/batch
Method: GET
URL Parameters:
Parameter | Description |
---|---|
run_id | The unique ID of the eval run |
Success Response:
- Code: 200 OK
- Content:
{
"id": "batch123",
"run_id": "run123",
"total_items": 50,
"completed_items": 25,
"failed_items": 2,
"status": "running",
"created_at": "2023-06-01T12:00:00Z",
"updated_at": "2023-06-01T12:03:00Z"
}
Error Responses:
- Code: 400 Bad Request
- Content:
{"detail": "Eval run not found"}
- Content:
{"detail": "Eval run workflow run not started."}
- Content:
{"detail": "Batch not started yet"}
- Content:
Get Eval Run Results
Retrieves the results for a specific evaluation run.
URL: /evals/runs/{run_id}/results
Method: GET
URL Parameters:
Parameter | Description |
---|---|
run_id | The unique ID of the eval run |
Success Response:
- Code: 200 OK
- Content:
[
{
"id": "result123",
"eval_run_id": "evalrun123",
"entry_id": "entry1",
"result": {
"correct": true,
"score": 1.0,
"model_output": "Paris",
"metrics": {
"accuracy": 1.0,
"latency": 0.23
}
},
"created_at": "2023-06-01T12:01:00Z"
},
{
"id": "result124",
"eval_run_id": "evalrun123",
"entry_id": "entry2",
"result": {
"correct": true,
"score": 1.0,
"model_output": "Rome",
"metrics": {
"accuracy": 1.0,
"latency": 0.19
}
},
"created_at": "2023-06-01T12:02:00Z"
}
]
Error Response:
- Code: 400 Bad Request
- Content:
{"detail": "Eval run not found"}
- Content:
Stop Eval Run
Stops a running evaluation.
URL: /evals/runs/{run_id}/stop
Method: POST
URL Parameters:
Parameter | Description |
---|---|
run_id | The unique ID of the eval run |
Success Response:
- Code: 200 OK
- Content:
{
"message": "Eval run stopped successfully"
}
Error Response:
- Code: 400 Bad Request
- Content:
{"detail": "Error message"}
- Content:
Get Eval Result
Retrieves a specific evaluation result.
URL: /evals/runs/results/{result_id}
Method: GET
URL Parameters:
Parameter | Description |
---|---|
result_id | The unique ID of the result |
Success Response:
- Code: 200 OK
- Content:
{
"id": "result123",
"eval_run_id": "evalrun123",
"entry_id": "entry1",
"result": {
"correct": true,
"score": 1.0,
"model_output": "Paris",
"metrics": {
"accuracy": 1.0,
"latency": 0.23
}
},
"created_at": "2023-06-01T12:01:00Z"
}
Error Response:
- Code: 400 Bad Request
- Content:
{"detail": "Error message"}
- Content:
Archive Dataset
Archives a dataset.
URL: /evals/datasets/{dataset_id}/archive
Method: POST
URL Parameters:
Parameter | Description |
---|---|
dataset_id | The unique ID of the dataset |
Success Response:
- Code: 200 OK
- Content:
{
"message": "Dataset archived successfully"
}
Error Response:
- Code: 404 Not Found
- Content:
{"detail": "Dataset not found"}
- Content:
Unarchive Dataset
Unarchives a dataset.
URL: /evals/datasets/{dataset_id}/unarchive
Method: POST
URL Parameters:
Parameter | Description |
---|---|
dataset_id | The unique ID of the dataset |
Success Response:
- Code: 200 OK
- Content:
{
"message": "Dataset unarchived successfully"
}
Error Response:
- Code: 404 Not Found
- Content:
{"detail": "Dataset not found"}
- Content: