Evaluations API

The Evaluations API allows you to create, run, and analyze evaluations for your AI workflows on the Lumea platform.

Overview

The Evaluations API provides a comprehensive framework for testing and validating AI workflows. It allows you to create evaluation datasets, run evaluations against your workflows, and analyze the results to ensure reliability and quality of your AI operations. This is a critical component for maintaining trust in your AI systems through systematic evaluation.

Note: Each customer receives a custom deployment with a unique API endpoint. The endpoints described in this documentation should be prefixed with your organization's specific API URL provided during onboarding.

Data Models

EvalDataset Object

Field	Type	Description
id	string	Unique identifier for the dataset
name	string	Name of the dataset
description	string	Optional description of the dataset
created_at	string	Timestamp when the dataset was created
updated_at	string	Timestamp when the dataset was last updated
archived	boolean	Whether the dataset is archived

EvalDatasetEntry Object

Field	Type	Description
dataset_id	string	ID of the dataset this entry belongs to
entry_id	string	Unique identifier for the entry within dataset
data	object	The content of the dataset entry

EvalRun Object

Field	Type	Description
id	string	Unique identifier for the eval run
dataset_id	string	ID of the dataset being evaluated
workflow_id	string	ID of the workflow being evaluated
eval_workflow_id	string	ID of the evaluation workflow
eval_workflow_run_id	string	ID of the evaluation workflow run
comment	string	Optional comment about this evaluation run
created_at	string	Timestamp when the eval run was created
updated_at	string	Timestamp when the eval run was last updated
status	string	Current status of the eval run

EvalRunResult Object

Field	Type	Description
id	string	Unique identifier for the result
eval_run_id	string	ID of the eval run this result belongs to
entry_id	string	ID of the dataset entry
result	object	The evaluation result data
created_at	string	Timestamp when the result was created

Endpoints

Create Dataset

Creates a new evaluation dataset.

URL: /evals/datasets

Method: POST

Request Body:

json

{
  "name": "Test Dataset",
  "description": "Dataset for testing our model"
}

Field	Type	Required	Description
name	string	Yes	Name of the dataset
description	string	No	Description of the dataset

Success Response:

Code: 200 OK
Content:

json

{
  "id": "dataset123",
  "name": "Test Dataset",
  "description": "Dataset for testing our model",
  "created_at": "2023-06-01T12:00:00Z",
  "updated_at": "2023-06-01T12:00:00Z",
  "archived": false
}

List Datasets

Retrieves a list of evaluation datasets.

URL: /evals/datasets

Method: GET

Query Parameters:

Parameter	Required	Description
dataset_name	No	Filter datasets by name
archived	No	Whether to include archived datasets (default: false)

Success Response:

Code: 200 OK
Content:

json

[
  {
    "id": "dataset123",
    "name": "Test Dataset",
    "description": "Dataset for testing our model",
    "created_at": "2023-06-01T12:00:00Z",
    "updated_at": "2023-06-01T12:00:00Z",
    "archived": false
  },
  {
    "id": "dataset456",
    "name": "Production Dataset",
    "description": "Dataset for production evaluation",
    "created_at": "2023-06-02T12:00:00Z",
    "updated_at": "2023-06-02T12:00:00Z",
    "archived": false
  }
]

Get Dataset

Retrieves details of a specific dataset.

URL: /evals/datasets/{dataset_id}

Method: GET

URL Parameters:

Parameter	Description
dataset_id	The unique ID of the dataset

Success Response:

Code: 200 OK
Content:

json

{
  "id": "dataset123",
  "name": "Test Dataset",
  "description": "Dataset for testing our model",
  "created_at": "2023-06-01T12:00:00Z",
  "updated_at": "2023-06-01T12:00:00Z",
  "archived": false
}

Update Dataset

Updates properties of an existing dataset.

URL: /evals/datasets/{dataset_id}

Method: PUT

URL Parameters:

Parameter	Description
dataset_id	The unique ID of the dataset

Request Body:

json

{
  "name": "Updated Dataset Name",
  "description": "Updated dataset description"
}

Field	Type	Required	Description
name	string	No	Updated name of the dataset
description	string	No	Updated description of the dataset

Success Response:

Code: 200 OK
Content:

json

{
  "id": "dataset123",
  "name": "Updated Dataset Name",
  "description": "Updated dataset description",
  "created_at": "2023-06-01T12:00:00Z",
  "updated_at": "2023-06-01T14:00:00Z",
  "archived": false
}

Error Response:

Code: 400 Bad Request
- Content: {"detail": "Dataset not found"}

Update Dataset Entries

Updates entries in a dataset.

URL: /evals/datasets/{dataset_id}/entries

Method: PUT

URL Parameters:

Parameter	Description
dataset_id	The unique ID of the dataset

Request Body:

json

{
  "entries": [
    {
      "entry_id": "entry1",
      "data": {
        "input": "What is the capital of France?",
        "expected_output": "Paris"
      }
    },
    {
      "entry_id": "entry2",
      "data": {
        "input": "What is the capital of Italy?",
        "expected_output": "Rome"
      }
    }
  ]
}

Field	Type	Required	Description
entries	array	Yes	Array of dataset entries to update

Success Response:

Code: 200 OK
Content:

json

{
  "message": "Entries updated successfully"
}

Error Response:

Code: 400 Bad Request
- Content: {"detail": "Dataset not found"}

Get Dataset Entries

Retrieves all entries in a dataset.

URL: /evals/datasets/{dataset_id}/entries

Method: GET

URL Parameters:

Parameter	Description
dataset_id	The unique ID of the dataset

Success Response:

Code: 200 OK
Content:

json

[
  {
    "dataset_id": "dataset123",
    "entry_id": "entry1",
    "data": {
      "input": "What is the capital of France?",
      "expected_output": "Paris"
    }
  },
  {
    "dataset_id": "dataset123",
    "entry_id": "entry2",
    "data": {
      "input": "What is the capital of Italy?",
      "expected_output": "Rome"
    }
  }
]

Error Response:

Code: 400 Bad Request
- Content: {"detail": "Dataset not found"}

Get Dataset Entry

Retrieves a specific entry in a dataset.

URL: /evals/datasets/{dataset_id}/entries/{entry_id}

Method: GET

URL Parameters:

Parameter	Description
dataset_id	The unique ID of the dataset
entry_id	The ID of the entry (may contain slashes)

Success Response:

Code: 200 OK
Content:

json

{
  "dataset_id": "dataset123",
  "entry_id": "entry1",
  "data": {
    "input": "What is the capital of France?",
    "expected_output": "Paris"
  }
}

Error Response:

Code: 400 Bad Request
- Content: {"detail": "Entry not found"}

Delete Dataset Entry

Deletes a specific entry in a dataset.

URL: /evals/datasets/{dataset_id}/entries/{entry_id}

Method: DELETE

URL Parameters:

Parameter	Description
dataset_id	The unique ID of the dataset
entry_id	The ID of the entry (may contain slashes)

Success Response:

Code: 200 OK
Content:

json

{
  "message": "Entry deleted successfully"
}

Error Response:

Code: 400 Bad Request
- Content: {"detail": "Entry not found"}

Start Eval Run

Initiates a new evaluation run.

URL: /evals/runs

Method: POST

Request Body:

json

{
  "dataset_id": "dataset123",
  "workflow_id": "workflow123",
  "eval_workflow_id": "eval_workflow123",
  "comment": "Testing model performance on basic questions",
  "start_task": "task1",
  "end_at_task": "task3"
}

Field	Type	Required	Description
dataset_id	string	Yes	ID of the dataset to use for evaluation
workflow_id	string	No	ID of the workflow to evaluate
eval_workflow_id	string	No	ID of the evaluation workflow to use
comment	string	No	Comment about this evaluation run
start_task	string	No	Task to start from in the workflow
end_at_task	string	No	Task to end at in the workflow

Success Response:

Code: 200 OK
Content:

json

{
  "id": "evalrun123",
  "dataset_id": "dataset123",
  "workflow_id": "workflow123",
  "eval_workflow_id": "eval_workflow123",
  "eval_workflow_run_id": "run123",
  "comment": "Testing model performance on basic questions",
  "created_at": "2023-06-01T12:00:00Z",
  "updated_at": "2023-06-01T12:00:00Z",
  "status": "pending"
}

Error Response:

Code: 400 Bad Request
- Content: {"detail": "Error message"}

List Eval Runs

Retrieves a list of evaluation runs with optional filtering.

URL: /evals/runs

Method: GET

Query Parameters:

Parameter	Required	Description
dataset_id	No	Filter runs by dataset ID
workflow_id	No	Filter runs by workflow ID
eval_workflow_id	No	Filter runs by evaluation workflow ID
limit	No	Maximum number of runs to return
offset	No	Number of runs to skip for pagination

Success Response:

Code: 200 OK
Content:

json

[
  {
    "id": "evalrun123",
    "dataset_id": "dataset123",
    "workflow_id": "workflow123",
    "eval_workflow_id": "eval_workflow123",
    "eval_workflow_run_id": "run123",
    "comment": "Testing model performance on basic questions",
    "created_at": "2023-06-01T12:00:00Z",
    "updated_at": "2023-06-01T12:05:00Z",
    "status": "completed"
  },
  {
    "id": "evalrun456",
    "dataset_id": "dataset456",
    "workflow_id": "workflow456",
    "eval_workflow_id": "eval_workflow456",
    "eval_workflow_run_id": "run456",
    "comment": "Production evaluation",
    "created_at": "2023-06-02T12:00:00Z",
    "updated_at": "2023-06-02T12:01:00Z",
    "status": "running"
  }
]

Error Response:

Code: 400 Bad Request
- Content: {"detail": "Error message"}

Get Eval Run

Retrieves details of a specific evaluation run.

URL: /evals/runs/{run_id}

Method: GET

URL Parameters:

Parameter	Description
run_id	The unique ID of the eval run

Success Response:

Code: 200 OK
Content:

json

{
  "id": "evalrun123",
  "dataset_id": "dataset123",
  "workflow_id": "workflow123",
  "eval_workflow_id": "eval_workflow123",
  "eval_workflow_run_id": "run123",
  "comment": "Testing model performance on basic questions",
  "created_at": "2023-06-01T12:00:00Z",
  "updated_at": "2023-06-01T12:05:00Z",
  "status": "completed"
}

Error Response:

Code: 400 Bad Request
- Content: {"detail": "Error message"}

Get Eval Run Batch

Retrieves the batch information for a specific evaluation run.

URL: /evals/runs/{run_id}/batch

Method: GET

URL Parameters:

Parameter	Description
run_id	The unique ID of the eval run

Success Response:

Code: 200 OK
Content:

json

{
  "id": "batch123",
  "run_id": "run123",
  "total_items": 50,
  "completed_items": 25,
  "failed_items": 2,
  "status": "running",
  "created_at": "2023-06-01T12:00:00Z",
  "updated_at": "2023-06-01T12:03:00Z"
}

Error Responses:

Code: 400 Bad Request
- Content: {"detail": "Eval run not found"}
- Content: {"detail": "Eval run workflow run not started."}
- Content: {"detail": "Batch not started yet"}

Get Eval Run Results

Retrieves the results for a specific evaluation run.

URL: /evals/runs/{run_id}/results

Method: GET

URL Parameters:

Parameter	Description
run_id	The unique ID of the eval run

Success Response:

Code: 200 OK
Content:

json

[
  {
    "id": "result123",
    "eval_run_id": "evalrun123",
    "entry_id": "entry1",
    "result": {
      "correct": true,
      "score": 1.0,
      "model_output": "Paris",
      "metrics": {
        "accuracy": 1.0,
        "latency": 0.23
      }
    },
    "created_at": "2023-06-01T12:01:00Z"
  },
  {
    "id": "result124",
    "eval_run_id": "evalrun123",
    "entry_id": "entry2",
    "result": {
      "correct": true,
      "score": 1.0,
      "model_output": "Rome",
      "metrics": {
        "accuracy": 1.0,
        "latency": 0.19
      }
    },
    "created_at": "2023-06-01T12:02:00Z"
  }
]

Error Response:

Code: 400 Bad Request
- Content: {"detail": "Eval run not found"}

Stop Eval Run

Stops a running evaluation.

URL: /evals/runs/{run_id}/stop

Method: POST

URL Parameters:

Parameter	Description
run_id	The unique ID of the eval run

Success Response:

Code: 200 OK
Content:

json

{
  "message": "Eval run stopped successfully"
}

Error Response:

Code: 400 Bad Request
- Content: {"detail": "Error message"}

Get Eval Result

Retrieves a specific evaluation result.

URL: /evals/runs/results/{result_id}

Method: GET

URL Parameters:

Parameter	Description
result_id	The unique ID of the result

Success Response:

Code: 200 OK
Content:

json

{
  "id": "result123",
  "eval_run_id": "evalrun123",
  "entry_id": "entry1",
  "result": {
    "correct": true,
    "score": 1.0,
    "model_output": "Paris",
    "metrics": {
      "accuracy": 1.0,
      "latency": 0.23
    }
  },
  "created_at": "2023-06-01T12:01:00Z"
}

Error Response:

Code: 400 Bad Request
- Content: {"detail": "Error message"}

Archive Dataset

Archives a dataset.

URL: /evals/datasets/{dataset_id}/archive

Method: POST

URL Parameters:

Parameter	Description
dataset_id	The unique ID of the dataset

Success Response:

Code: 200 OK
Content:

json

{
  "message": "Dataset archived successfully"
}

Error Response:

Code: 404 Not Found
- Content: {"detail": "Dataset not found"}

Unarchive Dataset

Unarchives a dataset.

URL: /evals/datasets/{dataset_id}/unarchive

Method: POST

URL Parameters:

Parameter	Description
dataset_id	The unique ID of the dataset

Success Response:

Code: 200 OK
Content:

json

{
  "message": "Dataset unarchived successfully"
}

Error Response:

Code: 404 Not Found
- Content: {"detail": "Dataset not found"}

Evaluations API ​

Overview ​

Data Models ​

EvalDataset Object ​

EvalDatasetEntry Object ​

EvalRun Object ​

EvalRunResult Object ​

Endpoints ​

Create Dataset ​

List Datasets ​

Get Dataset ​

Update Dataset ​

Update Dataset Entries ​

Get Dataset Entries ​

Get Dataset Entry ​

Delete Dataset Entry ​

Start Eval Run ​

List Eval Runs ​

Get Eval Run ​

Get Eval Run Batch ​

Get Eval Run Results ​

Stop Eval Run ​

Get Eval Result ​

Archive Dataset ​

Unarchive Dataset ​

Evaluations API

Overview

Data Models

EvalDataset Object

EvalDatasetEntry Object

EvalRun Object

EvalRunResult Object

Endpoints

Create Dataset

List Datasets

Get Dataset

Update Dataset

Update Dataset Entries

Get Dataset Entries

Get Dataset Entry

Delete Dataset Entry

Start Eval Run

List Eval Runs

Get Eval Run

Get Eval Run Batch

Get Eval Run Results

Stop Eval Run

Get Eval Result

Archive Dataset

Unarchive Dataset