Skip to content

Evaluations API

The Evaluations API allows you to create, run, and analyze evaluations for your AI workflows on the Lumea platform.

Overview

The Evaluations API provides a comprehensive framework for testing and validating AI workflows. It allows you to create evaluation datasets, run evaluations against your workflows, and analyze the results to ensure reliability and quality of your AI operations. This is a critical component for maintaining trust in your AI systems through systematic evaluation.

Note: Each customer receives a custom deployment with a unique API endpoint. The endpoints described in this documentation should be prefixed with your organization's specific API URL provided during onboarding.

Data Models

EvalDataset Object

FieldTypeDescription
idstringUnique identifier for the dataset
namestringName of the dataset
descriptionstringOptional description of the dataset
created_atstringTimestamp when the dataset was created
updated_atstringTimestamp when the dataset was last updated
archivedbooleanWhether the dataset is archived

EvalDatasetEntry Object

FieldTypeDescription
dataset_idstringID of the dataset this entry belongs to
entry_idstringUnique identifier for the entry within dataset
dataobjectThe content of the dataset entry

EvalRun Object

FieldTypeDescription
idstringUnique identifier for the eval run
dataset_idstringID of the dataset being evaluated
workflow_idstringID of the workflow being evaluated
eval_workflow_idstringID of the evaluation workflow
eval_workflow_run_idstringID of the evaluation workflow run
commentstringOptional comment about this evaluation run
created_atstringTimestamp when the eval run was created
updated_atstringTimestamp when the eval run was last updated
statusstringCurrent status of the eval run

EvalRunResult Object

FieldTypeDescription
idstringUnique identifier for the result
eval_run_idstringID of the eval run this result belongs to
entry_idstringID of the dataset entry
resultobjectThe evaluation result data
created_atstringTimestamp when the result was created

Endpoints

Create Dataset

Creates a new evaluation dataset.

URL: /evals/datasets

Method: POST

Request Body:

json
{
  "name": "Test Dataset",
  "description": "Dataset for testing our model"
}
FieldTypeRequiredDescription
namestringYesName of the dataset
descriptionstringNoDescription of the dataset

Success Response:

  • Code: 200 OK
  • Content:
json
{
  "id": "dataset123",
  "name": "Test Dataset",
  "description": "Dataset for testing our model",
  "created_at": "2023-06-01T12:00:00Z",
  "updated_at": "2023-06-01T12:00:00Z",
  "archived": false
}

List Datasets

Retrieves a list of evaluation datasets.

URL: /evals/datasets

Method: GET

Query Parameters:

ParameterRequiredDescription
dataset_nameNoFilter datasets by name
archivedNoWhether to include archived datasets (default: false)

Success Response:

  • Code: 200 OK
  • Content:
json
[
  {
    "id": "dataset123",
    "name": "Test Dataset",
    "description": "Dataset for testing our model",
    "created_at": "2023-06-01T12:00:00Z",
    "updated_at": "2023-06-01T12:00:00Z",
    "archived": false
  },
  {
    "id": "dataset456",
    "name": "Production Dataset",
    "description": "Dataset for production evaluation",
    "created_at": "2023-06-02T12:00:00Z",
    "updated_at": "2023-06-02T12:00:00Z",
    "archived": false
  }
]

Get Dataset

Retrieves details of a specific dataset.

URL: /evals/datasets/{dataset_id}

Method: GET

URL Parameters:

ParameterDescription
dataset_idThe unique ID of the dataset

Success Response:

  • Code: 200 OK
  • Content:
json
{
  "id": "dataset123",
  "name": "Test Dataset",
  "description": "Dataset for testing our model",
  "created_at": "2023-06-01T12:00:00Z",
  "updated_at": "2023-06-01T12:00:00Z",
  "archived": false
}

Update Dataset

Updates properties of an existing dataset.

URL: /evals/datasets/{dataset_id}

Method: PUT

URL Parameters:

ParameterDescription
dataset_idThe unique ID of the dataset

Request Body:

json
{
  "name": "Updated Dataset Name",
  "description": "Updated dataset description"
}
FieldTypeRequiredDescription
namestringNoUpdated name of the dataset
descriptionstringNoUpdated description of the dataset

Success Response:

  • Code: 200 OK
  • Content:
json
{
  "id": "dataset123",
  "name": "Updated Dataset Name",
  "description": "Updated dataset description",
  "created_at": "2023-06-01T12:00:00Z",
  "updated_at": "2023-06-01T14:00:00Z",
  "archived": false
}

Error Response:

  • Code: 400 Bad Request
    • Content: {"detail": "Dataset not found"}

Update Dataset Entries

Updates entries in a dataset.

URL: /evals/datasets/{dataset_id}/entries

Method: PUT

URL Parameters:

ParameterDescription
dataset_idThe unique ID of the dataset

Request Body:

json
{
  "entries": [
    {
      "entry_id": "entry1",
      "data": {
        "input": "What is the capital of France?",
        "expected_output": "Paris"
      }
    },
    {
      "entry_id": "entry2",
      "data": {
        "input": "What is the capital of Italy?",
        "expected_output": "Rome"
      }
    }
  ]
}
FieldTypeRequiredDescription
entriesarrayYesArray of dataset entries to update

Success Response:

  • Code: 200 OK
  • Content:
json
{
  "message": "Entries updated successfully"
}

Error Response:

  • Code: 400 Bad Request
    • Content: {"detail": "Dataset not found"}

Get Dataset Entries

Retrieves all entries in a dataset.

URL: /evals/datasets/{dataset_id}/entries

Method: GET

URL Parameters:

ParameterDescription
dataset_idThe unique ID of the dataset

Success Response:

  • Code: 200 OK
  • Content:
json
[
  {
    "dataset_id": "dataset123",
    "entry_id": "entry1",
    "data": {
      "input": "What is the capital of France?",
      "expected_output": "Paris"
    }
  },
  {
    "dataset_id": "dataset123",
    "entry_id": "entry2",
    "data": {
      "input": "What is the capital of Italy?",
      "expected_output": "Rome"
    }
  }
]

Error Response:

  • Code: 400 Bad Request
    • Content: {"detail": "Dataset not found"}

Get Dataset Entry

Retrieves a specific entry in a dataset.

URL: /evals/datasets/{dataset_id}/entries/{entry_id}

Method: GET

URL Parameters:

ParameterDescription
dataset_idThe unique ID of the dataset
entry_idThe ID of the entry (may contain slashes)

Success Response:

  • Code: 200 OK
  • Content:
json
{
  "dataset_id": "dataset123",
  "entry_id": "entry1",
  "data": {
    "input": "What is the capital of France?",
    "expected_output": "Paris"
  }
}

Error Response:

  • Code: 400 Bad Request
    • Content: {"detail": "Entry not found"}

Delete Dataset Entry

Deletes a specific entry in a dataset.

URL: /evals/datasets/{dataset_id}/entries/{entry_id}

Method: DELETE

URL Parameters:

ParameterDescription
dataset_idThe unique ID of the dataset
entry_idThe ID of the entry (may contain slashes)

Success Response:

  • Code: 200 OK
  • Content:
json
{
  "message": "Entry deleted successfully"
}

Error Response:

  • Code: 400 Bad Request
    • Content: {"detail": "Entry not found"}

Start Eval Run

Initiates a new evaluation run.

URL: /evals/runs

Method: POST

Request Body:

json
{
  "dataset_id": "dataset123",
  "workflow_id": "workflow123",
  "eval_workflow_id": "eval_workflow123",
  "comment": "Testing model performance on basic questions",
  "start_task": "task1",
  "end_at_task": "task3"
}
FieldTypeRequiredDescription
dataset_idstringYesID of the dataset to use for evaluation
workflow_idstringNoID of the workflow to evaluate
eval_workflow_idstringNoID of the evaluation workflow to use
commentstringNoComment about this evaluation run
start_taskstringNoTask to start from in the workflow
end_at_taskstringNoTask to end at in the workflow

Success Response:

  • Code: 200 OK
  • Content:
json
{
  "id": "evalrun123",
  "dataset_id": "dataset123",
  "workflow_id": "workflow123",
  "eval_workflow_id": "eval_workflow123",
  "eval_workflow_run_id": "run123",
  "comment": "Testing model performance on basic questions",
  "created_at": "2023-06-01T12:00:00Z",
  "updated_at": "2023-06-01T12:00:00Z",
  "status": "pending"
}

Error Response:

  • Code: 400 Bad Request
    • Content: {"detail": "Error message"}

List Eval Runs

Retrieves a list of evaluation runs with optional filtering.

URL: /evals/runs

Method: GET

Query Parameters:

ParameterRequiredDescription
dataset_idNoFilter runs by dataset ID
workflow_idNoFilter runs by workflow ID
eval_workflow_idNoFilter runs by evaluation workflow ID
limitNoMaximum number of runs to return
offsetNoNumber of runs to skip for pagination

Success Response:

  • Code: 200 OK
  • Content:
json
[
  {
    "id": "evalrun123",
    "dataset_id": "dataset123",
    "workflow_id": "workflow123",
    "eval_workflow_id": "eval_workflow123",
    "eval_workflow_run_id": "run123",
    "comment": "Testing model performance on basic questions",
    "created_at": "2023-06-01T12:00:00Z",
    "updated_at": "2023-06-01T12:05:00Z",
    "status": "completed"
  },
  {
    "id": "evalrun456",
    "dataset_id": "dataset456",
    "workflow_id": "workflow456",
    "eval_workflow_id": "eval_workflow456",
    "eval_workflow_run_id": "run456",
    "comment": "Production evaluation",
    "created_at": "2023-06-02T12:00:00Z",
    "updated_at": "2023-06-02T12:01:00Z",
    "status": "running"
  }
]

Error Response:

  • Code: 400 Bad Request
    • Content: {"detail": "Error message"}

Get Eval Run

Retrieves details of a specific evaluation run.

URL: /evals/runs/{run_id}

Method: GET

URL Parameters:

ParameterDescription
run_idThe unique ID of the eval run

Success Response:

  • Code: 200 OK
  • Content:
json
{
  "id": "evalrun123",
  "dataset_id": "dataset123",
  "workflow_id": "workflow123",
  "eval_workflow_id": "eval_workflow123",
  "eval_workflow_run_id": "run123",
  "comment": "Testing model performance on basic questions",
  "created_at": "2023-06-01T12:00:00Z",
  "updated_at": "2023-06-01T12:05:00Z",
  "status": "completed"
}

Error Response:

  • Code: 400 Bad Request
    • Content: {"detail": "Error message"}

Get Eval Run Batch

Retrieves the batch information for a specific evaluation run.

URL: /evals/runs/{run_id}/batch

Method: GET

URL Parameters:

ParameterDescription
run_idThe unique ID of the eval run

Success Response:

  • Code: 200 OK
  • Content:
json
{
  "id": "batch123",
  "run_id": "run123",
  "total_items": 50,
  "completed_items": 25,
  "failed_items": 2,
  "status": "running",
  "created_at": "2023-06-01T12:00:00Z",
  "updated_at": "2023-06-01T12:03:00Z"
}

Error Responses:

  • Code: 400 Bad Request
    • Content: {"detail": "Eval run not found"}
    • Content: {"detail": "Eval run workflow run not started."}
    • Content: {"detail": "Batch not started yet"}

Get Eval Run Results

Retrieves the results for a specific evaluation run.

URL: /evals/runs/{run_id}/results

Method: GET

URL Parameters:

ParameterDescription
run_idThe unique ID of the eval run

Success Response:

  • Code: 200 OK
  • Content:
json
[
  {
    "id": "result123",
    "eval_run_id": "evalrun123",
    "entry_id": "entry1",
    "result": {
      "correct": true,
      "score": 1.0,
      "model_output": "Paris",
      "metrics": {
        "accuracy": 1.0,
        "latency": 0.23
      }
    },
    "created_at": "2023-06-01T12:01:00Z"
  },
  {
    "id": "result124",
    "eval_run_id": "evalrun123",
    "entry_id": "entry2",
    "result": {
      "correct": true,
      "score": 1.0,
      "model_output": "Rome",
      "metrics": {
        "accuracy": 1.0,
        "latency": 0.19
      }
    },
    "created_at": "2023-06-01T12:02:00Z"
  }
]

Error Response:

  • Code: 400 Bad Request
    • Content: {"detail": "Eval run not found"}

Stop Eval Run

Stops a running evaluation.

URL: /evals/runs/{run_id}/stop

Method: POST

URL Parameters:

ParameterDescription
run_idThe unique ID of the eval run

Success Response:

  • Code: 200 OK
  • Content:
json
{
  "message": "Eval run stopped successfully"
}

Error Response:

  • Code: 400 Bad Request
    • Content: {"detail": "Error message"}

Get Eval Result

Retrieves a specific evaluation result.

URL: /evals/runs/results/{result_id}

Method: GET

URL Parameters:

ParameterDescription
result_idThe unique ID of the result

Success Response:

  • Code: 200 OK
  • Content:
json
{
  "id": "result123",
  "eval_run_id": "evalrun123",
  "entry_id": "entry1",
  "result": {
    "correct": true,
    "score": 1.0,
    "model_output": "Paris",
    "metrics": {
      "accuracy": 1.0,
      "latency": 0.23
    }
  },
  "created_at": "2023-06-01T12:01:00Z"
}

Error Response:

  • Code: 400 Bad Request
    • Content: {"detail": "Error message"}

Archive Dataset

Archives a dataset.

URL: /evals/datasets/{dataset_id}/archive

Method: POST

URL Parameters:

ParameterDescription
dataset_idThe unique ID of the dataset

Success Response:

  • Code: 200 OK
  • Content:
json
{
  "message": "Dataset archived successfully"
}

Error Response:

  • Code: 404 Not Found
    • Content: {"detail": "Dataset not found"}

Unarchive Dataset

Unarchives a dataset.

URL: /evals/datasets/{dataset_id}/unarchive

Method: POST

URL Parameters:

ParameterDescription
dataset_idThe unique ID of the dataset

Success Response:

  • Code: 200 OK
  • Content:
json
{
  "message": "Dataset unarchived successfully"
}

Error Response:

  • Code: 404 Not Found
    • Content: {"detail": "Dataset not found"}