Create An Evaluation Dataset
An evaluation dataset is a way to load in:
- CSV files
- JSON files
- Custom data sources (to be supported soon)
Defining A Dataset
An evaluation dataset is a list of test cases designed to make testing a large number of test cases very easily. Testing a large number of test cases is important for enterprise production use cases. We support a number of ways to quickly get started.
Example
from deepeval.dataset import EvaluationDataset
# from a csv
# sample.csv
# input,expected_output,id
# sample_input,sample_output,312
ds = EvaluationDataset.from_csv(
csv_filename="sample.csv",
input_column="input",
expected_output_column="expected_output",
id_column="312"
)
Running Tests
Running the tests is easy with the run_evaluation
method. When you call run_evaluation
, it will output a text file for you to review the results which will contain
ds.run_evaluation(
callable_fn=generate_llm_output,
)
# Returns the evaluation
Once you run these tests, you will then be given a table that looks like this and is saved to a text file.
Test Passed Metric Name Score Output Expected output Message
------------- --------------------- ----------- ------------------------------------------------ ----------------- -------------------------------------------
True EntailmentScoreMetric 0.000830871 Our customer success phone line is 1200-231-231. 1800-213-123 EntailmentScoreMetric was unsuccessful for
What is the customer success number
which should have matched
1800-213-123
View a sample of data inside the Evaluation Dataset
To view a sample of data, simply run:
ds.sample(5)
From CSV
You can set up an evaluation dataset from the CSV the from_csv
method
dataset = EvaluationDataset.from_csv(
csv_filename="input.csv",
input_column="input",
expected_output_column="expected_output",
)
Parameters
csv_filename
- the name of the CSV fileinput_column
- the input column nameexpected_output_column
- the expected output columnid_column
- the ID columnmetrics
- The list of metrics you want to supply to run this test.