Skip to content

Add __repr__ methods for DatasetColumn and DatasetColumnPair#765

Open
zanvari wants to merge 1 commit into
huggingface:mainfrom
zanvari:add-dataset-column-repr
Open

Add __repr__ methods for DatasetColumn and DatasetColumnPair#765
zanvari wants to merge 1 commit into
huggingface:mainfrom
zanvari:add-dataset-column-repr

Conversation

@zanvari

@zanvari zanvari commented Jun 17, 2026

Copy link
Copy Markdown

Summary

This PR addresses #547 by adding __repr__() implementations for DatasetColumn and DatasetColumnPair.

Currently, both classes inherit from list but do not populate the underlying list. As a result, printing either object displays [], even though they provide lazy access to dataset columns. This can make debugging difficult and lead to confusing exception messages and logs.

The new representations provide a concise summary of each object without loading dataset contents into memory. For DatasetColumn, the representation includes the column key and dataset length. For DatasetColumnPair, it includes the configured column names, key mappings, and dataset length.

Changes

  • Added __repr__() to DatasetColumn
  • Added __repr__() to DatasetColumnPair
  • Added tests covering both representations

Example

Before:

print(DatasetColumn(...))
# []

After:

print(DatasetColumn(...))
# DatasetColumn(key='text', len=3)

Similarly, DatasetColumnPair now provides a meaningful representation instead of displaying as an empty list.

Testing

I added tests for both classes and verified that the new representations are returned as expected.

python -m pytest tests/test_evaluator_utils.py -v

Result:

2 passed

Fixes #547.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DatasetColumn and DatasetColumnPair should have __repr__() methods defined

1 participant