Human Value Classification with pyvalues - Methods Hub

Learning Objectives

By the end of this tutorial, you will be able to

Detect Schwartz’ values in natural language text using different classifiers
Get an overview of the values in different texts
Evaluate classifications against a ground truth

Target Audience

This tutorial is aimed at beginners in programming.

Duration

About half a work day.

Use Cases

To analyze which human values different groups (e.g., media of different countries) associate with the same topic, acquire texts (e.g., news articles) on this topic written by the different groups, classify the texts by human values, and compare the relative frequencies of the human values between the texts of the groups. This use case was the inspiration for the ValuesML project.
To identify the human values that are important to a specific group, classify texts of that group (e.g., manifestos, opinion pieces) and visualize the relative frequencies of the human values.

1. Environment Setup

This tutorial is based on the pyvalues library (reference documentation). It is required to be installed for this tutorial and is installed like this:

!pip install pyvalues==0.11.1
import pyvalues

Hint: Run code like the one above by selecting the code area and pressing Ctrl+Enter. Try it for the code above!

The code below, on the other hand, checks whether a graphics card is available:

import subprocess
gpu_memory_free = 0
try:
  command = "nvidia-smi --query-gpu=memory.free --format=csv"
  command_output = subprocess.check_output(command.split()).decode('ascii')
  memory_free_info = command_output.split('\n')[:-1][1:]
  memory_free_values = [
    int(x.split()[0]) for i, x in enumerate(memory_free_info)
  ]
  if len(memory_free_values) > 0:
    gpu_memory_free = memory_free_values[0] / 1024
except FileNotFoundError:
  pass

if gpu_memory_free > 0:
  print(f"Found graphics card with {gpu_memory_free} GB memory")
else:
  print("No graphics card found.")

No graphics card found.

If you are using an online virtual environment that supports graphics cards and would like to use one (for the ValueEval’24 classifier if you have no API token available, see there), but none is found, check whether you can enable one. For example, here are instructions for Google Colab with GPU.

2. Basics of Human Value Classification

2.1. Value Schemas

The original 10 human values were later refined to 19 human values in a hierarchical structure - for example, “Self-direction” was refined into “Self-direction: action” and “Self-direction: thought”; but also the values “Face” and “Humility” were added (Schwartz et al., 2012).

Thus there are three schemas that are typically used in value research. When selecting the schema to use for one’s work, one has to consider that not all classification algorithms/model support all of these schemas. The schemas available in pyvalues are:

The 10 original values:

print(pyvalues.OriginalValues.names())

['Self-direction', 'Stimulation', 'Hedonism', 'Achievement', 'Power', 'Security', 'Tradition', 'Conformity', 'Benevolence', 'Universalism']

The 19 refined values:

print(pyvalues.RefinedValues.names())

['Self-direction: action', 'Self-direction: thought', 'Stimulation', 'Hedonism', 'Achievement', 'Power: dominance', 'Power: resources', 'Face', 'Security: personal', 'Security: societal', 'Tradition', 'Conformity: rules', 'Conformity: interpersonal', 'Humility', 'Benevolence: caring', 'Benevolence: dependability', 'Universalism: concern', 'Universalism: nature', 'Universalism: tolerance']

The 12 values when collapsing the 19 refined ones. These are the 10 original values plus “Face” and “Humility”:

print(pyvalues.RefinedCoarseValues.names())

['Self-direction', 'Stimulation', 'Hedonism', 'Achievement', 'Power', 'Face', 'Security', 'Tradition', 'Conformity', 'Humility', 'Benevolence', 'Universalism']

2.2. Value Scores

In computational analyses, one typically assigns a “score” to texts for each value of the selected schema. Such scores indicate a prevalence, confidence, or similar. Scores range from 0 (weakest) to 1 (strongest).

The following code illustrates how to set value scores and how to convert to schemas with fewer values:

scores = pyvalues.RefinedValues(self_direction_action=0.5, self_direction_thought=0.25, face=0.5)

print(f"Refined:        {scores}")
print(f"Refined coarse: {scores.coarse_values()}") # take maximum score of the respective refined values
print(f"Original:       {scores.original_values()}") # same as coarse_values() and remove "Face" and "Humility"

Refined:        {"Self-direction: thought":0.25,"Self-direction: action":0.5,"Face":0.5}
Refined coarse: {"Self-direction":0.5,"Face":0.5}
Original:       {"Self-direction":0.5}

2.3. Value Attainment

Some classifiers not only assign a score for a value, but also distinguish whether the value is (partially) attained or (partially) constrained in the text. A value is (partially) attained if the text describes a past, current, or hypothetical situation in which the value is (partially) fulfilled. On the other hand, a value is (partially) constrained if the text describes a past, current, or hypothetical situation that is against the (partial) fulfillment of the value. For example, for the value of Security, attainment would mean that something is made safer or healthier. In contrast, an event can be stated in a way that thwarts/constrains safety or health (Reitis-Münstermann et al., 2024).

Such classifiers assign to each value a score for attained and one for constrained. The sum of both scores has the same meaning as the single score assigned by other classifiers. The larger the attained-score is in comparison to the constrained-score for a value, the more the classifier sees the value as attained in the text (and vice versa).

In pyvalues, each schema is also available in a version WithAttainment:

# The 38 scores for 19 refined values (attained and constrained each)
print(pyvalues.RefinedValuesWithAttainment.names())
print()

scores = pyvalues.RefinedValuesWithAttainment(
  self_direction_action=pyvalues.AttainmentScore(attained=0.5),
  self_direction_thought=pyvalues.AttainmentScore(constrained=0.25),
  face=pyvalues.AttainmentScore(constrained=0.25),
)

print(f"Refined:                 {scores}")
print(f"Refined coarse:          {scores.coarse_values()}") # take maximum like above, keep ratio of attained and constrained
print(f"Original:                {scores.original_values()}") # same as coarse_values() and remove "Face" and "Humility"
print()
print(f"Refined (no attainment): {scores.without_attainment()}") # sum scores for attained and constrained

['Self-direction: action attained', 'Self-direction: action constrained', 'Self-direction: thought attained', 'Self-direction: thought constrained', 'Stimulation attained', 'Stimulation constrained', 'Hedonism attained', 'Hedonism constrained', 'Achievement attained', 'Achievement constrained', 'Power: dominance attained', 'Power: dominance constrained', 'Power: resources attained', 'Power: resources constrained', 'Face attained', 'Face constrained', 'Security: personal attained', 'Security: personal constrained', 'Security: societal attained', 'Security: societal constrained', 'Tradition attained', 'Tradition constrained', 'Conformity: rules attained', 'Conformity: rules constrained', 'Conformity: interpersonal attained', 'Conformity: interpersonal constrained', 'Humility attained', 'Humility constrained', 'Benevolence: caring attained', 'Benevolence: caring constrained', 'Benevolence: dependability attained', 'Benevolence: dependability constrained', 'Universalism: concern attained', 'Universalism: concern constrained', 'Universalism: nature attained', 'Universalism: nature constrained', 'Universalism: tolerance attained', 'Universalism: tolerance constrained']

Refined:                 {"Self-direction: thought":{"constrained":0.25},"Self-direction: action":{"attained":0.5},"Face":{"constrained":0.25}}
Refined coarse:          {"Self-direction":{"attained":0.3333333333333333,"constrained":0.16666666666666669},"Face":{"constrained":0.25}}
Original:                {"Self-direction":{"attained":0.3333333333333333,"constrained":0.16666666666666669}}

Refined (no attainment): {"Self-direction: thought":0.25,"Self-direction: action":0.5,"Face":0.25}

3. Loading Data

For this tutorial, we use a part of the dataset that was used in the ValueEval’23 competition.

3.1. Getting Data

For this tutorial, we download the data:

%env DATASET_URL=https://zenodo.org/records/10564870/files/arguments-test.tsv?download=1
!wget "$DATASET_URL" -O dataset.tsv

env: DATASET_URL=https://zenodo.org/records/10564870/files/arguments-test.tsv?download=1
--2026-06-22 22:16:40--  https://zenodo.org/records/10564870/files/arguments-test.tsv?download=1
Resolving zenodo.org (zenodo.org)... 188.184.103.118, 188.184.98.114, 137.138.52.235, ...
Connecting to zenodo.org (zenodo.org)|188.184.103.118|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 290185 (283K) [application/octet-stream]
Saving to: ‘dataset.tsv’

dataset.tsv           0%[                    ]       0  --.-KB/s               dataset.tsv         100%[===================>] 283.38K  --.-KB/s    in 0.07s   

2026-06-22 22:16:41 (4.10 MB/s) - ‘dataset.tsv’ saved [290185/290185]

Alternatively, all interactive online environments include a file browser that allows you to upload and download files. Examples: Binder or Jupyter4NFDI

You can upload and use your own data for the tutorial; however, we recommend working through the tutorial using the ValueEval’23 data first before using your own data.

To verify that the download was successful, you can view the first few lines of the dataset as follows:

!head dataset.tsv

Argument ID Conclusion  Stance  Premise
A26004  We should end affirmative action    against affirmative action helps with employment equity.
A26010  We should end affirmative action    in favor of affirmative action can be considered discriminatory against poor whites
A26016  We should ban naturopathy   in favor of naturopathy is very dangerous for the most vulnerable people, like children and cancer patients. people use ineffective treatments and forgo proven cures, such as antibiotics or chemo, often resulting in death.
A26024  We should prohibit women in combat  in favor of women shouldn't be in combat because they aren't as strong or fast as men and can be a weak link
A26026  We should ban naturopathy   in favor of once eradicated illnesses are returning due to people turning to naturopathy.
A26045  We should end racial profiling  in favor of racial profiling is a preconceived idea of people that views an entire race as criminal
A26060  We should end affirmative action    in favor of affirmative action is not fair to the rest of the people and knowing that should not be implemented.
A26064  We should prohibit flag burning against flag burning is a freedom of speech and as such is protected under the constitution. it is an act of defiance that must continue to be allowed to protect freedoms for all.
A26068  We should subsidize stay-at-home dads   in favor of we should subsidize stay-at-home dads because with them staying home, they more and likely would not have the same income level had the wife stayed at home since income equality still exists.

In the ValueEval’23 dataset, the “Premise” column contains the texts that should be classified.

3.2. Segmenting Texts

Human values are typically expressed in texts at a small scale, such as in sentences, phrases, or sometimes even individual words. Therefore, classifiers for human values generally take short texts (one or two sentences) as input.

However, many analyses operate at a larger scale and focus on the prevailing values of entire documents or even collections of documents. In such cases, it is therefore advisable to first divide the documents into smaller “segments” for classification and then recombine the classifications of each document’s segments for analysis (see Section 5, “Analyzing Classifications”).

This tutorial uses the sentencex library to segment texts into sentences. We install sentencex and define the segmentation function here. This function will then be applied in the following sections.

Although the texts (the “Premises”) in the ValueEval’23 dataset are short and segmentation is not strictly necessary, we will nevertheless segment them in this tutorial for illustrative purposes.

!pip install sentencex
import sentencex

def segment_text(text, language="en"):
  return [sentence.strip() for sentence in sentencex.segment(language, text)]

3.3. Reading Data

The following sections illustrate how to read data of different formats. For the ValueEval’23 example, see Section 3.3.1, “Reading Comma or Tab-Separated Value Files” and ignore the others. If you use your own data, use the section that fits best.

3.3.1. Reading Comma or Tab-Separated Value Files

To read files in comma or tab-separated values format, use the pyvalues.Document.read_tsv function.

First, look once more at the first lines of the dataset:

!head -n 4 dataset.tsv

Argument ID Conclusion  Stance  Premise
A26004  We should end affirmative action    against affirmative action helps with employment equity.
A26010  We should end affirmative action    in favor of affirmative action can be considered discriminatory against poor whites
A26016  We should ban naturopathy   in favor of naturopathy is very dangerous for the most vulnerable people, like children and cancer patients. people use ineffective treatments and forgo proven cures, such as antibiotics or chemo, often resulting in death.

We thus use the following parameters when loading the data:

delimiter="\t", as we are reading a tab-separated values file
document_id_field="Argument ID", as the “Argument ID” contains an identifier for each text
text_field="Premise", as the “Premise” field contains the text to be classified
segmenter=segment_text, to use the sentence segmenter defined in Section 3.2 to split each text into sentences, which are then classified

documents = list(pyvalues.Document.read_tsv(
    "dataset.tsv",
    delimiter="\t",
    document_id_field="Argument ID",
    text_field="Premise",
    segmenter=segment_text)
)

# print first two documents
for index in range(3):
  print(documents[index].model_dump_json(indent=2, exclude_defaults=True))

{
  "id": "A26004",
  "segments": [
    "affirmative action helps with employment equity."
  ]
}
{
  "id": "A26010",
  "segments": [
    "affirmative action can be considered discriminatory against poor whites"
  ]
}
{
  "id": "A26016",
  "segments": [
    "naturopathy is very dangerous for the most vulnerable people, like children and cancer patients.",
    "people use ineffective treatments and forgo proven cures, such as antibiotics or chemo, often resulting in death."
  ]
}

Pre-Segmented Datasets

In some datasets (like the ValueEval’24 dataset), the documents are already segmented beforehand. If this is the case in the dataset you are using, and if there is some column that specifies which document each segment (i.e., line in the dataset file) originates from:

Remove the segmenter=segment_text parameter to use the lines as segment
Make sure the document_id_field=COLUMN_NAME parameter is set correctly, with COLUMN_NAME being the name of the column that specifies the document, to group the segments with the same COLUM_NAME together, in the same order as they appear in the dataset file

Non-English Texts

If the texts are not (only) in English:

If all texts of the file are in the same language, add the language=XX parameter, with XX being the respective ISO 639-1 / alpha-2 language code
If the documents or segments of the file have different languages, and if there is some column that contains the ISO 639-1 / alpha-2 language code of the respective language, add the language_field=COLUMN_NAME parameter, with COLUMN_NAME being the name of the column that specifies the language

The language is used both by the text segmenter (if any) and human values classifiers.

3.3.2. Reading Text Files

To read files in from a directory, each one corresponding to one document, use the pyvalues.Document.read_txt function.

We define a helper function to read all files from a given directory:

def read_text_files(directory_path, segmenter=segment_text, language="en"):
  import os
  documents = []
  for filename in os.listdir(directory_path):
    file_path = os.path.join(directory_path, filename)
    if os.path.isfile(file_path):
      document_id = os.path.splitext(filename)[0] # filename without extension
      documents.append(pyvalues.Document.read_txt(
          file_path,
          document_id=document_id,
          segmenter=segmenter,
          language=language
      ))
  return documents

Use the helper function like this (replace path/to/your/directory):

directory_path = "path/to/your/directory"
# uncomment and execute to use the files in the directory instead of the ones from Section 3.3.1
# documents = read_text_files(directory_path, language="en")

4. Classifying Texts

Human value classifiers are computational methods that detect human values in text. For each sentence, they predict a score for each value * A score close to 0 means the classifier is confident in predicting that the sentence does not refer to (or attain or constrain) the value. * A score close to 1 means the classifier is confident in predicting that the sentence refers to (or attains or constrains) the value. * A score between 0 and 1 means the classifier places its confidence in-between, with a score 0.5 meaning the classifier is absolutely unsure.

So that the classifications of the differenct classifiers can be used in sections 5 and 6 of this tutorial, we store them in classifications_by:

classifications_by = {}

To continue with this tutorial, execute at least one of the subsections (“Classifying with …”).

4.1. Classifying with the Dictionary Classifier

This method checks whether the text contains a word from a predefined list (“dictionary”) that indicates a specific value. The dictionaries for each value were developed by Ponizovskiy et al. (2020).

from pyvalues.dictionary_classifier import OriginalValuesDictionaryClassifier
dictionary_classifier = OriginalValuesDictionaryClassifier.get_default(
    score_threshold = 1, # sets how often words from a dictionary have to occur in a text for it to be classified as indicating the value
    max_values = 0  # If not 0, at most this many values are classified (ranked by how often words from the dictionary occur in the text)
)

For illustration, the following line shows 10 words from the English dictionary that are used as indicators of “Achievement”.

print([word for word, values in dictionary_classifier._classifiers["en"]._dictionaries.items() if "Achievement" in values][0:10])

['accomplish', 'accomplished', 'achieve', 'achievement', 'achieving', 'advance', 'advancement', 'advantage', 'appreciate', 'appreciation']

A separate dictionary is needed for each language and value. The classifier used here only contains dictionaries for the original values and for a few languages:

print(list(dictionary_classifier._classifiers.keys()))

['bg', 'de', 'el', 'en', 'fr', 'it', 'nl']

To classify the sentences according to the dictionaries, use classify_documents_for_original_values:

# classify each segment
classifications_by["dictionary"] = list(
    dictionary_classifier.classify_documents_for_original_values(documents)
)

print(classifications_by["dictionary"][0])

# print classifications for first 3 documents
for document_index in range(3):
  classifications = classifications_by["dictionary"][document_index]
  print(f"document{document_index}: {classifications.id}")
  for index in range(len(classifications.segments)):
    segment = classifications.segments[index]
    values = classifications.values[index]
    print(f"    '{segment}'\n    -> {values}")

id='A26004' language='en' segments=['affirmative action helps with employment equity.'] values=[OriginalValues(self_direction=1.0, stimulation=0.0, hedonism=0.0, achievement=1.0, power=0.0, security=0.0, tradition=0.0, conformity=0.0, benevolence=0.0, universalism=0.0)]
document0: A26004
    'affirmative action helps with employment equity.'
    -> {"Self-direction":1.0,"Achievement":1.0}
document1: A26010
    'affirmative action can be considered discriminatory against poor whites'
    -> {"Self-direction":1.0}
document2: A26016
    'naturopathy is very dangerous for the most vulnerable people, like children and cancer patients.'
    -> {"Security":1.0,"Universalism":1.0}
    'people use ineffective treatments and forgo proven cures, such as antibiotics or chemo, often resulting in death.'
    -> {}

By default, whenever one word from a dictionary occurs, the text is classified as indicating the corresponding value. But this behavior can be changed through the score_threshold and max_values parameters (see the creation of the classifier above).

Also, own dictionaries can be used when constructing an OriginalValuesDictionaryClassifier (or using the one for the respective value schema) instead of using get_default().

4.2. Classifying with the ValueEval’24 Classifier

This approach to classify a text for values is a trained multilingual language model (Legkas et al., 2024). It won the Touché ValueEval’24 Human Value Detection Task. As it sometimes needs the context of a sentence to interpret is properly, this classifier considers for a classification also previous sentences and how they were classified.

The classifier classifies for the 19 refined values with attainment for the 9 languages of the ValueEval’24 task.

To run the classifier, you can either install and use it in this environment (Section 4.2.1, requires a 20GB GPU for reasonable speed - there is a check in Section 1 for whether you have a GPU available and how large it is), or use an API that serves the classifier remotely (Section 4.2.2, requires an API token).

4.2.1. Using the Classifier via Installation

Installing and loading the ValueEval’24 classifier (package valueeval24_hierocles_of_alexandria) can take several minutes.

Moreover, the original classifier requires a graphics card with at least 20 GB of memory. If only a smaller graphics card is available, a smaller (less accurate) version will be loaded automatically. If you do not have a graphics card with at least 5 GB of memory, the classifier will load without a graphics card instead (very slow).

!pip install valueeval24-hierocles-of-alexandria==0.11.1
import valueeval24_hierocles_of_alexandria
valueeval24_classifier = valueeval24_hierocles_of_alexandria.ValueEval24Classifier()

The classifier is available for nine different languages

print(valueeval24_hierocles_of_alexandria.multi_head_model.lang_dict.keys())

Now that you have installed the classifier, continue with Section 4.2.3.

4.2.2. Using the Classifier via Remote API

The classifier is also hosted on remote servers that you can access with an API token. The RemoteClassifier sends the documents to the remote server and receives the classifications back from it. No GPU is needed here.

One such remote server is managed by the Methods Hub and you might be able to get an API key this way. First, check that you are using the latest version of this tutorial by following this link to it on the Methods Hub, then open it again, and come back to this section. After you made sure that you are on the latest version, read on.

The Methods Hub is currently running an experimental service to host this classifier for social scientists to use it in their research. For as long as it is experimental, the service is free of charge, except that you have to provide us with a few details on what you plan to use the classifier on (including the order of magnitude of sentences you want to classify), provide us with a bit of feedback after use, and mention (via DOI) this tutorial in any publication that uses the classifications. Contact us for more information and to get your token.

Once you have the API token, you can run the code below, which will ask you to paste the API token.

from pyvalues.remote_classifier import RefinedValuesWithAttainmentRemoteClassifier
from getpass import getpass

valueeval24_classifier_url = "https://hierocles-of-alexandria.methodshub.workers.dev"
valueeval24_classifier_token = getpass("Paste your API token (and hit Enter): ")

valueeval24_classifier = RefinedValuesWithAttainmentRemoteClassifier(
            url=valueeval24_classifier_url,
            authorization_token=valueeval24_classifier_token
        )

Now that you have set up the classifier, continue with Section 4.2.3.

4.2.3. Running the Chosen Classifier

Classification works like for all other classifiers (using classify_documents_for_original_values etc.).

As this classifier needs some time for each classification, we classify only the first 5 documents in this tutorial. If you want to classify the entire dataset, remove the [:5].

If you get an error 503 Service Unavailable when you run the classifier, wait two minutes and try again: When the remote API is not used for some time, it will power off and needs a minute to start again.

# classify first 5 documents
classifications_by["valueeval24"] = list(
    valueeval24_classifier.classify_documents_for_original_values(documents[:5])
)

# print classifications for first 3 documents
for document_index in range(3):
  classifications = classifications_by["valueeval24"][document_index]
  print(f"document{document_index}: {classifications.id}")
  for index in range(len(classifications.segments)):
    segment = classifications.segments[index]
    values = classifications.values[index]
    print(f"    '{segment}'\n    -> {values}")

While this tutorial uses classify_documents_for_original_values for consistency with other classifiers, this classifier is actualy able to classify for the 19 refined values and with assignment. Use classify_documents_for_refined_values_with_attainment to get the respective classifications.

5. Analyzing Classifications

5.1. Saving and Loading Classifications

After values are classified, you should save them for future use. We here write a separate file for each classifier:

for classifier, classifications in classifications_by.items():
  with open(f"{classifier}.tsv", "w") as f:
    print(f"Saving {classifier} results to {f.name}")
    writer = pyvalues.OriginalValues.writer_tsv_with_text(f)
    writer.write_documents(classifications)

Saving dictionary results to dictionary.tsv

For example, if you used the Dictionary classifier (Section 4.1), its results will be in dictionary.tsv:

!head dictionary.tsv # read first 10 lines of the file

ID  Text    Language    Self-direction  Stimulation Hedonism    Achievement Power   Security    Tradition   Conformity  Benevolence Universalism
A26004  affirmative action helps with employment equity.    en  1.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0
A26010  affirmative action can be considered discriminatory against poor whites en  1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
A26016  naturopathy is very dangerous for the most vulnerable people, like children and cancer patients.    en  0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0
A26016  people use ineffective treatments and forgo proven cures, such as antibiotics or chemo, often resulting in death.   en  0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
A26024  women shouldn't be in combat because they aren't as strong or fast as men and can be a weak link    en  0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0
A26026  once eradicated illnesses are returning due to people turning to naturopathy.   en  0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
A26045  racial profiling is a preconceived idea of people that views an entire race as criminal en  1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
A26060  affirmative action is not fair to the rest of the people and knowing that should not be implemented.    en  1.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0
A26064  flag burning is a freedom of speech and as such is protected under the constitution.    en  1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0

If you are reading this tutorial in an online virtual environment, there is usually a way to browse and download files in the menu on the top or left. Look for a directory symbol.

When you want to continue from saved classifications, use

dataset_file_name = list(classifications_by.keys())[0] + ".tsv"
print("Reading " + dataset_file_name)
dataset_reloaded = list(pyvalues.OriginalValues.read_tsv(
    input_file=dataset_file_name,
    document_id_field="ID",
    segment_field="Text",
    language_field="Language"
))
print(f"Read {len(dataset_reloaded)} documents")

Reading dictionary.tsv
Read 1576 documents

5.2. Averaging Scores

Some research questions require an analysis of values per document (or even sets of documents) and not per segment. The straightforward way to do so is to average the confidence over all segments.

Let us look again at the scores for the first 3 documents:

for document_index in range(3):
  for segment_index in range(len(documents[document_index].segments)):
    print(f"{documents[document_index].id} segment{segment_index+1}")
    for classifier, classifications in classifications_by.items():
      if document_index <= len(classifications):
        values = classifications[document_index].values[segment_index]
        print(f"- {classifier}: {values}")

A26004 segment1
- dictionary: {"Self-direction":1.0,"Achievement":1.0}
A26010 segment1
- dictionary: {"Self-direction":1.0}
A26016 segment1
- dictionary: {"Security":1.0,"Universalism":1.0}
A26016 segment2
- dictionary: {}

Now we average within each document:

values_document_averages_by = {}
for classifier, classifications in classifications_by.items():
  with open(f"{classifier}-document-averages.tsv", "w") as output_file:
    writer = pyvalues.OriginalValues.writer_tsv(output_file)
    values_document_averages = []
    for document in classifications:
      # averaging:
      values_document_average = pyvalues.OriginalValues.average(document.values)
      values_document_averages.append(values_document_average)
      # saving:
      writer.write(values_document_average, record_id=document.id)
    values_document_averages_by[classifier] = values_document_averages

# printing:
for document_index in range(3):
  print(f"{documents[document_index].id}")
  for classifier, classifications in classifications_by.items():
      print(f"- {classifier}: {values_document_averages_by[classifier][document_index]}")

A26004
- dictionary: {"Self-direction":1.0,"Achievement":1.0}
A26010
- dictionary: {"Self-direction":1.0}
A26016
- dictionary: {"Security":0.5,"Universalism":0.5}

The average over all documents can provide an interesting overview:

values_averages_by = {}

# save the calculated averages, one line per classifier
with open("averages.tsv", "w") as output_file:
  writer = pyvalues.OriginalValues.writer_tsv(output_file)

  # calculate for each classifier
  for classifier, classifications in classifications_by.items():
    # averaging:
    values_averages_by[classifier] = pyvalues.OriginalValues.average_documents(classifications)
    # saving:
    writer.write(values_averages_by[classifier], record_id=classifier)

for classifier, values_averages in values_averages_by.items():
  print(f"- {classifier}: {values_averages}")

- dictionary: {"Self-direction":0.23006909193457417,"Stimulation":0.05045549311094996,"Hedonism":0.02832668600435098,"Achievement":0.1466075960841189,"Power":0.1412575034243816,"Security":0.11074802594472645,"Tradition":0.10887871646120378,"Conformity":0.1250475888324873,"Benevolence":0.16854352993312385,"Universalism":0.21766905366207398}

5.3. Binarizing Scores via Threshold

Typically, one is not interested in values that are classified with a small score. To get only the prevalent values, one can select only those values with a score above a certain threshold by setting these to a score of 1 and the others to a score of 0 (“binarization”). We here use a threshold of 0.1 - so if at least every tenth segment has a score of 1, the averaged and binarized score is also 1.

threshold = 0.1 # for the ValueEval'23 dataset, set to 0.51 to see how the output changes

values_binarized_by = {}
for classifier, classifications in classifications_by.items():
  with open(f"{classifier}-binarized.tsv", "w") as output_file:
    writer = pyvalues.OriginalValues.writer_tsv(output_file)
    values_binarized_by[classifier] = []
    for document_index in range(len(classifications)):
      # averaging within document:
      values = pyvalues.OriginalValues.average(classifications[document_index].values)
      # binarizing:
      values_binarized = values.binarize(threshold)
      values_binarized_by[classifier].append(values_binarized)
      # saving:
      writer.write(values_binarized, record_id=documents[document_index].id)

for document_index in range(3):
  print(f"{documents[document_index].id}")
  for classifier, values_binarized in values_binarized_by.items():
    print(f"- {classifier}: {values_binarized[document_index]}")

A26004
- dictionary: {"Self-direction":1.0,"Achievement":1.0}
A26010
- dictionary: {"Self-direction":1.0}
A26016
- dictionary: {"Security":1.0,"Universalism":1.0}

The average of the binarized document scores then gives an impression of how prevalent a value is in the entire collection of documents (relative frequencies / fraction of documents with the value):

relative_binarized_document_frequencies_by = {}
# save the calculated frequencies, one line per classifier
with open("relative-binarized-document-frequencies.tsv", "w") as output_file:
  writer = pyvalues.OriginalValues.writer_tsv(output_file)

  # calculate for each classifier
  for classifier, classifications in classifications_by.items():
    values_binarized = []
    for document_index in range(len(classifications)):
      # averaging within document:
      values = pyvalues.OriginalValues.average(classifications[document_index].values)
      # binarizing per-document averages:
      values_binarized.append(values.binarize(threshold))
    # averaging binarized values across documents:
    relative_binarized_document_frequencies_by[classifier] = pyvalues.OriginalValues.average(values_binarized)
    # saving:
    writer.write(relative_binarized_document_frequencies_by[classifier], record_id=classifier)

for classifier, relative_binarized_document_frequencies in relative_binarized_document_frequencies_by.items():
  print(f"- {classifier}: {relative_binarized_document_frequencies}")

- dictionary: {"Self-direction":0.26776649746192893,"Stimulation":0.06472081218274112,"Hedonism":0.03489847715736041,"Achievement":0.17449238578680204,"Power":0.16941624365482233,"Security":0.13324873096446702,"Tradition":0.12119289340101523,"Conformity":0.1567258883248731,"Benevolence":0.19733502538071065,"Universalism":0.25253807106598986}

5.4. Limiting to k Values Per Document

An alternative approach to getting only the prevalent values (compared to binarization, Section 5.3) is to limit the values of each document to those with the highest score. In the extreme case, one only takes the single value with highest score per document (k = 1). All scores lower than the top-k are set to 0.

k = 1
binarize = True # set the score of the top-k to 1

values_topped_by = {}
# calculate for each classifier
for classifier, classifications in classifications_by.items():
  # save the top-k values, one line per input segment
  with open(f"{classifier}-top.tsv", "w") as output_file:
    writer = pyvalues.OriginalValues.writer_tsv(output_file)
    values_topped_by[classifier] = []
    for document_index in range(len(classifications)):
      # averaging within document:
      values = pyvalues.OriginalValues.average(classifications[document_index].values)
      # taking only the top:
      values_top = values.top(k=k, binarize=binarize)
      values_topped_by[classifier].append(values_top)
      # saving:
      writer.write(values_top, record_id=documents[document_index].id)

for document_index in range(3):
  print(f"{documents[document_index].id}")
  for classifier, values_topped in values_topped_by.items():
    print(f"- {classifier}: {values_topped[document_index]}")

A26004
- dictionary: {"Self-direction":1.0}
A26010
- dictionary: {"Self-direction":1.0}
A26016
- dictionary: {"Security":1.0}

The average of the top-k-value document scores then gives an impression of how prevalent a value is in the entire collection of documents (relative frequencies / fraction of documents with the value in its top-k):

relative_topped_document_frequencies_by = {}
# save the calculated frequencies, one line per classifier
with open("relative-topped-document-frequencies.tsv", "w") as output_file:
  writer = pyvalues.OriginalValues.writer_tsv(output_file)

  # calculate for each classifier
  for classifier, classifications in classifications_by.items():
    values_topped = []
    for document_index in range(len(classifications)):
      # averaging within document:
      values = pyvalues.OriginalValues.average(classifications[document_index].values)
      # taking the top of the per-document averages:
      values_topped.append(values.top(k=k, binarize=binarize))
    # averaging topped values across documents:
    relative_topped_document_frequencies_by[classifier] = pyvalues.OriginalValues.average(values_topped)
    # saving:
    writer.write(relative_topped_document_frequencies_by[classifier], record_id=classifier)

for classifier, relative_topped_document_frequencies in relative_topped_document_frequencies_by.items():
  print(f"- {classifier}: {relative_topped_document_frequencies}")

- dictionary: {"Self-direction":0.43147208121827413,"Stimulation":0.03680203045685279,"Hedonism":0.02601522842639594,"Achievement":0.10532994923857868,"Power":0.09390862944162437,"Security":0.07106598984771574,"Tradition":0.06091370558375635,"Conformity":0.04949238578680203,"Benevolence":0.05520304568527919,"Universalism":0.06979695431472081}

Note that, since ties are broken always in the same order, taking the top-k is less expressive if values are often tied in score (e.g., when the ValueEval’23 dataset is classified by the Dictionary classifier).

5.5. Visualizing Scores

To get a visual impression of the calculated scores, a radar plot can be used to show the scores like in the circle of the human values theory. The visualization works for all kinds of scores, whether they are created through binarization, top-k-filtering, or something else.

scores = [pyvalues.OriginalValues.average_documents(documents) for documents in classifications_by.values()]
# uncomment the followin lines to look at these other scores:
# scores = list(relative_binarized_document_frequencies_by.values())
# scores = list(relative_topped_document_frequencies_by.values())

plot = pyvalues.OriginalValues.plot_all(
    scores,
    labels=list(classifications_by.keys()),
    gridlines=[0.1,0.2,0.3,0.4,0.5] # lines in the chart
)
plot.title("Scores per Classifier", pad=20)
plot.savefig("scores-per-classifier.pdf") # save the plot as file

Note that, if you used only the first 5 documents for the ValueEval24 classifier (as is default in this tutorial), the plot will look very “spiky” for the classifier. Plots like the one above are only meaningful for a larger number of documents.

6. Evaluating

Evaluation answers the question of well a classifier performed. It relies on human value labels (or “codes”) that are assumed to be correct (thus called “ground truth”) to compare them with the classifications of the classifier. Through this comparison, one can judge how often a classifier classified correctly or incorrectly, and thus can understand how well a classifier does and how it compares to other classifiers in terms of performance.

6.1. Getting Data

Similar to Section 3.1, we download the human labels (the “ground truth”) of the same dataset that was used in the ValueEval’23 competition:

%env GROUND_TRUTH_URL=https://zenodo.org/records/10564870/files/labels-test.tsv?download=1
!wget "$GROUND_TRUTH_URL" -O ground-truth.tsv

# the human annotations ("ground truth") uses the refined values scheme
ground_truth_refined = pyvalues.RefinedValues.read_tsv(
  input_file="ground-truth.tsv",
  document_id_field="Argument ID"
)
# this tutorial uses the original 10 values, so we convert them:
ground_truth = [
    pyvalues.ValuesAnnotatedDocument[pyvalues.OriginalValues](
        id=document.id,
        values=[values.convert(pyvalues.OriginalValues) for values in document.values]
    ) for document in ground_truth_refined
]
# just so we know that something was indeed read:
print(f"Read {len(ground_truth)} ground-truth labels")

env: GROUND_TRUTH_URL=https://zenodo.org/records/10564870/files/labels-test.tsv?download=1
--2026-06-22 22:16:47--  https://zenodo.org/records/10564870/files/labels-test.tsv?download=1
Resolving zenodo.org (zenodo.org)... 137.138.153.219, 188.185.43.153, 188.184.98.114, ...
Connecting to zenodo.org (zenodo.org)|137.138.153.219|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 74444 (73K) [application/octet-stream]
Saving to: ‘ground-truth.tsv’

ground-truth.tsv      0%[                    ]       0  --.-KB/s               ground-truth.tsv    100%[===================>]  72.70K  --.-KB/s    in 0.02s   

2026-06-22 22:16:47 (2.98 MB/s) - ‘ground-truth.tsv’ saved [74444/74444]

Read 1576 ground-truth labels

6.2. Classifying with the Naive Baseline Classifier

Zo judge classifiers, it is most often useful to compare their performance to the performance of a naive baseline classifier, i.e., a classifier which does something that is easy to comprehend, even if it does not really classify values. For example, a classifier which assigns labels without looking at the input text.

A good baseline for human value classification is the AllAttainedClassifier, which classifies each segment as attaining all values. It is very straightforward and comprehensible, but for some measures of classification performance it reaches quite goow values, even though it would not be useful to apply it in a real analysis. But specifically that makes it so interesting to have it: if another classifier does not perform better than the AllAttainedClassifier, this tells one that one should not use this other classifier for real analyses either.

from pyvalues.baseline_classifier import AllAttainedClassifier
all_attained_classifier = AllAttainedClassifier()
classifications_by["all_attained"] = list(
    all_attained_classifier.classify_documents_for_original_values(documents)
)

# print classifications for first 3 documents
for document_index in range(3):
  classifications = classifications_by["all_attained"][document_index]
  print(f"document{document_index}: {classifications.id}")
  for index in range(len(classifications.segments)):
    segment = classifications.segments[index]
    values = classifications.values[index]
    print(f"    '{segment}'\n    -> {values}")

document0: A26004
    'affirmative action helps with employment equity.'
    -> {"Self-direction":1.0,"Stimulation":1.0,"Hedonism":1.0,"Achievement":1.0,"Power":1.0,"Security":1.0,"Tradition":1.0,"Conformity":1.0,"Benevolence":1.0,"Universalism":1.0}
document1: A26010
    'affirmative action can be considered discriminatory against poor whites'
    -> {"Self-direction":1.0,"Stimulation":1.0,"Hedonism":1.0,"Achievement":1.0,"Power":1.0,"Security":1.0,"Tradition":1.0,"Conformity":1.0,"Benevolence":1.0,"Universalism":1.0}
document2: A26016
    'naturopathy is very dangerous for the most vulnerable people, like children and cancer patients.'
    -> {"Self-direction":1.0,"Stimulation":1.0,"Hedonism":1.0,"Achievement":1.0,"Power":1.0,"Security":1.0,"Tradition":1.0,"Conformity":1.0,"Benevolence":1.0,"Universalism":1.0}
    'people use ineffective treatments and forgo proven cures, such as antibiotics or chemo, often resulting in death.'
    -> {"Self-direction":1.0,"Stimulation":1.0,"Hedonism":1.0,"Achievement":1.0,"Power":1.0,"Security":1.0,"Tradition":1.0,"Conformity":1.0,"Benevolence":1.0,"Universalism":1.0}

6.3. Calculating Classifier Performance

The F-score is a typical measure of performance for human value classifications. It condenses performance into a single number by taking the harmonic mean of precision and recall. For analyzing errors, however, one should rather inspect its constituents:

Precision: ratio of predicted values that are correct according to the ground truth (“if the classifier classified a value, what is the probability it is correct?”)
Recall: ratio of values in the ground truth that are also predicted (“what percentage of the documents for a value did the classifier find?”)
F-score: 2 ⋅ precision ⋅ recall / (precision + recall)

fscores = {} # for Section 6.4

for classifier, classifications in classifications_by.items():
  # convert per-segment classifications to per-document classifications
  per_document_classifications = [
      pyvalues.ValuesAnnotatedDocument[pyvalues.OriginalValues](
          values=[pyvalues.OriginalValues.average(document.values)]
      )
    for document in classifications
  ]

  # compare classifications with ground truth
  evaluation = pyvalues.OriginalValues.evaluate_documents(
      per_document_classifications,
      # in case not all documents were classified, bring ground truth to the same size
      ground_truth[:len(per_document_classifications)]
  )

  # calculate (macro-)average F-Score, precision, and recall
  num_values_in_ground_truth = len(evaluation.get_values_in_ground_truth())
  f, precision, recall = evaluation.f()
  macro_f = sum(f.to_list()) / num_values_in_ground_truth
  macro_precision = sum(precision.to_list()) / num_values_in_ground_truth
  macro_recall = sum(recall.to_list()) / num_values_in_ground_truth

  # print results
  print(f'{classifier} (evaluated on {len(per_document_classifications)} documents with {num_values_in_ground_truth} values)')
  print(f'- F-Score:   {macro_f}; {f}')
  print(f'- Precision: {macro_precision}; {precision}')
  print(f'- Recall:    {macro_recall}; {recall}')
  fscores[classifier] = f

dictionary (evaluated on 1576 documents with 10 values)
- F-Score:   0.303152925651196; {"Self-direction":0.3746898263027295,"Stimulation":0.11111111111111112,"Hedonism":0.13513513513513514,"Achievement":0.37575757575757573,"Power":0.2997762863534676,"Security":0.2896281800391389,"Tradition":0.3438395415472779,"Conformity":0.25193798449612403,"Benevolence":0.4264099037138927,"Universalism":0.42324371205550737}
- Precision: 0.40582430140416986; {"Self-direction":0.39841688654353563,"Stimulation":0.10588235294117647,"Hedonism":0.10416666666666667,"Achievement":0.5,"Power":0.2838983050847458,"Security":0.8131868131868132,"Tradition":0.3314917127071823,"Conformity":0.3051643192488263,"Benevolence":0.543859649122807,"Universalism":0.6721763085399449}
- Recall:    0.26887214825856026; {"Self-direction":0.35362997658079626,"Stimulation":0.11688311688311688,"Hedonism":0.19230769230769232,"Achievement":0.30097087378640774,"Power":0.3175355450236967,"Security":0.1761904761904762,"Tradition":0.35714285714285715,"Conformity":0.2145214521452145,"Benevolence":0.3506787330316742,"Universalism":0.30886075949367087}
all_attained (evaluated on 1576 documents with 10 values)
- F-Score:   0.351900871361855; {"Self-direction":0.42636045931103345,"Stimulation":0.09316394434361766,"Hedonism":0.03245942571785269,"Achievement":0.41448692152917505,"Power":0.23614997202014548,"Security":0.6953642384105961,"Tradition":0.1926605504587156,"Conformity":0.3225119744544971,"Benevolence":0.4380574826560952,"Universalism":0.6677937447168216}
- Precision: 0.23451776649746195; {"Self-direction":0.27093908629441626,"Stimulation":0.04885786802030457,"Hedonism":0.01649746192893401,"Achievement":0.2614213197969543,"Power":0.13388324873096447,"Security":0.5329949238578681,"Tradition":0.1065989847715736,"Conformity":0.19225888324873097,"Benevolence":0.28045685279187815,"Universalism":0.501269035532995}
- Recall:    1.0; {"Self-direction":1.0,"Stimulation":1.0,"Hedonism":1.0,"Achievement":1.0,"Power":1.0,"Security":1.0,"Tradition":1.0,"Conformity":1.0,"Benevolence":1.0,"Universalism":1.0}

The output above contains for each classifier and measure (F-Score, precision, recall) the averaged result across values and the result for each value separately.

6.4. Visualizing Classifier Performance

A radar plot allows to visually compare the performance (F-Scores calculated in Section 6.3) of multiple classifiers:

plot = pyvalues.OriginalValues.plot_all(list(fscores.values()), labels=list(fscores.keys()))
plot.title("F-score per Classifier", pad=20);

7. Conclusion

This tutorial showed how to use the pyvalues library and different classifiers to classify segments and documents according to human values schemas. Moreover, it showed how to post-process and visualize the classifications, as well as how to evaluate classifications.

References

Legkas, S., Christodoulou, C., Zidianakis, M., Koutrintzes, D., Dagioglou, M., & Petasis, G. (2024). Hierocles of Alexandria at Touché: Multi-task & multi-head custom architecture with transformer-based models for human value detection. In G. Faggioli, N. Ferro, P. Galuscakova, & A. García Seco Herrera (Eds.), Working Notes Papers of the CLEF 2024 Evaluation Labs (Vol. 3740, CEUR Workshop Proceedings, pp. 3419–3432).

Ponizovskiy, V., Ardag, M., Grigoryan, L., Boyd, R., Dobewall, H., Holtz, P. (2022). Development and Validation of the Personal Values Dictionary: A Theory–Driven Tool for Investigating References to Basic Human Values in Text. European Journal of Personality, 34 (5), 885–902. https://doi.org/10.1002/per.2294

Reitis-Münstermann, T., Schulze Brock, P., Scharfbillig, M., Stefanovitch, N., & De Longueville, B. (2024). Values in news and political manifestos: Annotation guidelines. Publications Office of the European Union. https://doi.org/10.2760/7398

Schwartz, S. H., Cieciuch, J., Vecchione, M., Davidov, E., Fischer, R., Beierlein, C., Ramos, A., Verkasalo, M., Lönnqvist, J. E., Demirutku, K., Dirilen-Gumus, O., & Konty, M. (2012). Refining the theory of basic individual values. Journal of personality and social psychology, 103(4), 663–688. https://doi.org/10.1037/a0029393