About the Workshop

In neural models, generalization is the ability to apply learned knowledge to new, unseen data. Compositionality is a principle that enables such generalization by allowing complex structures to be represented and processed as combinations of simpler elements, which provides a systematic way to interpret new structures. Together, they are a key factor in bridging the gap between learning and genuine adaptability in AI.
This one‑day workshop aims to discuss compositionality, structured representation, and the integration of neural and symbolic reasoning in modern learning systems.

Our research:

Our work investigates hybrid models of reasoning, a neuro-symbolic approach to deductive inference that integrates neural learning with symbolic logic within restricted fragments of natural language. We began with a pilot study [1] on a simple propositional corpus to examine whether neural networks can assist a symbolic prover by selecting necessary formulas from a knowledge base to prove a given hypothesis. We then extended this work [2] to the syllogistic fragment, evaluating feedforward, recurrent, convolutional, and transformer architectures. Despite the simplicity of the experimental setup—training and testing on a single knowledge base with one-hot encoded inputs—our results indicated that models trained from scratch failed to capture the underlying logical structure. To validate and extend these findings, we next employed modern pretrained language models for formula selection in direct and indirect proofs, integrating neural assistants with a symbolic prover to evaluate their interaction within a hybrid reasoning framework [3]. In this phase, models were trained on multiple knowledge bases and tested on unseen ones, using pseudoword-based textual representations for both input and output. To further investigate the compositionality limitation in neural models, and in collaboration with researchers from the University of Trento, we explored a meta-learning approach on our syllogistic corpus [4] to study how models adapt to novel reasoning patterns in the formula selection task.

Compositionality is a fundamental component of reasoning and continues to challenge neural models. Our results suggest that hybrid architectures integrating symbolic inference with neural learning offer a promising path toward overcoming these limitations and, more broadly, a compelling direction for future exploration—shedding light on how structured reasoning can emerge from pattern-based learning systems in AI.

The research project “Hybrid Models of Reasoning”, funded by the National Science Centre, Poland, is conducted by Maciej Malicki (University of Warsaw), Jakub Szymanik (University of Trento), and Manuel Vargas Guzmán (University of Warsaw).

Published Papers

Program

Invited Speakers

Dieuwke Hupkes (Meta AI Research)
Compositional generalisation in LLMs through multilingual lenses

“Good generalisation” is often mentioned as a desirable property for NLP models. For LLMs, in the light of the sheer training corpora, among other things, it becomes more and more challenging to evaluate it as well as assess its importance. In this presentation, I discuss the (challenges) of evaluating compositional generalisation in LLMs, with a special focus on how multilingual datasets can help.

Ian Pratt-Hartmann (University of Manchester & University of Opole)
Natural Language Inference: from Aristotle to AI

For most of recorded history, logic was seen as an attempt to systematize the entailment patterns observed in natural—that is to say, human—languages. Only with the rise of quantification theory and the emergence of mathematical logic at the end of the nineteenth century did the syntactic structure of natural language lose its pre-eminence. Recently, however, there has been a resurgence of interest in natural language reasoning, as a result of two very different developments. The first is the discovery of a rich, complexity-theoretic landscape among fragments of natural languages defined by the syntactic devices they feature: quantifying determiners, relative clauses, ditransitive verb, passive constructions, anaphora, and so on. The second is the recent rise of transformer-based language models, which can be fine-tuned to solve a range of natural language inference tasks. In this talk I combine both these strands of research to direct the spotlight back on logical systems based on natural, rather than, formal, languages. As I shall argue, the study of such systems opens up new avenues of logical research.

Andrea de Varda (MIT)
Behavioral and structural signatures of human-like reasoning in LLMs

What does it mean for a model to reason in a human-like way? A core signature of cognitive effort in psychology is reaction time: harder problems take longer because they require more intermediate steps. We show that large reasoning models capture this cost of thinking. Across seven reasoning domains, the length of a model's chain of thought predicts human reaction times, tracking both item-level difficulty and broader task-level demands. This alignment is robust across models and is substantially stronger for reasoning models than for base LLMs. Motivated by this correspondence in behavior, we ask whether similarities between humans and models extend to the internal organization of their reasoning systems. Intelligent behavior in humans is supported by a set of specialized brain networks that segregate language processing, domain-general reasoning, social reasoning, and intuitive physics. Drawing inspiration from neuroscience, we used task-based functional localization in LLMs to identify units that selectively respond to tasks in each of these domains. We found that the units' selectivity profiles exhibit the same within-domain overlap and across-domain separability observed in the human brain. Together, these findings show that LLMs and their reasoning-optimized variants not only mirror patterns of cognitive effort but also develop emergent functional structure reminiscent of the modular architecture supporting human thought.

Marcin Miłkowski (Polish Academy of Sciences)
Composing Moves: How Procedural Memory Builds Novel Action

Compositionality is fundamental to both human and artificial cognition, yet theories of procedural memory often underestimate its representational demands. This presentation argues that any adequate account of skilled action must satisfy three conceptual desiderata rooted in compositionality: method-specific directivity (disambiguating kinematically equivalent execution paths), hierarchical sequencing (implementing conditional branching and nested timing), and dual error evaluation (distinguishing execution noise from content mismatch). Empirical patterns from apraxia research—where patients execute isolated movements but cannot combine them into novel tool-use sequences or meaningless gestures—reveal compositionality as a distinct theoretical requirement, not a product of associative learning. The analysis demonstrates that anti-representationalist appeals to "smooth coping" or affordance-responsiveness evade these constraints by masking the necessary representational architecture. Action guidance requires hybrid concepts merging descriptive properties with directive force; otherwise, the interface between linguistic instruction and motor execution remains unexplained. By articulating the minimal conceptual requirements for procedural memory, this framework compels both embodied cognition and artificial intelligence research to confront the compositional logic of action, precluding theoretical shortcuts and establishing a foundation for genuine practical rationality in biological and artificial agents.

Justyna Grudzińska-Zawadowska (University of Warsaw)
Disentangling Form and World Knowledge in LLM Interpretation: Evidence from Quantifier Scope Disambiguation

This talk summarizes a series of studies that use Quantifier Scope Disambiguation (QSD) as a probing task for understanding how LLMs construct meaning. The central question motivating this work is how to design an approach that effectively incorporates world knowledge into QSD, and more broadly, how LLMs balance linguistic form with background knowledge when selecting an interpretation. Across several projects, we first developed corpora that minimize the influence of surface-level cues. For Polish, a scope-annotated corpus was constructed and carefully balanced for linear order, quantifier type, and grammatical role. For English, we created an analogous balanced dataset by combining corpus-based materials with generated and hand-designed examples. In parallel, we conducted an experiment using entirely invented sentences with no real-world referents; this setting forces LLMs to rely solely on formal cues, allowing us to measure how much of their QSD performance depends on stored world knowledge versus structural patterns. Together, these resources allow us to treat QSD as a controlled interpretation probe. We then compared PLMs (HerBERT, RoBERTa) and full-scale LLMs (GPT-4o, Qwen2.5) with systems enriched with external knowledge, including dynamic RAG models that draw on ConceptNet and Simple Wikipedia. The results show that baseline models already rely on implicit world knowledge to achieve high accuracy on balanced datasets, while models with on-demand retrieval perform even better. I conclude by arguing that LLMs exhibit a form of world-knowledge-supported compositionality: their scope assignments arise not from surface heuristics alone, but from the interaction between broad background expectations about how situations typically unfold and the interpretive pressures encoded in the input.

Leonardo Bertolazzi (University of Trento)
Logic, Plausibility, and Generalization: Making LLMs More Systematic

Large language models (LLMs) can now tackle a wide range of complex tasks once exclusive to human intelligence, including mathematics, programming, and social and emotional reasoning. However, this impressive performance is often paired with surprising failures and a lack of systematicity. In this talk, I argue that we can learn valuable lessons from studies of human cognition to better understand and improve these capabilities in LLMs. I will present two applications of cognitively-inspired approaches, each addressing a different aspect of systematic reasoning in LLMs: 1. Investigating content effects in deductive reasoning. This line of work challenges the idea that LLMs learn to reason formally by examining how semantic content influences LLM performance on logical tasks, mirroring well-documented phenomena in human reasoning. Using controlled syllogistic reasoning experiments, our findings reveal that LLMs conflate logical validity with real-world plausibility. Through representational analysis, we demonstrate how validity and plausibility become entangled in the model's internal representations and explore interpretability techniques as tools for debiasing models to reason more formally. 2. Teaching systematic generalization through meta-learning. We investigated whether meta-learning could teach small language models to systematically apply logical rules. Inspired by systematic generalization in human cognition, we show that meta-learning enables models to apply inference rules from syllogistic logic to entirely novel premise structures, achieving both compositional generalization (handling shorter inference chains) and recursive generalization (handling longer chains). Together, these studies demonstrate how insights from human cognition can help us better understand and design models that are more systematic reasoners.

Manuel Vargas Guzmán (University of Warsaw)

TBA

Venue

The workshop will be held at the Institute of Philosophy and Sociology, Polish Academy of Sciences, located in the Staszic Palace, a historic building in the heart of Warsaw:

ul. Nowy Świat 72
00-330 Warszawa
Poland

How to reach the venue: Google Maps

The venue is conveniently located near the University of Warsaw, with bus stops (Uniwersytet) and the M2 subway station (Nowy Świat-Uniwersytet) in close proximity. Tickets can be purchased from ticket machines (3.40 PLN for 20 minutes or 4.40 PLN for 70 minutes). The entrance to the Staszic Palace, home of the Institute, is easily identifiable by the Nicolaus Copernicus Monument .

The workshop sessions will take place on the third floor, in Room 268.

Show Room Map Room 268 map

Attendance is free and open to all. No registration is required.

Contact

For questions about the workshop, please contact us at:
compositionalityworkshop@gmail.com