Better than Maximum Likelihood Estimation of Model-based and Model-free Learning Styles

Yazdani, Sadjad; Vahabie, Abdol-Hossein; Nadjar-Araabi, Babak; Nili Ahmadabadi, Majid

doi:10.32598/bcn.2024.5883.1

Accepted Articles Back to the articles list | Back to browse issues page

‎ 10.32598/bcn.2024.5883.1

Better than Maximum Likelihood Estimation of Model-based and Model-free Learning Styles

Sadjad Yazdani¹

, Abdol-Hossein Vahabie ^*¹

, Babak Nadjar-Araabi¹

, Majid Nili Ahmadabadi¹

1- School of Electrical and Computer Engineering, University of Tehran, Tehran, Iran.

Abstract:

Various decision-making systems work together to shape human behavior. Goal-directed and habitual systems are the two most important systems studied by reinforcement learning (RL) through model-based (MB) and model-free (MF) learning styles, respectively. Human behavior resembles the combination of these two decision-making paradigms, achieved by the weighted sum of the action values of the two styles in an RL framework. The weighting parameter is often extracted by the maximum likelihood (ML) or maximum a-posteriori (MAP) estimation method. In this study, we employ RL agents that use a combination of MB and MF decision-making to perform the well-known Daw two-stage task. ML and MAP methods result in less reliable estimates of the weighting parameter, where a large bias toward extreme values is often observed. We propose the k‑nearest neighbor as an alternative nonparametric estimate to improve the estimation error, where we devise a set of 20 features extracted from the behavior of the RL agent. Simulated experiments examine the proposed method. Our method reduces the bias and variance of the estimation error based on the obtained results. Human behavior data from previous studies is investigated as well. The proposed method results in predicting indices such as age, gender, IQ, the dwell time of gaze, and psychiatric disorder indices which are missed by the traditional method. In brief, the proposed method increases the reliability of the estimated parameters and enhances the applicability of reinforcement learning paradigms in clinical trials.

Keywords: Model-based and Model-free combined learning, Modeling different styles of learning, k-nearest neighbors estimation versus maximum likelihood and maximum a‑posteriori estimations, Behavioral observation analysis, Behavioral parameter estimation

Full-Text [PDF 1258 kb]

Type of Study: Original | Subject: Computational Neuroscience
Received: 2023/09/30 | Accepted: 2024/10/6

Send email to the article author

Rights and permissions
	This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Designed & Developed by : Yektaweb