ASD-SL Dataset

The Autism Spectrum Disorder - Severity Levels (ASD-SL) dataset contains anonymized data from a consecutive sample of children referred to Department of Pediatrics - Unit of Child and Adolescent Neuropsychiatry, University Federico II of Naples, for an evaluation in a clinical suspicion of ASD. About 141 individuals (76,5\% males), aged between 18 to 156 months, received a full assessment, including historical information, structured clinical interviews and validated observations.

Autism Diagnostic Observation Schedule-2 (ADOS-2) was performed by a licensed clinician both to confirm diagnosis and to evaluate level of symptoms according to comparative score. To determine the development/intellective level, Griffiths Mental Development Scale (GMDS-ER) or Leiter International Performance Test-Revised (Leiter-R) were administered. To establish adaptive competence of all patients, parents were interviewed by Vineland Adaptive Behavior Scales – II edition (VABS-II). Diagnosis of ASD was formulated according to DSM-5. We classified all the specifiers useful to determine the severity level of ASD according to the DSM-5 diagnostic criteria: "With/without accompanying language impairment"; "With/without accompanying intellectual impairment". We integrate data about environmental factors, genetic factors, cognitive/social/language impairments and useful measures which are usually leveraged to diagnose ASD.

For further details, please refer to the work Diagnosing severity levels of Autism Spectrum Disorder with Machine Learning, DCAI'21 @ NeurIPS.


raw_data.csv

prepared_data.csv

Click on the links to download the raw and prepared versions of the dataset. Specifically, the prepared version contains the following modifications:

  • The following attributes have been deleted: Data di Nascita, Diagnosi Principale, Tipo di familiarità, Condizione medica associata, Tipo di esposizione ambientale, Tipo di gene/condizione genetica nota associata, ' ' (tipo di compromissione del linguaggio)
  • Rows without a severity level associated have been deleted
  • DQ (developmental quotient) has been computed as (QS/mesi)*100. Rows 66-73 already contain DQ values in the QS column. These values have been replaced with (DQ*mesi)/100
  • DQ_QI has been created by merging DQ and QI columns
  • n_alterated_chromosomes, n_mutations, n_dup, n_del have been created from genetic features [Cromosomi alterati, tipo di mutazione] which have been then deleted
  • Sesso has been transformed from categorical to numeric (maschio: 1, femmina: 0)
  • Problemi in gravidanza from categorical to binary - ["Sì", "sì, gravidanza pretermine", "si", "pre termine, PMA"]: 1; ["No", "no"]: 0
  • Anomalie sono presenti alla nascita has become a binary attribute. We ignore the anomaly type and only consider the presence of anomalies
  • Familiarità per disturbi Psichiatrici has become a binary attribute. If values are different from ["No", "no"] they are replaced with 1, 0 otherwise
  • QS: we consider only the numerical part of the string
  • Specificatori, comorbidità psichiatriche, alterazioni array CGH: replaced by binary values
  • Disturbi della nutrizione, selettività alimentare: non-defined values from the first column are filled with values from the second one, which is then deleted. Finally, they are replaced by binary values.