Primary Dataset: NY SPARCS Hospital Inpatient Discharges (2021)
The New York State Statewide Planning and Research Cooperative System (SPARCS) [https://health.data.ny.gov/resource/tg3i-cinn.csv?$limit=10000] is a free, publicly available hospital discharge database from the NY Department of Health. The 2021 de-identified file contains over 2 million patient-level discharge records with race, ethnicity, gender, diagnosis, total charges, total costs, length of stay, insurance type, and severity of illness. The dataset is downloaded directly from the NY Health Data open portal, no login or account needed. We load it locally and filter to respiratory diagnoses only using the ccsr_diagnosis_code column, which uses CCSR (Clinical Classifications Software Refined) codes. All respiratory diagnoses begin with “RSP”.
The following output shows the structure of the filtered respiratory dataset, including variable names, types, and a preview of values.
Show code
library(tidyverse)
# Source: https://health.data.ny.gov/resource/tg3i-cinn.csv?$limit=10000
# Downloaded from NY Health Data open portal
sparcs_raw <- read_csv("../data/raw/tg3i-cinn.csv", show_col_types = FALSE)
# Filter to respiratory diagnoses only (all CCSR codes starting with RSP)
# and clean numeric columns
sparcs <- sparcs_raw |>
filter(str_starts(ccsr_diagnosis_code, "RSP")) |>
mutate(
length_of_stay = as.numeric(length_of_stay),
total_charges = as.numeric(total_charges),
total_costs = as.numeric(total_costs)
)
glimpse(sparcs)Rows: 278
Columns: 33
$ hospital_service_area <chr> "New York City", "New York City", "New …
$ hospital_county <chr> "Bronx", "Bronx", "Bronx", "Bronx", "Br…
$ operating_certificate_number <dbl> 7000006, 7000006, 7000008, 7000008, 700…
$ permanent_facility_id <chr> "001168", "003058", "001172", "001172",…
$ facility_name <chr> "Montefiore Medical Center-Wakefield Ho…
$ age_group <chr> "50 to 69", "70 or Older", "50 to 69", …
$ zip_code_3_digits <chr> "104", "104", "104", "104", "104", "104…
$ gender <chr> "F", "F", "M", "M", "F", "M", "M", "M",…
$ race <chr> "Other Race", "Other Race", "Other Race…
$ ethnicity <chr> "Spanish/Hispanic", "Spanish/Hispanic",…
$ length_of_stay <dbl> 3, 11, 1, 2, 3, 1, 2, 1, 2, 1, 2, 3, 2,…
$ type_of_admission <chr> "Emergency", "Emergency", "Emergency", …
$ patient_disposition <chr> "Short-term Hospital", "Hospice - Medic…
$ discharge_year <dbl> 2021, 2021, 2021, 2021, 2021, 2021, 202…
$ ccsr_diagnosis_code <chr> "RSP009", "RSP010", "RSP008", "RSP002",…
$ ccsr_diagnosis_description <chr> "Asthma", "Aspiration pneumonitis", "Ch…
$ ccsr_procedure_code <chr> "ADM017", "ESA004", "ESA004", NA, "ESA0…
$ ccsr_procedure_description <chr> "ADMINISTRATION OF NUTRITIONAL AND ELEC…
$ apr_drg_code <chr> "141", "137", "140", "139", "140", "139…
$ apr_drg_description <chr> "ASTHMA", "MAJOR RESPIRATORY INFECTIONS…
$ apr_mdc_code <chr> "04", "04", "04", "04", "04", "04", "04…
$ apr_mdc_description <chr> "DISEASES AND DISORDERS OF THE RESPIRAT…
$ apr_severity_of_illness_code <dbl> 3, 4, 3, 3, 3, 3, 1, 4, 2, 1, 3, 2, 1, …
$ apr_severity_of_illness <chr> "Major", "Extreme", "Major", "Major", "…
$ apr_risk_of_mortality <chr> "Major", "Extreme", "Major", "Moderate"…
$ apr_medical_surgical <chr> "Medical", "Medical", "Medical", "Medic…
$ payment_typology_1 <chr> "Medicare", "Medicare", "Medicaid", "Me…
$ payment_typology_2 <chr> "Medicaid", "Private Health Insurance",…
$ payment_typology_3 <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ birth_weight <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ emergency_department_indicator <chr> "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y",…
$ total_charges <dbl> 77183.27, 219857.19, 19207.88, 15297.87…
$ total_costs <dbl> 12057.91, 38291.05, 12276.24, 9777.25, …
The table below lists all respiratory diagnosis codes present in the data, sorted by how frequently each appears in the dataset
Show code
# A tibble: 14 × 3
ccsr_diagnosis_code ccsr_diagnosis_description n
<chr> <chr> <int>
1 RSP008 Chronic obstructive pulmonary disease and bronchie… 57
2 RSP009 Asthma 51
3 RSP002 Pneumonia (except that caused by tuberculosis) 49
4 RSP012 Respiratory failure; insufficiency; arrest 49
5 RSP010 Aspiration pneumonitis 20
6 RSP014 Pneumothorax 10
7 RSP006 Other specified upper respiratory infections 9
8 RSP011 Pleurisy, pleural effusion and pulmonary collapse 6
9 RSP016 Other specified and unspecified lower respiratory … 6
10 RSP004 Acute and chronic tonsillitis 5
11 RSP005 Acute bronchitis 5
12 RSP007 Other specified and unspecified upper respiratory … 5
13 RSP017 Postprocedural or postoperative respiratory system… 4
14 RSP015 Mediastinal disorders 2
The table below shows the racial and ethnic breakdown of respiratory patients in the dataset, sorted by count.
Show code
# A tibble: 12 × 3
race ethnicity n
<chr> <chr> <int>
1 Black/African American Not Span/Hispanic 77
2 White Not Span/Hispanic 71
3 Other Race Spanish/Hispanic 67
4 Other Race Not Span/Hispanic 18
5 Other Race Unknown 17
6 Black/African American Unknown 11
7 Black/African American Spanish/Hispanic 4
8 Multi-racial Not Span/Hispanic 4
9 White Unknown 4
10 White Spanish/Hispanic 3
11 Black/African American Multi-ethnic 1
12 Multi-racial Unknown 1
Key variables:
| Variable | Description |
|---|---|
race / ethnicity |
Patient self-reported race and Hispanic/Latino ethnicity |
gender |
Patient sex |
length_of_stay |
Days hospitalized |
total_charges |
Amount billed by the hospital |
total_costs |
Estimated actual cost of care |
payment_typology_1 |
Insurance type (Medicaid, Medicare, Private Health Insurance, Self-Pay) |
apr_severity_of_illness_code |
1–4 scale of illness severity at admission |
apr_severity_of_illness |
Severity label: Minor, Moderate, Major, Extreme |
apr_risk_of_mortality |
Risk label: Minor, Moderate, Major, Extreme |
ccsr_diagnosis_code |
CCSR diagnosis code — RSP codes = respiratory diagnoses |
ccsr_diagnosis_description |
Plain-text diagnosis name |
patient_disposition |
Outcome: home, expired, transferred, etc. |
hospital_county |
County of the treating hospital |
emergency_department_indicator |
Whether patient came through the ER |
Source: NY State Department of Health — Health Data NY
Download URL: https://health.data.ny.gov/resource/tg3i-cinn.csv?$limit=10000
License: Public domain / open government data — no login required