Data Wrangling: EHR Healthcare Claims Data

medical-billing data-analytics dplyr rstats r-bloggers

Using the {tidyverse} to wrangle EHR/EMR healthcare claims data.

Andrew Bruce https://andrewbruce.netlify.app
03-18-2022

Introduction

This is intended to be a step-by-step guide for data analysis of healthcare claims data from an EHR/EMR; one of a series of posts I’ll be working on with the goal of turning the code into functions in the interest of speeding up repetitive analyses.

Packages

I’ll load the packages that I’ll be using:

Data

I’m using a mock data set I’ve created using medical coding practice exercises that I have collected, the {randomNames} package for creating patient names, and my personal knowledge of EMR/EHR data systems to fill in the rest. A quick note: this is a very light set of data as compared to what you might actually see in an EHR. In the interest of time, I’ve tried to use only the types of data I would need to write these functions to perform the applicable analyses. Let’s take a look at the structure of the raw data:

str(claimsdata)
tibble [75 x 27] (S3: tbl_df/tbl/data.frame)
 $ enc       : chr [1:75] "AB073A1" "AB071A3" "AB072A2" "AB070A4" ...
 $ type      : chr [1:75] "HCFA" "HCFA" "HCFA" "HCFA" ...
 $ pt        : chr [1:75] "Long" "Noorani" "Tampubolon" "Martin" ...
 $ datecreate: POSIXct[1:75], format: "2022-02-03" "2022-02-04" ...
 $ datedos   : POSIXct[1:75], format: "2022-02-03" "2022-02-04" ...
 $ datenow   : POSIXct[1:75], format: "2022-02-12" "2022-02-12" ...
 $ datesign  : POSIXct[1:75], format: NA NA ...
 $ pos       : chr [1:75] "11" NA "11" "11" ...
 $ prov      : chr [1:75] "Johnson" "Johnson" "Johnson" NA ...
 $ loc       : chr [1:75] "Valdosta" "Valdosta" "Valdosta" "Valdosta" ...
 $ payer     : chr [1:75] "Aetna" "Medicare" "UHC" "BCBS" ...
 $ class     : chr [1:75] "Primary" "Primary" "Primary" "Primary" ...
 $ stat      : chr [1:75] "Unsigned" "Place of Service" "Date of Service" "Provider" ...
 $ datebill  : POSIXct[1:75], format: NA NA ...
 $ datecomp  : POSIXct[1:75], format: NA NA ...
 $ bal       : num [1:75] 975 711.8 723.1 659.8 71.9 ...
 $ rej       : logi [1:75] NA NA NA NA NA NA ...
 $ den       : chr [1:75] NA NA NA NA ...
 $ px1       : num [1:75] 99214 99204 99214 99203 99214 ...
 $ px1m      : num [1:75] 25 NA NA NA 25 NA NA 25 NA 25 ...
 $ px2       : num [1:75] 93350 NA NA NA 93289 ...
 $ px2m      : chr [1:75] NA NA NA NA ...
 $ px3       : chr [1:75] NA NA NA NA ...
 $ px3m      : chr [1:75] NA NA NA NA ...
 $ dx1       : chr [1:75] "R55" "R00.0" "R60.0" "I49.3" ...
 $ dx2       : chr [1:75] "E78.5" "Z33.1" "I11.9" "I34.0" ...
 $ dx3       : chr [1:75] "I25.9" NA NA "I34.1" ...

Data Cleaning & Wrangling

First I need to make sure any columns with dates are converted to date objects. Next, I need to convert the Procedure code columns to the character type. Since each of the Diagnosis code columns are alpha-numeric (as all ICD-10-CM codes are), they were automatically identified as the character type when the data was read in.

Convert ‘Date’ and ‘Character’ Types

# Convert date columns to date object
claimsdata$datecreate <- as.Date(claimsdata$datecreate, "%yyyy-%mm-%dd", tz = "EST")
claimsdata$datedos <- as.Date(claimsdata$datedos, "%yyyy-%mm-%dd", tz = "EST")
claimsdata$datesign <- as.Date(claimsdata$datesign, "%yyyy-%mm-%dd", tz = "EST")
claimsdata$datenow <- as.Date(claimsdata$datenow, "%yyyy-%mm-%dd", tz = "EST")
claimsdata$datebill <- as.Date(claimsdata$datebill, "%yyyy-%mm-%dd", tz = "EST")
claimsdata$datecomp <- as.Date(claimsdata$datecomp, "%yyyy-%mm-%dd", tz = "EST")

# Convert Px/Dx/Mod columns to factor type
claimsdata$px1 <- as.factor(claimsdata$px1)
claimsdata$px1m <- as.factor(claimsdata$px1m)
claimsdata$px2 <- as.factor(claimsdata$px2)
claimsdata$px2m <- as.factor(claimsdata$px2m)
claimsdata$px3 <- as.factor(claimsdata$px3)
claimsdata$px3m <- as.factor(claimsdata$px3m)

str(claimsdata)
tibble [75 x 27] (S3: tbl_df/tbl/data.frame)
 $ enc       : chr [1:75] "AB073A1" "AB071A3" "AB072A2" "AB070A4" ...
 $ type      : chr [1:75] "HCFA" "HCFA" "HCFA" "HCFA" ...
 $ pt        : chr [1:75] "Long" "Noorani" "Tampubolon" "Martin" ...
 $ datecreate: Date[1:75], format: "2022-02-02" "2022-02-03" ...
 $ datedos   : Date[1:75], format: "2022-02-02" "2022-02-03" ...
 $ datenow   : Date[1:75], format: "2022-02-11" "2022-02-11" ...
 $ datesign  : Date[1:75], format: NA NA ...
 $ pos       : chr [1:75] "11" NA "11" "11" ...
 $ prov      : chr [1:75] "Johnson" "Johnson" "Johnson" NA ...
 $ loc       : chr [1:75] "Valdosta" "Valdosta" "Valdosta" "Valdosta" ...
 $ payer     : chr [1:75] "Aetna" "Medicare" "UHC" "BCBS" ...
 $ class     : chr [1:75] "Primary" "Primary" "Primary" "Primary" ...
 $ stat      : chr [1:75] "Unsigned" "Place of Service" "Date of Service" "Provider" ...
 $ datebill  : Date[1:75], format: NA NA ...
 $ datecomp  : Date[1:75], format: NA NA ...
 $ bal       : num [1:75] 975 711.8 723.1 659.8 71.9 ...
 $ rej       : logi [1:75] NA NA NA NA NA NA ...
 $ den       : chr [1:75] NA NA NA NA ...
 $ px1       : Factor w/ 18 levels "93224","93227",..: 11 8 11 7 11 11 11 17 11 11 ...
 $ px1m      : Factor w/ 2 levels "25","26": 1 NA NA NA 1 NA NA 1 NA 1 ...
 $ px2       : Factor w/ 9 levels "17110","23650",..: 7 NA NA NA 6 NA NA 4 NA 5 ...
 $ px2m      : Factor w/ 3 levels "26","59","RT": NA NA NA NA 1 NA NA 2 NA 1 ...
 $ px3       : Factor w/ 6 levels "12011","93010",..: NA NA NA NA NA NA NA 2 NA NA ...
 $ px3m      : Factor w/ 1 level "XS-RT": NA NA NA NA NA NA NA NA NA NA ...
 $ dx1       : chr [1:75] "R55" "R00.0" "R60.0" "I49.3" ...
 $ dx2       : chr [1:75] "E78.5" "Z33.1" "I11.9" "I34.0" ...
 $ dx3       : chr [1:75] "I25.9" NA NA "I34.1" ...

New Columns: Age Open, Age Closed, Bucket

# Add 'Age Open' Column to assign 
# an Aging Bucket to each Open claim
claimsdata <- claimsdata %>%
    mutate(ageopen = days(datenow) - days(datedos))

claimsdata$ageopen <- as.numeric(claimsdata$ageopen, "hours")

claimsdata <- claimsdata %>%
    mutate(ageopen = ageopen / 24)

# Add 'Bucket' Column to assign 
# an Aging Bucket to each Open claim
claimsdata <- claimsdata %>%
    mutate(bucket = case_when(
        claimsdata$ageopen < 31 ~ "0-30",
        claimsdata$ageopen >= 31 & claimsdata$ageopen < 61 ~ "31-60",
        claimsdata$ageopen >= 61 & claimsdata$ageopen < 91 ~ "61-90",
        claimsdata$ageopen >= 91 & claimsdata$ageopen < 121 ~ "91-120",
        claimsdata$ageopen >= 121 & claimsdata$ageopen < 151 ~ "121-150",
        claimsdata$ageopen >= 151 & claimsdata$ageopen < 181 ~ "151-180",
        claimsdata$ageopen >= 181 ~ "181+",
        TRUE ~ "NA"
    ))

# Convert 'Bucket' column to factor
claimsdata$bucket <- factor(claimsdata$bucket, 
                            levels = c("0-30", 
                                       "31-60", 
                                       "61-90", 
                                       "91-120", 
                                       "121-150", 
                                       "151-180", 
                                       "181+", 
                                       "NA"
                                       ), 
                            ordered = is.ordered(claimsdata$bucket))

# Add 'Age Close' Column to calculate 
# the number of days it took to Close claim
claimsdata <- claimsdata %>%
    mutate(ageclose = days(datecomp) - days(datedos))

claimsdata$ageclose <- as.numeric(claimsdata$ageclose, "hours")

claimsdata <- claimsdata %>%
    mutate(ageclose = ageclose / 24)

# Print data frame
claimsdata %>% 
  select(
    enc,
    stat,
    datecreate,
    datedos,
    datenow,
    datesign,
    datebill,
    bucket,
    ageopen,
    ageclose
  ) %>% 
  head()
enc stat datecreate datedos datenow datesign datebill bucket ageopen ageclose
AB073A1 Unsigned 2022-02-02 2022-02-02 2022-02-11 NA NA 0-30 9 NA
AB071A3 Place of Service 2022-02-03 2022-02-03 2022-02-11 NA NA 0-30 8 NA
AB072A2 Date of Service 2022-02-03 NA 2022-02-11 NA NA NA NA NA
AB070A4 Provider 2022-02-10 2022-02-10 2022-02-11 NA NA 0-30 1 NA
AB069A5 Billed 2022-02-09 2022-02-09 2022-02-11 2022-02-27 2022-02-27 0-30 2 NA
AB068A6 Ready 2022-02-08 2022-02-08 2022-02-11 2022-02-09 NA 0-30 3 NA

Dates, Months, Quarters, Years

# function to create many different date types based on the Date of Service
claims_ymq <- function(df, date_col) {
  stopifnot(inherits(df, "data.frame"))
  stopifnot(class(df %>% dplyr::pull({{date_col}})) == 'Date')
  dplyr::mutate(df,
                nmon = lubridate::month({{date_col}}, label = FALSE),
                ndip = lubridate::days_in_month({{date_col}}),
                month = lubridate::month({{date_col}}, label = TRUE, abbr = FALSE),
                mon = lubridate::month({{date_col}}, label = TRUE, abbr = TRUE),
                year = lubridate::year({{date_col}}),
                nqtr = lubridate::quarter({{date_col}}),
                yqtr = lubridate::quarter({{date_col}}, with_year = TRUE),
                dqtr = paste0(lubridate::quarter({{date_col}}),"Q", format({{date_col}},"%y")),
                ymon = as.numeric(format({{date_col}}, "%Y.%m")),
                mmon = format({{date_col}}, "%b %Y"),
                nhalf = lubridate::semester({{date_col}}),
                yhalf = lubridate::semester({{date_col}}, with_year = TRUE),
                dhalf = paste0(lubridate::semester({{date_col}}), "H", format({{date_col}},"%y"))
                )
}

# call function
claimsdata <- claims_ymq(df = claimsdata, date_col = datecreate)

# print data frame
claimsdata %>% 
  select(
    datecreate,
    nmon,
    ndip,
    month,
    mon,
    year,
    nqtr,
    yqtr,
    dqtr,
    ymon,
    nmon,
    nhalf,
    yhalf,
    dhalf
  ) %>% 
  head()
datecreate nmon ndip month mon year nqtr yqtr dqtr ymon nhalf yhalf dhalf
2022-02-02 2 28 February Feb 2022 1 2022.1 1Q22 2022.02 1 2022.1 1H22
2022-02-03 2 28 February Feb 2022 1 2022.1 1Q22 2022.02 1 2022.1 1H22
2022-02-03 2 28 February Feb 2022 1 2022.1 1Q22 2022.02 1 2022.1 1H22
2022-02-10 2 28 February Feb 2022 1 2022.1 1Q22 2022.02 1 2022.1 1H22
2022-02-09 2 28 February Feb 2022 1 2022.1 1Q22 2022.02 1 2022.1 1H22
2022-02-08 2 28 February Feb 2022 1 2022.1 1Q22 2022.02 1 2022.1 1H22

Status Columns

# Add 'Cycle' Column to assign an RCM Status to each claim
claimsdata <- claimsdata %>%
    mutate(cycle = case_when(
        claimsdata$stat == "Complete" ~ "Closed",
        claimsdata$stat == "Refund" ~ "Closed",
        claimsdata$stat == "Credit" ~ "Closed",
        TRUE ~ "Open"
    ))

# Add 'Review' Column to assign Claims to Responsible Party
claimsdata <- claimsdata %>%
    mutate(review = case_when(
        claimsdata$stat == "Unsigned" ~ "Client",
        claimsdata$stat == "Charges" ~ "Client",
        claimsdata$stat == "Place of Service" ~ "Client",
        claimsdata$stat == "Date of Service" ~ "Client",
        claimsdata$stat == "Location" ~ "Client",
        claimsdata$stat == "Provider" ~ "Client",
        claimsdata$stat == "Payer" ~ "Client",
        claimsdata$stat == "Refund" ~ "Client",
        claimsdata$stat == "Credit" ~ "Client",
        claimsdata$stat == "Complete" ~ "None",
        claimsdata$stat == "Billed" ~ "Billing",
        claimsdata$stat == "New" ~ "Billing",
        claimsdata$stat == "Ready" ~ "Billing",
        claimsdata$stat == "Rejection" ~ "Billing",
        claimsdata$stat == "Denial" ~ "Aging",
        claimsdata$stat == "Appeal" ~ "Aging",
        TRUE ~ "Other"
    ))

## Add 'incomp' Column to group Incomplete Statuses
claimsdata <- claimsdata %>%
    mutate(incomp = case_when(
        claimsdata$stat == "Unsigned" ~ "Incomplete",
        claimsdata$stat == "Charges" ~ "Incomplete",
        claimsdata$stat == "Place of Service" ~ "Incomplete",
        claimsdata$stat == "Date of Service" ~ "Incomplete",
        claimsdata$stat == "Location" ~ "Incomplete",
        claimsdata$stat == "Provider" ~ "Incomplete",
        claimsdata$stat == "Payer" ~ "Incomplete",
        claimsdata$stat == "Refund" ~ "Review",
        claimsdata$stat == "Credit" ~ "Review",
        claimsdata$stat == "Complete" ~ "Complete",
        claimsdata$stat == "Billed" ~ "Billed",
        claimsdata$stat == "New" ~ "New",
        claimsdata$stat == "Ready" ~ "Ready",
        claimsdata$stat == "Rejection" ~ "Rejection",
        claimsdata$stat == "Denial" ~ "Denial",
        claimsdata$stat == "Appeal" ~ "Appeal",
        TRUE ~ "Other"
    ))

# print data frame
claimsdata %>% 
  select(
    enc,
    stat,
    cycle,
    review,
    incomp
  ) %>% 
  head()
enc stat cycle review incomp
AB073A1 Unsigned Open Client Incomplete
AB071A3 Place of Service Open Client Incomplete
AB072A2 Date of Service Open Client Incomplete
AB070A4 Provider Open Client Incomplete
AB069A5 Billed Open Billing Billed
AB068A6 Ready Open Billing Ready

Encounter Counts & Percentage

Encounters by Provider

# Number of Encounters by Provider
claimsdata %>%
    group_by(prov) %>%
    filter(bal > 0 & prov != "NA") %>%
    summarise(n = n()) %>%
  mutate(percent = n/sum(n) * 100) %>% 
    arrange(desc(n))
prov n percent
Smith 33 47.14286
Johnson 28 40.00000
Gibson 9 12.85714

Encounters by Payer

claimsdata %>%
    group_by(payer) %>%
    filter(bal > 0 & payer != "NA") %>%
    summarise(n = n()) %>%
  mutate(percent = n/sum(n) * 100) %>% 
    arrange(desc(n)) 
payer n percent
Medicare 19 27.142857
UHC 17 24.285714
Patient 14 20.000000
BCBS 7 10.000000
Aetna 4 5.714286
Humana 4 5.714286
Meritain 2 2.857143
Ambetter 1 1.428571
Cigna 1 1.428571
Medicaid 1 1.428571

Encounters by Incomplete Status

claimsdata %>%
    filter(bal > 0 & incomp == "Incomplete") %>%
    group_by(stat) %>%
    summarise(n = n()) %>%
  mutate(percent = n/sum(n) * 100) %>% 
    arrange(desc(n))
stat n percent
Unsigned 6 42.857143
Location 3 21.428571
Charges 1 7.142857
Date of Service 1 7.142857
Payer 1 7.142857
Place of Service 1 7.142857
Provider 1 7.142857

Aging: Count, Amount, Percent

# Encounters in Aging Buckets
buckcount <- claimsdata %>%
    group_by(bucket) %>%
    filter(bal > 0 & bucket != "NA") %>%
    summarise(n = n()) %>% 
  mutate(percent = n/sum(n) * 100)

# AR Balance in Aging Buckets + Percentage
buckamt <- claimsdata %>%
    group_by(bucket) %>%
    filter(bal > 0 & bucket != "NA") %>%
    summarise(bal = sum(bal)) %>% 
  mutate(percent = bal / sum(bal) * 100)

# Merge the data frames
buckamtcount <- merge(buckcount, buckamt, by = "bucket")

# Rename percent cols
buckamtcount <- buckamtcount %>% 
  rename(
    enc = n,
    pctenc = percent.x,
    pctbal = percent.y
    )

#Print data frame
buckamtcount
bucket enc pctenc bal pctbal
0-30 34 48.571429 13586.38 68.041072
31-60 18 25.714286 4166.39 20.865429
61-90 5 7.142857 817.00 4.091565
91-120 13 18.571429 1398.14 7.001935

Primary: Count, Amount, Percent

# Encounters in Primary by bucket
primbk <- claimsdata %>%
  group_by(bucket) %>%
  filter(bal > 0 & bucket != "NA" & class != "Closed" & class == "Primary") %>%
  summarise(n = n()) %>%
  mutate(percent = n/sum(n) * 100)

# Balance in Primary by bucket
primamt <- claimsdata %>%
    group_by(bucket) %>%
    filter(bal > 0 & bucket != "NA" & class != "Closed" & class == "Primary") %>%
    summarise(bal = sum(bal)) %>% 
  mutate(percent = bal / sum(bal) * 100)

# Merge the data frames
primamtcount <- merge(primbk, primamt, by = "bucket")

# Rename percent cols
primamtcount <- primamtcount %>% 
  rename(
    enc = n,
    pctenc = percent.x,
    pctbal = percent.y
    )

# Print data frame
primamtcount
bucket enc pctenc bal pctbal
0-30 25 60.975610 10891.27 75.0769984
31-60 15 36.585366 3483.89 24.0155651
91-120 1 2.439024 131.64 0.9074365

Secondary: Count, Amount, Percent

# Encounters in Secondary by bucket
secbk <- claimsdata %>%
  group_by(bucket) %>%
  filter(bal > 0 & bucket != "NA" & class != "Closed" & class == "Secondary") %>%
  summarise(n = n()) %>%
  mutate(percent = n/sum(n) * 100)

# Balance in Secondary by bucket
secamt <- claimsdata %>%
    group_by(bucket) %>%
    filter(bal > 0 & bucket != "NA" & class != "Closed" & class == "Secondary") %>%
    summarise(bal = sum(bal)) %>% 
  mutate(percent = bal / sum(bal) * 100)

# Merge the data frames
secamtcount <- merge(secbk, secamt, by = "bucket")

# Rename percent cols
secamtcount <- secamtcount %>% 
  rename(
    enc = n,
    pctenc = percent.x,
    pctbal = percent.y
    )

# Print data frame
secamtcount
bucket enc pctenc bal pctbal
0-30 6 46.15385 1590.24 56.860488
31-60 3 23.07692 682.50 24.403413
61-90 2 15.38462 264.00 9.439562
91-120 2 15.38462 260.00 9.296538

Patient: Count, Amount, Percent

# Encounters in Patient by bucket
patbk <- claimsdata %>%
  group_by(bucket) %>%
  filter(bal > 0 & bucket != "NA" & class != "Closed" & class == "Patient") %>%
  summarise(n = n()) %>%
  mutate(percent = n/sum(n) * 100)

# Balance in Patient by bucket
patamt <- claimsdata %>%
    group_by(bucket) %>%
    filter(bal > 0 & bucket != "NA" & class != "Closed" & class == "Patient") %>%
    summarise(bal = sum(bal)) %>% 
  mutate(percent = bal / sum(bal) * 100)

# Merge the data frames
patamtcount <- merge(patbk, patamt, by = "bucket")

# Rename percent cols
patamtcount <- patamtcount %>% 
  rename(
    enc = n,
    pctenc = percent.x,
    pctbal = percent.y
    )

# Print data frame
patamtcount
bucket enc pctenc bal pctbal
0-30 2 13.33333 464.87 22.96369
61-90 3 20.00000 553.00 27.31714
91-120 10 66.66667 1006.50 49.71917

Visualizations

Encounter Counts

Show code
# Number of Encounters by Provider
hcprov <- claimsdata %>%
    group_by(prov) %>%
    filter(bal > 0 & prov != "NA") %>%
    summarise(n = n()) %>%
    arrange(desc(n)) %>%
    hchart("column", hcaes(x = prov, y = n, color = prov), name = "Encounters") %>%
    hc_yAxis(
        gridLineWidth = 0,
        labels = list(style = list(color =  "#000000")),
        title = list(text = "", style = list(color = "#000000"))
    ) %>%
    hc_xAxis(
        labels = list(style = list(color =  "#000000")),
        title = list(text= ""),
        lineWidth = 0,
        tickWidth = 0
    ) %>% 
    hc_title(text = "Encounters by Provider") %>%
    hc_tooltip(
    useHTML = TRUE,
    crosshairs = TRUE,
    borderWidth = 1,
    sort = TRUE
        ) %>%
    hc_add_theme(hc_theme_smpl()) %>%
    hc_plotOptions(
        column = list(
            color = "red",
            dataLabels = list(
                enabled = TRUE
            )
        )
    ) %>%
    hc_size(height = 300, width = 350)

# Number of Encounters by Payer
hcpay <- claimsdata %>%
    group_by(payer) %>%
    filter(bal > 0 & payer != "NA") %>%
    summarise(n = n()) %>%
    arrange(desc(n)) %>%
    hchart("bar", hcaes(x = payer, y = n, color = payer), name = "Encounters") %>%
    hc_yAxis(
        gridLineWidth = 0,
        labels = list(style = list(color =  "#000000")),
        title = list(text = "", style = list(color = "#000000"))
    ) %>%
    hc_xAxis(
        labels = list(style = list(color =  "#000000")),
        title = list(text= ""),
        lineWidth = 0,
        tickWidth = 0
    ) %>%
    hc_title(text = "Encounters by Payer") %>%
    hc_tooltip(
    useHTML = TRUE,
    crosshairs = TRUE,
    borderWidth = 1,
    sort = TRUE
        ) %>%
    hc_add_theme(hc_theme_smpl()) %>%
    hc_plotOptions(
        column = list(
            dataLabels = list(
                enabled = TRUE
            )
        )
    ) %>%
    hc_size(height = 300, width = 350)

# Number of Encounters by Incomplete Status
hcstat <- claimsdata %>%
    filter(bal > 0 & incomp == "Incomplete") %>%
    group_by(stat) %>%
    summarise(n = n()) %>%
    arrange(desc(n)) %>%
    hchart("bar", hcaes(x = stat, y = n, color = stat), name = "Encounters") %>%
    hc_yAxis(
        gridLineWidth = 0,
        labels = list(style = list(color =  "#000000")),
        title = list(text = "", style = list(color = "#000000"))
    ) %>%
    hc_xAxis(
        labels = list(style = list(color =  "#000000")),
        title = list(text= ""),
        lineWidth = 0,
        tickWidth = 0
    ) %>%
  hc_title(text = "Incomplete Encounters") %>%
    hc_add_theme(hc_theme_smpl()) %>%
    hc_tooltip(
    useHTML = TRUE,
    crosshairs = TRUE,
    borderWidth = 1,
    sort = TRUE
        ) %>%
    hc_plotOptions(
        column = list(
            dataLabels = list(
                enabled = TRUE
            )
        )
    ) %>%
    hc_size(height = 300, width = 350)

crosstalk::bscols(
    widths = NA,
hcprov,
hcpay,
hcstat
)

Aging: Count, Amount, Percent

Show code
# Encounters in Aging Buckets
hcbucket <- claimsdata %>%
    group_by(bucket) %>%
    filter(bal > 0 & bucket != "NA") %>%
    summarise(n = n()) %>%
    hchart("column", hcaes(x = bucket, y = n, color = bucket), name = "Encounters") %>%
    hc_yAxis(
        gridLineWidth = 0,
        labels = list(style = list(color =  "#000000")),
        title = list(text = "", style = list(color = "#000000"))
    ) %>%
    hc_xAxis(
        labels = list(style = list(color =  "#000000")),
        title = list(text= ""),
        lineWidth = 0,
        tickWidth = 0
    ) %>% 
    hc_title(text = "Encounters in Aging Buckets") %>% 
    hc_add_theme(hc_theme_smpl()) %>%
    hc_tooltip(
    useHTML = TRUE,
    crosshairs = TRUE,
    borderWidth = 1,
    sort = TRUE
        ) %>%
    hc_plotOptions(
        column = list(
            color = "red",
            dataLabels = list(
                enabled = TRUE
            )
        )
    )

# Percentage of Encounters in Aging Buckets
hcbucket4 <- buckamtcount %>% 
    hchart("column", hcaes(x = bucket, y = round(pctenc, 2), color = bucket), name = "Percentage of Encounters") %>%
    hc_yAxis(
        gridLineWidth = 0,
        labels = list(style = list(color =  "#000000")),
        title = list(text = "", style = list(color = "#000000"))
    ) %>%
    hc_xAxis(
        labels = list(style = list(color =  "#000000")),
        title = list(text= ""),
        lineWidth = 0,
        tickWidth = 0
    ) %>% 
    hc_title(text = "Percentage of Encounters in Aging Buckets") %>% 
    hc_add_theme(hc_theme_smpl()) %>%
    hc_tooltip(
    useHTML = TRUE,
    crosshairs = TRUE,
    borderWidth = 1,
    sort = TRUE
        ) %>%
    hc_plotOptions(
        column = list(
            color = "red",
            dataLabels = list(
                enabled = TRUE
            )
        )
    )

# AR Balance in Aging Buckets
hcbucket2 <- claimsdata %>%
    group_by(bucket) %>%
    filter(bal > 0 & bucket != "NA") %>%
    summarise(bal = sum(bal)) %>%
    hchart("column", hcaes(
      x = bucket, 
      y = bal, 
      color = bucket), 
      name = "Balance") %>%
    hc_yAxis(
        gridLineWidth = 0,
        labels = list(style = list(color =  "#000000")),
        title = list(text = "", style = list(color = "#000000"))
    ) %>%
    hc_xAxis(
        labels = list(style = list(color =  "#000000")),
        title = list(text= ""),
        lineWidth = 0,
        tickWidth = 0
    ) %>% 
    hc_title(text = "Aging Buckets: AR Balance") %>% 
    hc_add_theme(hc_theme_smpl()) %>%
    hc_tooltip(
    useHTML = TRUE,
    crosshairs = TRUE,
    borderWidth = 1,
    sort = TRUE
        ) %>%
    hc_plotOptions(
        column = list(
            color = "red",
            dataLabels = list(
                enabled = TRUE
            )
        )
    )

# AR Balance Percentage in Aging Buckets
hcbucket3 <- buckamtcount %>% 
    hchart("column", hcaes(
      x = bucket, 
      y = round(pctbal, 2), 
      color = bucket), 
      name = "Percentage of Balance") %>%
    hc_yAxis(
        gridLineWidth = 0,
        labels = list(style = list(color =  "#000000")),
        title = list(text = "", style = list(color = "#000000"))
    ) %>%
    hc_xAxis(
        labels = list(style = list(color =  "#000000")),
        title = list(text= ""),
        lineWidth = 0,
        tickWidth = 0
    ) %>% 
    hc_title(text = "Aging Buckets: AR Balance Percentage") %>% 
    hc_add_theme(hc_theme_smpl()) %>%
    hc_tooltip(
    useHTML = TRUE,
    crosshairs = TRUE,
    borderWidth = 1,
    sort = TRUE
        ) %>%
    hc_plotOptions(
        column = list(
            color = "red",
            dataLabels = list(
                enabled = TRUE
            )
        )
    ) 

crosstalk::bscols(
    widths = NA,
hcbucket,
hcbucket4,
hcbucket2,
hcbucket3
)

Aging Percentages by Class

Show code
# Primary Aging by Count
primary_pct <- claimsdata %>%
    filter(bal > 0 & bucket != "NA" & class != "Closed" & class == "Primary") %>%
    count(class, bucket) %>% 
    mutate(percent = n/sum(n) * 100)

primary_two <- primary_pct %>%
    group_by(class) %>% 
    filter(bucket == "0-30" | bucket == "31-60") %>% 
    summarise(npct = sum(percent))

options(scipen = 999, digits = 2)
hcbucket5 <- primary_pct %>%
    hchart("column", hcaes(x = bucket, y = round(percent, 2)), name = "Percentage") %>%
    hc_yAxis(
        gridLineWidth = 0,
        labels = list(style = list(color =  "#000000")),
        title = list(text = "", style = list(color = "#000000"))
    ) %>%
    hc_xAxis(
        labels = list(style = list(color =  "#000000")),
        title = list(text= ""),
        lineWidth = 0,
        tickWidth = 0
    ) %>% 
    hc_title(text = "Primary Aging: 97.61%") %>% 
    hc_add_theme(hc_theme_smpl()) %>%
    hc_tooltip(
    useHTML = TRUE,
        valueDecimals = 2, 
        valueSuffix = '%',
        crosshairs = TRUE,
    borderWidth = 1,
    sort = TRUE
        ) %>%
    hc_plotOptions(
        column = list(
            color = "#34608D",
            dataLabels = list(
                valueDecimals = 2, 
                valueSuffix = '%',
                enabled = TRUE
            )
        )
    ) %>%
    hc_size(height = 300, width = 350)

# Secondary Aging by Count
second_pct <- claimsdata %>%
    filter(bal > 0 & bucket != "NA" & class != "Closed" & class == "Secondary") %>%
    count(class, bucket) %>% 
    mutate(percent = n/sum(n) * 100)

options(scipen = 999, digits = 2)
hcbucket6 <- second_pct %>%
    hchart("column", hcaes(x = bucket, y = round(percent, 2)), name = "Percentage") %>%
    hc_yAxis(
        gridLineWidth = 0,
        labels = list(style = list(color =  "#000000")),
        title = list(text = "", style = list(color = "#000000"))
    ) %>%
    hc_xAxis(
        labels = list(style = list(color =  "#000000")),
        title = list(text= ""),
        lineWidth = 0,
        tickWidth = 0
    ) %>% 
    hc_title(text = "Secondary Aging: 69.23%") %>% 
    hc_add_theme(hc_theme_smpl()) %>%
    hc_tooltip(
    useHTML = TRUE,
    # pointFormat = tltip,
        valueDecimals = 2, 
        valueSuffix = '%',
        crosshairs = TRUE,
    borderWidth = 1,
    sort = TRUE
        ) %>%
    hc_plotOptions(
        column = list(
            color = "#440154",
            dataLabels = list(
                valueDecimals = 2, 
                valueSuffix = '%',
                enabled = TRUE
            )
        )
    ) %>%
    hc_size(height = 300, width = 350)

# Patient Aging by Count
patient_pct <- claimsdata %>%
    filter(bal > 0 & bucket != "NA" & class != "Closed" & class == "Patient") %>%
    count(class, bucket) %>% 
    mutate(percent = n/sum(n) * 100)

options(scipen = 999, digits = 2)
hcbucket7 <- patient_pct %>%
    hchart("column", hcaes(x = bucket, y = round(percent, 2)), name = "Percentage") %>%
    hc_yAxis(
        gridLineWidth = 0,
        labels = list(style = list(color =  "#000000")),
        title = list(text = "", style = list(color = "#000000"))
    ) %>%
    hc_xAxis(
        labels = list(style = list(color =  "#000000")),
        title = list(text= ""),
        lineWidth = 0,
        tickWidth = 0
    ) %>% 
    hc_title(text = "Patient Aging: 33.33%") %>% 
    hc_add_theme(hc_theme_smpl()) %>%
    hc_tooltip(
    useHTML = TRUE,
    # pointFormat = tltip,
        valueDecimals = 2, 
        valueSuffix = '%',
        crosshairs = TRUE,
    borderWidth = 1,
    sort = TRUE
        ) %>%
    hc_plotOptions(
        column = list(
            color = "#4BBD85",
            dataLabels = list(
                valueDecimals = 2, 
                valueSuffix = '%',
                enabled = TRUE
            )
        )
    ) %>%
    hc_size(height = 300, width = 350)

crosstalk::bscols(
    widths = NA,
hcbucket5,
hcbucket6,
hcbucket7
)

Age and Balance Plots

Show code
# Aging Buckets: Count, Amount & Percent
buckamtcount %>% 
  hchart("scatter",
         hcaes(
    x = enc,
    y = bal,
    size = pctbal,
    group = bucket
  )
  )%>%
  hc_yAxis(
      title = list(text = "Balance"),
      labels = list(format = "${value}")
      ) %>%
  hc_xAxis(
    title = list(text = "Number of Claims"),
    labels = list(format = "{value}"),
     crosshair = list(
       snap = TRUE,
       width = 2,
       zIndex = 0)
    ) %>% 
  hc_title(text = "Aging Buckets: Count, Amount & Percent") %>%
  hc_subtitle(text = "Bubble size represents 
              the percentage of the total AR balance.") %>%
  hc_credits(
    enabled = TRUE,
    text = "HIMNI Billing Inc.",
    href = "https://www.himni.info/"
  ) %>%
  hc_add_theme(hc_theme_smpl()) %>%
  hc_caption(
    text = "<b>Bubble represents the different aging buckets while the 
    size of the bubble represents the percentage of the total AR 
    balance.</b><br><em> Excludes claims with a balance less than 
    or equal to zero, as well as any unsubmitted claims missing 
    information. <br></em>"
  ) %>% 
    hc_tooltip(
    useHTML = TRUE,
    # pointFormat = tltip,
        valueDecimals = 2, 
        valuePrefix = '$',
        crosshairs = TRUE,
    borderWidth = 1,
    sort = TRUE
        ) %>% 
    hc_legend(
    align = "right",
    verticalAlign = "top",
    layout = "vertical",
    x = 0,
    y = 100
  )
Show code
# Claims by Age and Outstanding Balance
options(scipen = 999, digits = 2)
hcscatter_1 <- claimsdata %>%
    filter(bal > 0 & payer != "NA") %>%
    hchart("scatter",
  hcaes(
    x = ageopen,
    y = bal,
    size = bal,
    group = payer
  )
) %>%
  hc_yAxis(
      title = list(text = "Balance"),
      labels = list(format = "${value}")
      ) %>%
  hc_xAxis(
    title = list(text = "Age of Claim"),
    labels = list(format = "{value} days"),
     crosshair = list(
       snap = TRUE,
       width = 2,
       zIndex = 0)
    ) %>% 
  hc_title(text = "Open Claims by Age and Outstanding Balance") %>%
  hc_subtitle(text = "Bubble size represents 
              balance amount remaining on the claim.") %>%
  hc_credits(
    enabled = TRUE,
    text = "HIMNI Billing Inc.",
    href = "https://www.himni.info/"
  ) %>%
  hc_add_theme(hc_theme_smpl()) %>%
  hc_caption(
    text = "<b>Bubble size represents the balance amount remaining on the claim.</b><br><em>
    Excludes claims with a balance less than or equal to zero, 
    as well as any unsubmitted claims missing information. <br></em>"
  ) %>% 
    hc_tooltip(
    useHTML = TRUE,
    # pointFormat = tltip,
        valueDecimals = 2, 
        valuePrefix = '$',
        crosshairs = TRUE,
    borderWidth = 1,
    sort = TRUE
        ) %>% 
    hc_legend(
    align = "right",
    verticalAlign = "top",
    layout = "vertical",
    x = 0,
    y = 100
  ) %>%
    hc_size(height = 600, width = 600)

# Unsigned Encounters by Age and Balance
options(scipen = 999, digits = 2)
hcscatter_2 <- claimsdata %>%
    filter(bal > 0 & prov != "NA") %>%
    hchart("scatter",
  hcaes(
    x = ageopen,
    y = bal,
    size = bal,
    group = prov, 
    colorKey = "bal"
  )
) %>%
    hc_colorAxis(
      min = 0,
      max = max(claimsdata$bal),
      minColor = "blue", 
      maxColor = "red",
      layout = "horizontal"
      ) %>%
  hc_yAxis(
      title = list(text = "Balance"),
      labels = list(format = "${value}")
      ) %>%
  hc_xAxis(
    title = list(text = "Age of Claim"),
    labels = list(format = "{value} days"),
     crosshair = list(
       snap = TRUE,
       width = 2,
       zIndex = 0)
    ) %>% 
  hc_title(text = "Unsigned Encounters by Age and Outstanding Balance") %>%
  hc_subtitle(text = "Bubble size represents gross charge amount on the claim.") %>%
  hc_credits(
    enabled = TRUE,
    text = "HIMNI Billing Inc.",
    href = "https://www.himni.info/"
  ) %>%
  hc_add_theme(hc_theme_smpl()) %>%
  hc_caption(
    text = "<b>Bubble size represents 
    the balance amount remaining on the claim.</b><br><em>
    Excludes claims with a balance less than or equal to zero, 
    as well as any unsubmitted claims missing information. <br></em>"
  ) %>% 
    hc_tooltip(
    useHTML = TRUE,
    # pointFormat = tltip,
        valueDecimals = 2, 
        valuePrefix = '$',
        crosshairs = TRUE,
    borderWidth = 1,
    sort = TRUE
        ) %>% 
    hc_legend(
    align = "right",
    verticalAlign = "bottom",
    layout = "horizontal",
    x = 0,
    y = 10
  ) %>%
    hc_size(height = 600, width = 600)

crosstalk::bscols(
    widths = NA,
hcscatter_1,
hcscatter_2
)

Diagnosis and Procedure Coding

Show code
# Primary Procedure by Provider
hcpx1 <- claimsdata %>%
    group_by(prov, px1) %>%
    filter(px1 != "none" & prov != "NA") %>%
    summarise(n = n()) %>%
    filter(n > 1) %>%
    arrange(desc(n)) %>%
    hchart("column", hcaes(x = px1, y = n, group = prov)) %>%
    hc_yAxis(
        gridLineWidth = 0,
        labels = list(style = list(color =  "#000000")),
        title = list(text = "", style = list(color = "#000000"))
    ) %>%
    hc_xAxis(
        labels = list(style = list(color =  "#000000")),
        title = list(text= ""),
        lineWidth = 0,
        tickWidth = 0
    ) %>%
    hc_title(text = "Primary Procedure by Provider (Oct '21 - Feb '22)") %>%
    hc_subtitle(text = "For Procedure Codes Occurring More Than Once") %>%
    hc_tooltip(
    useHTML = TRUE,
    crosshairs = TRUE,
    borderWidth = 1,
    sort = TRUE
        ) %>%
    hc_add_theme(hc_theme_smpl()) %>%
    hc_plotOptions(
        column = list(
            dataLabels = list(
                enabled = TRUE
            )
        )
    ) %>%
    hc_size(height = 550, width = 750)

# Primary Procedure by Payer
hcpx2 <- claimsdata %>%
    group_by(payer, px1) %>%
    filter(px1 != "none" & payer != "NA") %>%
    summarise(n = n()) %>%
    filter(n > 1) %>%
    arrange(desc(n)) %>%
    hchart("column", hcaes(x = px1, y = n, group = payer)) %>%
    hc_yAxis(
        gridLineWidth = 0,
        labels = list(style = list(color =  "#000000")),
        title = list(text = "", style = list(color = "#000000"))
    ) %>%
    hc_xAxis(
        labels = list(style = list(color =  "#000000")),
        title = list(text= ""),
        lineWidth = 0,
        tickWidth = 0
    ) %>%
    hc_title(text = "Primary Procedure by Payer (Oct '21 - Feb '22)") %>%
    hc_subtitle(text = "For Procedure Codes Occurring More Than Once") %>%
    hc_tooltip(
    useHTML = TRUE,
    crosshairs = TRUE,
    borderWidth = 1,
    sort = TRUE
        ) %>%
    hc_add_theme(hc_theme_smpl()) %>%
    hc_plotOptions(
        column = list(
            dataLabels = list(
                enabled = TRUE
            )
        )
    ) %>%
    hc_size(height = 550, width = 750)

crosstalk::bscols(
    widths = NA,
hcpx1,
hcpx2
)

Data Dictionary

Below is a quick overview of the variables in the data set, their names and descriptions:

Variable Name Description
enc Encounter ID An alpha-numeric string that is a concatenation of three separate unique identifiers. The first two characters (“AB”) represent the billing client ID, the numeric string following it functions as the unique patient ID, and the final letter-number combination is the visit number for that patient.
type Claim Type There are two types of claims: A HCFA (CMS-1500) and a UB-04 (CMS-1450).
pt Patient Name The patient’s last name.
dos Date of Service (DOS) The specific date that the encounter (time at which a patient has been given medical treatment) occurred on.
today Today’s Date The date that you’re viewing this information in the EHR and/or creating the report on.
pos Place of Service (POS) Two-digit code indicating the setting that a healthcare service has been provided.
prov Provider Name The provider’s last name.
loc Location The location of the healthcare facility that the encounter took place at.
payer Payer The party responsible for reimbursement of the claim, whether a third-party payer or the patient themselves.
class Insurance Class The three classes of health insurance: Primary, Secondary, and Tertiary.
stat Claim Status Current status of the claim.
billed Billed Date Date the claim was last billed.
doc Date Completed Date the claim’s balance reached $0.00 or less.
bal Claim Balance Amount currently owed on the claim.
rej Reason for Rejection CARC/RARC code indicating the reason for rejection of the claim.
den Reason for Denial CARC/RARC code indicating the reason for denial of the claim.
att Attention Message Post-adjudication warning message.
px1 Primary Procedure Code HCPCS Level II code for the primary procedure code
px1m Primary Procedure Code Modifier HCPCS Level II modifier for the primary procedure code
px2 Secondary Procedure Code HCPCS Level II code for the secondary procedure code
px2m Secondary Procedure Code Modifier HCPCS Level II modifier for the secondary procedure code
px3 Tertiary Procedure Code HCPCS Level II code for the tertiary procedure code
px3m Tertiary Procedure Code Modifier HCPCS Level II modifier for the tertiary procedure code
dx1 Primary Diagnosis Code ICD-10-CM code for the primary/principal diagnosis
dx2 Secondary Diagnosis Code ICD-10-CM code for the secondary diagnosis
dx3 Tertiary Diagnosis Code ICD-10-CM code for the tertiary diagnosis

Last updated on

[1] "2022-04-06 06:33:27 EDT"

Corrections

If you see mistakes or want to suggest changes, please create an issue on the source repository.

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. Source code is available at https://github.com/andrewallenbruce, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Bruce (2022, March 18). Andrew Bruce: Data Wrangling: EHR Healthcare Claims Data. Retrieved from https://andrewbruce.netlify.app/posts/data-wrangling-healthcare-claims/

BibTeX citation

@misc{bruce2022data,
  author = {Bruce, Andrew},
  title = {Andrew Bruce: Data Wrangling: EHR Healthcare Claims Data},
  url = {https://andrewbruce.netlify.app/posts/data-wrangling-healthcare-claims/},
  year = {2022}
}