Andrew Bruce: Data Wrangling: EHR Healthcare Claims Data

Andrew Bruce

Introduction

This is intended to be a step-by-step guide for data analysis of healthcare claims data from an EHR/EMR; one of a series of posts I’ll be working on with the goal of turning the code into functions in the interest of speeding up repetitive analyses.

Packages

I’ll load the packages that I’ll be using:

library(tidyverse)
library(tidyr)
library(rmarkdown)
library(lubridate)
library(highcharter)
library(htmltools)
library(reactable)
library(reactablefmtr)
library(ggsci)
library(printr)

Data

I’m using a mock data set I’ve created using medical coding practice exercises that I have collected, the {randomNames} package for creating patient names, and my personal knowledge of EMR/EHR data systems to fill in the rest. A quick note: this is a very light set of data as compared to what you might actually see in an EHR. In the interest of time, I’ve tried to use only the types of data I would need to write these functions to perform the applicable analyses. Let’s take a look at the structure of the raw data:

str(claimsdata)

tibble [75 x 27] (S3: tbl_df/tbl/data.frame)
 $ enc       : chr [1:75] "AB073A1" "AB071A3" "AB072A2" "AB070A4" ...
 $ type      : chr [1:75] "HCFA" "HCFA" "HCFA" "HCFA" ...
 $ pt        : chr [1:75] "Long" "Noorani" "Tampubolon" "Martin" ...
 $ datecreate: POSIXct[1:75], format: "2022-02-03" "2022-02-04" ...
 $ datedos   : POSIXct[1:75], format: "2022-02-03" "2022-02-04" ...
 $ datenow   : POSIXct[1:75], format: "2022-02-12" "2022-02-12" ...
 $ datesign  : POSIXct[1:75], format: NA NA ...
 $ pos       : chr [1:75] "11" NA "11" "11" ...
 $ prov      : chr [1:75] "Johnson" "Johnson" "Johnson" NA ...
 $ loc       : chr [1:75] "Valdosta" "Valdosta" "Valdosta" "Valdosta" ...
 $ payer     : chr [1:75] "Aetna" "Medicare" "UHC" "BCBS" ...
 $ class     : chr [1:75] "Primary" "Primary" "Primary" "Primary" ...
 $ stat      : chr [1:75] "Unsigned" "Place of Service" "Date of Service" "Provider" ...
 $ datebill  : POSIXct[1:75], format: NA NA ...
 $ datecomp  : POSIXct[1:75], format: NA NA ...
 $ bal       : num [1:75] 975 711.8 723.1 659.8 71.9 ...
 $ rej       : logi [1:75] NA NA NA NA NA NA ...
 $ den       : chr [1:75] NA NA NA NA ...
 $ px1       : num [1:75] 99214 99204 99214 99203 99214 ...
 $ px1m      : num [1:75] 25 NA NA NA 25 NA NA 25 NA 25 ...
 $ px2       : num [1:75] 93350 NA NA NA 93289 ...
 $ px2m      : chr [1:75] NA NA NA NA ...
 $ px3       : chr [1:75] NA NA NA NA ...
 $ px3m      : chr [1:75] NA NA NA NA ...
 $ dx1       : chr [1:75] "R55" "R00.0" "R60.0" "I49.3" ...
 $ dx2       : chr [1:75] "E78.5" "Z33.1" "I11.9" "I34.0" ...
 $ dx3       : chr [1:75] "I25.9" NA NA "I34.1" ...

Data Cleaning & Wrangling

First I need to make sure any columns with dates are converted to date objects. Next, I need to convert the Procedure code columns to the character type. Since each of the Diagnosis code columns are alpha-numeric (as all ICD-10-CM codes are), they were automatically identified as the character type when the data was read in.

Convert ‘Date’ and ‘Character’ Types

# Convert date columns to date object
claimsdata$datecreate <- as.Date(claimsdata$datecreate, "%yyyy-%mm-%dd", tz = "EST")
claimsdata$datedos <- as.Date(claimsdata$datedos, "%yyyy-%mm-%dd", tz = "EST")
claimsdata$datesign <- as.Date(claimsdata$datesign, "%yyyy-%mm-%dd", tz = "EST")
claimsdata$datenow <- as.Date(claimsdata$datenow, "%yyyy-%mm-%dd", tz = "EST")
claimsdata$datebill <- as.Date(claimsdata$datebill, "%yyyy-%mm-%dd", tz = "EST")
claimsdata$datecomp <- as.Date(claimsdata$datecomp, "%yyyy-%mm-%dd", tz = "EST")

# Convert Px/Dx/Mod columns to factor type
claimsdata$px1 <- as.factor(claimsdata$px1)
claimsdata$px1m <- as.factor(claimsdata$px1m)
claimsdata$px2 <- as.factor(claimsdata$px2)
claimsdata$px2m <- as.factor(claimsdata$px2m)
claimsdata$px3 <- as.factor(claimsdata$px3)
claimsdata$px3m <- as.factor(claimsdata$px3m)

str(claimsdata)

tibble [75 x 27] (S3: tbl_df/tbl/data.frame)
 $ enc       : chr [1:75] "AB073A1" "AB071A3" "AB072A2" "AB070A4" ...
 $ type      : chr [1:75] "HCFA" "HCFA" "HCFA" "HCFA" ...
 $ pt        : chr [1:75] "Long" "Noorani" "Tampubolon" "Martin" ...
 $ datecreate: Date[1:75], format: "2022-02-02" "2022-02-03" ...
 $ datedos   : Date[1:75], format: "2022-02-02" "2022-02-03" ...
 $ datenow   : Date[1:75], format: "2022-02-11" "2022-02-11" ...
 $ datesign  : Date[1:75], format: NA NA ...
 $ pos       : chr [1:75] "11" NA "11" "11" ...
 $ prov      : chr [1:75] "Johnson" "Johnson" "Johnson" NA ...
 $ loc       : chr [1:75] "Valdosta" "Valdosta" "Valdosta" "Valdosta" ...
 $ payer     : chr [1:75] "Aetna" "Medicare" "UHC" "BCBS" ...
 $ class     : chr [1:75] "Primary" "Primary" "Primary" "Primary" ...
 $ stat      : chr [1:75] "Unsigned" "Place of Service" "Date of Service" "Provider" ...
 $ datebill  : Date[1:75], format: NA NA ...
 $ datecomp  : Date[1:75], format: NA NA ...
 $ bal       : num [1:75] 975 711.8 723.1 659.8 71.9 ...
 $ rej       : logi [1:75] NA NA NA NA NA NA ...
 $ den       : chr [1:75] NA NA NA NA ...
 $ px1       : Factor w/ 18 levels "93224","93227",..: 11 8 11 7 11 11 11 17 11 11 ...
 $ px1m      : Factor w/ 2 levels "25","26": 1 NA NA NA 1 NA NA 1 NA 1 ...
 $ px2       : Factor w/ 9 levels "17110","23650",..: 7 NA NA NA 6 NA NA 4 NA 5 ...
 $ px2m      : Factor w/ 3 levels "26","59","RT": NA NA NA NA 1 NA NA 2 NA 1 ...
 $ px3       : Factor w/ 6 levels "12011","93010",..: NA NA NA NA NA NA NA 2 NA NA ...
 $ px3m      : Factor w/ 1 level "XS-RT": NA NA NA NA NA NA NA NA NA NA ...
 $ dx1       : chr [1:75] "R55" "R00.0" "R60.0" "I49.3" ...
 $ dx2       : chr [1:75] "E78.5" "Z33.1" "I11.9" "I34.0" ...
 $ dx3       : chr [1:75] "I25.9" NA NA "I34.1" ...

New Columns: Age Open, Age Closed, Bucket

# Add 'Age Open' Column to assign 
# an Aging Bucket to each Open claim
claimsdata <- claimsdata %>%
    mutate(ageopen = days(datenow) - days(datedos))

claimsdata$ageopen <- as.numeric(claimsdata$ageopen, "hours")

claimsdata <- claimsdata %>%
    mutate(ageopen = ageopen / 24)

# Add 'Bucket' Column to assign 
# an Aging Bucket to each Open claim
claimsdata <- claimsdata %>%
    mutate(bucket = case_when(
        claimsdata$ageopen < 31 ~ "0-30",
        claimsdata$ageopen >= 31 & claimsdata$ageopen < 61 ~ "31-60",
        claimsdata$ageopen >= 61 & claimsdata$ageopen < 91 ~ "61-90",
        claimsdata$ageopen >= 91 & claimsdata$ageopen < 121 ~ "91-120",
        claimsdata$ageopen >= 121 & claimsdata$ageopen < 151 ~ "121-150",
        claimsdata$ageopen >= 151 & claimsdata$ageopen < 181 ~ "151-180",
        claimsdata$ageopen >= 181 ~ "181+",
        TRUE ~ "NA"
    ))

# Convert 'Bucket' column to factor
claimsdata$bucket <- factor(claimsdata$bucket, 
                            levels = c("0-30", 
                                       "31-60", 
                                       "61-90", 
                                       "91-120", 
                                       "121-150", 
                                       "151-180", 
                                       "181+", 
                                       "NA"
                                       ), 
                            ordered = is.ordered(claimsdata$bucket))

# Add 'Age Close' Column to calculate 
# the number of days it took to Close claim
claimsdata <- claimsdata %>%
    mutate(ageclose = days(datecomp) - days(datedos))

claimsdata$ageclose <- as.numeric(claimsdata$ageclose, "hours")

claimsdata <- claimsdata %>%
    mutate(ageclose = ageclose / 24)

# Print data frame
claimsdata %>% 
  select(
    enc,
    stat,
    datecreate,
    datedos,
    datenow,
    datesign,
    datebill,
    bucket,
    ageopen,
    ageclose
  ) %>% 
  head()

enc	stat	datecreate	datedos	datenow	datesign	datebill	bucket	ageopen	ageclose
AB073A1	Unsigned	2022-02-02	2022-02-02	2022-02-11	NA	NA	0-30	9	NA
AB071A3	Place of Service	2022-02-03	2022-02-03	2022-02-11	NA	NA	0-30	8	NA
AB072A2	Date of Service	2022-02-03	NA	2022-02-11	NA	NA	NA	NA	NA
AB070A4	Provider	2022-02-10	2022-02-10	2022-02-11	NA	NA	0-30	1	NA
AB069A5	Billed	2022-02-09	2022-02-09	2022-02-11	2022-02-27	2022-02-27	0-30	2	NA
AB068A6	Ready	2022-02-08	2022-02-08	2022-02-11	2022-02-09	NA	0-30	3	NA

Dates, Months, Quarters, Years

# function to create many different date types based on the Date of Service
claims_ymq <- function(df, date_col) {
  stopifnot(inherits(df, "data.frame"))
  stopifnot(class(df %>% dplyr::pull({{date_col}})) == 'Date')
  dplyr::mutate(df,
                nmon = lubridate::month({{date_col}}, label = FALSE),
                ndip = lubridate::days_in_month({{date_col}}),
                month = lubridate::month({{date_col}}, label = TRUE, abbr = FALSE),
                mon = lubridate::month({{date_col}}, label = TRUE, abbr = TRUE),
                year = lubridate::year({{date_col}}),
                nqtr = lubridate::quarter({{date_col}}),
                yqtr = lubridate::quarter({{date_col}}, with_year = TRUE),
                dqtr = paste0(lubridate::quarter({{date_col}}),"Q", format({{date_col}},"%y")),
                ymon = as.numeric(format({{date_col}}, "%Y.%m")),
                mmon = format({{date_col}}, "%b %Y"),
                nhalf = lubridate::semester({{date_col}}),
                yhalf = lubridate::semester({{date_col}}, with_year = TRUE),
                dhalf = paste0(lubridate::semester({{date_col}}), "H", format({{date_col}},"%y"))
                )
}

# call function
claimsdata <- claims_ymq(df = claimsdata, date_col = datecreate)

# print data frame
claimsdata %>% 
  select(
    datecreate,
    nmon,
    ndip,
    month,
    mon,
    year,
    nqtr,
    yqtr,
    dqtr,
    ymon,
    nmon,
    nhalf,
    yhalf,
    dhalf
  ) %>% 
  head()

datecreate	nmon	ndip	month	mon	year	nqtr	yqtr	dqtr	ymon	nhalf	yhalf	dhalf
2022-02-02	2	28	February	Feb	2022	1	2022.1	1Q22	2022.02	1	2022.1	1H22
2022-02-03	2	28	February	Feb	2022	1	2022.1	1Q22	2022.02	1	2022.1	1H22
2022-02-03	2	28	February	Feb	2022	1	2022.1	1Q22	2022.02	1	2022.1	1H22
2022-02-10	2	28	February	Feb	2022	1	2022.1	1Q22	2022.02	1	2022.1	1H22
2022-02-09	2	28	February	Feb	2022	1	2022.1	1Q22	2022.02	1	2022.1	1H22
2022-02-08	2	28	February	Feb	2022	1	2022.1	1Q22	2022.02	1	2022.1	1H22

Status Columns

# Add 'Cycle' Column to assign an RCM Status to each claim
claimsdata <- claimsdata %>%
    mutate(cycle = case_when(
        claimsdata$stat == "Complete" ~ "Closed",
        claimsdata$stat == "Refund" ~ "Closed",
        claimsdata$stat == "Credit" ~ "Closed",
        TRUE ~ "Open"
    ))

# Add 'Review' Column to assign Claims to Responsible Party
claimsdata <- claimsdata %>%
    mutate(review = case_when(
        claimsdata$stat == "Unsigned" ~ "Client",
        claimsdata$stat == "Charges" ~ "Client",
        claimsdata$stat == "Place of Service" ~ "Client",
        claimsdata$stat == "Date of Service" ~ "Client",
        claimsdata$stat == "Location" ~ "Client",
        claimsdata$stat == "Provider" ~ "Client",
        claimsdata$stat == "Payer" ~ "Client",
        claimsdata$stat == "Refund" ~ "Client",
        claimsdata$stat == "Credit" ~ "Client",
        claimsdata$stat == "Complete" ~ "None",
        claimsdata$stat == "Billed" ~ "Billing",
        claimsdata$stat == "New" ~ "Billing",
        claimsdata$stat == "Ready" ~ "Billing",
        claimsdata$stat == "Rejection" ~ "Billing",
        claimsdata$stat == "Denial" ~ "Aging",
        claimsdata$stat == "Appeal" ~ "Aging",
        TRUE ~ "Other"
    ))

## Add 'incomp' Column to group Incomplete Statuses
claimsdata <- claimsdata %>%
    mutate(incomp = case_when(
        claimsdata$stat == "Unsigned" ~ "Incomplete",
        claimsdata$stat == "Charges" ~ "Incomplete",
        claimsdata$stat == "Place of Service" ~ "Incomplete",
        claimsdata$stat == "Date of Service" ~ "Incomplete",
        claimsdata$stat == "Location" ~ "Incomplete",
        claimsdata$stat == "Provider" ~ "Incomplete",
        claimsdata$stat == "Payer" ~ "Incomplete",
        claimsdata$stat == "Refund" ~ "Review",
        claimsdata$stat == "Credit" ~ "Review",
        claimsdata$stat == "Complete" ~ "Complete",
        claimsdata$stat == "Billed" ~ "Billed",
        claimsdata$stat == "New" ~ "New",
        claimsdata$stat == "Ready" ~ "Ready",
        claimsdata$stat == "Rejection" ~ "Rejection",
        claimsdata$stat == "Denial" ~ "Denial",
        claimsdata$stat == "Appeal" ~ "Appeal",
        TRUE ~ "Other"
    ))

# print data frame
claimsdata %>% 
  select(
    enc,
    stat,
    cycle,
    review,
    incomp
  ) %>% 
  head()

enc	stat	cycle	review	incomp
AB073A1	Unsigned	Open	Client	Incomplete
AB071A3	Place of Service	Open	Client	Incomplete
AB072A2	Date of Service	Open	Client	Incomplete
AB070A4	Provider	Open	Client	Incomplete
AB069A5	Billed	Open	Billing	Billed
AB068A6	Ready	Open	Billing	Ready

Encounter Counts & Percentage

Encounters by Provider

# Number of Encounters by Provider
claimsdata %>%
    group_by(prov) %>%
    filter(bal > 0 & prov != "NA") %>%
    summarise(n = n()) %>%
  mutate(percent = n/sum(n) * 100) %>% 
    arrange(desc(n))

prov	n	percent
Smith	33	47.14286
Johnson	28	40.00000
Gibson	9	12.85714

Encounters by Payer

claimsdata %>%
    group_by(payer) %>%
    filter(bal > 0 & payer != "NA") %>%
    summarise(n = n()) %>%
  mutate(percent = n/sum(n) * 100) %>% 
    arrange(desc(n))

payer	n	percent
Medicare	19	27.142857
UHC	17	24.285714
Patient	14	20.000000
BCBS	7	10.000000
Aetna	4	5.714286
Humana	4	5.714286
Meritain	2	2.857143
Ambetter	1	1.428571
Cigna	1	1.428571
Medicaid	1	1.428571

Encounters by Incomplete Status

claimsdata %>%
    filter(bal > 0 & incomp == "Incomplete") %>%
    group_by(stat) %>%
    summarise(n = n()) %>%
  mutate(percent = n/sum(n) * 100) %>% 
    arrange(desc(n))

stat	n	percent
Unsigned	6	42.857143
Location	3	21.428571
Charges	1	7.142857
Date of Service	1	7.142857
Payer	1	7.142857
Place of Service	1	7.142857
Provider	1	7.142857

Aging: Count, Amount, Percent

# Encounters in Aging Buckets
buckcount <- claimsdata %>%
    group_by(bucket) %>%
    filter(bal > 0 & bucket != "NA") %>%
    summarise(n = n()) %>% 
  mutate(percent = n/sum(n) * 100)

# AR Balance in Aging Buckets + Percentage
buckamt <- claimsdata %>%
    group_by(bucket) %>%
    filter(bal > 0 & bucket != "NA") %>%
    summarise(bal = sum(bal)) %>% 
  mutate(percent = bal / sum(bal) * 100)

# Merge the data frames
buckamtcount <- merge(buckcount, buckamt, by = "bucket")

# Rename percent cols
buckamtcount <- buckamtcount %>% 
  rename(
    enc = n,
    pctenc = percent.x,
    pctbal = percent.y
    )

#Print data frame
buckamtcount

bucket	enc	pctenc	bal	pctbal
0-30	34	48.571429	13586.38	68.041072
31-60	18	25.714286	4166.39	20.865429
61-90	5	7.142857	817.00	4.091565
91-120	13	18.571429	1398.14	7.001935

Primary: Count, Amount, Percent

# Encounters in Primary by bucket
primbk <- claimsdata %>%
  group_by(bucket) %>%
  filter(bal > 0 & bucket != "NA" & class != "Closed" & class == "Primary") %>%
  summarise(n = n()) %>%
  mutate(percent = n/sum(n) * 100)

# Balance in Primary by bucket
primamt <- claimsdata %>%
    group_by(bucket) %>%
    filter(bal > 0 & bucket != "NA" & class != "Closed" & class == "Primary") %>%
    summarise(bal = sum(bal)) %>% 
  mutate(percent = bal / sum(bal) * 100)

# Merge the data frames
primamtcount <- merge(primbk, primamt, by = "bucket")

# Rename percent cols
primamtcount <- primamtcount %>% 
  rename(
    enc = n,
    pctenc = percent.x,
    pctbal = percent.y
    )

# Print data frame
primamtcount

bucket	enc	pctenc	bal	pctbal
0-30	25	60.975610	10891.27	75.0769984
31-60	15	36.585366	3483.89	24.0155651
91-120	1	2.439024	131.64	0.9074365

Secondary: Count, Amount, Percent

# Encounters in Secondary by bucket
secbk <- claimsdata %>%
  group_by(bucket) %>%
  filter(bal > 0 & bucket != "NA" & class != "Closed" & class == "Secondary") %>%
  summarise(n = n()) %>%
  mutate(percent = n/sum(n) * 100)

# Balance in Secondary by bucket
secamt <- claimsdata %>%
    group_by(bucket) %>%
    filter(bal > 0 & bucket != "NA" & class != "Closed" & class == "Secondary") %>%
    summarise(bal = sum(bal)) %>% 
  mutate(percent = bal / sum(bal) * 100)

# Merge the data frames
secamtcount <- merge(secbk, secamt, by = "bucket")

# Rename percent cols
secamtcount <- secamtcount %>% 
  rename(
    enc = n,
    pctenc = percent.x,
    pctbal = percent.y
    )

# Print data frame
secamtcount

bucket	enc	pctenc	bal	pctbal
0-30	6	46.15385	1590.24	56.860488
31-60	3	23.07692	682.50	24.403413
61-90	2	15.38462	264.00	9.439562
91-120	2	15.38462	260.00	9.296538

Patient: Count, Amount, Percent

# Encounters in Patient by bucket
patbk <- claimsdata %>%
  group_by(bucket) %>%
  filter(bal > 0 & bucket != "NA" & class != "Closed" & class == "Patient") %>%
  summarise(n = n()) %>%
  mutate(percent = n/sum(n) * 100)

# Balance in Patient by bucket
patamt <- claimsdata %>%
    group_by(bucket) %>%
    filter(bal > 0 & bucket != "NA" & class != "Closed" & class == "Patient") %>%
    summarise(bal = sum(bal)) %>% 
  mutate(percent = bal / sum(bal) * 100)

# Merge the data frames
patamtcount <- merge(patbk, patamt, by = "bucket")

# Rename percent cols
patamtcount <- patamtcount %>% 
  rename(
    enc = n,
    pctenc = percent.x,
    pctbal = percent.y
    )

# Print data frame
patamtcount

bucket	enc	pctenc	bal	pctbal
0-30	2	13.33333	464.87	22.96369
61-90	3	20.00000	553.00	27.31714
91-120	10	66.66667	1006.50	49.71917

Visualizations

Encounter Counts

Show code

# Number of Encounters by Provider
hcprov <- claimsdata %>%
    group_by(prov) %>%
    filter(bal > 0 & prov != "NA") %>%
    summarise(n = n()) %>%
    arrange(desc(n)) %>%
    hchart("column", hcaes(x = prov, y = n, color = prov), name = "Encounters") %>%
    hc_yAxis(
        gridLineWidth = 0,
        labels = list(style = list(color =  "#000000")),
        title = list(text = "", style = list(color = "#000000"))
    ) %>%
    hc_xAxis(
        labels = list(style = list(color =  "#000000")),
        title = list(text= ""),
        lineWidth = 0,
        tickWidth = 0
    ) %>% 
    hc_title(text = "Encounters by Provider") %>%
    hc_tooltip(
    useHTML = TRUE,
    crosshairs = TRUE,
    borderWidth = 1,
    sort = TRUE
        ) %>%
    hc_add_theme(hc_theme_smpl()) %>%
    hc_plotOptions(
        column = list(
            color = "red",
            dataLabels = list(
                enabled = TRUE
            )
        )
    ) %>%
    hc_size(height = 300, width = 350)

# Number of Encounters by Payer
hcpay <- claimsdata %>%
    group_by(payer) %>%
    filter(bal > 0 & payer != "NA") %>%
    summarise(n = n()) %>%
    arrange(desc(n)) %>%
    hchart("bar", hcaes(x = payer, y = n, color = payer), name = "Encounters") %>%
    hc_yAxis(
        gridLineWidth = 0,
        labels = list(style = list(color =  "#000000")),
        title = list(text = "", style = list(color = "#000000"))
    ) %>%
    hc_xAxis(
        labels = list(style = list(color =  "#000000")),
        title = list(text= ""),
        lineWidth = 0,
        tickWidth = 0
    ) %>%
    hc_title(text = "Encounters by Payer") %>%
    hc_tooltip(
    useHTML = TRUE,
    crosshairs = TRUE,
    borderWidth = 1,
    sort = TRUE
        ) %>%
    hc_add_theme(hc_theme_smpl()) %>%
    hc_plotOptions(
        column = list(
            dataLabels = list(
                enabled = TRUE
            )
        )
    ) %>%
    hc_size(height = 300, width = 350)

# Number of Encounters by Incomplete Status
hcstat <- claimsdata %>%
    filter(bal > 0 & incomp == "Incomplete") %>%
    group_by(stat) %>%
    summarise(n = n()) %>%
    arrange(desc(n)) %>%
    hchart("bar", hcaes(x = stat, y = n, color = stat), name = "Encounters") %>%
    hc_yAxis(
        gridLineWidth = 0,
        labels = list(style = list(color =  "#000000")),
        title = list(text = "", style = list(color = "#000000"))
    ) %>%
    hc_xAxis(
        labels = list(style = list(color =  "#000000")),
        title = list(text= ""),
        lineWidth = 0,
        tickWidth = 0
    ) %>%
  hc_title(text = "Incomplete Encounters") %>%
    hc_add_theme(hc_theme_smpl()) %>%
    hc_tooltip(
    useHTML = TRUE,
    crosshairs = TRUE,
    borderWidth = 1,
    sort = TRUE
        ) %>%
    hc_plotOptions(
        column = list(
            dataLabels = list(
                enabled = TRUE
            )
        )
    ) %>%
    hc_size(height = 300, width = 350)

crosstalk::bscols(
    widths = NA,
hcprov,
hcpay,
hcstat
)

Aging: Count, Amount, Percent

Show code

# Encounters in Aging Buckets
hcbucket <- claimsdata %>%
    group_by(bucket) %>%
    filter(bal > 0 & bucket != "NA") %>%
    summarise(n = n()) %>%
    hchart("column", hcaes(x = bucket, y = n, color = bucket), name = "Encounters") %>%
    hc_yAxis(
        gridLineWidth = 0,
        labels = list(style = list(color =  "#000000")),
        title = list(text = "", style = list(color = "#000000"))
    ) %>%
    hc_xAxis(
        labels = list(style = list(color =  "#000000")),
        title = list(text= ""),
        lineWidth = 0,
        tickWidth = 0
    ) %>% 
    hc_title(text = "Encounters in Aging Buckets") %>% 
    hc_add_theme(hc_theme_smpl()) %>%
    hc_tooltip(
    useHTML = TRUE,
    crosshairs = TRUE,
    borderWidth = 1,
    sort = TRUE
        ) %>%
    hc_plotOptions(
        column = list(
            color = "red",
            dataLabels = list(
                enabled = TRUE
            )
        )
    )

# Percentage of Encounters in Aging Buckets
hcbucket4 <- buckamtcount %>% 
    hchart("column", hcaes(x = bucket, y = round(pctenc, 2), color = bucket), name = "Percentage of Encounters") %>%
    hc_yAxis(
        gridLineWidth = 0,
        labels = list(style = list(color =  "#000000")),
        title = list(text = "", style = list(color = "#000000"))
    ) %>%
    hc_xAxis(
        labels = list(style = list(color =  "#000000")),
        title = list(text= ""),
        lineWidth = 0,
        tickWidth = 0
    ) %>% 
    hc_title(text = "Percentage of Encounters in Aging Buckets") %>% 
    hc_add_theme(hc_theme_smpl()) %>%
    hc_tooltip(
    useHTML = TRUE,
    crosshairs = TRUE,
    borderWidth = 1,
    sort = TRUE
        ) %>%
    hc_plotOptions(
        column = list(
            color = "red",
            dataLabels = list(
                enabled = TRUE
            )
        )
    )

# AR Balance in Aging Buckets
hcbucket2 <- claimsdata %>%
    group_by(bucket) %>%
    filter(bal > 0 & bucket != "NA") %>%
    summarise(bal = sum(bal)) %>%
    hchart("column", hcaes(
      x = bucket, 
      y = bal, 
      color = bucket), 
      name = "Balance") %>%
    hc_yAxis(
        gridLineWidth = 0,
        labels = list(style = list(color =  "#000000")),
        title = list(text = "", style = list(color = "#000000"))
    ) %>%
    hc_xAxis(
        labels = list(style = list(color =  "#000000")),
        title = list(text= ""),
        lineWidth = 0,
        tickWidth = 0
    ) %>% 
    hc_title(text = "Aging Buckets: AR Balance") %>% 
    hc_add_theme(hc_theme_smpl()) %>%
    hc_tooltip(
    useHTML = TRUE,
    crosshairs = TRUE,
    borderWidth = 1,
    sort = TRUE
        ) %>%
    hc_plotOptions(
        column = list(
            color = "red",
            dataLabels = list(
                enabled = TRUE
            )
        )
    )

# AR Balance Percentage in Aging Buckets
hcbucket3 <- buckamtcount %>% 
    hchart("column", hcaes(
      x = bucket, 
      y = round(pctbal, 2), 
      color = bucket), 
      name = "Percentage of Balance") %>%
    hc_yAxis(
        gridLineWidth = 0,
        labels = list(style = list(color =  "#000000")),
        title = list(text = "", style = list(color = "#000000"))
    ) %>%
    hc_xAxis(
        labels = list(style = list(color =  "#000000")),
        title = list(text= ""),
        lineWidth = 0,
        tickWidth = 0
    ) %>% 
    hc_title(text = "Aging Buckets: AR Balance Percentage") %>% 
    hc_add_theme(hc_theme_smpl()) %>%
    hc_tooltip(
    useHTML = TRUE,
    crosshairs = TRUE,
    borderWidth = 1,
    sort = TRUE
        ) %>%
    hc_plotOptions(
        column = list(
            color = "red",
            dataLabels = list(
                enabled = TRUE
            )
        )
    ) 

crosstalk::bscols(
    widths = NA,
hcbucket,
hcbucket4,
hcbucket2,
hcbucket3
)

Aging Percentages by Class

Show code

# Primary Aging by Count
primary_pct <- claimsdata %>%
    filter(bal > 0 & bucket != "NA" & class != "Closed" & class == "Primary") %>%
    count(class, bucket) %>% 
    mutate(percent = n/sum(n) * 100)

primary_two <- primary_pct %>%
    group_by(class) %>% 
    filter(bucket == "0-30" | bucket == "31-60") %>% 
    summarise(npct = sum(percent))

options(scipen = 999, digits = 2)
hcbucket5 <- primary_pct %>%
    hchart("column", hcaes(x = bucket, y = round(percent, 2)), name = "Percentage") %>%
    hc_yAxis(
        gridLineWidth = 0,
        labels = list(style = list(color =  "#000000")),
        title = list(text = "", style = list(color = "#000000"))
    ) %>%
    hc_xAxis(
        labels = list(style = list(color =  "#000000")),
        title = list(text= ""),
        lineWidth = 0,
        tickWidth = 0
    ) %>% 
    hc_title(text = "Primary Aging: 97.61%") %>% 
    hc_add_theme(hc_theme_smpl()) %>%
    hc_tooltip(
    useHTML = TRUE,
        valueDecimals = 2, 
        valueSuffix = '%',
        crosshairs = TRUE,
    borderWidth = 1,
    sort = TRUE
        ) %>%
    hc_plotOptions(
        column = list(
            color = "#34608D",
            dataLabels = list(
                valueDecimals = 2, 
                valueSuffix = '%',
                enabled = TRUE
            )
        )
    ) %>%
    hc_size(height = 300, width = 350)

# Secondary Aging by Count
second_pct <- claimsdata %>%
    filter(bal > 0 & bucket != "NA" & class != "Closed" & class == "Secondary") %>%
    count(class, bucket) %>% 
    mutate(percent = n/sum(n) * 100)

options(scipen = 999, digits = 2)
hcbucket6 <- second_pct %>%
    hchart("column", hcaes(x = bucket, y = round(percent, 2)), name = "Percentage") %>%
    hc_yAxis(
        gridLineWidth = 0,
        labels = list(style = list(color =  "#000000")),
        title = list(text = "", style = list(color = "#000000"))
    ) %>%
    hc_xAxis(
        labels = list(style = list(color =  "#000000")),
        title = list(text= ""),
        lineWidth = 0,
        tickWidth = 0
    ) %>% 
    hc_title(text = "Secondary Aging: 69.23%") %>% 
    hc_add_theme(hc_theme_smpl()) %>%
    hc_tooltip(
    useHTML = TRUE,
    # pointFormat = tltip,
        valueDecimals = 2, 
        valueSuffix = '%',
        crosshairs = TRUE,
    borderWidth = 1,
    sort = TRUE
        ) %>%
    hc_plotOptions(
        column = list(
            color = "#440154",
            dataLabels = list(
                valueDecimals = 2, 
                valueSuffix = '%',
                enabled = TRUE
            )
        )
    ) %>%
    hc_size(height = 300, width = 350)

# Patient Aging by Count
patient_pct <- claimsdata %>%
    filter(bal > 0 & bucket != "NA" & class != "Closed" & class == "Patient") %>%
    count(class, bucket) %>% 
    mutate(percent = n/sum(n) * 100)

options(scipen = 999, digits = 2)
hcbucket7 <- patient_pct %>%
    hchart("column", hcaes(x = bucket, y = round(percent, 2)), name = "Percentage") %>%
    hc_yAxis(
        gridLineWidth = 0,
        labels = list(style = list(color =  "#000000")),
        title = list(text = "", style = list(color = "#000000"))
    ) %>%
    hc_xAxis(
        labels = list(style = list(color =  "#000000")),
        title = list(text= ""),
        lineWidth = 0,
        tickWidth = 0
    ) %>% 
    hc_title(text = "Patient Aging: 33.33%") %>% 
    hc_add_theme(hc_theme_smpl()) %>%
    hc_tooltip(
    useHTML = TRUE,
    # pointFormat = tltip,
        valueDecimals = 2, 
        valueSuffix = '%',
        crosshairs = TRUE,
    borderWidth = 1,
    sort = TRUE
        ) %>%
    hc_plotOptions(
        column = list(
            color = "#4BBD85",
            dataLabels = list(
                valueDecimals = 2, 
                valueSuffix = '%',
                enabled = TRUE
            )
        )
    ) %>%
    hc_size(height = 300, width = 350)

crosstalk::bscols(
    widths = NA,
hcbucket5,
hcbucket6,
hcbucket7
)

Age and Balance Plots

Show code

# Aging Buckets: Count, Amount & Percent
buckamtcount %>% 
  hchart("scatter",
         hcaes(
    x = enc,
    y = bal,
    size = pctbal,
    group = bucket
  )
  )%>%
  hc_yAxis(
      title = list(text = "Balance"),
      labels = list(format = "${value}")
      ) %>%
  hc_xAxis(
    title = list(text = "Number of Claims"),
    labels = list(format = "{value}"),
     crosshair = list(
       snap = TRUE,
       width = 2,
       zIndex = 0)
    ) %>% 
  hc_title(text = "Aging Buckets: Count, Amount & Percent") %>%
  hc_subtitle(text = "Bubble size represents 
              the percentage of the total AR balance.") %>%
  hc_credits(
    enabled = TRUE,
    text = "HIMNI Billing Inc.",
    href = "https://www.himni.info/"
  ) %>%
  hc_add_theme(hc_theme_smpl()) %>%
  hc_caption(
    text = "<b>Bubble represents the different aging buckets while the 
    size of the bubble represents the percentage of the total AR 
    balance.</b><br><em> Excludes claims with a balance less than 
    or equal to zero, as well as any unsubmitted claims missing 
    information. <br></em>"
  ) %>% 
    hc_tooltip(
    useHTML = TRUE,
    # pointFormat = tltip,
        valueDecimals = 2, 
        valuePrefix = '$',
        crosshairs = TRUE,
    borderWidth = 1,
    sort = TRUE
        ) %>% 
    hc_legend(
    align = "right",
    verticalAlign = "top",
    layout = "vertical",
    x = 0,
    y = 100
  )

Show code

# Claims by Age and Outstanding Balance
options(scipen = 999, digits = 2)
hcscatter_1 <- claimsdata %>%
    filter(bal > 0 & payer != "NA") %>%
    hchart("scatter",
  hcaes(
    x = ageopen,
    y = bal,
    size = bal,
    group = payer
  )
) %>%
  hc_yAxis(
      title = list(text = "Balance"),
      labels = list(format = "${value}")
      ) %>%
  hc_xAxis(
    title = list(text = "Age of Claim"),
    labels = list(format = "{value} days"),
     crosshair = list(
       snap = TRUE,
       width = 2,
       zIndex = 0)
    ) %>% 
  hc_title(text = "Open Claims by Age and Outstanding Balance") %>%
  hc_subtitle(text = "Bubble size represents 
              balance amount remaining on the claim.") %>%
  hc_credits(
    enabled = TRUE,
    text = "HIMNI Billing Inc.",
    href = "https://www.himni.info/"
  ) %>%
  hc_add_theme(hc_theme_smpl()) %>%
  hc_caption(
    text = "<b>Bubble size represents the balance amount remaining on the claim.</b><br><em>
    Excludes claims with a balance less than or equal to zero, 
    as well as any unsubmitted claims missing information. <br></em>"
  ) %>% 
    hc_tooltip(
    useHTML = TRUE,
    # pointFormat = tltip,
        valueDecimals = 2, 
        valuePrefix = '$',
        crosshairs = TRUE,
    borderWidth = 1,
    sort = TRUE
        ) %>% 
    hc_legend(
    align = "right",
    verticalAlign = "top",
    layout = "vertical",
    x = 0,
    y = 100
  ) %>%
    hc_size(height = 600, width = 600)

# Unsigned Encounters by Age and Balance
options(scipen = 999, digits = 2)
hcscatter_2 <- claimsdata %>%
    filter(bal > 0 & prov != "NA") %>%
    hchart("scatter",
  hcaes(
    x = ageopen,
    y = bal,
    size = bal,
    group = prov, 
    colorKey = "bal"
  )
) %>%
    hc_colorAxis(
      min = 0,
      max = max(claimsdata$bal),
      minColor = "blue", 
      maxColor = "red",
      layout = "horizontal"
      ) %>%
  hc_yAxis(
      title = list(text = "Balance"),
      labels = list(format = "${value}")
      ) %>%
  hc_xAxis(
    title = list(text = "Age of Claim"),
    labels = list(format = "{value} days"),
     crosshair = list(
       snap = TRUE,
       width = 2,
       zIndex = 0)
    ) %>% 
  hc_title(text = "Unsigned Encounters by Age and Outstanding Balance") %>%
  hc_subtitle(text = "Bubble size represents gross charge amount on the claim.") %>%
  hc_credits(
    enabled = TRUE,
    text = "HIMNI Billing Inc.",
    href = "https://www.himni.info/"
  ) %>%
  hc_add_theme(hc_theme_smpl()) %>%
  hc_caption(
    text = "<b>Bubble size represents 
    the balance amount remaining on the claim.</b><br><em>
    Excludes claims with a balance less than or equal to zero, 
    as well as any unsubmitted claims missing information. <br></em>"
  ) %>% 
    hc_tooltip(
    useHTML = TRUE,
    # pointFormat = tltip,
        valueDecimals = 2, 
        valuePrefix = '$',
        crosshairs = TRUE,
    borderWidth = 1,
    sort = TRUE
        ) %>% 
    hc_legend(
    align = "right",
    verticalAlign = "bottom",
    layout = "horizontal",
    x = 0,
    y = 10
  ) %>%
    hc_size(height = 600, width = 600)

crosstalk::bscols(
    widths = NA,
hcscatter_1,
hcscatter_2
)

Diagnosis and Procedure Coding

Show code

# Primary Procedure by Provider
hcpx1 <- claimsdata %>%
    group_by(prov, px1) %>%
    filter(px1 != "none" & prov != "NA") %>%
    summarise(n = n()) %>%
    filter(n > 1) %>%
    arrange(desc(n)) %>%
    hchart("column", hcaes(x = px1, y = n, group = prov)) %>%
    hc_yAxis(
        gridLineWidth = 0,
        labels = list(style = list(color =  "#000000")),
        title = list(text = "", style = list(color = "#000000"))
    ) %>%
    hc_xAxis(
        labels = list(style = list(color =  "#000000")),
        title = list(text= ""),
        lineWidth = 0,
        tickWidth = 0
    ) %>%
    hc_title(text = "Primary Procedure by Provider (Oct '21 - Feb '22)") %>%
    hc_subtitle(text = "For Procedure Codes Occurring More Than Once") %>%
    hc_tooltip(
    useHTML = TRUE,
    crosshairs = TRUE,
    borderWidth = 1,
    sort = TRUE
        ) %>%
    hc_add_theme(hc_theme_smpl()) %>%
    hc_plotOptions(
        column = list(
            dataLabels = list(
                enabled = TRUE
            )
        )
    ) %>%
    hc_size(height = 550, width = 750)

# Primary Procedure by Payer
hcpx2 <- claimsdata %>%
    group_by(payer, px1) %>%
    filter(px1 != "none" & payer != "NA") %>%
    summarise(n = n()) %>%
    filter(n > 1) %>%
    arrange(desc(n)) %>%
    hchart("column", hcaes(x = px1, y = n, group = payer)) %>%
    hc_yAxis(
        gridLineWidth = 0,
        labels = list(style = list(color =  "#000000")),
        title = list(text = "", style = list(color = "#000000"))
    ) %>%
    hc_xAxis(
        labels = list(style = list(color =  "#000000")),
        title = list(text= ""),
        lineWidth = 0,
        tickWidth = 0
    ) %>%
    hc_title(text = "Primary Procedure by Payer (Oct '21 - Feb '22)") %>%
    hc_subtitle(text = "For Procedure Codes Occurring More Than Once") %>%
    hc_tooltip(
    useHTML = TRUE,
    crosshairs = TRUE,
    borderWidth = 1,
    sort = TRUE
        ) %>%
    hc_add_theme(hc_theme_smpl()) %>%
    hc_plotOptions(
        column = list(
            dataLabels = list(
                enabled = TRUE
            )
        )
    ) %>%
    hc_size(height = 550, width = 750)

crosstalk::bscols(
    widths = NA,
hcpx1,
hcpx2
)

Data Dictionary

Below is a quick overview of the variables in the data set, their names and descriptions:

Variable	Name	Description
`enc`	Encounter ID	An alpha-numeric string that is a concatenation of three separate unique identifiers. The first two characters (“AB”) represent the billing client ID, the numeric string following it functions as the unique patient ID, and the final letter-number combination is the visit number for that patient.
`type`	Claim Type	There are two types of claims: A HCFA (CMS-1500) and a UB-04 (CMS-1450).
`pt`	Patient Name	The patient’s last name.
`dos`	Date of Service (DOS)	The specific date that the encounter (time at which a patient has been given medical treatment) occurred on.
`today`	Today’s Date	The date that you’re viewing this information in the EHR and/or creating the report on.
`pos`	Place of Service (POS)	Two-digit code indicating the setting that a healthcare service has been provided.
`prov`	Provider Name	The provider’s last name.
`loc`	Location	The location of the healthcare facility that the encounter took place at.
`payer`	Payer	The party responsible for reimbursement of the claim, whether a third-party payer or the patient themselves.
`class`	Insurance Class	The three classes of health insurance: Primary, Secondary, and Tertiary.
`stat`	Claim Status	Current status of the claim.
`billed`	Billed Date	Date the claim was last billed.
`doc`	Date Completed	Date the claim’s balance reached $0.00 or less.
`bal`	Claim Balance	Amount currently owed on the claim.
`rej`	Reason for Rejection	CARC/RARC code indicating the reason for rejection of the claim.
`den`	Reason for Denial	CARC/RARC code indicating the reason for denial of the claim.
`att`	Attention Message	Post-adjudication warning message.
`px1`	Primary Procedure Code	HCPCS Level II code for the primary procedure code
`px1m`	Primary Procedure Code Modifier	HCPCS Level II modifier for the primary procedure code
`px2`	Secondary Procedure Code	HCPCS Level II code for the secondary procedure code
`px2m`	Secondary Procedure Code Modifier	HCPCS Level II modifier for the secondary procedure code
`px3`	Tertiary Procedure Code	HCPCS Level II code for the tertiary procedure code
`px3m`	Tertiary Procedure Code Modifier	HCPCS Level II modifier for the tertiary procedure code
`dx1`	Primary Diagnosis Code	ICD-10-CM code for the primary/principal diagnosis
`dx2`	Secondary Diagnosis Code	ICD-10-CM code for the secondary diagnosis
`dx3`	Tertiary Diagnosis Code	ICD-10-CM code for the tertiary diagnosis

Last updated on

[1] "2022-04-06 06:33:27 EDT"

Comment on this article Share:

Data Wrangling: EHR Healthcare Claims Data

Introduction

Packages

Data

Data Cleaning & Wrangling

Convert ‘Date’ and ‘Character’ Types

New Columns: Age Open, Age Closed, Bucket

Dates, Months, Quarters, Years

Status Columns

Encounter Counts & Percentage

Encounters by Provider

Encounters by Payer

Encounters by Incomplete Status

Aging: Count, Amount, Percent

Primary: Count, Amount, Percent

Secondary: Count, Amount, Percent

Patient: Count, Amount, Percent

Visualizations

Encounter Counts

Aging: Count, Amount, Percent

Aging Percentages by Class

Age and Balance Plots

Diagnosis and Procedure Coding

Data Dictionary

Last updated on

Corrections

Reuse

Citation