Using the {tidyverse} to wrangle EHR/EMR healthcare claims data.
This is intended to be a step-by-step guide for data analysis of healthcare claims data from an EHR/EMR; one of a series of posts I’ll be working on with the goal of turning the code into functions in the interest of speeding up repetitive analyses.
I’ll load the packages that I’ll be using:
I’m using a mock data set I’ve created using medical coding practice exercises that I have collected, the {randomNames} package for creating patient names, and my personal knowledge of EMR/EHR data systems to fill in the rest. A quick note: this is a very light set of data as compared to what you might actually see in an EHR. In the interest of time, I’ve tried to use only the types of data I would need to write these functions to perform the applicable analyses. Let’s take a look at the structure of the raw data:
str(claimsdata)
tibble [75 x 27] (S3: tbl_df/tbl/data.frame)
$ enc : chr [1:75] "AB073A1" "AB071A3" "AB072A2" "AB070A4" ...
$ type : chr [1:75] "HCFA" "HCFA" "HCFA" "HCFA" ...
$ pt : chr [1:75] "Long" "Noorani" "Tampubolon" "Martin" ...
$ datecreate: POSIXct[1:75], format: "2022-02-03" "2022-02-04" ...
$ datedos : POSIXct[1:75], format: "2022-02-03" "2022-02-04" ...
$ datenow : POSIXct[1:75], format: "2022-02-12" "2022-02-12" ...
$ datesign : POSIXct[1:75], format: NA NA ...
$ pos : chr [1:75] "11" NA "11" "11" ...
$ prov : chr [1:75] "Johnson" "Johnson" "Johnson" NA ...
$ loc : chr [1:75] "Valdosta" "Valdosta" "Valdosta" "Valdosta" ...
$ payer : chr [1:75] "Aetna" "Medicare" "UHC" "BCBS" ...
$ class : chr [1:75] "Primary" "Primary" "Primary" "Primary" ...
$ stat : chr [1:75] "Unsigned" "Place of Service" "Date of Service" "Provider" ...
$ datebill : POSIXct[1:75], format: NA NA ...
$ datecomp : POSIXct[1:75], format: NA NA ...
$ bal : num [1:75] 975 711.8 723.1 659.8 71.9 ...
$ rej : logi [1:75] NA NA NA NA NA NA ...
$ den : chr [1:75] NA NA NA NA ...
$ px1 : num [1:75] 99214 99204 99214 99203 99214 ...
$ px1m : num [1:75] 25 NA NA NA 25 NA NA 25 NA 25 ...
$ px2 : num [1:75] 93350 NA NA NA 93289 ...
$ px2m : chr [1:75] NA NA NA NA ...
$ px3 : chr [1:75] NA NA NA NA ...
$ px3m : chr [1:75] NA NA NA NA ...
$ dx1 : chr [1:75] "R55" "R00.0" "R60.0" "I49.3" ...
$ dx2 : chr [1:75] "E78.5" "Z33.1" "I11.9" "I34.0" ...
$ dx3 : chr [1:75] "I25.9" NA NA "I34.1" ...
First I need to make sure any columns with dates are converted to
date objects. Next, I need to convert the Procedure code
columns to the character type. Since each of the Diagnosis
code columns are alpha-numeric (as all ICD-10-CM codes are), they were
automatically identified as the character type when the
data was read in.
# Convert date columns to date object
claimsdata$datecreate <- as.Date(claimsdata$datecreate, "%yyyy-%mm-%dd", tz = "EST")
claimsdata$datedos <- as.Date(claimsdata$datedos, "%yyyy-%mm-%dd", tz = "EST")
claimsdata$datesign <- as.Date(claimsdata$datesign, "%yyyy-%mm-%dd", tz = "EST")
claimsdata$datenow <- as.Date(claimsdata$datenow, "%yyyy-%mm-%dd", tz = "EST")
claimsdata$datebill <- as.Date(claimsdata$datebill, "%yyyy-%mm-%dd", tz = "EST")
claimsdata$datecomp <- as.Date(claimsdata$datecomp, "%yyyy-%mm-%dd", tz = "EST")
# Convert Px/Dx/Mod columns to factor type
claimsdata$px1 <- as.factor(claimsdata$px1)
claimsdata$px1m <- as.factor(claimsdata$px1m)
claimsdata$px2 <- as.factor(claimsdata$px2)
claimsdata$px2m <- as.factor(claimsdata$px2m)
claimsdata$px3 <- as.factor(claimsdata$px3)
claimsdata$px3m <- as.factor(claimsdata$px3m)
str(claimsdata)
tibble [75 x 27] (S3: tbl_df/tbl/data.frame)
$ enc : chr [1:75] "AB073A1" "AB071A3" "AB072A2" "AB070A4" ...
$ type : chr [1:75] "HCFA" "HCFA" "HCFA" "HCFA" ...
$ pt : chr [1:75] "Long" "Noorani" "Tampubolon" "Martin" ...
$ datecreate: Date[1:75], format: "2022-02-02" "2022-02-03" ...
$ datedos : Date[1:75], format: "2022-02-02" "2022-02-03" ...
$ datenow : Date[1:75], format: "2022-02-11" "2022-02-11" ...
$ datesign : Date[1:75], format: NA NA ...
$ pos : chr [1:75] "11" NA "11" "11" ...
$ prov : chr [1:75] "Johnson" "Johnson" "Johnson" NA ...
$ loc : chr [1:75] "Valdosta" "Valdosta" "Valdosta" "Valdosta" ...
$ payer : chr [1:75] "Aetna" "Medicare" "UHC" "BCBS" ...
$ class : chr [1:75] "Primary" "Primary" "Primary" "Primary" ...
$ stat : chr [1:75] "Unsigned" "Place of Service" "Date of Service" "Provider" ...
$ datebill : Date[1:75], format: NA NA ...
$ datecomp : Date[1:75], format: NA NA ...
$ bal : num [1:75] 975 711.8 723.1 659.8 71.9 ...
$ rej : logi [1:75] NA NA NA NA NA NA ...
$ den : chr [1:75] NA NA NA NA ...
$ px1 : Factor w/ 18 levels "93224","93227",..: 11 8 11 7 11 11 11 17 11 11 ...
$ px1m : Factor w/ 2 levels "25","26": 1 NA NA NA 1 NA NA 1 NA 1 ...
$ px2 : Factor w/ 9 levels "17110","23650",..: 7 NA NA NA 6 NA NA 4 NA 5 ...
$ px2m : Factor w/ 3 levels "26","59","RT": NA NA NA NA 1 NA NA 2 NA 1 ...
$ px3 : Factor w/ 6 levels "12011","93010",..: NA NA NA NA NA NA NA 2 NA NA ...
$ px3m : Factor w/ 1 level "XS-RT": NA NA NA NA NA NA NA NA NA NA ...
$ dx1 : chr [1:75] "R55" "R00.0" "R60.0" "I49.3" ...
$ dx2 : chr [1:75] "E78.5" "Z33.1" "I11.9" "I34.0" ...
$ dx3 : chr [1:75] "I25.9" NA NA "I34.1" ...
# Add 'Age Open' Column to assign
# an Aging Bucket to each Open claim
claimsdata <- claimsdata %>%
mutate(ageopen = days(datenow) - days(datedos))
claimsdata$ageopen <- as.numeric(claimsdata$ageopen, "hours")
claimsdata <- claimsdata %>%
mutate(ageopen = ageopen / 24)
# Add 'Bucket' Column to assign
# an Aging Bucket to each Open claim
claimsdata <- claimsdata %>%
mutate(bucket = case_when(
claimsdata$ageopen < 31 ~ "0-30",
claimsdata$ageopen >= 31 & claimsdata$ageopen < 61 ~ "31-60",
claimsdata$ageopen >= 61 & claimsdata$ageopen < 91 ~ "61-90",
claimsdata$ageopen >= 91 & claimsdata$ageopen < 121 ~ "91-120",
claimsdata$ageopen >= 121 & claimsdata$ageopen < 151 ~ "121-150",
claimsdata$ageopen >= 151 & claimsdata$ageopen < 181 ~ "151-180",
claimsdata$ageopen >= 181 ~ "181+",
TRUE ~ "NA"
))
# Convert 'Bucket' column to factor
claimsdata$bucket <- factor(claimsdata$bucket,
levels = c("0-30",
"31-60",
"61-90",
"91-120",
"121-150",
"151-180",
"181+",
"NA"
),
ordered = is.ordered(claimsdata$bucket))
# Add 'Age Close' Column to calculate
# the number of days it took to Close claim
claimsdata <- claimsdata %>%
mutate(ageclose = days(datecomp) - days(datedos))
claimsdata$ageclose <- as.numeric(claimsdata$ageclose, "hours")
claimsdata <- claimsdata %>%
mutate(ageclose = ageclose / 24)
# Print data frame
claimsdata %>%
select(
enc,
stat,
datecreate,
datedos,
datenow,
datesign,
datebill,
bucket,
ageopen,
ageclose
) %>%
head()
| enc | stat | datecreate | datedos | datenow | datesign | datebill | bucket | ageopen | ageclose |
|---|---|---|---|---|---|---|---|---|---|
| AB073A1 | Unsigned | 2022-02-02 | 2022-02-02 | 2022-02-11 | NA | NA | 0-30 | 9 | NA |
| AB071A3 | Place of Service | 2022-02-03 | 2022-02-03 | 2022-02-11 | NA | NA | 0-30 | 8 | NA |
| AB072A2 | Date of Service | 2022-02-03 | NA | 2022-02-11 | NA | NA | NA | NA | NA |
| AB070A4 | Provider | 2022-02-10 | 2022-02-10 | 2022-02-11 | NA | NA | 0-30 | 1 | NA |
| AB069A5 | Billed | 2022-02-09 | 2022-02-09 | 2022-02-11 | 2022-02-27 | 2022-02-27 | 0-30 | 2 | NA |
| AB068A6 | Ready | 2022-02-08 | 2022-02-08 | 2022-02-11 | 2022-02-09 | NA | 0-30 | 3 | NA |
# function to create many different date types based on the Date of Service
claims_ymq <- function(df, date_col) {
stopifnot(inherits(df, "data.frame"))
stopifnot(class(df %>% dplyr::pull({{date_col}})) == 'Date')
dplyr::mutate(df,
nmon = lubridate::month({{date_col}}, label = FALSE),
ndip = lubridate::days_in_month({{date_col}}),
month = lubridate::month({{date_col}}, label = TRUE, abbr = FALSE),
mon = lubridate::month({{date_col}}, label = TRUE, abbr = TRUE),
year = lubridate::year({{date_col}}),
nqtr = lubridate::quarter({{date_col}}),
yqtr = lubridate::quarter({{date_col}}, with_year = TRUE),
dqtr = paste0(lubridate::quarter({{date_col}}),"Q", format({{date_col}},"%y")),
ymon = as.numeric(format({{date_col}}, "%Y.%m")),
mmon = format({{date_col}}, "%b %Y"),
nhalf = lubridate::semester({{date_col}}),
yhalf = lubridate::semester({{date_col}}, with_year = TRUE),
dhalf = paste0(lubridate::semester({{date_col}}), "H", format({{date_col}},"%y"))
)
}
# call function
claimsdata <- claims_ymq(df = claimsdata, date_col = datecreate)
# print data frame
claimsdata %>%
select(
datecreate,
nmon,
ndip,
month,
mon,
year,
nqtr,
yqtr,
dqtr,
ymon,
nmon,
nhalf,
yhalf,
dhalf
) %>%
head()
| datecreate | nmon | ndip | month | mon | year | nqtr | yqtr | dqtr | ymon | nhalf | yhalf | dhalf |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2022-02-02 | 2 | 28 | February | Feb | 2022 | 1 | 2022.1 | 1Q22 | 2022.02 | 1 | 2022.1 | 1H22 |
| 2022-02-03 | 2 | 28 | February | Feb | 2022 | 1 | 2022.1 | 1Q22 | 2022.02 | 1 | 2022.1 | 1H22 |
| 2022-02-03 | 2 | 28 | February | Feb | 2022 | 1 | 2022.1 | 1Q22 | 2022.02 | 1 | 2022.1 | 1H22 |
| 2022-02-10 | 2 | 28 | February | Feb | 2022 | 1 | 2022.1 | 1Q22 | 2022.02 | 1 | 2022.1 | 1H22 |
| 2022-02-09 | 2 | 28 | February | Feb | 2022 | 1 | 2022.1 | 1Q22 | 2022.02 | 1 | 2022.1 | 1H22 |
| 2022-02-08 | 2 | 28 | February | Feb | 2022 | 1 | 2022.1 | 1Q22 | 2022.02 | 1 | 2022.1 | 1H22 |
# Add 'Cycle' Column to assign an RCM Status to each claim
claimsdata <- claimsdata %>%
mutate(cycle = case_when(
claimsdata$stat == "Complete" ~ "Closed",
claimsdata$stat == "Refund" ~ "Closed",
claimsdata$stat == "Credit" ~ "Closed",
TRUE ~ "Open"
))
# Add 'Review' Column to assign Claims to Responsible Party
claimsdata <- claimsdata %>%
mutate(review = case_when(
claimsdata$stat == "Unsigned" ~ "Client",
claimsdata$stat == "Charges" ~ "Client",
claimsdata$stat == "Place of Service" ~ "Client",
claimsdata$stat == "Date of Service" ~ "Client",
claimsdata$stat == "Location" ~ "Client",
claimsdata$stat == "Provider" ~ "Client",
claimsdata$stat == "Payer" ~ "Client",
claimsdata$stat == "Refund" ~ "Client",
claimsdata$stat == "Credit" ~ "Client",
claimsdata$stat == "Complete" ~ "None",
claimsdata$stat == "Billed" ~ "Billing",
claimsdata$stat == "New" ~ "Billing",
claimsdata$stat == "Ready" ~ "Billing",
claimsdata$stat == "Rejection" ~ "Billing",
claimsdata$stat == "Denial" ~ "Aging",
claimsdata$stat == "Appeal" ~ "Aging",
TRUE ~ "Other"
))
## Add 'incomp' Column to group Incomplete Statuses
claimsdata <- claimsdata %>%
mutate(incomp = case_when(
claimsdata$stat == "Unsigned" ~ "Incomplete",
claimsdata$stat == "Charges" ~ "Incomplete",
claimsdata$stat == "Place of Service" ~ "Incomplete",
claimsdata$stat == "Date of Service" ~ "Incomplete",
claimsdata$stat == "Location" ~ "Incomplete",
claimsdata$stat == "Provider" ~ "Incomplete",
claimsdata$stat == "Payer" ~ "Incomplete",
claimsdata$stat == "Refund" ~ "Review",
claimsdata$stat == "Credit" ~ "Review",
claimsdata$stat == "Complete" ~ "Complete",
claimsdata$stat == "Billed" ~ "Billed",
claimsdata$stat == "New" ~ "New",
claimsdata$stat == "Ready" ~ "Ready",
claimsdata$stat == "Rejection" ~ "Rejection",
claimsdata$stat == "Denial" ~ "Denial",
claimsdata$stat == "Appeal" ~ "Appeal",
TRUE ~ "Other"
))
# print data frame
claimsdata %>%
select(
enc,
stat,
cycle,
review,
incomp
) %>%
head()
| enc | stat | cycle | review | incomp |
|---|---|---|---|---|
| AB073A1 | Unsigned | Open | Client | Incomplete |
| AB071A3 | Place of Service | Open | Client | Incomplete |
| AB072A2 | Date of Service | Open | Client | Incomplete |
| AB070A4 | Provider | Open | Client | Incomplete |
| AB069A5 | Billed | Open | Billing | Billed |
| AB068A6 | Ready | Open | Billing | Ready |
# Number of Encounters by Provider
claimsdata %>%
group_by(prov) %>%
filter(bal > 0 & prov != "NA") %>%
summarise(n = n()) %>%
mutate(percent = n/sum(n) * 100) %>%
arrange(desc(n))
| prov | n | percent |
|---|---|---|
| Smith | 33 | 47.14286 |
| Johnson | 28 | 40.00000 |
| Gibson | 9 | 12.85714 |
claimsdata %>%
group_by(payer) %>%
filter(bal > 0 & payer != "NA") %>%
summarise(n = n()) %>%
mutate(percent = n/sum(n) * 100) %>%
arrange(desc(n))
| payer | n | percent |
|---|---|---|
| Medicare | 19 | 27.142857 |
| UHC | 17 | 24.285714 |
| Patient | 14 | 20.000000 |
| BCBS | 7 | 10.000000 |
| Aetna | 4 | 5.714286 |
| Humana | 4 | 5.714286 |
| Meritain | 2 | 2.857143 |
| Ambetter | 1 | 1.428571 |
| Cigna | 1 | 1.428571 |
| Medicaid | 1 | 1.428571 |
claimsdata %>%
filter(bal > 0 & incomp == "Incomplete") %>%
group_by(stat) %>%
summarise(n = n()) %>%
mutate(percent = n/sum(n) * 100) %>%
arrange(desc(n))
| stat | n | percent |
|---|---|---|
| Unsigned | 6 | 42.857143 |
| Location | 3 | 21.428571 |
| Charges | 1 | 7.142857 |
| Date of Service | 1 | 7.142857 |
| Payer | 1 | 7.142857 |
| Place of Service | 1 | 7.142857 |
| Provider | 1 | 7.142857 |
# Encounters in Aging Buckets
buckcount <- claimsdata %>%
group_by(bucket) %>%
filter(bal > 0 & bucket != "NA") %>%
summarise(n = n()) %>%
mutate(percent = n/sum(n) * 100)
# AR Balance in Aging Buckets + Percentage
buckamt <- claimsdata %>%
group_by(bucket) %>%
filter(bal > 0 & bucket != "NA") %>%
summarise(bal = sum(bal)) %>%
mutate(percent = bal / sum(bal) * 100)
# Merge the data frames
buckamtcount <- merge(buckcount, buckamt, by = "bucket")
# Rename percent cols
buckamtcount <- buckamtcount %>%
rename(
enc = n,
pctenc = percent.x,
pctbal = percent.y
)
#Print data frame
buckamtcount
| bucket | enc | pctenc | bal | pctbal |
|---|---|---|---|---|
| 0-30 | 34 | 48.571429 | 13586.38 | 68.041072 |
| 31-60 | 18 | 25.714286 | 4166.39 | 20.865429 |
| 61-90 | 5 | 7.142857 | 817.00 | 4.091565 |
| 91-120 | 13 | 18.571429 | 1398.14 | 7.001935 |
# Encounters in Primary by bucket
primbk <- claimsdata %>%
group_by(bucket) %>%
filter(bal > 0 & bucket != "NA" & class != "Closed" & class == "Primary") %>%
summarise(n = n()) %>%
mutate(percent = n/sum(n) * 100)
# Balance in Primary by bucket
primamt <- claimsdata %>%
group_by(bucket) %>%
filter(bal > 0 & bucket != "NA" & class != "Closed" & class == "Primary") %>%
summarise(bal = sum(bal)) %>%
mutate(percent = bal / sum(bal) * 100)
# Merge the data frames
primamtcount <- merge(primbk, primamt, by = "bucket")
# Rename percent cols
primamtcount <- primamtcount %>%
rename(
enc = n,
pctenc = percent.x,
pctbal = percent.y
)
# Print data frame
primamtcount
| bucket | enc | pctenc | bal | pctbal |
|---|---|---|---|---|
| 0-30 | 25 | 60.975610 | 10891.27 | 75.0769984 |
| 31-60 | 15 | 36.585366 | 3483.89 | 24.0155651 |
| 91-120 | 1 | 2.439024 | 131.64 | 0.9074365 |
# Encounters in Secondary by bucket
secbk <- claimsdata %>%
group_by(bucket) %>%
filter(bal > 0 & bucket != "NA" & class != "Closed" & class == "Secondary") %>%
summarise(n = n()) %>%
mutate(percent = n/sum(n) * 100)
# Balance in Secondary by bucket
secamt <- claimsdata %>%
group_by(bucket) %>%
filter(bal > 0 & bucket != "NA" & class != "Closed" & class == "Secondary") %>%
summarise(bal = sum(bal)) %>%
mutate(percent = bal / sum(bal) * 100)
# Merge the data frames
secamtcount <- merge(secbk, secamt, by = "bucket")
# Rename percent cols
secamtcount <- secamtcount %>%
rename(
enc = n,
pctenc = percent.x,
pctbal = percent.y
)
# Print data frame
secamtcount
| bucket | enc | pctenc | bal | pctbal |
|---|---|---|---|---|
| 0-30 | 6 | 46.15385 | 1590.24 | 56.860488 |
| 31-60 | 3 | 23.07692 | 682.50 | 24.403413 |
| 61-90 | 2 | 15.38462 | 264.00 | 9.439562 |
| 91-120 | 2 | 15.38462 | 260.00 | 9.296538 |
# Encounters in Patient by bucket
patbk <- claimsdata %>%
group_by(bucket) %>%
filter(bal > 0 & bucket != "NA" & class != "Closed" & class == "Patient") %>%
summarise(n = n()) %>%
mutate(percent = n/sum(n) * 100)
# Balance in Patient by bucket
patamt <- claimsdata %>%
group_by(bucket) %>%
filter(bal > 0 & bucket != "NA" & class != "Closed" & class == "Patient") %>%
summarise(bal = sum(bal)) %>%
mutate(percent = bal / sum(bal) * 100)
# Merge the data frames
patamtcount <- merge(patbk, patamt, by = "bucket")
# Rename percent cols
patamtcount <- patamtcount %>%
rename(
enc = n,
pctenc = percent.x,
pctbal = percent.y
)
# Print data frame
patamtcount
| bucket | enc | pctenc | bal | pctbal |
|---|---|---|---|---|
| 0-30 | 2 | 13.33333 | 464.87 | 22.96369 |
| 61-90 | 3 | 20.00000 | 553.00 | 27.31714 |
| 91-120 | 10 | 66.66667 | 1006.50 | 49.71917 |
# Number of Encounters by Provider
hcprov <- claimsdata %>%
group_by(prov) %>%
filter(bal > 0 & prov != "NA") %>%
summarise(n = n()) %>%
arrange(desc(n)) %>%
hchart("column", hcaes(x = prov, y = n, color = prov), name = "Encounters") %>%
hc_yAxis(
gridLineWidth = 0,
labels = list(style = list(color = "#000000")),
title = list(text = "", style = list(color = "#000000"))
) %>%
hc_xAxis(
labels = list(style = list(color = "#000000")),
title = list(text= ""),
lineWidth = 0,
tickWidth = 0
) %>%
hc_title(text = "Encounters by Provider") %>%
hc_tooltip(
useHTML = TRUE,
crosshairs = TRUE,
borderWidth = 1,
sort = TRUE
) %>%
hc_add_theme(hc_theme_smpl()) %>%
hc_plotOptions(
column = list(
color = "red",
dataLabels = list(
enabled = TRUE
)
)
) %>%
hc_size(height = 300, width = 350)
# Number of Encounters by Payer
hcpay <- claimsdata %>%
group_by(payer) %>%
filter(bal > 0 & payer != "NA") %>%
summarise(n = n()) %>%
arrange(desc(n)) %>%
hchart("bar", hcaes(x = payer, y = n, color = payer), name = "Encounters") %>%
hc_yAxis(
gridLineWidth = 0,
labels = list(style = list(color = "#000000")),
title = list(text = "", style = list(color = "#000000"))
) %>%
hc_xAxis(
labels = list(style = list(color = "#000000")),
title = list(text= ""),
lineWidth = 0,
tickWidth = 0
) %>%
hc_title(text = "Encounters by Payer") %>%
hc_tooltip(
useHTML = TRUE,
crosshairs = TRUE,
borderWidth = 1,
sort = TRUE
) %>%
hc_add_theme(hc_theme_smpl()) %>%
hc_plotOptions(
column = list(
dataLabels = list(
enabled = TRUE
)
)
) %>%
hc_size(height = 300, width = 350)
# Number of Encounters by Incomplete Status
hcstat <- claimsdata %>%
filter(bal > 0 & incomp == "Incomplete") %>%
group_by(stat) %>%
summarise(n = n()) %>%
arrange(desc(n)) %>%
hchart("bar", hcaes(x = stat, y = n, color = stat), name = "Encounters") %>%
hc_yAxis(
gridLineWidth = 0,
labels = list(style = list(color = "#000000")),
title = list(text = "", style = list(color = "#000000"))
) %>%
hc_xAxis(
labels = list(style = list(color = "#000000")),
title = list(text= ""),
lineWidth = 0,
tickWidth = 0
) %>%
hc_title(text = "Incomplete Encounters") %>%
hc_add_theme(hc_theme_smpl()) %>%
hc_tooltip(
useHTML = TRUE,
crosshairs = TRUE,
borderWidth = 1,
sort = TRUE
) %>%
hc_plotOptions(
column = list(
dataLabels = list(
enabled = TRUE
)
)
) %>%
hc_size(height = 300, width = 350)
crosstalk::bscols(
widths = NA,
hcprov,
hcpay,
hcstat
)
# Encounters in Aging Buckets
hcbucket <- claimsdata %>%
group_by(bucket) %>%
filter(bal > 0 & bucket != "NA") %>%
summarise(n = n()) %>%
hchart("column", hcaes(x = bucket, y = n, color = bucket), name = "Encounters") %>%
hc_yAxis(
gridLineWidth = 0,
labels = list(style = list(color = "#000000")),
title = list(text = "", style = list(color = "#000000"))
) %>%
hc_xAxis(
labels = list(style = list(color = "#000000")),
title = list(text= ""),
lineWidth = 0,
tickWidth = 0
) %>%
hc_title(text = "Encounters in Aging Buckets") %>%
hc_add_theme(hc_theme_smpl()) %>%
hc_tooltip(
useHTML = TRUE,
crosshairs = TRUE,
borderWidth = 1,
sort = TRUE
) %>%
hc_plotOptions(
column = list(
color = "red",
dataLabels = list(
enabled = TRUE
)
)
)
# Percentage of Encounters in Aging Buckets
hcbucket4 <- buckamtcount %>%
hchart("column", hcaes(x = bucket, y = round(pctenc, 2), color = bucket), name = "Percentage of Encounters") %>%
hc_yAxis(
gridLineWidth = 0,
labels = list(style = list(color = "#000000")),
title = list(text = "", style = list(color = "#000000"))
) %>%
hc_xAxis(
labels = list(style = list(color = "#000000")),
title = list(text= ""),
lineWidth = 0,
tickWidth = 0
) %>%
hc_title(text = "Percentage of Encounters in Aging Buckets") %>%
hc_add_theme(hc_theme_smpl()) %>%
hc_tooltip(
useHTML = TRUE,
crosshairs = TRUE,
borderWidth = 1,
sort = TRUE
) %>%
hc_plotOptions(
column = list(
color = "red",
dataLabels = list(
enabled = TRUE
)
)
)
# AR Balance in Aging Buckets
hcbucket2 <- claimsdata %>%
group_by(bucket) %>%
filter(bal > 0 & bucket != "NA") %>%
summarise(bal = sum(bal)) %>%
hchart("column", hcaes(
x = bucket,
y = bal,
color = bucket),
name = "Balance") %>%
hc_yAxis(
gridLineWidth = 0,
labels = list(style = list(color = "#000000")),
title = list(text = "", style = list(color = "#000000"))
) %>%
hc_xAxis(
labels = list(style = list(color = "#000000")),
title = list(text= ""),
lineWidth = 0,
tickWidth = 0
) %>%
hc_title(text = "Aging Buckets: AR Balance") %>%
hc_add_theme(hc_theme_smpl()) %>%
hc_tooltip(
useHTML = TRUE,
crosshairs = TRUE,
borderWidth = 1,
sort = TRUE
) %>%
hc_plotOptions(
column = list(
color = "red",
dataLabels = list(
enabled = TRUE
)
)
)
# AR Balance Percentage in Aging Buckets
hcbucket3 <- buckamtcount %>%
hchart("column", hcaes(
x = bucket,
y = round(pctbal, 2),
color = bucket),
name = "Percentage of Balance") %>%
hc_yAxis(
gridLineWidth = 0,
labels = list(style = list(color = "#000000")),
title = list(text = "", style = list(color = "#000000"))
) %>%
hc_xAxis(
labels = list(style = list(color = "#000000")),
title = list(text= ""),
lineWidth = 0,
tickWidth = 0
) %>%
hc_title(text = "Aging Buckets: AR Balance Percentage") %>%
hc_add_theme(hc_theme_smpl()) %>%
hc_tooltip(
useHTML = TRUE,
crosshairs = TRUE,
borderWidth = 1,
sort = TRUE
) %>%
hc_plotOptions(
column = list(
color = "red",
dataLabels = list(
enabled = TRUE
)
)
)
crosstalk::bscols(
widths = NA,
hcbucket,
hcbucket4,
hcbucket2,
hcbucket3
)
# Primary Aging by Count
primary_pct <- claimsdata %>%
filter(bal > 0 & bucket != "NA" & class != "Closed" & class == "Primary") %>%
count(class, bucket) %>%
mutate(percent = n/sum(n) * 100)
primary_two <- primary_pct %>%
group_by(class) %>%
filter(bucket == "0-30" | bucket == "31-60") %>%
summarise(npct = sum(percent))
options(scipen = 999, digits = 2)
hcbucket5 <- primary_pct %>%
hchart("column", hcaes(x = bucket, y = round(percent, 2)), name = "Percentage") %>%
hc_yAxis(
gridLineWidth = 0,
labels = list(style = list(color = "#000000")),
title = list(text = "", style = list(color = "#000000"))
) %>%
hc_xAxis(
labels = list(style = list(color = "#000000")),
title = list(text= ""),
lineWidth = 0,
tickWidth = 0
) %>%
hc_title(text = "Primary Aging: 97.61%") %>%
hc_add_theme(hc_theme_smpl()) %>%
hc_tooltip(
useHTML = TRUE,
valueDecimals = 2,
valueSuffix = '%',
crosshairs = TRUE,
borderWidth = 1,
sort = TRUE
) %>%
hc_plotOptions(
column = list(
color = "#34608D",
dataLabels = list(
valueDecimals = 2,
valueSuffix = '%',
enabled = TRUE
)
)
) %>%
hc_size(height = 300, width = 350)
# Secondary Aging by Count
second_pct <- claimsdata %>%
filter(bal > 0 & bucket != "NA" & class != "Closed" & class == "Secondary") %>%
count(class, bucket) %>%
mutate(percent = n/sum(n) * 100)
options(scipen = 999, digits = 2)
hcbucket6 <- second_pct %>%
hchart("column", hcaes(x = bucket, y = round(percent, 2)), name = "Percentage") %>%
hc_yAxis(
gridLineWidth = 0,
labels = list(style = list(color = "#000000")),
title = list(text = "", style = list(color = "#000000"))
) %>%
hc_xAxis(
labels = list(style = list(color = "#000000")),
title = list(text= ""),
lineWidth = 0,
tickWidth = 0
) %>%
hc_title(text = "Secondary Aging: 69.23%") %>%
hc_add_theme(hc_theme_smpl()) %>%
hc_tooltip(
useHTML = TRUE,
# pointFormat = tltip,
valueDecimals = 2,
valueSuffix = '%',
crosshairs = TRUE,
borderWidth = 1,
sort = TRUE
) %>%
hc_plotOptions(
column = list(
color = "#440154",
dataLabels = list(
valueDecimals = 2,
valueSuffix = '%',
enabled = TRUE
)
)
) %>%
hc_size(height = 300, width = 350)
# Patient Aging by Count
patient_pct <- claimsdata %>%
filter(bal > 0 & bucket != "NA" & class != "Closed" & class == "Patient") %>%
count(class, bucket) %>%
mutate(percent = n/sum(n) * 100)
options(scipen = 999, digits = 2)
hcbucket7 <- patient_pct %>%
hchart("column", hcaes(x = bucket, y = round(percent, 2)), name = "Percentage") %>%
hc_yAxis(
gridLineWidth = 0,
labels = list(style = list(color = "#000000")),
title = list(text = "", style = list(color = "#000000"))
) %>%
hc_xAxis(
labels = list(style = list(color = "#000000")),
title = list(text= ""),
lineWidth = 0,
tickWidth = 0
) %>%
hc_title(text = "Patient Aging: 33.33%") %>%
hc_add_theme(hc_theme_smpl()) %>%
hc_tooltip(
useHTML = TRUE,
# pointFormat = tltip,
valueDecimals = 2,
valueSuffix = '%',
crosshairs = TRUE,
borderWidth = 1,
sort = TRUE
) %>%
hc_plotOptions(
column = list(
color = "#4BBD85",
dataLabels = list(
valueDecimals = 2,
valueSuffix = '%',
enabled = TRUE
)
)
) %>%
hc_size(height = 300, width = 350)
crosstalk::bscols(
widths = NA,
hcbucket5,
hcbucket6,
hcbucket7
)
# Aging Buckets: Count, Amount & Percent
buckamtcount %>%
hchart("scatter",
hcaes(
x = enc,
y = bal,
size = pctbal,
group = bucket
)
)%>%
hc_yAxis(
title = list(text = "Balance"),
labels = list(format = "${value}")
) %>%
hc_xAxis(
title = list(text = "Number of Claims"),
labels = list(format = "{value}"),
crosshair = list(
snap = TRUE,
width = 2,
zIndex = 0)
) %>%
hc_title(text = "Aging Buckets: Count, Amount & Percent") %>%
hc_subtitle(text = "Bubble size represents
the percentage of the total AR balance.") %>%
hc_credits(
enabled = TRUE,
text = "HIMNI Billing Inc.",
href = "https://www.himni.info/"
) %>%
hc_add_theme(hc_theme_smpl()) %>%
hc_caption(
text = "<b>Bubble represents the different aging buckets while the
size of the bubble represents the percentage of the total AR
balance.</b><br><em> Excludes claims with a balance less than
or equal to zero, as well as any unsubmitted claims missing
information. <br></em>"
) %>%
hc_tooltip(
useHTML = TRUE,
# pointFormat = tltip,
valueDecimals = 2,
valuePrefix = '$',
crosshairs = TRUE,
borderWidth = 1,
sort = TRUE
) %>%
hc_legend(
align = "right",
verticalAlign = "top",
layout = "vertical",
x = 0,
y = 100
)
# Claims by Age and Outstanding Balance
options(scipen = 999, digits = 2)
hcscatter_1 <- claimsdata %>%
filter(bal > 0 & payer != "NA") %>%
hchart("scatter",
hcaes(
x = ageopen,
y = bal,
size = bal,
group = payer
)
) %>%
hc_yAxis(
title = list(text = "Balance"),
labels = list(format = "${value}")
) %>%
hc_xAxis(
title = list(text = "Age of Claim"),
labels = list(format = "{value} days"),
crosshair = list(
snap = TRUE,
width = 2,
zIndex = 0)
) %>%
hc_title(text = "Open Claims by Age and Outstanding Balance") %>%
hc_subtitle(text = "Bubble size represents
balance amount remaining on the claim.") %>%
hc_credits(
enabled = TRUE,
text = "HIMNI Billing Inc.",
href = "https://www.himni.info/"
) %>%
hc_add_theme(hc_theme_smpl()) %>%
hc_caption(
text = "<b>Bubble size represents the balance amount remaining on the claim.</b><br><em>
Excludes claims with a balance less than or equal to zero,
as well as any unsubmitted claims missing information. <br></em>"
) %>%
hc_tooltip(
useHTML = TRUE,
# pointFormat = tltip,
valueDecimals = 2,
valuePrefix = '$',
crosshairs = TRUE,
borderWidth = 1,
sort = TRUE
) %>%
hc_legend(
align = "right",
verticalAlign = "top",
layout = "vertical",
x = 0,
y = 100
) %>%
hc_size(height = 600, width = 600)
# Unsigned Encounters by Age and Balance
options(scipen = 999, digits = 2)
hcscatter_2 <- claimsdata %>%
filter(bal > 0 & prov != "NA") %>%
hchart("scatter",
hcaes(
x = ageopen,
y = bal,
size = bal,
group = prov,
colorKey = "bal"
)
) %>%
hc_colorAxis(
min = 0,
max = max(claimsdata$bal),
minColor = "blue",
maxColor = "red",
layout = "horizontal"
) %>%
hc_yAxis(
title = list(text = "Balance"),
labels = list(format = "${value}")
) %>%
hc_xAxis(
title = list(text = "Age of Claim"),
labels = list(format = "{value} days"),
crosshair = list(
snap = TRUE,
width = 2,
zIndex = 0)
) %>%
hc_title(text = "Unsigned Encounters by Age and Outstanding Balance") %>%
hc_subtitle(text = "Bubble size represents gross charge amount on the claim.") %>%
hc_credits(
enabled = TRUE,
text = "HIMNI Billing Inc.",
href = "https://www.himni.info/"
) %>%
hc_add_theme(hc_theme_smpl()) %>%
hc_caption(
text = "<b>Bubble size represents
the balance amount remaining on the claim.</b><br><em>
Excludes claims with a balance less than or equal to zero,
as well as any unsubmitted claims missing information. <br></em>"
) %>%
hc_tooltip(
useHTML = TRUE,
# pointFormat = tltip,
valueDecimals = 2,
valuePrefix = '$',
crosshairs = TRUE,
borderWidth = 1,
sort = TRUE
) %>%
hc_legend(
align = "right",
verticalAlign = "bottom",
layout = "horizontal",
x = 0,
y = 10
) %>%
hc_size(height = 600, width = 600)
crosstalk::bscols(
widths = NA,
hcscatter_1,
hcscatter_2
)
# Primary Procedure by Provider
hcpx1 <- claimsdata %>%
group_by(prov, px1) %>%
filter(px1 != "none" & prov != "NA") %>%
summarise(n = n()) %>%
filter(n > 1) %>%
arrange(desc(n)) %>%
hchart("column", hcaes(x = px1, y = n, group = prov)) %>%
hc_yAxis(
gridLineWidth = 0,
labels = list(style = list(color = "#000000")),
title = list(text = "", style = list(color = "#000000"))
) %>%
hc_xAxis(
labels = list(style = list(color = "#000000")),
title = list(text= ""),
lineWidth = 0,
tickWidth = 0
) %>%
hc_title(text = "Primary Procedure by Provider (Oct '21 - Feb '22)") %>%
hc_subtitle(text = "For Procedure Codes Occurring More Than Once") %>%
hc_tooltip(
useHTML = TRUE,
crosshairs = TRUE,
borderWidth = 1,
sort = TRUE
) %>%
hc_add_theme(hc_theme_smpl()) %>%
hc_plotOptions(
column = list(
dataLabels = list(
enabled = TRUE
)
)
) %>%
hc_size(height = 550, width = 750)
# Primary Procedure by Payer
hcpx2 <- claimsdata %>%
group_by(payer, px1) %>%
filter(px1 != "none" & payer != "NA") %>%
summarise(n = n()) %>%
filter(n > 1) %>%
arrange(desc(n)) %>%
hchart("column", hcaes(x = px1, y = n, group = payer)) %>%
hc_yAxis(
gridLineWidth = 0,
labels = list(style = list(color = "#000000")),
title = list(text = "", style = list(color = "#000000"))
) %>%
hc_xAxis(
labels = list(style = list(color = "#000000")),
title = list(text= ""),
lineWidth = 0,
tickWidth = 0
) %>%
hc_title(text = "Primary Procedure by Payer (Oct '21 - Feb '22)") %>%
hc_subtitle(text = "For Procedure Codes Occurring More Than Once") %>%
hc_tooltip(
useHTML = TRUE,
crosshairs = TRUE,
borderWidth = 1,
sort = TRUE
) %>%
hc_add_theme(hc_theme_smpl()) %>%
hc_plotOptions(
column = list(
dataLabels = list(
enabled = TRUE
)
)
) %>%
hc_size(height = 550, width = 750)
crosstalk::bscols(
widths = NA,
hcpx1,
hcpx2
)
Below is a quick overview of the variables in the data set, their names and descriptions:
| Variable | Name | Description |
|---|---|---|
enc |
Encounter ID | An alpha-numeric string that is a concatenation of three separate unique identifiers. The first two characters (“AB”) represent the billing client ID, the numeric string following it functions as the unique patient ID, and the final letter-number combination is the visit number for that patient. |
type |
Claim Type | There are two types of claims: A HCFA (CMS-1500) and a UB-04 (CMS-1450). |
pt |
Patient Name | The patient’s last name. |
dos |
Date of Service (DOS) | The specific date that the encounter (time at which a patient has been given medical treatment) occurred on. |
today |
Today’s Date | The date that you’re viewing this information in the EHR and/or creating the report on. |
pos |
Place of Service (POS) | Two-digit code indicating the setting that a healthcare service has been provided. |
prov |
Provider Name | The provider’s last name. |
loc |
Location | The location of the healthcare facility that the encounter took place at. |
payer |
Payer | The party responsible for reimbursement of the claim, whether a third-party payer or the patient themselves. |
class |
Insurance Class | The three classes of health insurance: Primary, Secondary, and Tertiary. |
stat |
Claim Status | Current status of the claim. |
billed |
Billed Date | Date the claim was last billed. |
doc |
Date Completed | Date the claim’s balance reached $0.00 or less. |
bal |
Claim Balance | Amount currently owed on the claim. |
rej |
Reason for Rejection | CARC/RARC code indicating the reason for rejection of the claim. |
den |
Reason for Denial | CARC/RARC code indicating the reason for denial of the claim. |
att |
Attention Message | Post-adjudication warning message. |
px1 |
Primary Procedure Code | HCPCS Level II code for the primary procedure code |
px1m |
Primary Procedure Code Modifier | HCPCS Level II modifier for the primary procedure code |
px2 |
Secondary Procedure Code | HCPCS Level II code for the secondary procedure code |
px2m |
Secondary Procedure Code Modifier | HCPCS Level II modifier for the secondary procedure code |
px3 |
Tertiary Procedure Code | HCPCS Level II code for the tertiary procedure code |
px3m |
Tertiary Procedure Code Modifier | HCPCS Level II modifier for the tertiary procedure code |
dx1 |
Primary Diagnosis Code | ICD-10-CM code for the primary/principal diagnosis |
dx2 |
Secondary Diagnosis Code | ICD-10-CM code for the secondary diagnosis |
dx3 |
Tertiary Diagnosis Code | ICD-10-CM code for the tertiary diagnosis |
[1] "2022-04-06 06:33:27 EDT"
If you see mistakes or want to suggest changes, please create an issue on the source repository.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. Source code is available at https://github.com/andrewallenbruce, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Bruce (2022, March 18). Andrew Bruce: Data Wrangling: EHR Healthcare Claims Data. Retrieved from https://andrewbruce.netlify.app/posts/data-wrangling-healthcare-claims/
BibTeX citation
@misc{bruce2022data,
author = {Bruce, Andrew},
title = {Andrew Bruce: Data Wrangling: EHR Healthcare Claims Data},
url = {https://andrewbruce.netlify.app/posts/data-wrangling-healthcare-claims/},
year = {2022}
}