In this vignette we show different functions to get characteristics (e.g. age, sex, prior history…) of subjects in OMOP CDM tables and cohort tables. This can be useful when doing explanatory analysis as well as calling these functions for more complex analyses.
The PatientProfiles package is designed to work with data in the OMOP CDM format, so our first step is to create a reference to the data using the DBI and CDMConnector packages. The connection to a Postgres database would look like:
library(DBI)
library(CDMConnector)
# The input arguments provided are for illustrative purposes only and do not provide access to any database.
con <- DBI::dbConnect(RPostgres::Postgres(),
dbname = "omop_cdm",
host = "10.80.192.00",
user = "user_name",
password = "user_pasword"
)
cdm <- CDMConnector::cdm_from_con(con,
cdm_schema = "main",
write_schema = "main",
cohort_tables = "cohort_example"
)
For this example we will work with simulated data generated by the
mockPatientProfiles()
function provided in this package,
which mimics a database formatted in OMOP:
library(PatientProfiles)
library(duckdb)
library(dplyr)
cdm <- mockPatientProfiles(
patient_size = 1000,
drug_exposure_size = 1000
)
addAge()
: adds a new column to the input table
containing each patient’s age at a certain date, specified in indexDate.
Function allows to set month and/or day of birth to patients with
missings or it can be imposed to all subjects. Further, the function can
classify patient’s into different age groups based on the argument
ageGroup.
Suppose we want to calculate the age at condition start date for records in the condition_occurrence table. Also, we wan to group patients in 20-year age band and if they are 60 years old or more.
cdm$condition_occurrence %>%
glimpse()
## Rows: ??
## Columns: 5
## Database: DuckDB 0.7.1 [martics@Windows 10 x64:R 4.2.1/:memory:]
## $ condition_occurrence_id <int> 314, 970, 349, 234, 193, 919, 113, 277, 491, 9…
## $ person_id <int> 314, 970, 349, 234, 193, 919, 113, 277, 491, 9…
## $ condition_concept_id <int> 4, 5, 1, 2, 3, 4, 3, 2, 1, 2, 1, 5, 3, 4, 4, 1…
## $ condition_start_date <date> 2005-08-25, 2007-02-15, 2009-02-15, 2008-05-0…
## $ condition_end_date <date> 2006-06-14, 2007-06-05, 2011-06-24, 2009-01-0…
cdm$condition_occurrence_mod <- cdm$condition_occurrence %>%
addAge(
ageDefaultMonth = 1,
ageDefaultDay = 6,
indexDate = "condition_start_date",
ageGroup = list(
"ageBand_20" =
list(
"0 to 19" = c(0, 19),
"20 to 39" = c(20, 39),
"40 to 59" = c(40, 59),
"60 to 79" = c(60, 79),
"80 to 99" = c(80, 99),
">= 100" = c(100, 150)
),
"ageThreshold_60" =
list(
"less60" = c(0, 59),
"more60" = c(60, 150)
)
)
)
cdm$condition_occurrence_mod %>%
glimpse()
## Rows: ??
## Columns: 8
## Database: DuckDB 0.7.1 [martics@Windows 10 x64:R 4.2.1/:memory:]
## $ condition_occurrence_id <int> 314, 234, 277, 26, 276, 656, 275, 249, 150, 9,…
## $ person_id <int> 314, 234, 277, 26, 276, 656, 275, 249, 150, 9,…
## $ condition_concept_id <int> 4, 2, 2, 1, 5, 3, 4, 2, 4, 5, 2, 2, 2, 3, 1, 1…
## $ condition_start_date <date> 2005-08-25, 2008-05-03, 2007-04-26, 2009-01-0…
## $ condition_end_date <date> 2006-06-14, 2009-01-05, 2007-05-07, 2010-02-0…
## $ age <dbl> 58, 57, 22, 9, 76, 89, 70, 49, 55, 17, 39, 26,…
## $ ageBand_20 <chr> "40 to 59", "40 to 59", "20 to 39", "0 to 19",…
## $ ageThreshold_60 <chr> "less60", "less60", "less60", "less60", "more6…
addSex()
: appends a column to the input table indicating
the sex for each patient as “Female” or “Male”.
First, we can add the sex of the patients to the table. This information can be used to count the occurrences of the condition_concept_id = 5 in males aged 60 years or older. We can also stratify the number of events by age, grouping patients into 20-year age bands.
cdm$condition_occurrence_mod <- cdm$condition_occurrence_mod %>%
addSex()
cdm$condition_occurrence_mod %>%
glimpse()
## Rows: ??
## Columns: 9
## Database: DuckDB 0.7.1 [martics@Windows 10 x64:R 4.2.1/:memory:]
## $ condition_occurrence_id <int> 314, 234, 277, 26, 276, 656, 275, 249, 150, 9,…
## $ person_id <int> 314, 234, 277, 26, 276, 656, 275, 249, 150, 9,…
## $ condition_concept_id <int> 4, 2, 2, 1, 5, 3, 4, 2, 4, 5, 2, 2, 2, 3, 1, 1…
## $ condition_start_date <date> 2005-08-25, 2008-05-03, 2007-04-26, 2009-01-0…
## $ condition_end_date <date> 2006-06-14, 2009-01-05, 2007-05-07, 2010-02-0…
## $ age <dbl> 58, 57, 22, 9, 76, 89, 70, 49, 55, 17, 39, 26,…
## $ ageBand_20 <chr> "40 to 59", "40 to 59", "20 to 39", "0 to 19",…
## $ ageThreshold_60 <chr> "less60", "less60", "less60", "less60", "more6…
## $ sex <chr> "Male", "Male", "Male", "Female", "Male", "Mal…
numConditions <- cdm$condition_occurrence_mod %>%
filter(
sex == "Male"
) %>%
filter(
ageThreshold_60 == "more60"
) %>%
filter(
condition_concept_id == 5
) %>%
group_by(
ageBand_20
) %>%
summarise(
n = count(condition_occurrence_id)
)
numConditions
## # Source: SQL [2 x 2]
## # Database: DuckDB 0.7.1 [martics@Windows 10 x64:R 4.2.1/:memory:]
## ageBand_20 n
## <chr> <dbl>
## 1 60 to 79 29
## 2 80 to 99 15
PatientProfiles functions can be used on both OMOP CDM tables and cohort tables. In this example we will see some of the package functionalities applied to a cohort table:
addInObservation()
: adds a new binary column to the
input table, indicating whether the subjects are being observed at a
specific time.
addPriorObservation()
: appends a column to the input
table containing the number of days each patient has been in observation
up to a specified date.
addFutureObservation()
: adds a column with the days of
future observation for an individual at a certain date
We can use the first function to obtain patients which are in observation at “cohort_start_date” and subsequently get their prior and future observation days. Notice that we are not using the argument “indexDate”, since it is already defaulted to “cohort_start_date”.
cdm$cohort1 %>%
glimpse()
## Rows: ??
## Columns: 4
## Database: DuckDB 0.7.1 [martics@Windows 10 x64:R 4.2.1/:memory:]
## $ cohort_definition_id <dbl> 1, 1, 1, 2
## $ subject_id <dbl> 1, 1, 2, 3
## $ cohort_start_date <date> 2020-01-01, 2020-06-01, 2020-01-02, 2020-01-01
## $ cohort_end_date <date> 2020-04-01, 2020-08-01, 2020-02-02, 2020-03-01
cdm$cohort1 <- cdm$cohort1 %>%
addInObservation() %>%
filter(
in_observation == 1
) %>%
addPriorObservation() %>%
addFutureObservation()
cdm$cohort1 %>%
glimpse()
## Rows: ??
## Columns: 7
## Database: DuckDB 0.7.1 [martics@Windows 10 x64:R 4.2.1/:memory:]
## $ cohort_definition_id <dbl> 2, 1, 1, 1
## $ subject_id <dbl> 3, 1, 2, 1
## $ cohort_start_date <date> 2020-01-01, 2020-06-01, 2020-01-02, 2020-01-01
## $ cohort_end_date <date> 2020-03-01, 2020-08-01, 2020-02-02, 2020-04-01
## $ in_observation <dbl> 1, 1, 1, 1
## $ prior_observation <dbl> 4635, 5350, 4168, 5198
## $ future_observation <dbl> 36925, 18232, 17348, 18384
If the database allows for multiple observation periods, it’s
important to note that the results of the previous functions will be
based on the period where “indexDate” falls within. If a patient is not
under observation at the specified date,
addPriorObservation()
and
addFutureObservation()
functions will return NA.
addDemographics()
: can be used to add all the features
presented in this vignette (except for addInObservation()
)
at once, in both tables and cohort tables.
If we want to get the age, sex and prior history of individuals at
the day they enter a cohort, we can use the function
addDemographics()
as follows
cdm$cohort2 %>%
glimpse()
## Rows: ??
## Columns: 4
## Database: DuckDB 0.7.1 [martics@Windows 10 x64:R 4.2.1/:memory:]
## $ cohort_definition_id <dbl> 1, 1, 2, 3, 1
## $ subject_id <dbl> 1, 3, 1, 2, 1
## $ cohort_start_date <date> 2019-12-30, 2020-01-01, 2020-05-25, 2020-01-01, 2…
## $ cohort_end_date <date> 2019-12-30, 2020-01-01, 2020-05-25, 2020-01-01, 2…
cdm$cohort2 <- cdm$cohort2 %>%
addDemographics(
age = TRUE,
ageName = "age",
ageGroup = NULL,
sex = TRUE,
sexName = "sex",
priorObservation = TRUE,
priorObservationName = "prior_observation",
futureObservation = FALSE,
)
cdm$cohort2 %>%
glimpse()
## Rows: ??
## Columns: 7
## Database: DuckDB 0.7.1 [martics@Windows 10 x64:R 4.2.1/:memory:]
## $ cohort_definition_id <dbl> 1, 3, 2, 1, 1
## $ subject_id <dbl> 1, 2, 1, 1, 3
## $ cohort_start_date <date> 2020-05-25, 2020-01-01, 2020-05-25, 2019-12-30, 2…
## $ cohort_end_date <date> 2020-05-25, 2020-01-01, 2020-05-25, 2019-12-30, 2…
## $ age <dbl> 41, 50, 41, 40, 42
## $ sex <chr> "Male", "Male", "Male", "Male", "Female"
## $ prior_observation <dbl> 5343, 4167, 5343, 5196, 4635