Predicting adverse outcomes following catheter ablation treatment for atrial fibrillation

Objective: To develop prognostic survival models for predicting adverse outcomes after catheter ablation treatment for non-valvular atrial fibrillation (AF). Methods: We used a linked dataset including hospital administrative data, prescription medicine claims, emergency department presentations, and death registrations of patients in New South Wales, Australia. The cohort included patients who received catheter ablation for AF. Traditional and deep survival models were trained to predict major bleeding events and a composite of heart failure, stroke, cardiac arrest, and death. Results: Out of a total of 3285 patients in the cohort, 177 (5.3%) experienced the composite outcome (heart failure, stroke, cardiac arrest, death) and 167 (5.1%) experienced major bleeding events after catheter ablation treatment. Models predicting the composite outcome had high risk discrimination accuracy, with the best model having a concordance index>0.79 at the evaluated time horizons. Models for predicting major bleeding events had poor risk discrimination performance, with all models having a concordance index<0.66. The most impactful features for the models predicting higher risk were comorbidities indicative of poor health, older age, and therapies commonly used in sicker patients to treat heart failure and AF. Conclusions: Diagnosis and medication history did not contain sufficient information for precise risk prediction of experiencing major bleeding events. The models for predicting the composite outcome have the potential to enable clinicians to identify and manage high-risk patients following catheter ablation proactively. Future research is needed to validate the usefulness of these models in clinical practice.


Introduction
Atrial fibrillation (AF) is the most common cardiac arrhythmia.AF is a significant driver of cardiovascular hospitalization, and it is associated with adverse outcomes including stroke, heart failure, and mortality (1,2).Studies have shown worldwide increasing trends in AF (3,4), which makes the treatment of AF a significant target for managing patients' cardiac health and reducing cardiac-related deaths.Amongst the treatment options recommended for managing AF (5,6), catheter ablation has been increasingly used and continues to be the focus of comparative effectiveness research.Recent clinical trials have demonstrated that compared to medical therapy, catheter ablation can lead to better outcomes in selected populations (7)(8)(9).
While machine learning (ML) has been increasingly applied in studies of diagnosis, risk prediction, and management of AF, only a few studies have focused on using ML for prognostic modeling of adverse outcomes following catheter ablation treatment (10,11).AF recurrence following ablation has been the adverse outcome most commonly analyzed in prior studies, with ML models trained with laboratory and clinical parameters (12) and magnetic resonance imaging scans (13,14).One study used ML to predict non-pulmonary vein trigger origins to reduce AF recurrence after ablation (15).Other studies used ML to predict 30-day and 90-day readmission following catheter ablation (16,17).One study predicted major cardiac outcomes (major bleeding, stroke/systemic embolism, and death) with ML, but the cohort was newly diagnosed AF patients treated with vitamin K antagonists and not catheter ablation (18).Beyond ML, statistical risk scores are the models most commonly used for predicting risk of adverse outcomes for AF patients (e.g. the CHADS2 score for stroke prediction) and recurrence of AF after catheter ablation (19).The major limitation of these ML studies for predicting AF ablation outcomes is that the models predict binary outcomes (probability of experiencing the event), whereas a time-to-event modeling approach would enable the calculation of survival curves and risk estimates at various points of interest.A survival time-to-event approach also allows less biased modeling in the presence of loss to follow-up and competing events.
Given the 20-40% recurrence of AF in patients who undergo ablation and the association of AF with adverse outcomes (5,20), prognostic models could enable clinicians to develop treatment and management plans for patients, and act as a communication tool with patients who can be part of the decision-making process by knowing their risk profile.To date, there are no ML survival prognostic models for predicting the risk of experiencing major cardiac adverse outcomes after catheter ablation.This study aims to develop prognostic ML models that predict risk of experiencing adverse outcomes, including major bleeding, heart failure, stroke, cardiac arrest, and death following catheter ablation.

2
Materials and methods

Study cohort
The cohort included patients with a hospital episode with a primary diagnosis of AF or atrial flutter and a catheter ablation procedure in the same episode between January 2009 and December 2018 (codes in Appendix 2).The first such episode was identified as the index ablation.For each patient, we used three years of medical history before the index ablation and a maximum follow-up period of three years.Exclusion criteria included patients under 18 years of age at the time of index ablation, patients with a diagnosis of valvular heart disease or mitral valve stenosis, or a replacement of mitral valve procedure before or during the index ablation episode.If a hospital stay was made up of various episodes of care, we aggregated all diagnosis and procedure codes from these episodes into a single hospital stay (see Appendix 1).

Outcomes
We built prognostic models to predict common adverse outcomes for AF, with the ICD-10 codes used for each outcome based on prior observational studies (Appendix 2).We used a composite outcome of death, heart failure, stroke, and cardiac arrest, due to the low number of adverse events in the cohort.Major bleeding events (including gastrointestinal bleeding, intracranial bleeding, and other bleeding) were modeled separately and not included in the composite outcome since they tend to be a result of treatment and have different causes.Outcomes were obtained from the primary diagnosis from inpatient (APDC) and emergency records (EDDC), the death registry, and the cause of death records.The codes for each outcome are provided in Appendix 2.

Prognostic Machine Learning Algorithms
Survival models were built for the composite outcome and for major bleeding events.The composite outcome was modeled using single risk models, with censoring occurring at the earlier of three years of follow-up or December 2018.Major bleeding events were modeled using competing risks, with the composite outcome as the competing risk.The following survival algorithms were selected based on commonly used algorithms for prognosis and more recent deep survival models (further details in Appendix 3): Cox proportional hazards model with elastic net penalty (Cox); random survival forest (RSF); gradient boosted survival (GBT); DeepSurv, a Cox proportional hazards neural network (22); and deep survival machines (DSM), a neural network that estimates the survival function as a mixture of individual parametric survival distributions (Weibull or Lognormal) (23).For major bleeding events, DSM supports competing risks, but cause-specific modeling was used for all other algorithms.A cause-specific DSM was included for performance comparison.For the major bleeding cause-specific models, censoring was the earlier of experiencing the composite outcome, three years of follow-up, or December 2018.The CHA2DS2-VASc and HAS-BLED scores were included as baselines for the composite outcome and major bleeding events, with the risk estimates obtained from a cohort study of 182,678 AF patients (24).

Feature Extraction
All available diagnosis codes from inpatient episodes and all pharmaceutical data during the threeyear lookback period and during the index ablation episode were used as features for the ML models.The features were coded in binary, with 1 indicating having experienced an event.Rare ICD-10-AM codes and medications associated with less than ten people were excluded.Diagnosis codes from emergency department visits were not used as features because the diagnosis coding in emergency administrative data contained multiple coding schemes.Sex and age at the time of the index ablation episode were included as features.

Evaluation
The prognostic models were evaluated using single-time-point metrics at the event horizon quantiles of 25%, 50%, and 75% for risk discrimination (using the time-dependent concordance index) and calibration (using the expected ℓ calibration error [ECE]).ECE measures the absolute difference between the observed and expected event rates, conditional on the estimated risk scores (25), with the expected event rates estimated using a Kaplan-Meier (KM) curve.These single-time-point metrics were adjusted with an inverse propensity of censoring estimate.Calibration over the entire distribution of the prognostic models was assessed with distributional calibration (D-calibration), with the D-calibration test indicating that the survival curve generated by the model for patients is not calibrated if P-value < 0.05 (26).KM curves for the composite outcome and major bleeding events were generated to estimate the cohort average cumulative risk of experiencing these events.
For each model, we performed 10-fold cross-validation, reporting the mean and standard deviation across the folds.The best model for each fold was selected by performing hyperparameter tuning on each fold (see Appendix 3).Post-hoc explanations of the best performing models were obtained with SHapley Additive exPlanations (SHAP), using the Kernel SHAP method for the deep survival models and the Tree SHAP method for the survival tree ensemble models (27,28), and visualized with summary plots.In this paper, SHAP explains the risk prediction of a patient by calculating the contribution of each feature to the risk prediction.SHAP summary plots show the magnitude and direction of feature attributions to the risk prediction, with the SHAP values from each validation fold aggregated to generate the summary plots.

Patient characteristics
A total of 3285 patients had a hospitalization with AF as the primary diagnosis and an ablation procedure performed during that episode of care.The median age was 63 years (interquartile range, 56 -70 years) and 1110 (33.8%) were female (Table 1).All patients were followed for a median of three years, patients who experienced the composite outcome-heart failure, stroke, cardiac arrest, and death-had a median follow-up of 10 months, and patients who experienced major bleeding events had a median follow-up of 12 months.The incidence of adverse events was low for all outcomes: composite (177, 5.4%), major bleeding (167, 5.1%), heart failure (103, 3.1%), stroke (18, 0.5%), cardiac arrest (0.2%), and death (75, 2.2%).The highest rate of events for both the composite outcome and major bleeding events occurred within the first five months of follow-up (KM plots in Appendix 4).At baseline, patients who experienced the composite outcome had higher CHA2DS2-VASc scores and higher rates of medical history of heart failure, hypertension, diabetes, and vascular diseases (Table 1).Patients who experienced major bleeding events had higher CHA2DS2-VASc scores and higher rates of medical history of hypertension (Table 1).Amongst those who died during follow-up, half had a history of hypertension.

Prognostic Models for Composite Outcome and Major Bleeding Events
Table 2 shows the prognostic model performance for predicting the composite outcome.GBT achieved the highest concordance index on the 25% and 75% quantile of event time horizons.The Cox and DeepSurv models were competitive at the 50% quantile of event time horizons.DeepSurv had the lowest expected calibration error on all three quantiles of event time horizons.For the composite outcome, the concordance index of the Cox model and the survival ensembles (GBT and RF) was similar, with GBT having slightly better performance on the 50% and 75% quantiles of event times.The calibration scores suggest that the estimated risks of the adverse outcomes are consistent with the cumulative probabilities of the KM curve.All models except for DSM were Dcalibrated.The low number of events made visual assessment of calibration unreliable, but calibration plots of GBT at the event quantiles show that the model increasingly underestimates the risk of experiencing the event at the longer event horizons (Appendix 5).The CHA2DS2-VASc score had better risk discrimination performance at the 25% quantile of event times, but the ML survival models provided better discrimination performance on longer horizons (the 50% and 75% quantiles of event times).The ML models also had better calibration performance across all three quantiles of event times.
Table 3 shows the prognostic model performance for predicting major bleeding events.The risk discrimination performance of all models was low, with only DSM achieving a concordance index greater than 0.60 on all three quantiles of event times.DSM was the only model that was not Dcalibrated.The cause-specific DSM model performance was on par with the other cause-specific models, which was lower than the competing risk DSM.The HAS-BLED also had poor discrimination (concordance index < 0.60) and calibration performance.

Explainability
For the composite outcome (Figure 1), older age, and a medical history of heart failure (congestive heart failure and left ventricular failure), fluid overload, disorders of magnesium metabolism, atherosclerotic heart disease, pneumonia, long term (current) use of anticoagulants, primary hypertension, presence of a cardiac device, and chronic kidney disease contributed to the model predicting higher risk.A medication history of furosemide (high-ceiling diuretic), spironolactone (aldosterone antagonist), cefalexin (antibacterial), amiodarone (antiarrhythmic), metformin (blood glucose lowering), warfarin (antithrombotic), bisoprolol (beta blocking agent), and allopurinol (antigout preparation) contributed to the model predicting higher risk.A medication history of flecainide (antiarrhythmic) contributed to lower risk.
Figure 2 shows the post-hoc explanation of the DSM model for predicting major bleeding events.Age had the greatest average impact on the model predictions, with older age contributing to a higher risk prediction.A medical history of tobacco use (past and present) and primary hypertension contributed to higher risk.A diagnosis code for injury, poisoning or other adverse effect with place of occurrence recorded as a health service area also contributed to the model predicting higher risk.A medication history of selected antibacterials and antibiotics (amoxicillin and beta-lactamase inhibitor, cefalexin, chloramphenicol, amoxicillin), warfarin (antithrombotic), atorvastatin (lipid modifying agent), pantoprazole (proton pump inhibitor), allopurinol (antigout preparation), amiodarone (antiarrhythmic), and prednisone (corticosteroid) contributed to higher risk.A medical history of paroxysmal AF, chronic hypertension, unspecified AF and atrial flutter, and a medication history of flecainide (antiarrhythmic) and class III antiarrhythmics contributed to predicting lower risk.

Discussion
This study evaluated survival prognostic ML models for predicting adverse outcomes in patients following catheter ablation treatment.The main findings of this study are: (1) patients in our cohort who underwent catheter ablation experienced adverse outcomes at low rates (5.7% for compositedeath, heart failure, stroke, and cardiac arrest-and 5.3% for major bleeding events); (2) we succeeded in predicting experiencing the composite outcome with high precision, but prediction of experiencing major bleeding events was poor; (3) the most important features for predicting higher risk of experiencing the composite and the major bleeding outcomes were indicative of sicker and older patients.
Our study is the first to use ML-based survival models to predict adverse outcomes on patients following catheter ablation.Prior studies have focused on using risk scores that use a small number of variables (six or less) for predicting AF recurrence in patients following catheter ablation (19).
Other studies have explored ML for predicting adverse events after catheter ablation, including AF recurrence (12-14) and 30-day and 90-day readmission (16,17).Our study improves on prior literature by predicting the risk of major cardiovascular events post catheter ablation, which are the primary endpoints that have been assessed when determining the effectiveness of catheter ablation over pharmacotherapy (8,9,29).

Meaning of study
The strong performance results for predicting the composite outcome highlight its potential clinical utility.The competitive performance in complex models (gradient boosted survival) and interpretable models (elastic net Cox) demonstrates that various deployment options are available to clinical teams, depending on the prioritization of transparency, maximal performance gains, or calibration.The poor performance of models predicting major bleeding events suggests: (1) that the composite outcome is easier to model than major bleeding events; (2) that diagnosis history and prescription claims prior the catheter ablation treatment do not contain sufficient information to predict future major bleeding events; (3) future research should explore additional clinical and biological variables for predicting major bleeding events.
Older age was the factor that had the highest impact on predicting risk of experiencing the composite outcome and major bleeding events.A medication history of flecainide contributed to predicting a lower risk of experiencing the composite outcome and major bleeding events, which may be due to flecainide being selectively used in lower risk patients (as recommended in clinical guidelines for the management of AF (5)).Known cardiovascular risk factors and markers of underlying disease contributed to predicting higher risk, including hypertension, heart failure, heart disease, presence of a cardiac device, and tobacco use.This is consistent with factors identified in a prior study that used deep learning the risk of experiencing CVD events (30).Comorbidities indicative of poor health (pneumonia, chronic kidney disease), therapies used in sicker patients to treat heart failure and AF (antiarrhythmics, antithrombotics, diuretics, and beta blocking agents) also contributed to higher risk predictions of adverse outcomes.A health service area coded as the place of occurrence for an external cause code is indicative of healthcare-related adverse events and complications, including medication errors, which may have occurred prior to or during a hospital stay.The best model for the composite outcome had more comorbidities in the top 20 features, whereas the best model for major bleeding had more antibacterials and antibiotics, suggesting complications and sicker patients.Paroxysmal AF is associated with better prognosis than permanent AF, reflected by a diagnosis of paroxysmal AF contributing to lower risk prediction.
The models developed in this study provide an example of prognostic modeling that could be incorporated in clinical practice following the administration of catheter ablation.Identifying patients at high risk may lead to proactive management by clinicians.

Machine Learning Implications
For the composite outcome, the deep survival models did not outperform the traditional survival algorithms.This may be due to the censoring rate (> 94%), the low number of events, and the tabular dataset (where deep learning is not guaranteed to outperform other models).For major bleeding events, the difference in performance between the cause-specific DSM and the competing risk DSM suggests that the representation learning layer for the competing risks captures additional information that leads to significant performance gains (23).
Caution must be taken with the interpretation of the summary plots and the impact of the features on risk predictions.The SHAP explanations capture correlations between the input features and the risk prediction, but they are not causal.While explanation of the models focused on the top 20 features, other features not listed also impacted the risk predictions.Explanations with SHAP are particularly useful when paired with models that use non-linearities and capture complex interactions in the input features, such as the survival ensembles and deep survival models.

Limitations
The high degree of censoring and low number of events in our cohort may limit how well our results generalize.This study did not include data regarding family history, outpatient medical history, social and lifestyle factors (except in cases where tobacco use was recorded in a hospital admission).Our analysis relied on coded hospital diagnoses, recorded only for conditions that significantly affect patient management during an episode of care.Potential sources of bias in our analysis include using diagnoses as a proxy for incident or prevalent disease and the quality of hospital diagnosis coding (accuracy 51-98% across 32 studies) (31).Given the high rate of history of heart failure amongst patients who experienced the composite outcome (34.5%) and history of heart failure being an important feature for predicting the composite outcome, a separate study on a heart failure cohort would be desirable, but it is left for future work due to the small size of our current dataset.

Conclusion
ML survival models using diagnostic and medication history predicted the risk of experiencing a composite adverse outcome-death, heart failure, stroke, and cardiac arrest-with high precision following catheter ablation for AF.Using the same data and ML survival models could not predict major bleeding events.The models presented may be useful in proactively managing high-risk patients following catheter ablation.Future research is needed to validate the usefulness of these models in clinical practice.Admitted patient records relate to episodes of care, defined as the period of admitted patient care between a formal or statistical admission and a formal or statistical separation, characterised by only one care type.A "statistical" admission or separation records the commencement or cessation of an episode of care, which may occur when there is a change in the type of care provided to a patient (e.g. from acute to subacute care).A "formal" admission or separation records the commencement or cessation of a patient's treatment and/or care and/or accommodation.
A hospital stay, defined as the period of admitted patient care between a formal admission and a formal discharge, may comprise one or more episodes of care.For episodes of care ending in type change (e.g. from acute to sub-acute care) or transfer, we constructed contiguous periods of stay using admission dates and admission status from the first episode and separation dates and separation type from the last episode of care.Therefore, hospital stays as defined in this report include continuous periods of inpatient care that were provided by one or more hospitals.For HAS-BLED and CHA2DS2-VASc ICD-10 codes and ATC codes see (3).

Appendix 3
All models were implemented in Python.Hyperparameter optimization was implemented with Optuna with 20 trials using three-fold cross-validation on the training set.Cox proportional hazards model with elastic net penalty (1), random survival forest (3), and gradient boosted survival (4) used the implementation from the scikit-survival library (2).The neural network survival models DeepSurv (5) and deep survival machines (DSM) (7) used the implementation from the auton-survival library (6).The neural networks were trained for 10 epochs, with the number of epochs chosen by experimentation on a subset of the data.

Figure 1 .
Figure 1.SHAP summary plot for the best performing model (gradient boosting survival) for

Figure 2 . 2 . 1 1 The
Figure 2. SHAP summary plot for the best performing model (deep survival machine) for

Figure 4 -
Figure 4 -Kaplan Meier survival curve showing the cumulative probability of experiencing major bleeding events after catheter ablation treatment for atrial fibrillation.The highest rate of events for the composite outcome occurred within the first five months of follow-up.

Figure 6 -
Figure 6 -Survival calibration curve at the 50% quantile of the event times for the gradient boosted survival model predicting the composite outcome of death, heart failure, cardiac arrest, and stroke.

Figure 7 -
Figure 7 -Survival calibration curve at the 75% quantile of the event times for the gradient boosted survival model predicting the composite outcome of death, heart failure, cardiac arrest, and stroke.
(21) study used linked administrative inpatient and emergency department data, pharmaceutical claims data, and mortality data for patients from New South Wales (NSW), Australia.Data on AF hospitalizations were extracted from the NSW Admitted Patient Data Collection (APDC), which includes records of all inpatient separations (discharges, transfers, and deaths) from public and private hospitals in NSW (see Appendix 1 for details).Data on emergency visits were extracted from the NSW Emergency Department Data Collection (EDDC), which includes records of presentations to public hospital emergency departments (Eds) in NSW.Pharmaceutical dispensing data were obtained from Pharmaceutical Benefits Scheme (PBS) records, which contain claims for subsidized prescription medicines in Australia dispensed in community pharmacies and private hospitals.Prescription medicines dispensed in NSW public hospitals are not included in PBS data(21).Death records were extracted from the NSW Registry of Births, Deaths, and Marriages registration file (RBDM) and the Cause of Death Unit Record File (COD).Data linkage was performed by the NSW Ministry of Health Centre for Health Record Linkage and the Australian Institute of Health and Welfare (AIHW) Data Integration Services Centre.This study was granted ethical approval by the University of New South Wales, NSW Population and Health Services Research (HREC/18/CIPHS/56), Aboriginal Health and Medical Research Council of NSW (1503/19), and Australian Institute of Health and Welfare (EO2018/2/431) research ethics committees.AIHW privacy regulations require cell sizes of five or less to be suppressed due to risk of re-identification of individuals in the study.

Table 1 -
Summary of baseline characteristics of patients at time of index catheter ablation for atrial fibrillation (AF) admission to a public or 1 private hospital in New South Wales, Australia.2 , 4) † Death, heart failure, cardiac arrest, and stroke.**Vascular diseases: coronary artery disease, peripheral artery disease, atherosclerosis, and myocardial infarction.Values are given as median and IQR, or total number (n) and %. †P-value < 0.05.Pvalues for continuous variables are the results of the Mann-Whitney U test on patients who experienced the event and those who did not experience the event.For categorical variables, Fisher's Exact test was used to compare frequencies.

Table 2 -
Concordance index and expected calibration error results for the machine learning survival models that predict the composite outcome of death, heart failure, cardiac arrest, and stroke.Standard deviation from the 10-fold cross-validation shown in parenthesis.

Table 3 -
Concordance index and expected calibration error results for the machine learning survival models that predict major bleeding events.Standard deviation from the 10-fold cross-validation shown in parenthesis.

Table 4 -
Hyperparameter values for Cox proportional hazards with elastic net penalty.

Table 5 -
Hyperparameter values for random survival forest.

Table 8 -
Hyperparameter values for deep survival machines.