19th International Summer School on Biocomplexity from Gene to System

Travel Info:

Sponsors:

Past Editions:

Student Presentations

Monica Isgut	Highly elevated polygenic risk scores are better predictors of myocardial infarction risk early in life than later Background Several polygenic risk scores (PRS) have been developed for cardiovascular risk prediction, but the additive value of including PRS together with conventional risk factors for risk prediction is questionable. This study assesses the clinical utility of including four PRS generated from 194, 46K, 1.5M, and 6M SNPs, along with conventional risk factors, to predict risk of ischemic heart disease (IHD), myocardial infarction (MI), and first MI event on or before age 50 (early MI). Methods A cross-validated logistic regression (LR) algorithm was trained either on ~ 440K European ancestry individuals from the UK Biobank (UKB), or the full UKB population, including as features different combinations of conventional established-at-birth risk factors (ancestry, sex) and risk factors that are non-fixed over an individual’s lifespan (age, BMI, hypertension, hyperlipidemia, diabetes, smoking, family history), with and without also including PRS. The algorithm was trained separately with IHD, MI, and early MI as prediction labels. Results When LR was trained using risk factors established-at-birth, adding the four PRS significantly improved the area under the curve (AUC) for IHD (0.62 to 0.67) and MI (0.67 to 0.73), as well as for early MI (0.70 to 0.79). When LR was trained using all risk factors, adding the four PRS only resulted in a significantly higher disease prevalence in the 98th and 99th percentiles of both the IHD and MI scores. Conclusions PRS improve cardiovascular risk stratification early in life when knowledge of later-life risk factors is unavailable. However, by middle age, when many risk factors are known, the improvement attributed to PRS is marginal for the general population.

Xiaoxi Wei	Advancing Transfer Learning for Subject Independence & Heterogenous EEG Data Sets Transfer learning and meta-learning offer some of the most promising avenues to unlock the scalability of healthcare and consumer technologies driven by biosignal data. This is because current methods cannot generalise well across human subjects' data and handle learning from different heterogeneously collected data sets, thus limiting the scale of training data. On the other hand, developments in transfer learning would benefit significantly from a real-world benchmark with immediate practical application. Therefore, we pick electroencephalography (EEG) as an exemplar for what makes biosignal machine learning hard. br /> In NeurIPS 2021 Competition Track the BEETL competition, we design two transfer learning challenges around diagnostics and Brain-Computer-Interfacing (BCI), that have to be solved in the face of low signal-to-noise ratios, major variability among subjects, differences in the data recording sessions and techniques, and even between the specific BCI tasks recorded in the dataset. Task 1 is centred on the field of medical diagnostics, addressing automatic sleep stage annotation across subjects. Task 2 is centred on Brain-Computer Interfacing (BCI), addressing motor imagery decoding across both subjects and data sets. br /> The BEETL competition with its over 30 competing teams and its 3 winning entries brought attention to the potential of deep transfer learning and combinations of set theory and conventional machine learning techniques to overcome the challenges. In this presentation, we will show the key observations and lessons learnt from the competition. br />

Dimitrios Zaridis	Fine-tuned feature selection to improve prostate segmentation via a fully connected meta-learner architecture Precise delineation of the prostate gland on MRI is the cornerstone for accurate prostate cancer diagnosis, detection, characterization and treatment. The present work proposes a meta-learner deep learning (DL) network that combines the complexity of 3 well-established DL models and fine tune them in order to improve the segmentation of the prostate compared to the base learners. The backbone of the meta-learner consist the original U-net, Dense2U-net and Bridged U-net models. A model was added on top of the three base networks that has four convolutions with different receptor fields. The meta-learner outperformed the base-learners in 4 out of 5 performance metrics. The median Dice Score for the meta-learner was 89% while for the second best model it was 83%. Except for Hausdorff distance, where the meta-learner and Dense2U-net performed equally well, the improvement achieved in terms of average sensitivity, balanced accuracy, dice score and rand error, compared to the best performing base-learner, was 6%, 3%, 5% and 4%, respectively.

Melina Tourni	Automated Electromechanical Wave Imaging at Reduced Frame Rates during Sinus Rhythm Using Machine Learning Background, Motivation and Objective: Spatial resolution is prioritized over temporal in conventional echocardiography for optimal qualitative diagnosis. Yet, this restricts the use of time-shifted based techniques at higher frame rates (FR), like Electromechanical Wave Imaging (EWI). EWI is an ultrasound-based modality that non-invasively maps the cardiac activation sequence. At lower FRs, EWI strain curves have a different profile which constrains manual input an interpretation. In this study, we investigate how machine learning can assist in accurate activation time estimation at low FR data. Methods: A diverging wave sequence (Verasonics Vantage 256, fc = 2.5 MHz, PRF = 2000 Hz) is used to image N = 6 patients in sinus rhythm. Standard manual EWI processing was performed on the four apical views (Melki et al. STM 2020) and the corresponding isochrones were used as baseline. The RF data were then decimated to produce EWI strain curves at 500 Hz, 250 Hz and 125 Hz. Resulting features were automatically collected for 2500 isochrone pixels per view and fed to a Random Forest Classifier (RFC) (Melki et al. TMI 2021) for automated isochrone generation. We selected 20-40% of the pixels with the highest prediction probability for each automated isochrone. The absolute activation time difference (ATD) between the manual baseline and the automated isochrones, was computed in each view and averaged across all patients. Results/Discussion: RFC successfully identified the normal early activated basal septum in all lower FR cases. For case (M, 17yo) shown in Fig. 1, the isochrone at 500 Hz (b) was in best qualitative agreement with the baseline isochrone (a) (78% ATD ≤ 20 ms). The 250 Hz (c) and 125 Hz (d) maps also show great agreement (76%, 56% ATD ≤ 20 ms), differing mainly in the LV anterior and anterolateral walls, respectively. These findings indicate that an efficient ML approach can be used to generate accurate EWI activation maps with a more flexible FR range that includes clinical systems at low FRs, where manual processing would otherwise be impractical and/or inaccurate.

Haoyang Mi	According to US Centers of Disease Control and Prevention, cancer is the second leading cause of death. Advancement in cancer treatments, such as immune-checkpoint blockade therapy, has been successful in improving the survival of cancer patients. However, the response rates are generally low therefore hampers the expansion of its benefits. It is evident that the intra-tumoral heterogeneity is the hallmark of therapy resistance. Such heterogeneity, is modulated by the complex and dynamic ecosystem of varies cell populations, molecules, metabolites, etc., namely tumor microenvironment (TME). To study the underlying cancer biology, my research applies the advanced imaging technologies, such as imaging mass cytometry, and computational approaches, including spatial statistics and artificial intelligence, to decode such complexity within the TME into components and analyze their spatial correlations, with the endpoint of predictive and prognostic biomarker discovery. I started from utilizing pathological images with 5 immunohistochemistry (IHC) labels: CD3, CD4, CD8, CD20, and FoxP3 from 5 treatment-naïve patients with triple-negative breast cancer (TNBC). Using spatial statistics such as hierarchical-based clustering algorithm, I found a clustering pattern of immune cells in the invasive front, which contribute to the characterization of immune landscape of TNBC. My next projects expanded the label panel from 5 to 12 and patient size from 5 to 66. Such increasement in data size enabled me to apply machine learning techniques to extract subtle spatial features for the establishment of predictive biomarkers for the stratification of patients with muscle-invasive bladder cancer with neoadjuvant chemotherapy. Next, we further advanced our imaging technique from single-plex to multiplex. Using image mass cytometry, we were able to stain 27 proteins simultaneously in one single slide. Such advancement enabled me to study multiple cell populations and their interplay at an unprecedented resolution. This method brings the new era of studying tumor microenvironment as an ecosystem. With this technology, we analyzed 12 patients treated with a combination therapy of nivolumab and cabozantinib against hepatocellular carcinoma and found potential biomarker to predict response. Similarly, we used multiplex IHC to study the tumor microenvironment for a cohort of 45 patients with pancreatic ductal adenocarcinoma and established a potential prognostic biomarker to predict survival. In conclusion, my research bridges the gap between cancer biology and artificial intelligence using computational approaches and advanced imaging techniques. As a biomedical engineer, my research during my Ph.D. has advanced this domain by establishing pipelines and framework that may have significant impact on the field of immuno-oncology and computational systems biology. The framework could also facilitate the translation of precision medicine and further benefit the design and optimization of cancer immunotherapy.

Andre Cakici	Decoding Hand Digit Dynamics with a Motor Unit Model The neural drive to the skeletal muscles, which are responsible from different degrees of freedom in the extremities, is provided either by the motor cortex in the brain, propagated down the spine and efferent neurons, or afferent feedback. The motor unit transforms this control signal to force generation. The study of motor unit discharge patterns reveals direct information about this signal. In order to investigate the biphasic activity of hand digits, a combination of a blind source separation algorithm (convolutional kernel compensation) and a factorization method (non-negative matrix factorization) was used. During the experiments, 320 channels of surface electromyography and video were recorded, which were synchronized in post-processing. Healthy human participants performed a number of tasks such as single digit flexion and extension, two- and three-digit pinching, and grasping. Non negative matrix factorization results showed that although there was a group of motor units that were not correlated with neither flexion nor extension, most variance in the population was explained by these two components. This work contributes to efforts towards building models for peripheral neural interfaces that perform precise execution of the intended movement of its user.

Elisavet Kapetanou	Utilizing bioengineering and synthetic biology tools for developing a POC biosensor for saliva cancer screening Synthetic biology, that has been advancing rapidly, is the systematic and rational engineering of biological systems, aiming to impart genetically engineered biological devices with novel functionalities or aspiring to instill life-like behaviors in artificial entities from the bottom up (1). The iGEM competition has acted as a catalyst for the infusion of synthetic biology with interdisciplinary fundamental and translational research, as several disciplines converge to develop synthetic biological systems aimed at tackling global issues through engineering-based approaches, towards a sustainable future. In the context of participating in the iGEM competition, our project proposed a versatile and modular point-of-care (POC) device for cancer screening in saliva. Utilizing cutting-edge synthetic biology tools and a two-step detection assay that can even be applied at home without the need for bulky equipment and trained medical professionals, we proposed a solution for a non-invasive, early colorectal cancer diagnosis platform. Following the example of the SHERLOCK and DETECTR protocols, the two-step assay involves isothermal amplification technique, RPA, followed by a highly specific target identification step utilizing the power of collateral cleavage properties of CRISPR-Cas12a enzyme. The assay is inherently modular and can be adapted to multiple biomarkers as well as different types of biomarkers. The readout method is highly flexible and can be designed to be fluorescent, electrochemical or colorimetric by using a lateral flow strip. Besides the scientific and research aspect of our project our team also focused heavily on entrepreneurship and dissemination aspects of the project, through organizing scientific conferences and workshops for the public, while participating in many entrepreneurship competitions, joining an incubator and receiving entrepreneurial training. We also conducted a youth exchange training program titled “Communicating science to non-scientists thourghg non-formal forms of communication” with 20,000 UER funding received from the European Union.

Artur Wysoczanski	Total lung volume inference on cardiac computed tomography using deep learning: the MESA Lung and SPIROMICS Studies Total lung volume (TLV) is a physiologic parameter strongly associated with pulmonary disease risk, disease severity, and respiratory mortality. However, compared to standard pulmonary functional measures acquired by spirometry, lung volume estimation by plethysmography, tracer gas dilution, or thoracic computed tomography (CT) is far more technically burdensome and less frequently performed. Cardiac CT, meanwhile, allows for robust quantitative study of lung morphology and is significantly more abundant, but includes only about one-half to two-thirds of the lung fields, making it an imperfect predictor of TLV. ¬¬¬To address this need, we design and validate a multi-view convolutional neural network (CNN) model to regress TLV directly from cardiac CT, using the imaging data available in Exam 5 of the Multi-Ethnic Study of Atherosclerosis (MESA). We have trained custom CNN feature extractors on 2-D projections of cardiac CT volumes into the axial, coronal and sagittal planes, taking as input the in-plane projection of the lung segmentation, and the maximum- and mean-intensity projection of the CT image. After independent optimization, the latent feature vectors obtained in each imaging plane are combined and passed to a fully-connected network for final TLV prediction. We demonstrate that our model predicts TLV from MESA Exam 5 cardiac imaging with an error (mean/SD) of -0.009 +/- 0.519 L compared to contemporaneous full-lung imaging, comparable to the reproducibility of TLV measured by repeated gold-standard full-lung CT. Our results suggest that a deep-learned TLV estimate is a suitable proxy for direct TLV measurement where the latter is unavailable in retrospective analysis, or where it is unfeasible in large prospective studies.

Eleftherios Trivizakis	Radiotranscriptomics of Non-Small Cell Lung Cancer Radiogenomic and radiotranscriptomic studies have the potential to pave the way for a holistic decision support system built on genomics, transcriptomics, radiomics, deep features and clinical parameters to assess treatment evaluation and care planning. The integration of invasive and routine imaging data into a common feature space has the potential to yield robust models for inferring the drivers of underlying biological mechanisms. Therefore, a multi-omics representation comprised of deep features and transcriptomics was evaluated to further explore the synergetic and complementary properties of these diverse multi-view data sources by utilizing data-driven machine learning models. The proposed deep radiotranscriptomic analysis is a feature-based fusion that significantly enhances sensitivity by up to 0.174 and AUC by up to 0.22, compared to the baseline single source models, across all experiments on the unseen testing set. Additionally, a radiomics-based fusion was also explored as an alternative methodology yielding radiomic signatures that are comparable to several previous publications in the field of radiogenomics. The clinical impact of such high-performing models can add prognostic value and lead to optimal treatment assessments by targeting specific oncogenes.

Antony Gitau	NeeFlex: A wearable device for measuring knee flexion angles in rehabilitating patients The knee bears 80% of the weight of a human body when standing still. This makes it prone to many healthcare conditions. Osteoarthritis is a common condition characterized by knee pain. According to Johns Hopkins Medicine, major causes of knee pain include; obesity, injury, aging, and repeated strain on the joint and ligament protecting the end of the tibia and femur. Treatment varies depending on the cause of the knee pain. However, surgery is a common treatment method. Total knee replacement (TKR) is an effective procedure done on osteoarthritis patients. However, TKR patients suffer from knee stiffness and reduced knee flexion of less than 90° therefore they need physiotherapy sessions. Restricted postoperative knee flexion is actually the most frequent complication after TKR procedures and it is also the main cause of patient dissatisfaction. Currently, surgeons use goniometers to obtain the postoperative knee ROM during routine ward inspections or clinic visits for outpatients. This system is slow, highly prone to errors, and considered invasive as it involves too much contact with the patient. In this project, we developed a knee wearable device using an Arduino Nano 33 BLE Sense, and a flex sensor. The device was able to instantly and simultaneously record, and transmit flexion angles to a computer and display the measured flexion angles on a dashboard. This project made use of Bluetooth Low Energy technology to connect the Arduino (hardware) and the computer (software) for data transfer. This device was designed for rehabilitation purposes. The goal is to allow physiotherapists and/or orthopedic surgeons to use it on their patients to track knee flexion angles digitally.

Konstantinos Polyzos	Weighted Ensembles for Active Learning with Adaptivity Labeled data can be expensive to acquire in several application domains, including medical imaging, robotics, and computer vision. To efficiently train machine learning models under such high labeling costs or restrictive privacy concerns, active learning (AL) aims at judiciously selecting the most informative data instances to label on-the-fly. This active sampling process can benefit from a statistical function model that is typically captured by a Gaussian process (GP). Belonging to Bayesian approaches, GPs are capable of learning nonlinear functions with uncertainty quantification in a sample-efficient manner, which is of particular interest in various applications including biomedical ones. While most GP-based AL approaches rely on a single kernel function, the present work advocates an ensemble of GP models with weights adapted to the labeled data collected incrementally. Building on this novel EGP model, a suite of acquisition functions emerges based on the uncertainty and disagreement rules. An adaptively weighted ensemble of EGP-based acquisition functions is also introduced to further robustify performance. Extensive tests on synthetic and real datasets, including biomedical ones, showcase the merits of the proposed EGP-based approaches with respect to the single GP-based AL alternatives.

Seongmi Song	Underwater balance perturbations modulate human frontoparietal alpha band spectral power Maintaining standing balance is a whole-body task that requires multisensory integration from spinal and supraspinal neural pathways. Bodyweight support is distributed among lower extremity joints and muscles in opposition to gravitational and inertial loads that create an unstable equilibrium when standing in an upright posture. To overcome gravitational loads, several methods have been used to provide external bodyweight support for simulating low gravity conditions (i.e., deep space) or reducing balance demands and lower limb loads during recovery and rehabilitation. Reduced gravity is often simulated in underwater environments by relying on buoyancy of the human body. Postural adjustments and neural control of standing balance underwater have been investigated using lower limb electromyography (EMG), but we understand relatively little about supraspinal neural control processes while maintaining standing balance in water. In this study, we aimed to identify alterations in healthy human electrical brain and muscle dynamics during standing balance, with and without bodyweight support provided by underwater buoyancy, and in response to balance perturbations applied using external underwater fluid forces. Our broader aim is to better understand the neural control of human gait and balance, therefore Because standing in water can reduce gravity load, we hypothesized that standing underwater would reduce cognitive demands. Greater sensory feedback from the surrounding fluid environment and increased balance demands during balance perturbations caused by external fluid forces could further enhance theta band spectral power increases from parietal, frontal, and central cortical regions. Compared to standing on land, theta band spectral power from prefrontal and parietal cortices increased when participants were submerged in water, while alpha band spectral power decreased from cingulate cortex. By increasing external fluid forces in the posterior direction while standing underwater, theta band spectral power from cingulate and parietal cortices increased, while alpha band spectral power from cingulate cortex decreased. Our results align with prior studies that showed reduced alpha band spectral power from central cortical regions during lower limb sensorimotor processing, including locomotion. Here, we identified electrocortical correlates of increased sensorimotor processing when standing in water compared to standing on land.

Menu

Travel Info:

Sponsors:

Past Editions:

Student Presentations