Machine learning and metabolic modelling assisted implementation of a novel process analytical technology in cell and gene therapy manufacturing

Cell cultureA serum-free, suspension-adapted, HEK293T-based LVV PCL, used to provide recombinant, VSV-G pseudotyped, HIV-1-GFP based mostly LVVs, was generated beforehand from an adherent HEK293T cell line, supplied by Oxford Biomedica; all LVV encoding genetic parts had been stably built-in into the host-cell genome. The PCL was cultured in FreeStyle 293 Expression Medium (Gibco, Thermo Fisher Scientific, Waltham, MA, USA) and cultivated in 5 L bioreactors (Applikon Biotechnology, The Netherlands). An Acridine Orange (whole cell stain) and 4′,6-diamidino-2-phenylindole (DAPI, lifeless cell stain) staining process was used to find out viable cell focus (VCC) and cell viability by way of evaluation on an automatic NucleoCounter NC200 (ChemoMetec, Denmark). Online bioreactor process parameters had been recorded each minute utilizing a supervisory management and information acquisition (SCADA) system. Offline samples had been taken throughout every day monitoring actions and previous to key process steps to measure VCC, cell viability, cell aggregation, purposeful vector titre and the focus of the p24 capsid protein. Additionally, concentrations of glucose, lactate, ammonium, sodium, potassium, glutamine, and glutamate denoted as [glucose], [lactate], [NH4+], [Na+], [K+], [Gln], [Glu], respectively, had been measured utilizing a Nova FLEX 2 (Nova Biomedical, Waltham, MA, USA).Lentiviral vector productionRecombinant, VSV-G pseudotyped, HIV-1-GFP based mostly LVV manufacturing was induced in the LVV PCL by way of the addition of doxycycline to cell cultures in 5 L bioreactor. A secondary induction step, involving the addition of sodium butyrate to cell populations, was performed to reinforce transcription of the genetic parts encoding the LVV roughly 20 h following the addition of doxycycline. The HIV-1-based LVV containing supernatant was harvested roughly 44 h following the preliminary induction of LVV expression, clarified by a 0.45 µm filter and saved at − 80 °C for subsequent evaluation.PAT measurements and protocolsPTI, derived from the RI of the cell tradition, was calculated utilizing in situ optical probes mounted inside 5 L bioreactors. A knowledge filtering algorithm was utilized to the PTI measurements to routinely derive the MRI of the tradition in actual time. Measurements had been taken each 5 s. The Ranger RI system was used to carry out ‘parameter probing’, which was executed in two modes of operation, using each pre-programmed and autonomous logic methodologies. The pre-programmed mode of operation adopted a pre-defined script detailing a collection of pH parameter adjustments scheduled in opposition to mounted process instances, communicated to the bioreactor’s bio-controller unit (Applikon Biotechnology, The Netherlands) by way of an open platform communication (OPC) connection. The autonomous mode of operation concerned the dynamic adjustment of parameter set factors, in real-time, decided by the consequence noticed following a earlier parameter adjustment in the experiment, to hone into parameter settings which maximised the best degree of mobile exercise.Transduction and titration of LVVsFunctional LVV titreFunctional vector titres had been decided by way of the transduction of adherent HEK293T cells, seeded in 12 nicely plates 24 h previous to transduction. Test samples had been serially diluted in DMEM (Sigma-Aldrich, Merck, Burlington, MA, USA) supplemented with 8 µg/mL polybrene (Sigma-Aldrich, Merck, Burlington, MA, USA). Transduced cells had been harvested 72 h following their publicity to the diluted viral vector preparations and analysed for GFP expression by way of circulation cytometry utilizing an Attune NxT acoustic focusing circulation cytometer (Thermo Fisher Scientific, Waltham, MA, USA) utilizing an optimised GFP detection protocol. Functional LVV particles have been expressed because the quantity of transducing items/mL, calculated utilizing Eq. (1).$$Titre left( {TU/{textual content{mL}}} proper) = frac{{left[ {frac{% Parent }{{100}} times Conc. of cells prior to transduction times Dilution factor} right]}}{{Volume of vector added left( {mL} proper)}}$$
Total LVV ParticlesQuantification of the p24 capsid protein in LVV preparations was carried out by an enzyme-linked immunosorbent assay (ELISA). p24 concentrations had been measured with an Alliance HIV-1 p24 ELISA package (Perkin Elmer, Waltham, MA, USA) and transformed into whole quantity of LVV particles utilizing Eq. (2).$$Total ,particles left( {frac{TP}{{{textual content{mL}}}}} proper) = left( {frac{{p24 ,concentrationleft( {frac{{textual content{g}}}{{{textual content{mL}}}}} proper)}}{Molecular, weight ,of, p24}} proper) instances left( {frac{{Avogadro^{^{prime}} s, quantity}}{Number ,of ,p24 ,molecules ,per ,HIV ,particle}} proper)$$
Molecular weight of p24: 2.4 × 104.Number of p24 molecules per HIV particle: 2000.Metabolic modellingModelThe HEK293 metabolic mannequin developed by Martínez-Monge20, derived from the latest human genome-scale metabolic reconstruction Recon 2.216, was used. This mannequin represented 354 reactions between 335 metabolites and was prolonged in this work to incorporate two extra change reactions for sodium and potassium ions, respectively. This modification had an influence on the flux by 23 reactions (Supplementary Table 1).AnalysesFlux steadiness evaluation (FBA)21,22 was carried out utilizing the COBRA toolbox (v. 3.0) operating beneath MATLAB (v. 9.8) , utilizing the Gurobi optimiser (v. 9.1.1) because the mathematical solver. The goal operate to be maximised was the expansion price flux. The higher and decrease bounds of change reactions, for which experimental information had been accessible, had been constrained accordingly. The constraints for the higher bounds of a number of change reactions that enable the uptake of vitamins, for which empirical information was not accessible, had been decided utilizing the flux variability evaluation (FVA). Amino acid change fluxes had been amongst these vital constraints. FVA23 was carried out utilizing the identical platform. The flux ranges of the amino acid change reactions supplied by the FVA resolution had been evaluated based mostly on typical amino acid concentrations in cell tradition media; a few hundred milligrams per litre as much as barely greater than 1 g/L had been accepted to be the everyday vary. When the FVA resolution vary exceeded this, the amino acid change fluxes had been constrained with tighter bounds and the FVA was repeated. FVAs led to the vary of (pm) 1 mmol/(g CDW × h) for all amino acids aside from valine and isoleucine, which had been constrained to the vary of between (pm) 0.5 mmol/(g CDW × h).Identifying possible ranges for change fluxes of medium parts of undefined compositionGenome scale metabolic fashions are extremely underdetermined methods. Consequently, the predictions we make on the metabolic state of the cells typically characterize a vary of accessible options that fulfill sure standards equally nicely whereas reaching an optimum objective, which, in this case, can be optimum mobile development price. From a sensible level of view, the extra of these standards we are able to determine and constrain to fulfill particular necessities, the narrower this vary of potential options will get, thus permitting us to enhance the precision of our understanding of the metabolic behaviour of the cell at any given metabolic snapshot. However, in a bioprocess setting, info, and information on many such standards important for metabolic modelling could possibly be unavailable. There are totally different causes for the absence of such info and one very outstanding trigger is the proprietary nature of the process settings and operations. Knowledge of the particular chemical composition of cell tradition medium harbours helpful perception into how mobile metabolism behaves, and thus is vital in constraining metabolic fashions to realize good predictive functionality. However, the particular compositions of such mediums are sometimes proprietary and not disclosed, complicating this process. Due to this, the constraints of vital mannequin inputs, for which no experimental information was accessible, had been estimated utilizing FVA in this evaluation. The amino acid fluxes had been discovered to be significantly vital, and, due to this fact, the main target was set on these.Since the one exterior enter issue that was assorted throughout the experiments was the cell tradition pH profile, a technique was wanted to include extracellular pH info into the metabolic mannequin (Fig. 5b). This technique allowed us to implement, for the primary time, a technique which accounted for the metabolic variations instigated by the extracellular pH at which the cell tradition was maintained. Furthermore, amino acid necessities as decided by the respective ranges of the uptake fluxes had been strongly related to the incorporation of the sodium and potassium change reactions in the mannequin.Introducing pH as a constraint into the metabolic mannequinThe following methodology was proposed to deal with the problem: each cell exports a mounted quantity of protons, which in flip contributes to the measurement of extracellular pH. This fixed and steady proton export results in a rise in the extracellular proton focus which, if not counteracted, would trigger a pH lower in the extracellular surroundings. Since the quasi-steady state working assumption should maintain, there must be an alternate path to counteract (or neutralise) this proton accumulation and keep pH on the mounted management setpoint. A bicarbonate buffering system was used in the bioprocesses, due to this fact HCO3− was chosen because the neutralising compound, which wanted to be exported, ensuing in a regular equilibrium, i.e., a outlined pH worth. At a fixed proton efflux, the bicarbonate efflux was particularly tuned to realize the goal pH worth. Namely, adjustments had been launched to the constraints of two change reactions, these for H+ and HCO3−. First, the true pH worth of the cell tradition on the time level of curiosity was transformed into the corresponding molar proton focus (A). The proton export flux, which was set as a mounted constraint for the sake of comparability, was transformed into a molar focus (B) as nicely, and the distinction between concentrations (A) and (B) was calculated. As (B) was persistently bigger than (A), it was assumed that a certain quantity of natural base have to be exported to neutralise the extreme extracellular protons and keep the pH. Based on the data that a bicarbonate buffering system was used in the bioprocesses, HCO3− was chosen because the neutralising compound. Under the belief that 1 mol of base neutralises 1 mol of acid, the molar focus of HCO3− equal to absolutely the distinction between (A) and (B) was computed and subsequently transformed into a flux worth as described elsewhere. This flux was set as mounted constraint in FBA to simulate HCO3− export.Prior to its utilisation in this work, a proof-of-concept research was performed the place two FBAs had been run to characterize the identical experimental setup, which solely differed in the HCO3− efflux constraint, representing totally different pH values of 6 and 7. The distribution of the fluxes for 26 reactions assorted between the 2 FBA options (Supplementary Table 2). In all flux comparisons, absolute variations in fluxes between any two situations had been computed. As preliminary evaluation, flux distributions obtained from FBA for various situations had been evaluated in a 2 by 2 factorial design: in the presence or absence of pH taken under consideration by the mannequin, and Na+/Ok+ change reactions accounted for in the mannequin or not (Supplementary Tables 3, 5, 7, 9).Visualisation of metabolic pathway fluxesVisualisation maps had been generated in Escher-FBA24. The flux values of all reactions in the FBA resolution had been set as constraints for a new MATLAB mannequin, which was then transformed into a JSON mannequin file utilizing COBRApy (v. 0.21.0). The JSON mannequin was uploaded into Escher-FBA and laid onto the RECON1 Escher map accessed from the BiGG Models database.Bioprocess information evaluationData and platforms. The information evaluation was performed utilizing Python (v. 3.8.7), R (v. 4.0.3) and RStudio (v. 1.1.463). Other supporting Python packages employed had been “gplearn” (v. 0.4.1), “Lazy Predict” (v. 0.2.9), “NumPy” (v. 1.19.1), “pandas” (v. 1.2.3), “seaborn” (v. 0.11.1), “SHAP” (v. 0.38.1), “scikit-learn” (v. 0.23.1), “cowplot” (v. 1.1.0), “desiderata” (v. 0.38.0), “dplyr” (v. 1.0.2), “ggplot” (v. 3.3.2), “reshape2” (v. 1.4.4), “stringr” (v. 1.4.0).A complete of 22 upstream experiments had been carried out (bioprocess period vary of 66.15–69.25 h); 18 of the experiments had been performed as half of the process improvement part and 4 experiments executed as half of the following validation part. Essentially, the process improvement part concerned the completion of experiments to develop a process capable of improve the metabolic exercise of the cell tradition in comparison with the ‘unoptimised’ process. The validation part concerned a direct comparability of the process that had been optimised for prime ranges of mobile exercise to the ‘unoptimised’ process. For each bioprocess run, three subsets of information had been acquired: bioreactor SCADA information, Ranger RI system information and offline sampling information. The bioreactor SCADA information comprised as soon as per minute on-line data of elapsed process time (EPT), cumulative alkali addition, cumulative air addition, cumulative CO2 addition, cumulative N2 addition, cumulative O2 addition, measured dissolved oxygen, measured pH, measured stirrer pace, measured temperature, air circulation price, CO2 circulation price, N2 circulation price, O2 circulation price, dissolved oxygen set level, pH set level, stirrer pace set level and temperature set level. Three parameters had been extracted from the Ranger RI system and information had been recorded each 5 seconds: EPT, PTI and MRI. The offline sampling information collected for the event experiments had been data of VCC, cell viability and cell aggregation at 5 time factors of every bioprocess. For the 4 validation experiments data had been collected at seven time factors. In addition to VCC, cell viability and cell aggregation, there have been data of vector purposeful titre, focus of the p24 capsid protein and concentrations of glucose, lactate, ammonium, sodium, potassium, glutamine, and glutamate denoted as [glucose], [lactate], [NH4+], [Na+], [K+], [Gln], [Glu], respectively.In the forecasting and regression experiments both MRI or PTI is used because the output variable; the cumulative alkali addition, cumulative air addition, cumulative CO2 addition, cumulative N2 addition, cumulative O2 addition, measured dissolved oxygen, measured pH, measured stirrer pace, measured temperature, air circulation price, CO2 circulation price, N2 circulation price, and O2 circulation price had been used because the enter variables as obligatory.Online bioprocess information pre-processingHandling of lacking informationFor the SCADA datafiles, the low price of missingness allowed flexibility in the strategy of selection and easy shifting common was chosen as imputation technique. The fractions of lacking information in the MRI information had been bigger than that noticed in the SCADA datafiles on common because the Ranger RI system is programmed to not document MRI information generated instantly following a modification to a parameter setting. The proportion of lacking information, ensuing from pH set level adjustments, ranged from 0% as much as 53% throughout the assorted experiments, associated to the quantity of pH adjustments that had been made through the processes. MRI datafiles with in depth missingness (outlined as greater than 15% of information lacking) had been excluded from additional evaluation in order to keep away from imputation-related info achieve or loss in information evaluation downstream of this activity.Univariate time collection imputation was performed utilizing “imputeTS” (v. 3.1)25. For lacking information at solely a single time level, imputation was accomplished by easy shifting common. The sections of lacking information in the MRI measurements had been imputed by Kalman smoothing on a state area illustration utilizing “auto.arima” in R “forecast” (v. 8.3). Details on this are proven in Supplementary Figs. 1 and 2.Data condensationIn order for the timeframes of the time collection information to match in size and in time step measurement, the Ranger RI system dataset, the place information had been recorded each 5 seconds, was condensed to a information frequency of as soon as each minute by computing the imply of the respective variable for each minute and was then merged with the remaining dataset. This merged dataset of every bioprocess run will henceforth be known as on-line information.Feature selectionDifferent options might or might not deliver in totally different info relating to a bioprocess run relying of the experiment that was performed, and the identification of informative process parameters, i.e. options and operating a sanity examine on them previous to their utilisation in information analytics and modelling can change into a non-trivial task26. In order to deal with this, a preliminary characteristic choice step was included in the process. sp_dO, sp_Stirrer and sp_Temperature had been all eliminated previous to the evaluation since these parameters had been stored fixed throughout all runs, therefore wouldn’t carry any info. Dynamic time warping27,28 (“dtw” v. 1.22–1.23 in R) and Pearson correlation had been used to detect multicollinearity among the many on-line parameters (options) and examine the similarity to/correlation with the goal variables MRI/PTI. Pearson correlation was used to check and distinction the outcomes obtained by dynamic time warping in order to reveal the function of time in the identification of multicollinearity in the dataset. Each bioprocess was investigated individually for correlations inside itself, and correlations had been additionally sought throughout all accessible datasets. Correlation evaluation led to the removing of EPT and sp_pH from the datasets as a consequence of excessive correlation with different options stored in the dataset. m_Stirrer and m_Temperature had been excluded, too, since these options by no means confirmed vital temporal variation.Online information processingMultiple forecastingThe synthetic recurrent neural community structure lengthy short-term reminiscence (LSTM) was utilized and applied utilizing “Keras” (v. 2.4.3) in Python. Datasets of every process run had been transformed into a set of time home windows; this was accomplished individually for the options and the goal variable to allow totally different window sizes for coaching and forecasting. Forecasting accuracy was evaluated utilizing the basis imply sq. error (RMSE) between forecasted and true information factors. Two coaching approaches had been used; one offering a coaching set of subsequent home windows that differ by just one time step (“sliding window”). one other one obtained by shuffling the home windows of a coaching set. The randomised mannequin coaching considerably decreased the loss throughout coaching and led to a rather more regular and quicker converging learning curve (Supplementary Fig. 3). The improved coaching efficiency in flip additionally improved forecasting precision, given by RMSE, by as much as one order of magnitude because the total process was used for learning whereas the sliding window strategy solely allowed the mannequin to study from an early given fraction of the process, having the data in the remaining time factors to be by no means realized by the mannequin, thus ensuing in poor forecasting efficiency. The LSTM fashions had been cross-validated utilizing “scikit-learn” (v. 0.23.1) bundle using stacked cross-validation.RegressionRegression analyses had been performed with the “XGBoost” bundle (v. 1.1.1) in Python29. The coefficient of dedication (R2) was used as an estimate of how nicely the mannequin predictions approximate the underlying coaching information. RMSE was utilized to judge mannequin predictions for the goal variable based mostly on characteristic values it had not seen for coaching. The determination for the choice of the regression technique to make use of was facilitated by Lazy Predict for Python. XGBoost Regressor was chosen amongst regression strategies with comparable efficiency based mostly on the metrics R2, RMSE and operating time generated by the “LazyRegressor” operate on a few randomly chosen process runs (Supplementary Fig. 4).Univariate forecastingUnivariate forecasting was carried out utilizing the “pmdarima”30 (v.1.8.0) and “sktime” (v. 0.5.3) packages31 in Python. For every forecasting activity one hour of previous information was used for coaching to make a forecast one hour into the long run. Forecasting accuracy was evaluated utilizing the RMSE between forecasted and true information factors.Statistical evaluationStatistical evaluation was carried out utilizing Prism v9.1 (GraphPad Software, San Diego, CA, USA). Graphs depict the imply ± one customary deviation. Differences between two means had been evaluated utilizing unpaired two-tailed Student’s t-tests. Results had been thought of statistically vital when p values had been lower than 0.05 (*), 0.01 (**), 0.001 (***), 0.0001 (****).

Recommended For You