Abstract
Background. Solubility is a fundamental physicochemical property of active pharmaceutical ingredients. The optimization of a dissolution medium aims not only to increase solubility and other aspects are to be included such as environmental impact, toxicity degree, availability, and costs. Obtaining comprehensive solubility characteristics of chemical compounds is a non-trivial and demanding process. Therefore, support from theoretical approaches is of practical importance.
Objectives. This study aims to examine the accuracy of the reference solubility approach in the case of sulfanilamide dissolution in a variety of binary solvents. This pharmaceutically active substance has been extensively studied, and a substantial amount of solubility data is available. Unfortunately, using this set of data directly for theoretical modeling is impeded by noticeable inconsistencies in the published solubility data. Hence, this aspect is addressed by data curation using theoretical and experimental confirmations.
Materials and methods. In the experimental part of our study, the popular shake-flask method combined with ultraviolet (UV) spectrophotometric measurements was applied for solubility determination. The computational phase utilized the conductor-like screening model for real solvents (COSMO-RS) approach.
Results. The analysis of the results of solubility calculations for sulfonamide in binary solvents revealed abnormally high error values for acetone-ethyl acetate mixtures, which were further confirmed with experimental measurements. Additional confirmation was obtained by extending the solubility measurements to a series of homologous acetate esters.
Conclusions. Our study addresses the crucial issue of coherence of solubility data used for many theoretical inquiries, including parameter fitting of semi-empirical models, in-depth thermodynamic interpretations and application of machine learning protocols. The effectiveness of the proposed methodology for dataset curation was demonstrated for sulfanilamide solubility in binary mixtures. This approach enabled not only the formulation of a consistent dataset of sulfanilamide solubility binary solvent mixtures, but also its implementation as a qualitative tool guiding rationale solvent selection for experimental solubility screening.
Key words: in silico, solubility, sulfanilamide, binary solvents, COSMO-RS
Streszczenie
Wprowadzenie. Rozpuszczalność jest jedną z najbardziej fundamentalnych charakterystyk fizykochemicznych substancji aktywnych farmaceutycznie. Optymalizacja medium rozpuszczalnikowego jest złożonym problemem obejmującym nie tylko wyznaczenie samej rozpuszczalności ale również takie aspekty jak wpływ na środowisko, toksyczność, dostępność i koszt zakupu oraz utylizacji rozpuszczalników. Ponieważ uzyskanie kompleksowej charakterystyki rozpuszczalności nie jest trywialna, a często wiąże się z kosztownym i czasochłonnym procesem, badania teoretyczne stanowią istotne wsparcie.
Cel pracy. Celem pracy jest ilościowa ocena efektywności obliczania rozpuszczalności w mieszaninach dwuskładnikowych w oparciu o metodologię referencyjnego rozpuszczalnika. Z uwagi na duże zainteresowanie tym związkiem dostępna jest znaczna ilość danych doświadczalnych umożliwiających szczegółową analizę teoretyczną. Niestety, wykorzystanie tego zestawu bezpośrednio do modelowania teoretycznego jest utrudnione przez zaobserwowane niespójności. Ten problem został szczegółowo przeanalizowanych w oparciu o rozważania teoretyczną oraz przeprowadzone eksperymenty.
Materiał i metody. Część doświadczalna badań obejmowała równowagowy pomiar rozpuszczalności z wykorzystaniem metody spektrofotometrycznej w zakresie UV. Badania teoretyczne zostały zrealizowane z wykorzystaniem modelu COSMO-RS.
Wyniki. Analiza wyników przeprowadzonych obliczeń rozpuszczalności sulfonamidu w binarnych rozpuszczalnikach ujawniła anomalnie wysokie wartości błędów dla przypadku mieszanin aceton-octan etylu, co zostało zweryfikowane poprzez nowe pomiary eksperymentalne. Dodatkowego potwierdzenia dostarczyło rozszerzenie pomiarów rozpuszczalności o serie homologicznych estrów octanowych. Ponowne obliczenie rozpuszczalności w oparciu o metodykę rozpuszczalnika referencyjnego doprowadziło do znaczącej mniejszej rozbieżności pomiędzy obliczonymi oraz zmierzonymi wartościami ułamków molowych.
Wnioski. Niniejszy raport porusza kluczową kwestię zapewnienia spójności danych rozpuszczalności, które częstokroć są wykorzystywane rozważaniach teoretycznych, takich jak interpretacja rozpuszczalności w oparciu o modele empiryczne czy półempiryczne oraz termodynamicznej charakterystyce procesów rozpuszczania. Zastosowanie metodyki obliczania rozpuszczalności w mieszaninach dwuskładnikowych na podstawie danych dla czystych rozpuszczalników pozwoliło nie tylko na identyfikację niespójności danych ale również na potencjalne badania przesiewowe.
Słowa kluczowe: in silico, rozpuszczalność, sulfonamid, COSMO-RS
Background
Sulfonamides are a highly important class of active pharmaceutical ingredients (APIs). This group of antimicrobial substances has a broad spectrum of applications, including the treatment of various bacterial and fungal infections and diabetes therapy.1, 2 Sulfanilamide (SA; C6H8N2O2S; CAS:63-74-1, DrugBank: DB00259) is a precursor of sulfonamides used as topical anti-infectives.3 Furthermore, besides its conventional applications, SA has been employed in the synthesis of functional polymers with potential biomedical and pharmaceutical significance.4, 5, 6
The solubility of a particular API is a fundamental physicochemical property routinely determined at the early stages of drug development. Moreover, the optimization of the dissolution media is often required to fulfill the criterion of solubilization effectiveness and additional requirements such as ecological issues, toxicity, availability, and costs. However, due to the enormously large solvent space, the practical realization of this step poses serious challenges. Hence, multiple theoretical approaches have been developed for the estimation of the properties of saturated solutions. Among the many available methods, the conductor-like screening model for real solvents (COSMO-RS)7, 8 is one of the most ambitious theoretical approaches, aiming to determine the comprehensive characteristics of liquids exclusively from molecular structure. This particular approach harnesses the first principle of quantum chemistry computations augmented with statistical thermodynamics. This 2-step procedure results in full temperature-dependent thermodynamics for bulk systems, even with complex compositions. A unique feature is its theoretical framework that explicitly includes the structural and energetic diversity of the liquid systems by allowing the molecular representation by the set of the most probable conformers. There is considerable evidence of the successful application of this method to a wide range of chemical problems. Some reported examples include solubility modeling of rutin in natural deep eutectic solvents (NADES),9 halogenated hydrocarbons in ionic liquids10 or cellulose in ionic liquids.11 Notably, according to a recent report by Klajmon,12 the COSMO-RS model is generally significantly more accurate for modeling the solubility of various pharmaceuticals in conventional solvents compared to the popular perturbed-chain statistical associating fluid theory (PC-SAFT) method.
Importantly, COSMO-RS is a theory of bulk liquids, and the characteristics of the solid-liquid equilibria (SLE) require taking into account the Gibbs free energy values of fusion of the solute (ΔGfus). This thermodynamic property considers the transfer of the compound from the crystalline state into an unordered liquid and is typically determined experimentally using differential scanning calorimetry (DSC) measurements, including melting temperature (Tm), heat of fusion (ΔHfus), and heat capacity change upon melting. Although many solids have been fully characterized in these respects,13, 14 such thermodynamic data are not available for numerous active pharmaceutical ingredients. The reasons for this are related to the thermal instabilities and decomposition below the melting point. Additionally, the complex polymorphic behavior, which is often observed for many solids, requires further determination of thermodynamic equilibria between the different polymorphs. Occasionally,15 the polymorph dissolved under the experimental conditions of solubility measurements differs from the crystal form stable at the melting point. Also, sulfanilamide is considered a problematic solid. In the Cambridge Structural Database (CSD), sulfanilamide solid structure is deposited in 4 sets of records documenting distinct polymorphic forms, namely α (Pbca), β (P21/c), γ (P 21/c), and δ (Pbca). It has been reported16, 17 that both α and β forms during the heating process undergo a transition to the γ form, which is the only existing form under melting conditions. The 4th form is metastable and does not melt. The β-polymorph is the most stable one under ambient conditions and is usually commercially available.18 This complicates the theoretical computations of solubility since the experimental fusion data are unavailable. Moreover, proper characteristics of the fusion require the determination of the temperature-related heat capacities of the solid and liquid states, which often exceed the melting conditions. Therefore, the melting and fusion terms are often distinguishable. The former is used to depict the solid-liquid phase transition at the melting state, while the latter is reserved for solid transfer into a subcooled liquid state at any other temperature. Furthermore, the possibility of solvate formation should be considered. Fortunately, for sulfanilamide, no solvent inclusion occurs after crystallization.16, 19 The reported DSC thermograms19 provide a consistent picture where the thermograms’ shapes are preserved irrespective of the recrystallization from many solvents. This is evident by the observed small endothermic peak corresponding to the polymorphic transition and the large endothermic peak associated with the melting of the γ form.
The above information suggests that solubility computation is not a straightforward and trivial task, even in conjunction with the first-principle approach. Hence, many simplifications have been proposed by some authors, but criticized by others.20 However, the lack of consensus regarding the appropriate way of including fusion thermodynamics in solubility predictions should not be a prohibiting factor. Indeed, in the COSMOtherm program (Dassault Systèmes, Biovia, San Diego, USA), which practically implements the COSMO-RS theory, it is possible to compute solid solubility by indirectly defining necessary fusion data. This is achieved by providing experimental data for a given solute in a given solvent or mixture at a given temperature as input data for computing solubility in different media. This approach refers to the reference solubility method. Furthermore, using the value of ΔGfus determined in such a manner reproduces the solubility of the solutes given as the reference. Hence, this method is very flexible, providing the reference solubility at conditions different from those of the solubility predictions. Particularly, it is possible to generate several reference solubility values or a variety of temperatures. The former option is utilized in our study by providing the 2 reference solubility data, which were experimentally determined for neat solvents for computing the solubility in the binary mixture at the given temperature. This reference solvent procedure was previously used with varying effectiveness, depending on the system under consideration.21 In general, the proper selection of a reference solvent can provide a reasonable solubility estimate. Such a solvent is commonly referred to as a consonance solvent.15, 21 It is interesting to note that this approach is not utilized for the practical prediction of solubility in solvent mixtures.
Herein, we analyzed the accuracy of the aforementioned approach for sulfanilamide dissolved in a variety of binary solvent mixtures. This particular solute was extensively studied, and the available solubility data were large enough for theoretical analysis.
Materials and methods
Analytical grade sulfanilamide (≥98%, SA, CAS: 63-74-1), ethyl acetate (99.5%, CAS: 141-78-6), n-propyl acetate (≥99.5%, CAS: 109-60-4), n-butyl acetate (99.7%, CAS: 123-86-4), and methanol (≥99.8%, CAS: 67-56-1) were obtained from Sigma-Aldrich (Saint Louis, USA) and used as received. Nitrogen (99.999%) for DSC analyses was supplied by Linde (Warsaw, Poland).
Solubility measurements
In this study, the shake-flask solubility determination procedure was applied. It is worth noting that the protocol has been previously employed for various organic compounds and validated in literature data.19, 22, 23, 24, 25, 26, 27 To determine the concentration of the saturated solutions of sulfanilamide in the considered set of solvents, the samples containing undissolved excess of active substances were incubated with mixing (60 rpm) at 25°C. The mixtures were prepared by dissolving sulfanilamide in a solvent in glass tubes (10 mL). Then, the test tubes were placed in an incubator (Orbital Shaker ES-20/60; Biosan, Riga, Latvia). After allowing the undissolved precipitate to settle for 1 h at 25°C, the mixture and the sediment were separated by filtration using a syringe equipped with a 0.22 μm PTFE filter. Spectrophotometric measurements were taken using 0.1 mL of the filtered solution, which was immediately diluted with 2 mL of methanol. Rapid dilution of the samples was essential to avoid crystallization during subsequent analytical procedures for concentration determination. For the same reason, i.e., to avoid crystallization, the equipment employed at the filtration and dilution steps (syringes, filters, test tubes for filtrates, and pipette tips) was preheated at 25°C before use. For density determination, 1 mL of the filtrate was immediately transferred using an automatic pipette (Eppendorf Reference 2; Eppendorf, Hamburg, Germany) with a preheated tip to the tared 10 mL glass flask and weighted using an analytical balance (RADWAG AS 110 R2.PLUS; Radwag, Radom, Poland).
The quantification of SA in filtrates was ascertained using a spectrophotometric method (λmax = 262 nm). To establish the calibration curve, the absorbance values were determined for a series of dilutions of the stock sulfanilamide solution (1.78×10–5–6.68×0–5 M). All spectrophotometric measurements were performed using the A360 UV-VIS device (AOE Instruments, Shanghai, China).
Thermal analysis
After the shake-flask procedure, SA solid residues were air-dried and subjected to DSC analysis using a DSC 6000 PerkinElmer calorimeter (PerkinElmer, Waltham, USA). Heating rate was set to 5 K/min, while the nitrogen flow was 20 mL/min. The samples were measured in standard aluminum pans. The heat flow and temperature calibration were performed using zinc and indium standards supplied by the DSC device manufacturer.
Solubility dataset
An extensive literature review revealed sulfanilamide solubility of 18 different binary mixtures, including water-propylene glycol,28 1,4-dioxane-water,29 water-methanol,30 acetone-ethanol,31 acetone-methanol,31 acetone-toluene,31 methanol-toluene,32 ethanol-toluene,32 methanol-chloroform,32 ethanol-chloroform,32 methanol-ethanol,32 1,4-dioxane-water,19 ethanol-water,19 dimethyl sulfoxide (DMSO)-water,19 dimethylformamide (DMF)-water,19 acetonitrile-water,19 and 4-formylmorpholine-water.19 This collection included 13 solvents used for binary mixtures’ preparation in broad ranges of compositions and temperatures. The total number of datapoints was equal to n = 1171.
COSMO-RS computations
The COSMO-RS quantum chemistry approach7, 8 was used for solubility predictions, as implemented in the COSMOtherm software.33 A standard procedure for conformer generation was adopted, using the COSMOconf program (Dassault Systèmes, Biovia, San Diego, USA)34 at the highest parametrization level available (BP_TZVPD_FINE_21.ctd). The most stable conformer is shown in Figure 1.
The solubility computations relied on a thermodynamic equation defining the general solid-liquid equilibria in the following form (Equation 1):
(1)
where the left side defines the solute activity ai = yi • x1id at saturated conditions, Tm stands for melting temperature, ΔHfus represents heat of fusion, ΔCp,fus denotes heat capacity change upon melting, and R is the gas constant. The fusion data indispensable for direct solubility computations in the COSMOtherm were taken as the average of available values,13 namely heat of fusion ΔHfus = 23.40 ±0.38 kJ/mol (average of reported 23.28 kJ/mol,35 23.30 kJ/mol,35, 36 24.02 kJ/mol,37 23.0 kJ/mol,38 and 23.42 kJ/mol32) and melting point Tm = 437.3 ±1.84 K (mean value of reported 435.35,35 435.4 K,35, 36 439.3K,37 438.7 K,38 and 437.7 K32). The application of this equation for any solid solute is often simplified by assuming the temperature independence of ΔCp,fus. There are 2 common alternatives, which suffer from different inaccuracies depending on the studied system. The crudest simplification ignores heat capacity by setting ΔCp,fus = 0, which is often quite acceptable and was denoted as COSMO-RS in our study. Alternatively, one can assume ΔCp,fus ≈ Sfus = ΔHfus/Tm referred to below as CSOMO-RS(2). It is worth mentioning that these fusion data were used for computing the solubility in binary mixtures only for comparisons with reference solubility computations. The latter did not require fusion thermodynamics, since they were computed using the COSMOtherm program from the provided reference solubility data. The solubility values were computed by completely solving the SLE problem, utilizing a sequence of the following options in the input file “SLESOL solub screening”. For every reference solvent computation, actual values of neat solvent solubility were declared using the “ref_sol_x_log10” option along with the binary mixture composition, “x_ref_sol”, and temperature, “T_ref_sol”, keywords.
Results and discussion
The solubility of sulfanilamide in binary solvent mixtures was computed using the reference solvent approach. The experimental mole fractions of the saturated solutions were collected from the literature for systems listed in the methodology section. The computed values of SA solubility for the whole dataset are shown in Figure 2, which comprises of 4 sets of results. The 1st set consisted of the prediction of SA solubility in binary mixtures derived directly from solving SLE within the COSMO-RS framework, and was displayed as the distribution of gray crosses. It was visible that this method provided a very raw estimate of the experimental solubility data. However, reports have stated that COSMO-RS performs poorly in solubility prediction for various systems,22, 39 although there are cases where the computed values are in good agreement with the experimental ones.24 This was not the case for SA dissolved in the binary solvent mixtures, and we did not rely on such values for screening purposes, due to the back-computed values suffering from serious inaccuracies. The value of the mean absolute percentage error (MAPE) of mole fractions was as high as 1,310% (or 100% of the mean absolute logarithmic error), and cases arose that predicted value deviates by 3 orders of magnitude with respect to the measured ones. This was found for SA dissolution in chloroform, DMSO, DMF, and 4-formylmorpholine. Fortunately, the reference solvent procedure significantly increased the accuracy of the overall performance, since MAPE was reduced to about 40%. Although this was not a qualitative prediction, it offered rational estimates directing solvent selection for further experiments, if screening was needed. This was a very practical result, since only about 10% of the systems were studied experimentally within the whole solubility space formed by all possible combinations of the 13 neat solvents involved in the collected dataset of SA. Additionally, by performing measurements of SA in new neat solvents, the procedure enabled the estimation of solubility in virtually any combinations forming new binary mixtures.
Apart from the above determinations, additional interesting aspects emerged after performing reference solvent computations. Unexpected and anomalous behavior of acetone-ethyl acetate binary systems can be observed.31 This occasionally occurs where the reported solubilities were incongruent23 and should be used with reserve. Herein, the highly inaccurate values of SA solubility in the mentioned binary mixture computed using the reference solvent approach are highlighted in Figure 2 as gray circles distribution. Reports suggest a similarity in the solubility of SA in toluene and ethyl acetate.31 However, due to the differences in the polarity of both solvents, the opposite is expected. Notably, Asadi et al.40 showed relatively high deviations for SA solubility in the acetone-ethyl acetate system if interpreted using different solubility equations. Hence, extremely inaccurate predictions using reference solvent approaches for this system should not be attributed to theoretical issues. To resolve this problem and univocally address the observed discrepancy, 2 series of new measurements were performed. In the 1st trial, the solubility of SA was measured in 9 compositions of acetone and ethyl acetate at room temperature. To further confirm the solubility trend, the solubility of SA was determined in a series of ester homologs. The results of these 2 series of experiments are provided in Figure 3. The obtained data highlighted the problem of the reported SA solubility in acetone-ethyl acetate binary mixtures and provided a comprehensive and convincing resolution.
First, the left panel confirms the serious discrepancy between our measurements and those reported in the literature.31 Interestingly, the application of the reference solvent method to these new solubility data resulted in a very accurate prediction. Indeed, as shown in Figure 2, the red circles representing the values for SA in acetone-ethyl acetate mixtures were the same as the overall trend and similar to the experimental data. Hence, this conducted analysis confirmed the higher reliability of the new measurements collected in Figure 3A over those previously reported.31 To further support our observations, additional confirmation was offered by the results of new experimental solubility measurements (Figure 3B). It was reasonable to expect that for a homolog series of solvents, a type of smooth trend of solubility should be observed. As shown in Figure 3B, this was the case for a series of 4 acetate esters. A linear trend was found between the number of carbon atoms in the alcohol part and SA solubility in the corresponding acetate esters. The literature values were beyond that of the measured trend.
Undoubtedly, factors related to the characteristics of the solid, such as purity and polymorphism, have an impact on the results of solubility measurements. Therefore, calorimetric studies could provide valuable insight. Importantly, DSC measurement results of the sediments collected after the shake-flask procedure in neat solvents did not differ significantly from the pure sulfanilamide measured in this study (Figure 4) and those reported in the literature.31 For all thermograms, a small peak corresponding to the polymorphic transformation and an intense γ form melting peak was observed. The onset values for the polymorphic and solid-liquid phase transitions of pure SA were 391.84 K and 437.41 K, with corresponding enthalpy values of 1.89 kJ/mol and 24.17 kJ/mol. Notably, the melting point obtained in our study was similar to that reported by Kodide et al. for SA powder used for solubility measurements (Tm = 437.78 K).31
Conclusions
In this report, we addressed an important problem of solubility data coherence. The accuracy of this data is crucial as it is often used for further theoretical investigations, such as fitting parameters of empirical or semi-empirical solubility equations, solubility thermodynamics interpretation or more sophisticated inquiries, such as local composition determinations. Furthermore, the development of non-linear models using machine learning methodology also requires a reliable and consistent dataset for parameter hypertuning. Hence, data curation for solubility collections is a vital and valuable step.
This study displayed a straightforward methodology for the curation of certain subsets of the solubility data available in the published resources. The well-known saying “garbage in – garbage out” was exemplified for the solubility computations using the reference solvent approach. The accuracy of computed values in mixed solvents was inherently dependent on the reliability of solubility in neat solvents, as shown for sulfanilamide dissolved in binary mixtures of acetone and ethyl acetate. The replacement of suspicious solubility in neat solvents promoted an increase in the overall accuracy of solubility computations. This method is a reliable way to test the consistency of solid solutes in mixed dissolution media at various temperatures, including sulfanilamide.
Although qualitatively accurate, the reference solvent solubility values are still not accurate enough and cannot replace actual measurements. However, machine learning protocols may be implemented to further increase the accuracy of predicted solubility. The cured dataset, which encompasses congruent and coherent data, will be used in our next project for this purpose.
Data availability
The datasets generated and/or analyzed during the current study are available from the corresponding author upon reasonable request.