Beans are a valuable food product for both humans and animals.  They are the seeds of one of several genera of the flowering plant family Fabaceae.  Fabaceae is the third largest land plant by number of species with nearly twenty thousand known species.  Beans are valued for both their health benefits and long shelf-life. They can be cooked in many different ways such as boiling, frying, and baking.  There are numerous varieties grown all over the world that all have distinct features, textures, and flavors.  Some of the most popular beans include green beans, lima beans, kidney beans, black beans, chickpeas, and soybeans.  The popularity of beans is due to being affordable, easily transportable due to long shelf-life, and high nutritional content. Beans are a major source of protein, dietary fiber, carbohydrates, minerals, and vitamins.  They offer high protein and amino acid content while being lower in calories and saturated fat than many high protein sources, such as meat and dairy products.  Studies have shown that components in beans can act as antioxidants, anti-inflammatory agents, and can improve heart health.  Other potential health benefits include reduction in blood sugar levels, lower blood pressure, improved gut health, and having fiber and healthy starches which can help food eaters feel full, helping to prevent overeating and aiding in weight loss.  Growing and harvesting beans is relatively simple and the process is similar for most types of beans.  One exception to this is green beans, which are best harvested when immature and when the pods are bulging past their peak.  Properly dried beans are shipped all over the world and do not need to be refrigerated for a long period of time if the beans are to be frozen before shipping.  They can be sold dry as is, canned, or processed into various products.  Some examples of processed bean products include baked beans, bean pastes, puffed snacks, refried beans, rehydrated beans, and bean flours.  There is an increasing demand for beans all over the world due to many factors, including higher consumer health awareness of plant based diets and a need for food with long shelf-life due to the COVID-19 pandemic.  Consumers are looking for ways to replace less healthy meat and dairy products and beans is an ideal food type for doing that.  With demand continuing to grow and research moving forward at a rapid pace, there is a need for new testing methods to meet the challenges of optimizing bean breeding, growing, harvesting, and processing.  Traditional methods are often expensive, time-consuming, and impractical for use on a large scale.  One method which has shown potential for measuring parameters of interest in beans that is fast, non-invasive, and able to be implemented for large-scale testing is NIR spectroscopy. 


  • Protein 
  • Moisture 
  • Starch 
  • Fat/Oil 
  • Tannins  
  • Vicine 
  • Convicine 
  • Total polyphenols 
  • Starch 
  • Lipids 
  • Ash 
  • Carbohydrates 
  • Total dietary fiber 
  • Seed weight 
  • Hydration capacity 
  • Alcohol insoluble solids 
  • Dry matter 
  • Sensory attributes 
  • Amylose 
  • Calcium 
  • Magnesium 
  • Germination time 
  • Water content 
  • Ascorbic acid 
  • Total isoflavones 
  • Mineral and protein content based on contrasting tannins 
  • Discrimination between arabica and robusta green coffee beans 
  • Coffee roasting levels 

Summary of Published Papers, Articles, and Reference Materials 

Application of Infrared Spectroscopy for The Prediction of Nutritional Content and Quality Assessment of Faba Bean (Vicia faba L.) 

There is increasing demand for functional food products that have the potential to provide health benefits.  Modern consumers are more connected than ever to information about nutritional content and health benefits.  One food that has growing consumer interest and demand is faba bean.  Faba bean is one of the world’s oldest cultivated crops and is also known as broad bean and fava bean.  The high levels of antioxidant and phenolic compounds in faba beans are linked to numerous health benefits, such as protection against radicals, antihypertensive benefits, and anticancer activity.  Traditionally consumed in the Middle East and Southeast Asia, production of faba beans has steadily increased in many developed countries over the last few decades.  The COVID-19 pandemic has also fueled increased demand for faba beans due to increasing health awareness, desire for immune system strengthening, and the long shelf-life of faba beans which enables both exporting and storage for consumers.  Increased demand and production have created a need for methods to assess the nutritional quality and bioactive compounds in faba beans.  Traditional methods for analyzing parameters of interest are often expensive, time-consuming, require the use of toxic chemicals and solvents, and are impractical for implementing for large-scale testing.  One method that has been extensively studied for measuring nutritional and bioactive components in faba beans is NIR spectroscopy.  NIR spectroscopy has the advantages of being fast, non-invasive, requiring little or no sample preparation, the ability to measure multiple parameters with a single scan, and can be implemented for large-scale testing.  Mid-infrared (MIR) spectroscopy has been studied as well and while it is not as well-suited for the quantitative measurement of parameters of interest in faba beans as NIR, it does have a larger array of absorption peaks for a range of chemical bonds, making it a powerful tool for analyzing certain kinds of molecular changes in faba beans and other foods.  This review paper discusses and analyzes studies that have been performed to analyze faba beans using both NIR and MIR spectroscopy. 

NIR spectroscopy has historically been the dominant form of infrared spectroscopy used for food analysis due to low instrumentation cost, high signal-to-noise ratio, and greater penetration into the sample matrix due to the longer wavelengths used.  Shortwave NIR spectroscopy can penetrate centimeters into a sample and can even be used for transmission through certain solids, such as whole grains.  Longwave NIR spectroscopy has shorter penetration and is better suited for reflectance surface analysis of homogenous samples.  Technological advances in NIR spectrometers have also enabled their use as both a portable instrument and on-line process measurement tool.  The term proximate nutritional composition refers to broad classes of macronutrients that compose the majority of food.  Some of these include moisture, protein, starch, ash, oil, crude fat, and crude fiber.  NIR spectroscopy has been examined for the determination of numerous proximate nutritional composition parameters in faba beans as well as for discrimination analysis of both varieties and growing locations, leaf analysis, and root analysis.  Shown below is a list of reviewed application studies for various parameters in faba beans. 

Analyte Matrix Accuracy 

Protein Milled SeedRMSE 0.56%R2 0.96 
ProteinNot Reported CV 1.13% 
Protein Milled SeedRMSECV 0.34%
Whole SeedRMSECV 0.60% 
Moisture Milled SeedRMSE 0.30%R2 0.93
Starch Milled SeedRMSECV 0.72%R2 0.86
Whole SeedRMSECV Not Reported 
Oil Milled SeedRMSECV 0.17% 
Whole SeedRMSECV 0.18% 
Tannins Whole SeedSEP 0.54%
Vicine and Convicine FlourSECV 0.094% 
Total Polyphenols Milled SeedRMSECV 0.40 mg/g
Whole SeedRMSECV 0.42 mg/g 
Glycine Betaine LeafletsRPD 1.81

Protein and moisture are two of the first NIR spectroscopic applications ever developed for food products and the first protein study shown above as well as the moisture study were conducted in 1978.  Two separate models were developed: protein using the ratio of absorption at 2180 nm to absorption at 2100 nm and moisture using the ratio of absorption at 1940 nm to absorption at 1800 nm.  Both models showed a correlation coefficient higher than 0.90 and considering the limited instrumentation available at the time, prediction error and correlation are excellent.  Another study used a limited sample set of fifty samples to build a calibration model for protein and showed good correlation with validation predictions showing a standard deviation between values obtained from the NIR calibration model and reference Kjeldahl method of 0.28% and a coefficient of variability of 1.13%.  Other protein studies included a study that determined protein content of faba beans in an attempt to optimize crop combinations of various plants to obtain the greatest protein yield per acre and a study that compared determining protein in ground faba bean powder with determining protein in intact seeds.  Results and correlation were far better for the ground powder than intact seeds, with the powder model showing a correlation coefficient of 0.94 and a RMSECV of 0.34% and the intact seed model showing a correlation coefficient of 0.76 and a RMSECV of 0.60%.  The higher error in the intact seed model is almost surely due to greater heterogeneity and light penetration of only the outer seed surface. 

Starch is another important parameter in not only faba beans but all grain and legume products that plays an important role in determining overall nutritional quality.  One study analyzed determining starch in ground faba beans with over two hundred samples, showing reasonable results of a correlation coefficient of 0.86 and RMSECV of 0.72%.  The same study also analyzed oil content in both ground and whole faba beans.  Results were similar for both types of samples with correlation coefficients of 0.66 and nearly identical RMSECV of 0.17% and 0.18%.  Considering the range of values for oil was small, from 0.48% to 1.99%, these results are reasonable and the models are considered adequate for screening purposes.  Polyphenols are one of the major groups of phytochemicals in faba beans.  They are known for their health benefits, especially for positive cardiovascular effects.  NIR spectroscopy has been examined to predict the total polyphenol content in ground faba bean with a correlation coefficient of 0.79 and RMSECV of 0.40 mg/g. These results show potential to replace the traditional reference method of the Folin-Ciocalteu assay, which is time-consuming and expensive to implement.  Tannins are a complex group of polyphenols that are considered anti-nutritive because they can reduce the efficiency of nutrient uptake and metabolism.  It is important to consider tannin concentration when developing new faba bean varieties and one study examined using NIR spectroscopy for this purpose.  Sixty whole faba bean samples were used with a range from 0.01% to 7% w/w (although no samples from 1% to 3.5% were in the calibration) and good correlation was obtained with a correlation coefficient of 0.93 and SEP of 0.54%. Tannins are largely contained in the seed coat and this likely explains the strong correlation.  Vicine and convicine are alkaloid glycosides that can cause problems if consumed by individuals with a certain type of blood enzyme.  Although the concentration of these compounds in faba beans is typically low at around 0.6% to 0.9% w/w, one successful study reported good correlation in faba bean flour with a correlation coefficient of 0.968 and an RMSECV of 0.094%.  It must be noted that calibrations of low concentration of micronutrients in faba beans and other foods may be actually measuring a secondary correlation of the micronutrients with certain macronutrients.  While such a correlation is acceptable, it must be examined carefully and properly validated to determine if the correlation is real or not.   

Other potential applications using NIR spectroscopy in faba beans include authentication of variety and growing area, leaf analysis of carbon and nitrogen, and root analysis. Mid-IR spectroscopy has also been examined and has shown good potential for a number of applications.  It is better suited than NIR for molecular analytes like protein secondary structure, polymer characterization, starch crystallinity, and starch granular architecture.  Mid-IR is also good for certain types of discrimination analysis, such as different colors of beans, cultivars, growing years, and high and low tannin varieties.  While Mid-IR does contain a larger array of specific absorption peaks for a range of functional groups, the low light penetration, need to use Attenuated Total Reflectance (ATR) to increase signal amplitude, not being well-suited for both portability and on-line applications, and difficulty in use for quantitative measurements does limit it use.  As more applications using infrared spectroscopy are studied and developed, there will be increased use of both NIR and Mid-IR spectroscopy to analyze faba beans and many potential applications could use both methods in conjunction with each other. 

Application of infrared spectroscopy for the prediction of nutritional content and quality assessment of faba bean (Vicia faba L.) – Johnson – 2020 – Legume Science – Wiley Online Library 


Near-Infrared Spectroscopy (NIRS) Applied to Legume Analysis: A Review 

Legumes are a very important food in the human diet.  They are known for their health benefits and high nutritional value.  About twenty types of legumes are used as dry grains for human nutrition in many parts of the world and are sources of complex carbohydrates, protein, dietary fiber, vitamins, and minerals.  These include common beans, peas, chickpeas, and lentils.  Consumption of these products is increasing every year and there is a need to develop methods for analyzing parameters of interest in legumes.  Conventional methods for determining nutritional composition in legumes are time-consuming, expensive, often require the use of toxic chemicals and solvents, require sample destruction, and are impractical for implementing for large-scale testing.  One method that has been studied extensively for replacing traditional methods is NIR spectroscopy.  While NIR spectroscopy does require collecting spectra of samples, performing reference tests, and building chemometric models that correlate the NIR spectra to parameters of interest, once this process is completed the advantages are enormous.  NIR spectroscopy is fast, non-invasive, requires little or no sample preparation, does not destroy samples, and has the ability to measure multiple parameters with a single light scan once calibrations are made.  There have been a number of application studies to determine the feasibility of using NIR spectroscopy as an analytical tool for analysis of legumes and many of these studies are reviewed here.  Shown below is a list of application studies for various types of legumes. 


Faba Bean 

Sample Parameter Accuracy 

244 milled & intact seeds Protein (Milled)RPD = 4 R2 = 0.97
Protein (Whole)RPD = 2 – 2.5 
Starch (Milled)RPD = 3 R2= 0.93
Starch (Whole)RPD = 2 – 2.5 
Polyphenols (Milled)RPD = 2 – 2.5 
OilRPD = 2 – 2.5 R2 = 0.89

This application study for analyzing faba beans using NIR spectroscopy showed excellent results for protein in the milled samples.  RPD is defined as Residual Prediction Deviation, the standard deviation of observed values divided by the Root Mean Square Error of Prediction (RMSEP).  It is a metric of model validity and is considered more objective than RMSEP as well as more easily comparable across different model validation studies.  Protein in intact seed samples showed lower correlation and this is most likely due to large differences in the size of the particles and the fact that milled samples are more homogenous.  The starch model for milled samples also showed good predictive capacity.  Whole seed models for starch, milled seeds for polyphenols, and oil did not show good enough results for practical use. 


Sample Parameter Accuracy  

153 whole grains Crude ProteinR2 = 0.97RMSEC = 0.61RMSEP = 0.76
Fat R2 = 0.97RMSEC = 0.36RMSEP = 0.41
80 samples Total Dietary FiberR2 = 0.80RMSEC = 1.7RMSEP = 0.86

The two application studies shown here determined crude protein, fat, and total dietary fiber in soybean.  Results for protein and fat were excellent and demonstrate the potential of NIR spectroscopy to replace traditional reference methods for measuring these parameters in soybeans.  The model for total dietary fiber was less accurate but still has an acceptable correlation coefficient and reasonable error in predictions, indicating that this model could be used for screening purposes.  

Chickpea and Pea 

Sample Parameter Accuracy (Milled/Whole) 

156 pea samples Crude ProteinR2 = 0.99/0.94 SECV = 0.27/0.57
Moisture R2 = 0.90/0.51SECV = 0.19/0.39
151 chickpeas MoistureR2 = 0.77/0.84SECV = 0.36/0.31 
Ash R2 = 0.77/0.72 SECV = 0.19/0.39
Seed WeightR2 = 0.89/0.88SECV = 1.50/1.50
Hydration CapacityR2 = 0.82/0.90SECV = 3.33/2.65
Percentage of HuskR2 = 0.64/0.74SECV = 5.46/5.05
Peeling EfficiencyR2 = 0.59/0.80SECV = 1.23/0.85 
Cooking QualityR2 = 0.53/0.71 SECV = 2.93/2.40

The calibration models for pea showed excellent correlation for both types of samples for crude protein and good correlation for milled peas but poor correlation in whole peas for moisture.  In general, the chickpea calibration models were better for the ground samples when measuring chemical composition but better for the whole samples when measuring physical or functional properties.  The grinding of the samples makes them more homogenous, making the chemical properties more easily determined while likely causing a change in the physical properties.   

Fresh and Frozen Peas 

Sample Parameter Accuracy (Fresh/Frozen) 

114 samples Alcohol Insoluble SolidsR2 = 0.96/0.84  
Dry MatterR2 = 0.97/0.96  
Sensory AttributesR2 = 0.97/0.97
Firmness of FleshR2 = 0.83 (Fresh) 
Sweet FlavorR2 = 0.82 (Fresh) 
Strength of FlavorsR2 = 0.76 (Fresh) 
Brightness of ColorR2 = 0.89 (Fresh)

Results from this study were good and demonstrated the ability of NIR spectroscopy to measure chemical and physical indicators of maturity in peas.  Decent correlation was attained for sensory attributes for texture and flavor.  There is potential to use the methods developed here for on-line sorting of peas by degree of maturity in a pea processing factory. 

Dry Pea Flour 

Sample Parameter Accuracy  

123 samples AmyloseR2 = 0.95 
Resistant StarchR2 = 0.76 
Digestible StarchR2 = 0.80 
Total StarchR2 = 0.88 

This application study used Multi-Linear Regression (MLR) calibration models to predict amylose, resistant starch, digestible starch, and total starch in dry pea flours. Values predicted by the calibration models were in good agreement with the laboratory reference values for the parameters of interest, proving the feasibility of the correlations and calibration models. 

Common Bean 

Sample Parameter Accuracy (Dispersive/FT-NIR) 

54 genotypes (White and Colored) ProteinR2 = 0.96-0.97
Starch R2 = 0.95-0.96 
Amylose R2 = 0.94-0.95 

This study compared two different types of NIR spectrometers for analyzing protein, starch, and amylose in different genotypes of common bean.  Correlation was higher and predictive performance was better for models using an FT-NIR spectrometer than for models using a dispersive spectrometer.   

Sample Parameter Accuracy 

121 samples MoistureR2 = 0.94SEP = 0.39 
Starch R2 = 0.88SEP = 0.9 
Protein R2 = 0.94SEP = 0.56
Fat R2 = 0.74SEP = 0.13 

An independent validation set was used to confirm the validity of the models and there was good agreement between the predicted values from the NIR calibrations and reference methods, especially for starch and protein. 

Sample Parameter Accuracy (Whole/Milled) 

90 seed coats Dietary FiberSEP = 1.23/2.60 
Uronic AcidsSEP = 1.40/1.49 
Ash SEP = 2.03/3.49 
Calcium SEP = 2.40/3.57
Magnesium SEP = 1.33/1.50 

The models developed in this application study showed sufficient results for screening of ash and calcium using NIR spectroscopy.  Samples scanned were ground husk and all models with the exception of uronic acids, which showed very poor correlation, could be used for rough screening and classifying seed husks based on the parameter of interest.  All studies discussed here have shown the potential to use NIR spectroscopy as a replacement for traditional expensive and time-consuming reference methods for determining parameters of interest in legumes. 


Near-Infrared Spectroscopy and Aquaphotomics for Monitoring Mung Bean (Vigna radiata) Sprout Growth and Validation of Ascorbic Acid Content 

Mung bean is an important food commodity, especially in Asia.  It is a cheap protein source in cereal based diets and can be either eaten whole, cooked, or fermented or milled into flour.  Mung bean flour is used to make multiple products, including noodles, breads, and various bakery products.  In addition to significant amounts of protein, mung bean also contains fiber, soluble fiber, potassium, vitamins, and minerals.  Phosphorous content is significant and the molecules come in the form of phytate, an anti-nutritive component that binds with minerals and thus creates insoluble compounds.  However, processes such as germination, soaking, fermentation, and cooking have all been proven to reduce these anti-nutritive effects of phytate.  During sprouting, many nutritional compounds are formed and one significant compound is ascorbic acid, better known as Vitamin C.  Ascorbic acid is significantly affected by the germination time.  Initial content has been reported as low as 3 mg/100 g and the final content after germination can go as high as 98 mg/100 g.  There are several quality components that can be used to determine germination time in addition to ascorbic acid, such as water content, pH, and conductivity.  However, determining these components in sprouting mung beans is time-consuming, requires sample destruction, uses toxic chemicals and solvents for some tests, and is impractical for large-scale testing.  NIR spectroscopy was examined as a method for determining germination time and ascorbic acid content in mung bean.  Using NIR spectroscopy is often a correlative method, meaning that while the exact composition of the sample may not be measured after the creation of calibration models, a measurable component (such as water content) that is correlated with other parameters of interest (such as germination time and ascorbic acid) can indirectly determine the parameters of interest. In such cases, models must be carefully examined and validated to ensure proper correlation.  One such method for doing this is known as aquaphotomics, which characterizes complex aqueous systems through changes in the hydrogen bonding network of water molecules from 1300 nm to 1600 nm.  A simpler explanation is that low concentration components that are below the threshold of detection for NIR spectroscopy can in fact be measured indirectly if they cause a change in water molecules, which are highly absorbing of light in the near-infrared range.   

Mung beans from Thailand were procured for the study.  Six separate 400 g packages were homogenized and separated into twenty-one different holders, each containing about 100 g of beans.  Germination time was set for zero hours to one hundred twenty hours. A standard soaking, draining, and incubation process with constant temperature and humidity was used for germination.  At each desired germination time, the beans in that holder were dried and scanned using an NIR spectrometer.  Twenty bean samples from each holder were scanned in triplicate from 900 nm to 1700 nm.  After bean sprout scanning, 100 g of each sprout was weighed, mixed with distilled water, crushed, and filtered.  The filtrate was divided into portions for reference tests for pH, conductivity, and ascorbic acid content.  Another portion was also scanned using the same NIR spectrometer but after placement in a 1 mm quartz cuvette. Water content was determined by drying bean sprouts in an oven.  Various pre-processing methods were applied to the spectral data before chemometric analysis.  NIR spectra of the bean sprouts were used to create a Linear Discriminant Analysis (LDA) model to classify the germination time of bean sprouts at 24 h intervals.  NIR spectra of the filtrate was used with reference values for germination time, water content, and ascorbic acid to create Partial Least Squares (PLS) models correlating the spectra to these parameters of interest.  Models used the wavelength range from 1300 nm to 1600 nm. 

Whole Beans LDA Classification Based on 24 Hour Intervals of Germination Time 

Bean Sprout Extract | 100% Accuracy

Germination Time (h)R2 = 0.960RMSEC = 8.18
Water Content (%R2 = 0.966RMSEC = 2.34
Ascorbic Acid (mg/100 g)R2 = 0.962RMSEC = 22.9

The results of this study show promise for determining germination time, water content, and ascorbic acid in mung bean.  However, there was a fair amount of error in the predicted values for these parameters despite the high correlation coefficients.  One likely reason for this is that reference tests were only performed for samples every twenty-four hours, creating a large number of samples that likely exhibited spectral differences but had the same reference values for the parameters of interest.  Results are likely to improve with more frequent sampling and reference tests.  Aquaphotomics analysis did determine a good correlation between water content, germination time, and ascorbic acid content.  More work is needed before using these models in a practical setting but the potential was demonstrated to use NIR spectroscopy as a fast and non-invasive method for determining germination time, water content, and ascorbic acid content in mung beans. 

Sensors | Free Full-Text | Near-Infrared Spectroscopy and Aquaphotomics for Monitoring Mung Bean (Vigna radiata) Sprout Growth and Validation of Ascorbic Acid Content | HTML ( 

Comparison and Application of Non-Destructive NIR Evaluations of Seed Protein and Oil Content in Soybean Breeding 

Soybean is a major crop grown worldwide and plays an important role in agricultural production, industrial biofuel manufacturing, and international trade.  On average, the dry weight of soybeans contains around 40% protein and 20% oil, with most of the remaining composition containing carbohydrates, minerals, and water.  There are a number of reasons why analysis of nutritional components and other important traits is necessary.  Breeders need to assess large numbers of breeding materials for multiple traits in a short period of time to select the desired genotypes in breeding populations with complicated variations.  Soybean usage is very dependent on seed composition.  High oil breeds are used for vegetable oil processing and biodiesel manufacturing, while high protein is preferred for human diet and soy food products.  Determining protein and oil composition requires wet chemistry methods such as the Kjeldahl method for protein and the Soxhlet method for oil.  While accurate, both methods are time-consuming, expensive, require sample destruction and the use of toxic chemicals, and are impractical for implementing for large-scale testing.  NIR spectroscopy was examined as a method for determining protein and oil content.  Two separate spectrometers with pre-built calibrations with protein and oil for soybeans were used in the study.  One instrument is for laboratory use while the other is portable and can be used in the field.  Whole seed samples of sixteen different genotypes were procured for the study.  For four of the sixteen genotypes, additional samples were taken from either a different harvesting year or location to examine the variability from different seed sources.  Protein and oil content were analyzed using the laboratory NIR spectrometer, portable NIR spectrometer, and wet chemistry methods. In total, seven hundred and sixty soybeans were scanned with the spectrometers. 

Laboratory NIR Spectrometer – Correlation with Wet Chemistry Methods 

Protein R2 = 0.977 
Oil R2 = 0.960 

Correlation with the reference methods was excellent when using the laboratory NIR spectrometer but much poorer when using the portable NIR spectrometer.  However, this was expected as the laboratory NIR spectrometer used calibrations developed and updated by the manufacturer while the calibrations used for the portable instrument were the original installed calibrations.  After analysis of the spectral data, it was determined that both genotype and particle size of the seeds had significant effects on the predictions.  After analysis of the variations and bias corrections to the equations used for the calibrations, both correlation coefficients for the protein and oil models for the portable instrument increased to higher than 0.75.  Results were validated by predicted values from an additional two hundred and forty samples scanned with the portable instrument.  The study showed that the laboratory instrument could be used for quantitative analysis of protein and oil in soybeans while the portable instrument could be used for screening single plants in breeding selection.   

Agronomy | Free Full-Text | Comparison and Application of Non-Destructive NIR Evaluations of Seed Protein and Oil Content in Soybean Breeding ( 

Use of Near-Infrared Reflectance Spectroscopy for the Estimation of the Isoflavone Contents of Soybean Seeds 

There has been increased interest in the composition and physiological functions of food products in recent years as consumers look for healthy and alternative foods in their diet.  With this increased interest, it is important for food sellers to obtain and promote food with higher nutritional composition.  Isoflavones are one family of compounds found in soybeans that may be associated with lower rates of postmenopausal cancer in women as well as helping to prevent osteoporosis.  The traditional method for analyzing isoflavone content is HPLC which is effective, but time-consuming, expensive, requires sample destruction and the use of toxic chemicals, and is impractical for implementing for large-scale testing.  NIR spectroscopy was examined for determining isoflavone content in soybeans.  Forty-eight soybean samples were procured from different growing areas in Japan for the study.  All samples were scanned using an NIR spectrometer from 1100 nm to 2500 nm at 2 nm intervals.  After scanning, all samples were milled and the powdered samples were scanned again using the NIR spectrometer.  Isoflavone content was determined by HPLC.  Individual components of isoflavone were determined as well.  Various pre-processing methods were performed on the NIR spectra before chemometric analysis.  Multi-Linear Regression (MLR) calibration models were created for total isoflavone and individual components correlating the NIR spectra to parameters of interest.  The NIR spectra for thirty-six samples were used as a calibration set and the remaining twelve samples were used for a validation set. 

Total Isoflavone 

Intact Samples (mg/100 g)R2 = 0.92SEP = 38.51
Powdered Samples (mg/100 g)R2 = 0.85SEP = 63.43

Results for total isoflavone showed high correlation for the powdered samples and decent correlation for the intact samples.  The range of values for the calibration set was from 133.44 mg/100 g of dry weight to 633.42 mg/100 g of dry weight.  Independent predictions using the validation set confirmed the validity of the models.  Models were also created for the individual isoflavone components and while some of them had high correlation coefficients, it is almost certain that these models are correlating to something besides the individual isoflavone components as the concentration of these parameters is far below the threshold of detection for NIR spectroscopy.  Some components had a mean value of less than 10 mg/100 g and the model for these is definitely correlating another parameter which is measurable using NIR spectroscopy.  It is possible that the individual components are affecting macronutrient concentration which is then the basis for the correlation to the NIR spectra. However, such an indirect correlation must be examined and validated carefully.  This study showed that measuring the total isoflavone content in soybeans is feasible in both intact and powdered samples of soybeans. 

Use of Near-infrared Reflectance Spectroscopy for the Estimation of the Isoflavone Contents of Soybean Seeds: Plant Production Science: Vol 11, No 4 ( 

Seed Mineral Composition and Protein Content of Faba Beans (Vicia faba L.) with Contrasting Tannin Contents 

Faba bean is widely grown around the world and is used as a source of protein in human diets, as fodder and a forage crop for animals, and also has a good ability to fix atmospheric nitrogen.  It is also a good source of energy, fiber and minerals.  Protein content is high ranging from 24% to 35% of the seed dry matter.  Mineral content is especially important because an estimated two-thirds of the world’s population are at risk of deficiency of one or more essential minerals, such as calcium, magnesium, zinc, and potassium.  One course of action that has been studied to help address mineral deficiencies in humans is genetic biofortification through plant breeding.  The technique involves screening and developing micronutrient rich germplasm, conducting genetic studies, and developing molecular markers to facilitate breeding.  While this method is effective, it does require testing that can be expensive, time-consuming, and difficult to implement on a large-scale.  One potential way to help facilitate these kinds of studies and tests is to investigate variations in chemical and genetic composition by genotype, growing area, and other environmental factors and then correlate those with an easily measurable macronutrient component.  In this study, different faba bean genotypes from different growing areas were investigated for variation of mineral components and protein using NIR spectroscopy and other testing methods to correlate these parameters with contrasting tannin contents.  Twenty-five different faba bean genotypes grown at three different locations in Canada during two separate growing seasons were procured for the study.  Each location had a different soil type as well.  Plot samples were threshed, washed, and ground to a fine powder. Micronutrients were analyzed using the standard method of Inductively Coupled Plasma Mass Spectrometry (ICP-MS).  Protein content was determined using an NIR spectrometer with a pre-built calibration for protein.  It is known that genotypes that are white-flowered contain low tannins while spotted-flowered contain high tannins.  The combination of year and location was considered as “environment” and two different data analysis algorithms were applied. Mixed Model Analysis of Variance (ANOVA) was used to determine variance with genotype as a fixed effect while location, year, and replications nested within the site-year were considered random effects. Principle Component Analysis (PCA) was used to characterize associations among genotypes, mineral elements, and protein.  The data analysis indicated that both the seed minerals concentrations and protein were affected by environmental variation and the tannin profile.  Specifically, low-tannin white-flowered faba beans were found to be rich in calcium, magnesium, iron, and zinc, which are minerals that are known to be deficient in the human diet for many people.  A higher protein content was also found in these beans.  The high heritability observed for mineral concentrations in the seeds suggest that genetic improvement is possible for these traits.  While more study and a deeper examination would be required, this study shows the potential to use NIR spectroscopy as a tool for helping to correlate protein concentration in faba beans with tannin and mineral concentration to assist in genetic breeding.   

Agronomy | Free Full-Text | Seed Mineral Composition and Protein Content of Faba Beans (Vicia faba L.) with Contrasting Tannin Contents | HTML ( 

Robust Prediction Performance of Inner Quality Attributes in Intact Cocoa Beans Using Near-Infrared Spectroscopy and Multivariate Analysis 

Chocolate is made from raw cocoa beans that are extracted from the cocoa tree pod and then roasted, fermented, or ground into formation of processed products.  It can be formulated into a paste or solid-state from a roasted or ground cocoa and fat combination.  Chocolate is typically sweetened with additional sugar and other ingredients, formed into bars, and eaten as confectionery.  There are two quality classifications for cocoa beans: bulk cocoa which is considered standard quality and flavor cocoa which is considered high quality.  Chocolate manufacturers need to check their incoming cocoa beans to ensure they are high quality.  Fat and moisture content are considered the two primary quality parameters in cocoa beans.  Current methods for determining quality parameters in cocoa beans are time-consuming, expensive, require the use of toxic chemicals and solvents as well as sample destruction, and are impractical for implementing for large-scale testing.  Fat testing is done by the Soxhlet method which is both time-consuming and uses solvents. Moisture testing requires a drying and gravimetric method which takes well over an hour.  NIR spectroscopy was examined as a method for determining fat and moisture content in cocoa beans.  One hundred and ten bulk cocoa bean samples that were harvested from June to August from the same plantations in Indonesia were procured for the study.  Each bulk sample contains around 54 g of intact beans.  All samples were scanned using an NIR spectrometer from 1000 nm to 2500 nm with a scan interval of 0.2 nm. Thirty-two scans were collected per reading and averaged into one spectrum per sample.  Fat and moisture content were determined for each sample using the standard Soxhlet and gravimetric methods.  Various pre-processing methods were applied to the spectral data before chemometric analysis.  Partial Least Squares (PLS) calibration models were created correlating the fat and moisture content to the NIR spectra. 

Fat R2 = 0.86RMSEP = 0.79
Moisture R2 = 0.92RMSEP = 0.41 

The results indicate that NIR spectroscopy is a feasible method for determining fat and moisture content in cocoa beans.  Cross-validation was performed by removing spectra from the calibration models, recalculating the models without those spectra, and then using the new models to predict values from the removed spectra. Predictions were in good agreement with the reference method values which confirms the validity of the models.  Before using these models in a practical setting, further study and addition of data would be warranted.  Samples from more growing areas and from different harvest seasons would likely improve modeling results.  This study shows the potential to use NIR spectroscopy as a faster and cheaper alternative to traditional methods for determining fat and moisture content in cocoa beans. 

Robust prediction performance of inner quality attributes in intact cocoa beans using near infrared spectroscopy and multivariate analysis – PubMed ( 

Rapid Prediction of Moisture Content in Intact Green Coffee Beans Using Near Infrared Spectroscopy 

Moisture is a very important quality parameter in green coffee beans and is strictly regulated by most countries that import and export coffee. The safe range for moisture is from 8% to 12.5% based on fresh matter. Moisture below 8% causes shrunken beans and an unwanted appearance. Moisture above 12.5% facilitates fungal and mycotoxin growth as well as the potential for problems during storage and the roasting process. NIR spectroscopy was examined as a method for measuring moisture content in both Arabica and Robusta green coffee beans. Twelve sets of samples were used for the study: Three Arabica species and four Robusta species of different origins for the calibration set and two Arabica species and three Robusta species of different origins for the validation set. NIR diffuse reflectance spectra were collected from all samples from 1000 nm to 2500 nm at 2 nm intervals. Each individual spectrum consisted of the average of sixty-four scans. Three replicates were acquired for each sample and these spectra were averaged as well, resulting in one hundred and eight total spectra of the twelve different samples. Reference values were obtained for moisture and these were used with the NIR spectra to create Partial Least Squares (PLS) calibration models for moisture content. 

Moisture (Full Wavelength Range)R2 = 0.9850RMSEP= 0.57% 
Moisture (Selective Wavelengths)R2 = 0.9743RMSEP= 0.77% 

Two sets of PLS calibration models were created: one using the full wavelength range and the other using seven selective wavelengths that were chosen based on the correlation of the full range model. Some of these are moisture absorbing areas of the NIR spectrum and others correlate to organic compounds affected by a change in moisture: 1155 nm, 1212 nm, 1340 nm, 1409 nm, 1724 nm, 1908 nm, and 2249 nm. Prediction results on the validation set using both models proved the feasibility of the measurement. Results were comparable for both models and either could be applied in an on-line setting to determine moisture in green coffee beans. 

Foods | Free Full-Text | Rapid Prediction of Moisture Content in Intact Green Coffee Beans Using Near Infrared Spectroscopy | HTML ( 

Reliable Discrimination of Green Coffee Beans Species: A Comparison of UV-Vis-Based Determination of Caffeine and Chlorogenic Acid with Non-Targeted Near-Infrared Spectroscopy 

Coffee consumption is increasing every year across the world and adulteration is a common problem in the coffee market.  The two types of coffee beans are Arabica and Robusta.  Arabica makes up around 58% of global production of coffee while Robusta makes up the remaining 42%.  They differ in several aspects such as taxomonic classification, morphology, bean size and color, chemical compounds, and sensory evaluation.  There are limitations to visual inspections of the beans because the physical characteristics can vary considerably between species and varieties due to different genotypes and environmental factors.  Certain varieties of Arabica also have sensory properties very similar to Robusta, such as mouthfeel and bitterness.  The average annual price of Arabica green coffee beans is around $2.51 per kg while the price of Robusta is around $1.63 per kg.  The price difference makes substituting Robusta for Arabica enticing and thus there is a problem with adulteration that can also include substituting less desirable varieties from different geographical regions.  Both NIR and UV-Vis spectroscopy were examined for discriminating between different species of green coffee beans.  UV-VIS spectroscopy has been used as a method to determine caffeine and chlorogenic acid content in coffee beans and in this study, it was used to also discriminate between Arabica and Robusta beans.  Seventy-four green coffee beans samples from different locations in Indonesia were procured for the study.  The samples were chosen specifically to represent different environmental factors, agricultural practices, and genetic characteristics.  They were sourced from thirty-eight different processing facilities during the same harvesting season.  Thirty-two samples were Arabica and forty-two were Robusta.  Caffeine and chlorogenic acid were determined using UV-VIS spectroscopy by standard methods and procedures.  An FT-NIR spectrometer was used to scan all samples from 1000 nm to 2500 nm at 2 nm intervals.  Each sample was scanned sixty-four times and the scans were averaged into a single spectrum.  This process was repeated three times and the three spectra per sample were further averaged into one spectrum.  Various pre-processing methods were applied to the spectral data before chemometric analysis.  Linear Discriminant Analysis (LDA) was performed on both the UV-VIS spectra and NIR spectra as a model for discriminating between species.  In the case of the UV-VIS spectra, differences in values between species for caffeine and chlorogenic acid were analyzed as well.  Results are shown below. 


UV-VIS97.3% Correct Classification 
NIR95.5% Correct Classification 

The results for both sets of spectra show that both methods can be used to discriminate between Arabica and Robusta coffee beans.  There was some overlap in caffeine and chlorogenic acid values between the two species, indicating that these values alone cannot be used as a basis for classification.  While the results were slightly better using the UV-VIS spectra for discrimination between the two species, it must be noted that UV-VIS spectroscopy is a far more labor intensive method than NIR spectroscopy. UV-VIS requires extensive sample preparation and the use of solvents and standard solutions.  By contrast, once a calibration model is created NIR spectroscopy only requires collecting a spectrum for analysis, typically taking around thirty seconds per reading.  The results here show the potential to use NIR spectroscopy for classifying Arabica and Robusta coffee beans. Further research should include beans of different species and varieties from different parts of the world. 

Foods | Free Full-Text | Reliable Discrimination of Green Coffee Beans Species: A Comparison of UV-Vis-Based Determination of Caffeine and Chlorogenic Acid with Non-Targeted Near-Infrared Spectroscopy | HTML ( 

Application of Detrended Fluctuation Analysis and Yield Stability Index to Evaluate Near Infrared Spectra of Green and Roasted Coffee Samples 

The quality of coffee is determined by many factors, such as species, variety, geographic location, and processing method.  The physical properties and chemical composition of the final product are all dependent on these factors and thus affect the final price of coffee in the market.  Variation can be significant and NIR spectroscopy is a proven method for determining numerous chemical and physical properties in coffee beans as well as discrimination analysis and adulteration detection.  Some of these include caffeine, color, roasting conditions, roasting degree, Arabica/Robusta ratio in ground coffee, place of origin, chemical composition of coffee grounds, and sensory properties of beverages.  NIR spectroscopy does require the use of chemometric modeling to correlate NIR spectra to parameters of interest.  There are numerous multivariate statistical methods that can be used as well as pre-processing techniques that help extract the maximum information from the NIR spectra.  Two promising methods which have recently been applied to NIR spectroscopy are Detrended Fluctuation Analysis (DFA) and Yield Stability Index (YSI).  DFA is a widely used time series data analysis tool and has been applied in multiple applications such as high-viscosity gad-liquid flows, water contaminant classification, EEG patterns associated with real and imaginary arm movements, air traffic flow analysis, and even for the analysis of NBA basketball games.  YSI was developed to measure extremities in a time series for agriculture by measuring the proportion of annual yields being reasonably close to the expected trend value within a given time period.  When applied to NIR spectroscopy of coffee spectra at different roasting levels, it should provide information about the stability of the signals.  In this study, DFA and YSI applications were introduced on NIR spectra of different coffee samples with varying roasting levels.  Fifteen different coffee samples (fourteen Arabica and one Canephora Robusta) were procured from different parts of the world for the study.  Before roasting, each sample was scanned using a FT-NIR spectrometer from 12500 cm-1 to 3800 cm-1 at 16 cm-1 resolution.  Sixteen scans were collected per reading and averaged into one spectrum.  Samples were then divided into three portions and roasted at three separate levels: light, medium, and dark. NIR spectra were then collected for all the roasted samples using the same parameters.  Various pre-processing methods were applied to the NIR spectra before analysis.  Principle Component Analysis (PCA) was first performed followed by DFA and YSI.  PCA was able to successfully show differentiation of the roasting levels after preprocessing when all samples were analyzed together.  DFA showed clear discrimination between the green unroasted samples and roasted samples but discrimination was not so clear between different roasting levels.  However, DFA was able to discriminate very well between roasting levels within the same group of samples.  This is an important distinction because DFA analyzes one spectrum at a time while PCA analyzes the entire data set at the same time.  This makes PCA disadvantageous if PCA was used for a new set of samples.  The nature of DFA makes it possible to set certain coefficients in the data set as global thresholds for determining if a sample is green, light, medium, or dark roasted.  YSI was used to show stability by higher YSI values and the light roast samples were the most stable of all roasting levels.  Additional research should focus on the application of DFA in terms of analysis on the effects of other transformation methods of the spectra and to analyze different types of samples to determine the robustness of the method.  

Processes | Free Full-Text | Application of Detrended Fluctuation Analysis and Yield Stability Index to Evaluate Near Infrared Spectra of Green and Roasted Coffee Samples | HTML (