Machine Learning for Colorimetric Analysis of Saliva-Alcohol Test Strips
By Euiwon Bae, Purdue University
Saliva-alcohol test strips provide a rapid, inexpensive way to assess blood alcohol concentrations. Treated with an enzyme that reacts with methyl, ethyl, and allyl alcohols, the strips turn shades of green and blue when in contact with a solution containing alcohol. They cost only a few dollars, and yield results within minutes.
One drawback of the strips is that they require subjective evaluation. The user must compare the shade of green-blue on the reactive strip to a color chart of shades corresponding to blood alcohol levels (Figure 1). The final determination is heavily dependent on the ambient lighting, as well as on the judgment of the individual conducting the test.
My research group at Purdue University has developed a system that eliminates variability and subjectivity in the analysis of colorimetric strips. Our system makes use of smartphone features—including the camera, flash, and processing power—and a hardware attachment that we designed to hold the strip in place and illuminate it consistently (see sidebar). We developed software in MATLAB® that analyzes the color space composition of images captured with the smartphone camera and employs machine learning algorithms to classify each image and identify the corresponding blood alcohol level.
We chose MATLAB for this work because it provided the image processing capabilities we needed for color analysis—including cropping and color space conversion—as well as the statistical and machine learning features we needed for classification. The ability to test new ideas and see the results immediately in MATLAB, instead of digging into C code each time, enabled us to rapidly explore a variety of analysis options and identify the best one.
A Smartphone Attachment for Colorimetric Imaging
To ensure the consistency of the images captured with the smartphone camera, we designed a small chamber that illuminates the test strips and keeps them at a predetermined lateral (XY) position and distance (Z) from the camera lens (Figure 2). The chamber includes a reflector and diffuser to provide uniform lighting from the camera’s LED flash, as well as a plano–convex lens to enable the camera to capture images of the test strips at close range.
Analyzing Experimental Results
We began by photographing test strips that had been exposed to five precisely measured solutions with varying alcohol concentrations. These concentrations—0.0%, 0.02%, 0.04%, 0.08%, and 0.30%—correspond to the five concentrations pictured on the color chart provided by the test strip manufacturer (Figure 1).
We imported the JPEG photos from the smartphone into MATLAB and used Image Processing Toolbox™ to crop them to produce 120x120 pixel images of the central, colored area of each photo. Working in MATLAB, we examined the images using two color schemes: red–green–blue (RGB) and hue–saturation–value (HSV). We used the MATLAB function rgb2hsv
to convert the smartphone’s native RGB images to HSV.
We tried two different approaches to identify the alcohol concentration via the image. First, we averaged the spatial intensity of each separate color channel, but found that this method could not distinguish alcohol blood levels above 0.2%. Next, we plotted the color histograms of each channel’s intensity in MATLAB for the five standard alcohol concentrations (Figure 3). The green, blue, and value channel histograms showed sufficient inter-peak distance among the five concentrations to enable us to infer the concentration of unknown samples from their mean peak values.
Initial Results and the First-Generation App
Based on the histograms for the green channel and the value channel of the test strip image, we developed an inverse algorithm in MATLAB that estimates blood alcohol concentration. After verifying that this algorithm could accurately identify the five standard concentrations, we added four intermediate concentrations (0.01%, 0.03%, 0.06%, and 0.15%) to our test sample set. The algorithm still worked for most samples, but at this level of resolution had a misclassification rate of 17%–25%.
We translated our MATLAB algorithms into Java® and packaged them in an Android™ app (Figure 4). Like the original MATLAB algorithms, the app is capable of identifying each of the five standard concentrations with 100% accuracy. It includes a database that stores the date, image, RGB values, estimated concentration, and geographical location for each sample tested.
Adding New Color Spaces and Machine Learning Classification
After documenting the results of our initial research and first app, our group was eager to continue exploring new smartphone-based colorimetric analysis methods. We had three ideas in particular that we wanted to test. First, we wanted to see if the YUV and CIE 1976 L*a*b* (Lab) color spaces offered any improvement over the RGB and HSV color space we had already tried. Second, we wanted to know whether machine learning algorithms would improve classification of minute changes in color variations. Third, we wanted to re-architect the software to make it scalable to a large number of users.
We developed the algorithms for the next generation of our software in MATLAB, again starting with images of test strips that had been exposed to the five standard alcohol concentrations. This time, we used the rgb2lab
function in Image Processing Toolbox to convert the native RGB images to the Lab color space and developed our own MATLAB function for converting to the YUV color space.
We separated the channels and plotted a histogram of each channel’s intensity for the five alcohol concentrations (Figure 3). As before, we found that the green and blue channels of RGB and the value channel of HSV showed good peak separation. We also found that the brightness (Y) and lightness (L) channels of YUV and Lab, respectively, showed relatively little overlap and thus could be used to discriminate between alcohol concentrations.
Next, we explored the use of machine learning algorithms to estimate blood alcohol levels based on color channel intensity data extracted from each image. We used Statistics and Machine Learning Toolbox™ to test three classification models: linear discriminant analysis (LDA), support vector machine (SVM), and an artificial neural network (ANN). We tried each one across data from all four color spaces. For the five standard concentration samples, all three methods provided positive predictive values (PPV) above 95%, with SVM and ANN close to 100% for all color spaces. Overall, we found the ANN model provided the highest PPVs with the Lab color space; it was even capable of identifying non-standard blood alcohol concentrations (such as 0.01% and 0.03%) that required finer resolution.
Our first-generation Android app had a few drawbacks. Only the smartphone owner had access to the data, and memory storage constraints on the smartphone limited the number of images that could be collected. In addition, any new algorithms that we developed in MATLAB had to be validated on multiple smartphone platforms, and then the app had to be reinstalled on each device. To address these shortcomings, we modified the app to send the captured image via HTTP to a host running MATLAB and a web server constructed with Python® Flask.
Because all image processing and colorimetric analysis is now performed on the server, we can easily update the algorithms without changing the Android app itself. We updated the MATLAB code to save results to an SQL database, making it easy to perform large-scale tracking of all strip test results.
Extending the Technology to Food Safety
We are currently extending the use of smartphone-based colorimetric analysis to other areas, including food safety. Our research colleagues at Purdue are developing colorimetric pads that change color when exposed to pathogenic bacteria, providing an inexpensive way for food inspectors to identify tainted products.
As with saliva-alcohol strip tests, current tests using these pads rely on human observation and interpretation of the color change. Our immediate goal is to apply our same approach and MATLAB algorithms to provide objective interpretation of the pathogenic bacteria pads. Ultimately, we plan to further develop our technology to detect color changes indistinguishable by humans, which will enable researchers to make the colorimetric pads orders of magnitude more sensitive than they are today.
This material is based upon work supported by the U.S. Department of Agriculture, Agricultural Research Service, under Project No. 8072-42000-077. Any opinions, findings, conclusion, or recommendations expressed in this publication are those of the authors and do not necessarily reflect the view of the U.S. Department of Agriculture.
Published 2017 - 93155v00