Skip to main content

Data Management and Analytics

DuraMAT investigates photovoltaic (PV) material degradation and durability through an ambitious data collection and analytics effort. We aggregate data from diverse sources such as device simulation, materials characterization, and time-series PV performance into one place—the DuraMAT Data Hub.

This centralized resource for PV-centric data will enable researchers to share their data with the greater research community, discover new data sets, and explore new collaborative opportunities with consortium members. For more information, visit the Data Hub.

In addition to storing heterogenous data, this capability will also actively perform PV reliability research by mining resources in the Data Hub to find new and interesting correlations.  This research will inspire the development of open-source software tools that other PV researchers can freely use in their own research projects. The software developed will be useful for researchers looking to process or clean their data, make informative visualizations, or build machine learning models.  Software developed through this capability’s research will built upon the popular and flexible Python ecosystem using libraries such as NumPy, SciPy, Pandas, Scikit-Learn, and matplotlib. 
Previous topics have included modeling optimal string length sizes, modeling climate zones relevant for solar degradation, and clear sky filtering of data. Current topics include modeling degradation of module parameters using performance data, image analysis of electroluminescence images, and developing methods to measure the thickness of antireflective coatings. Web-based versions of some tools are available on the Berkeley Lab website.

Core Objective

Central Data Resource


Lawrence Berkeley National Laboratory


This research needs collaborators who are collecting interesting, industry-relevant data but lack either resources or expertise in data analysis. Approach the team with your data and interesting questions that statistical analysis, data mining, or machine learning might be able to answer. The analytics team is open to working with data from any number of collection techniques - whether it be time-series performance data, image data, or materials properties.


Open to all researchers—academic, government, or industry.


1. Karin, T. & Jain, A. Photovoltaic String Sizing Using Site-Specific Modeling. IEEE J. Photovoltaics 10, 888–897 (2020).

2. Ellis, B. H., Deceglie, M. & Jain, A. Automatic Detection of Clear-Sky Periods From Irradiance Data. IEEE Journal of Photovoltaics 998–1005 (2019). doi:10.1109/JPHOTOV.2019.2914444

3. Karin, T., Jones, C. B. & Jain, A. Photovoltaic Degradation Climate Zones. in 2019 IEEE 46th Photovoltaic Specialists Conference (PVSC) 0687–0694 (IEEE, 2019). doi:10.1109/PVSC40753.2019.8980831 


To learn more about this project, contact Anubhav Jain.

A chart image showing cell temperatures, an image of cells, another image of six charts showing pvpro and True, and two other X-ray-like images

Left: an example of data analytics that uses production data to estimate module circuit parameters using the PVPro method based on Suns-Vmp. Right: Example of using automated image analysis to automatically process electroluminescence images and detect busbars, cracks, and power loss areas