Computational Physics, Inc. (CPI) Data Center
The archiving, documenting, and distributing of scientific data is an important emerging need in the scientific community because of the sheer volume and increasing complexity of data being collected, with this particularly being the case for remotely sensed data from satellites. For example, NASA's Earth Observing System Data and Information System archive, the agency's vault of observational data from satellites and terrestrial sources, has grown from 100 terabytes in 2000 to 4600 terabytes in 2010 [Space News, 2012; USGCRP, 2011]. There is a pressing need for effectively acquiring and distributing remotely sensed data so it can be efficiently and rapidly distributed to scientists for analysis to extract trends in the Earth's weather and climate, as well as for data assimilation into numerical weather prediction models to improve the accuracy of weather forecasts. A data center provides its user community with remotely-sensed data products, data information, user services, and tools unique to its particular science, to effectively achieve such goals. The organization hosting the data center provides the invaluable function of operational data management and user services by performing such tasks as data ingest and storage, filling user orders, answering inquiries, monitoring user comments, and providing referrals to other data centers.
CPI has experience in supporting a data center to provide such services to scientific organizations with data set hosting and query capabilities as well as web based collaboration. CPI's data center has supported the distribution of WindSat remotely-sensed data products for the Naval Research Laboratory (NRL) since May 2008, distributing WindSat data products for ocean surface wind, sea surface temperature, total precipitable water, integrated cloud liquid water, and rain rate over the ocean. CPI's software and hardware infrastructure can be easily extended to include data products of other scientific organizations that are interested in outsourcing their data, without undertaking the substantial investment in manpower and resources to establish, maintain, and service a data center.
The CPI data center is built on the Model-Viewer-Controller (MVC) framework as implemented in Ruby-on-Rails (RoR) for database driven websites. The open source PostgreSQL database provides the mechanism for storing large scientific dataset metadata that can be queried by the model. The controller handles all the web server interaction and the view generates the HTML code that is actually displayed in the browser. An Apache Server acts as the load balancer distributing incoming HTTP requests to multiple Thin application servers running on distinct ports. Each Thin server is responsible for hosting a RoR application, currently supporting WindSat data sets. In addition, ingest scripts written in the Ruby programming language automate the loading of scientific data sets into the database. Dataset files are archived in the Linux file system. Data files that are returned as a query results set can be staged for https or sftp pull by the data center user. The entire dataset can be ordered and delivered to a user on hard drive media.
More details on these capabilities can be found in the Data Center brochure.
Leone, D. (2012), NRC Report Re-emphasizes NOAA Warning on Weather Satellite Gap, Space News, 23(2), 14.