How RMP Map processes, deduplicates, and displays EPA Risk Management Program data.
The data displayed on RMP Map originates from the U.S. EPA Risk Management Program database and follows this chain of custody:
We import the Data Liberation Project's SQLite database into PostgreSQL with minimal transformation:
We maintain a separate facility_geocoding table that validates facility-reported coordinates against state bounding boxes. This table is additive—it doesn't modify source data, just supplements it. Coordinates that don't match the facility's stated location are flagged for address-based geocoding.
RMP Map displays only what we receive. The data shown here is not guaranteed to be comprehensive. Facilities may have incomplete records, missing information, or reporting errors in their original RMP submissions to the EPA.
| Type of Issue | Who to Contact | Examples |
|---|---|---|
| Our calculations or display | rmpmap@drexel.edu | Wrong accident counts, duplicate facilities showing, incorrect deduplication, map display errors |
| Source data quality | U.S. EPA RMP Program | Missing facilities, wrong addresses, incorrect chemical lists, missing accidents, outdated information |
Certain RMP data fields are withheld from public release because facilities may claim them as Confidential Business Information (CBI) under 40 CFR Part 2. In the public dataset we receive, the following fields are entirely empty across all facilities:
The database tables for these scenarios exist and contain row-level linkages to regulated chemicals, but the key impact fields (distance, quantity, population) are null across all 237,000+ rows. This means we cannot display worst-case or alternative release scenario details, even though the RMP regulation requires facilities to report them. The data is collected by EPA but not included in the public data release.
If you need worst-case scenario data for a specific facility, you may be able to obtain it through a FOIA request to EPA or by contacting the facility's Local Emergency Planning Committee (LEPC).
We do not modify source data. If a facility's information appears incorrect in our system, it's almost certainly incorrect in the EPA's database as well. We faithfully reproduce what we receive; corrections must be made at the source.
We're happy to hear about potential data issues so we can verify whether they originate in our processing or in the source data itself. Contact us at rmpmap@drexel.edu and we'll investigate.
The EPA RMP database has a key structural characteristic that can create apparent duplicates if not handled correctly.
| ID Type | Description | Behavior |
|---|---|---|
FacilityID | Internal submission ID | New ID assigned with every RMP submission |
EPAFacilityID | Canonical facility identifier | Stable across submissions for the same physical facility |
When a facility submits a new RMP (every 5 years, or when changes occur), it receives a new FacilityID but retains its same EPAFacilityID.
Each RMP submission includes the facility's complete accident history. This means the same accident appears in multiple rows with different AccidentHistoryID values (one per submission).
Why This Matters
If we naively counted accidents by AccidentHistoryID: 3 submissions × 3 accidents = 9 accident records (wrong!)
If we deduplicate by the actual accident event: 3 unique accidents = 3 accidents (correct)
We apply deduplication logic at display time, not at import. This preserves the source data while presenting a coherent view to users.
Throughout the application, we treat EPAFacilityID as the "real" facility identifier:
EPAFacilityID: /facilities/100000193471EPAFacilityID valuesEPAFacilityID When displaying facility information, we use PostgreSQL's DISTINCT ON to select the most recent submission:
SELECT DISTINCT ON ("EPAFacilityID") "FacilityName", "FacilityStr1", "FacilityCity", ... FROM tbls1facilities WHERE "EPAFacilityID" = '100000193471' ORDER BY "EPAFacilityID", "ReceiptDate" DESC
This returns one row per facility—the most recently submitted data.
Since the same accident appears in multiple submissions, we identify unique accidents by their date and time:
-- Get unique accidents for a facility SELECT DISTINCT ON ("AccidentDate", "AccidentTime") "AccidentHistoryID", "AccidentDate", "AccidentTime", ... FROM tbls6accidenthistory WHERE "FacilityID" IN (all FacilityIDs for this EPAFacilityID) ORDER BY "AccidentDate" DESC, "AccidentTime"
We use (AccidentDate, AccidentTime) as the deduplication key because:
Note: AccidentHistoryID is not a good deduplication key because each submission generates new IDs for the same historical accidents.
| Entity | Raw Database | Display Logic | Dedup Key |
|---|---|---|---|
| Facilities | Multiple rows per physical facility | Show most recent submission | EPAFacilityID + latest ReceiptDate |
| Accidents | Same accident in multiple submissions | Count/show unique events | (AccidentDate, AccidentTime) per facility |
| Chemicals | Listed per submission | Show from most recent submission | N/A (inherited from facility dedup) |
EPAFacilityID)EPAFacilityID over FacilityID, date/time over AccidentHistoryIDThis approach lets us:
RMP Map provides a free, public API for programmatic access to this data. The API returns JSON and requires no authentication.
format=geojson for map-ready data/api/search — Search facilities with filters/api/facilities/:id — Get facility details by EPAFacilityID/api/accidents/:id — Get accident details/api/facilities/geo — All facilities as GeoJSONFor complete endpoint documentation, parameters, and examples, see the full API documentation.
All API responses include a _meta block with data provenance information:
{
"_meta": {
"source": "U.S. EPA Risk Management Program via Data Liberation Project",
"license": "CC BY-SA 4.0",
"disclaimer": "Data as reported by facilities to EPA"
}
}See API Terms of Use for rate limits and attribution requirements.