The Uncommon Data Set
Choosing a college should not require stitching together a dozen tabs, a spreadsheet, a federal database, and a commercial search product just to answer basic questions.
The good news is that a lot of the data already exists. Colleges publish Common Data Set files. The Department of Education publishes College Scorecard and IPEDS data. The hard part is that none of those sources, by itself, gives families a simple, current, trustworthy way to browse the college landscape.
NCES is the Department of Education's statistical center, and IPEDS is its core postsecondary data system for institution-reported federal data.
The problem
Common Data Set files are excellent, but scattered. They show up on school websites in many formats, on different timetables, under different URLs, and only a minority of the 3,000+ in-scope undergraduate institutions publish a current public CDS that is easy to find.
College Scorecard is useful, especially for outcomes like net price, debt, completion, and earnings, but it is not a substitute for the richer admissions and aid details schools publish in the CDS.
IPEDS is powerful and broad, but federal releases lag the freshest school-published files, and the official tools can be hard to navigate unless you already know the survey components, table names, and Access database workflow.
And if you want enriched data, the default option has often been a proprietary vendor platform. Those tools can be useful, but they may require accounts, hide their source lineage, limit API access, or create another student-data profile along the way.
So the gap was not “does college data exist?” The gap was: where can a student, parent, counselor, journalist, or builder browse the freshest school-published facts, federal baseline data, source links, and an open API in one place?
What you can do now
collegedata.fyi is a public college-data browser built around source transparency. Search for a school, see whether we found a public CDS, inspect the original source file, read extracted fields, and compare key facts across schools without creating an account.
- Find a school's latest archived Common Data Set and download the original PDF, XLSX, DOCX, or HTML source.
- Browse extracted admissions, enrollment, test-score, aid, and academic fields across schools.
- See source-labeled federal baseline facts for schools where no public CDS is archived.
- Use academic positioning, admission strategy, merit-aid, and match list tools without sending student profile data to a server.
- Query the same data through a public REST API for spreadsheets, research, dashboards, or your own tools.
If you want starter ideas, the Recipes page has worked examples you can adapt.
What makes it different
- Fresh school-authored data first. When a current CDS exists, we treat it as the primary source for CDS-native fields.
- Federal coverage where CDS is missing. NCES/IPEDS fills in source-labeled baseline facts for institutions that do not publish a public CDS.
- Clear provenance. Values keep their source attached: CDS, IPEDS provisional/final, or Scorecard context. We do not blend them into one unlabeled number.
- Accessible tables and durable links. Public pages prioritize readable, keyboard-friendly tables and link back to the original source documents.
- Open API. The API is the same data surface the website uses, so researchers and builders do not have to scrape the site.
- Privacy by default. The core site works without accounts. Student profile tools are local-first unless a future feature explicitly says otherwise.
- Built for everyday use. Pages are designed to be fast, readable, accessible, and stable enough for families, counselors, and builders to rely on.
Structured extracts are useful, but source documents still matter. Every school-year page links back to the original file, and federal baseline rows keep enough source context to understand where a number came from and how it should be read.
Open source
The entire project is open source under the MIT license. The code, the schema, the extraction pipeline, and the archived documents are all public.
- GitHub repository
- Public API
- CDS Initiative (the original template publisher)
Credits
Built on Supabase (Postgres, Edge Functions, Storage). Extraction powered by Docling for flattened PDFs. Reducto reference extracts used as a quality benchmark. Federal baseline facts come from official NCES/IPEDS releases.
Project Sponsors
collegedata.fyi is supported by: