Abstract Statistical models built using different data sources and methods can exhibit conflicting patterns. We used the northern stock of black sea bass (Centropristis striata) as a case study to assess the impacts of using different fisheries data sources and laboratory‐derived physiological metrics in the development of thermal habitat models for marine fishes. We constructed thermal habitat models using generalized additive models (GAMs) based on various fisheries datasets as input, including the NOAA Northeast Fisheries Science Center (NEFSC) bottom trawl surveys, various inshore fisheries‐independent trawl surveys (state waters), NEFSC fisheries‐dependent observer data, and laboratory‐based physiological metrics. We compared each model's GAM response curve and coupled them to historical ocean conditions in the U.S. Northeast Shelf using bias‐corrected ocean temperature output from a regional ocean model. Thermal habitat models based on shelf‐wide data (NEFSC fisheries‐dependent observer data and fisheries‐independent spring and fall surveys) explained the most variation in black sea bass presence/absence data at ~15% deviance explained. Models based on a narrower range of sampled thermal habitat from inshore survey data in the Northeast Area Monitoring and Assessment Program (NEAMAP) and the geographically isolated Long Island Sound data performed poorly. All models had similar lower thermal limits around 8.5℃, but thermal optima, when present, ranged from 16.7 to 24.8℃. The GAMs could reliably predict habitat from years excluded from model training, but due to strong seasonal temperature fluctuations in the region, could not be used to predict habitat in seasons excluded from training. We conclude that survey data source can greatly impact development and interpretation of thermal habitat models for marine fishes. We suggest that model development be based on data sources that sample the widest range of ocean temperature and physical habitat throughout multiple seasons when possible, and encourage thorough consideration of how data gaps may influence model uncertainty.