Painless Data Augmentation with BigQuery | by Austin Poor | Jan, 2021


Quickly Augmenting Your Datasets with BigQuery Public Data

Photo by Lukas Blazek on Unsplash

Google Cloud’s BigQuery is a handy gizmo for information scientists to temporarily and simply increase their datasets with exterior information. Specifically, BigQuery has a list of public datasets from quite a lot of other assets. All you want is a Google Cloud account and a few elementary SQL wisdom.

Here are only some helpful public datasets:

I feel one of the helpful a few of the BigQuery public datasets is the USA Census ACS information, which provides multi-year information damaged down geographically (by state, zip code, county, and many others.).

It has so much of nice demographic data like inhabitants (damaged down by age, race, gender, marital standing, and many others.), schooling ranges, employment, source of revenue, and a lot more.

For instance, say I sought after to question the overall inhabitants and median family source of revenue for 3 zip codes within the NYC space. There’s a desk referred to as zip_codes_2018_5yr that provides a 5-year estimate of census information for the 12 months 2018, damaged down by zip code.

Here’s what my question will appear to be:

geo_id, -- Where geo_id is the zip code
geo_id in ("11377","11101","10708");

And I will be able to run it within the BigQuery UI…

Screenshot of the BigQuery UI

And get the next effects…

Viewing question leads to the BigQuery UI

Great! I were given my solution in 0.four seconds and now I will be able to return and make bigger my question to get this information for more than one years. Or, I will be able to export the consequences to a CSV or JSON report to enroll in it up with my information.

Screenshot appearing export choices for BigQuery effects

Finally, as an advantage, you’ll be able to hook up with BigQuery thru Python with the next bundle:


Please enter your comment!
Please enter your name here