The Databricks workspace packs so many features into their platform user interface it should be a case study on solid product development most other SaaS platforms should learn from. One feature to highlight regarding workspace defaults is setting the default for your Databricks unity catalog.
Not too long ago the hive_metastore was the default area for working queries, landing data etc. With the intro and focus on the unity catalog, newer Datbricks accounts/clusters are setup with a unity catalog right away. And Databricks makes that the default typically.
But what if you want or need to change the default catalog for your workspace?
Here’s how you can set the default unity catalog:
- Navigate to your workspace
- In the upper right hand corner of the platform click your user icon
- Select “Settings” from the list of options
- In the resulting Settings page, select the Advanced link from the left menu
- Scroll to the “Other” section in the main area to see the Default catalog for the workspace area
- This is where you will enter your other catalog, in the field available,
- Click Save when readyl
As the instructions state, once you click the Save button, in order for the setting to apply you will need to restart any compute (SQL Warehouses or Clusters). :
Setting the default catalog for the workspace determines the catalog that is used when queries do not reference a fully qualified 3 level name. For example, if the default catalog is set to ‘retail_prod’ then a query ‘SELECT * FROM myTable’ would reference the object ‘retail_prod.default.myTable’ (the schema ‘default’ is always assumed).
If the default catalog is in Unity Catalog (set to any value other than ‘hive_metastore’ or ‘spark_catalog’), MLflow client code that reads or writes models will target that catalog by default. Otherwise, models will be written to and read from the workspace model registry
Creating new registered models in workspace model registry is disabled if the default catalog is in Unity Catalog (set to any value other than ‘hive_metastore’ or ‘spark_catalog’)
This setting requires a restart of clusters and SQL warehouses to take effect. Additionally, this setting only applies to Unity Catalog compatible compute i.e. when the workspace has an assigned Unity Catalog metastore, and the cluster is in access mode ‘Shared’ or ‘Single User’, or in SQL warehouses.
https://docs.databricks.com/aws/en/catalogs/default
Basically from the text summary provided, be sure to consider the fully qualified domain name (FQDN) of the three level name to reach your table in a general query or reference. And, there’s an impact for newer Databricks features that require the unity catalog instead of the legacy hive_metastore. Lastly, as mentioned above, updating the default catalog requires a restart of the compute.
Hopefully that helps as you start making more use of your Databricks environments.
If you’d like to dig deeper on this topic for the nuances, take a look at the Databricks Manage the default catalog page.