Welcome to episode five of the “Building on the Awesense Platform” series. This is a multi-part series, where we showcase the power of the Awesense Digital Energy Platform, the Awesense Data Engine, and the Awesense Open Energy Data Model. In this series we show you how these features can be used to rapidly build, test, and prototype energy-focused applications and analytics using a variety of software tools and techniques. In this episode specifically, we will focus on data science notebooks like Apache Zeppelin and Jupyter Notebook.
Starting on the fifth article of our series may not be the best approach to engaging with this series. Click here to check out some of our earlier content in this series and work your way up to this article. This series is meant to help you build not just applications, but also to gradually demonstrate different techniques for application design from least to most complex. If you are in the right place, please continue.
What are Data Science Notebooks, and why do we use them?
Data Scientists use a lot of tools to interpret and analyze data in various ways. Data science notebooks are one of the most essential tools in a data scientist’s arsenal when it comes to finding ways in which they can easily track their progress. Notebooks allow data scientists to write code, design custom applications, and generate interesting and impactful visualizations of their data. They give data scientists the ability to interact with their code, change portions of it, and analyze where errors may arise. This allows for an optimal amount of customization while maintaining a robust structure throughout the code.
Notebooks are commonly used to design more custom and niche applications which may require more love and care than a typical method may allow for. This means that each notebook gets built from scratch or at least a simple template, and the data scientist can then embed their personality and vision, right into the code. This can then be used to share their results with team members and key decision-makers, as well as used to execute critical decision-making from the results that arise. Overall, notebooks allow for greater flexibility in building focused applications, leading to the best outcomes.
Because of their ease of use and increasing availability, many other groups of people are beginning to experiment with notebooks in order to achieve their data-driven goals. This includes people in roles such as that of a business analyst, engineers, and really any other role that requires someone to work with data.
Notebooks outline the steps and code required to reach a conclusion, because of this, it allows for a story to be created using the data provided. This storytelling is a powerful way in which we can connect the gap between our understanding of the world and the data we use to make decisions about it. Like most stories told by data, a certain genre or structure is put in place to guide it along so that we can make proper sense of it.
Step one involves defining a data set. This is important because it allows us to discover which source of data we are using, preparing, and relying on to make a decision. Step two is the cleansing and preparation of that data, but no need to worry about that yet, the Awesense Data Engine will take care of that part for you! Step three involves your creative side- coming up with some models and schemas you think would be best suited for the data in order to tell that Cinderella story to your own Prince Charming (or a crucial decision-maker in this case)! Lastly, step four involves interpreting the results of the story you told. This is where you get to decide on whether it’s a happy ending or not. By being able to parse through every line of the story you’ve written, and see how the interactive interpretation leads to a natural conclusion, you can be sure that the glass slipper will fit.
Notebooks have received a lot of attention recently because of their ability to help tell a story in a way that matches the data and the intention of the user. Though many tools exist in this space such as Deepnote, Polynote, Mode, Google Colab, and others, Jupyter Notebook and Apache Zeppelin are the two notebooking tools we will focus on below.
“Now that she’s back in the atmosphere, with drops of Jupiter in her hair… She acts like summer and walks like rain, reminds me that there’s a time to change…” – Drops of Jupiter, Train (2009)
With some beautiful song lyrics to start us off, we can’t imagine a better introduction to our most beloved notebooking tool, Jupyter Notebook. As a technology company, we spend a good portion of our free time romantically engaging with the future of electric vehicles and incredible data science tools, much of which serves to remind us that there is time to change. Jupyter Notebook (formerly iPython) is by far, the most popular notebooking tool in use today. With support for almost 50 programming languages including Python and R, there’s pretty much not a thing we can’t change about the old way we used to work with data.
Though limited in certain capacities, third-party tools are easy to integrate and use to achieve any desired result. We promise, we’re not trying to advertise for Jupyter Notebook, we just really really like them!
Why We Love Jupyter as a Notebooking Tool
Jupyter Notebooks integrate easily with open source platforms used by many data enthusiasts such as Github. This allows for a clear, easy display, and an easy-to-use interface. Jupyter Notebooks operate through a basic IPython Kernel. This means that it does not allow for switching between programming languages between cells, but some third-party tools enable this functionality if it is needed.
Some spin-off features exist such as Jupyter Lab, this means that there are a range of options for your preferences and what you plan on using Jupyter Notebooks to do. This offers a large amount of flexibility in working with preferred data and formats, meaning that anyone can use Jupyter Notebooks to create what they have in mind.
Another reason we love working with Jupyter Notebooks is that they can easily be turned into a standalone data application using frameworks like Voila or Panel. These frameworks and the whole world of embedded analytics is a topic we’ll explore more in-depth in future articles in this series (stay tuned!).
Were you expecting Led Zeppelin lyrics to start us off? Not quite this time. If Jupyter Notebooks take gold, Apache Zeppelin is always right behind, bagging silver on the podium. As the second most popular Notebooking tool out there, Zeppelin supports four primary languages namely Scala (Spark Scala), Python (PySpark), R (SparkR), and SQL (SparkSQL).
Why We Love Apache Zeppelin as a Notebooking Tool
Apache Zeppelin has some incredible integrations features, specifically, its in-built integration with Apache Spark. This feature allows users to pull from databases using a Java API. One reason we love Apache Spark, an adjacent tool in Apache Zeppelin, is that it is a cluster computing system that supports Spark data frames and lazy executions. These are extremely useful while working with large time-series datasets or GIS data.
Not just that though, Apache Zeppelin can be used in its entirety for data ingestion, discovery, analytics and visualization, and collaboration to share your knowledge with a wider team. Apache Zeppelin even supports multi-user support and editing.
Though Apache Zeppelin can be controlled via Github, its display functionality makes it a tad bit harder to share notebooks as it often displays them as plain text files over the cloud. That said, with browser capabilities, this is not an obstacle in the workplace the majority of the time.
A very unique feature Apache Zeppelin offers is giving users the ability to manipulate their BI dashboard. This is more tweakable than even Jupyter Notebooks, as most notebooking tools do not allow for this level of customization.
The Fun Part- How We Notebook (With Examples!)
It’s never fun being left out, and that’s how many data scientists and engineering professionals feel when they see the expansive opportunities notebooking tools offer. Because of that, we developed ways so that these data professionals can connect their preferred notebooks to the Awesense Open Energy Platform and benefit from our Open Energy Data Model!
Examples, Examples, Examples!
Below are some examples of how we use notebooking capabilities in Apache Zeppelin and Jupyter Notebook combined with the Awesense Digital Energy Platform.
We like to connect to the Awesense Open Energy Data Platform from Jupyter Notebook using two ways. Firstly, using Jupyter SQL magic, and the second is using the python PostgreSQL adapter module. Let’s start with SQL magic.
Above are two lines of code that are loading Jupyter’s SQL magic. This is all that it takes to connect to the Awesense Open Energy Platform’s database. From here we can use SQL to access the Open Energy Data Model. A simple query can be seen below, displaying a subset of tables from the database
A more complex example is demonstrated next, where the example contains a query to quantify the average current (I) from the Awesense Raptor 3 sensor for the whole month of November 2020.
A similar result can be achieved using the python modules psycopg2 and pandas. In the example below, psycopg2 is responsible for secure connection to Awesense Energy Data Model and pandas is responsible for the execution of SQL query.
Similarly, an example with the average current on an Awesense Raptor can be achieved using psycopg2 & pandas in tandem.
To connect to the Awesense Open Energy Platform from Apache Zeppelin, a new %jdbc group interpreter has to be created with the PostgreSQL driver.
Next, in the Zeppelin notebook, all that is needed is to call the %edm interpreter and to start directly interacting with the Awesense Open Energy Data Model.
To link to previous examples let’s query Awesense Raptor 3 data using %edm interpreter. The result of the query is conveniently visualized in the same cell (which is a great feature of the Apache Zeppelin).
A Final Word
Data science notebook tools are a powerful way for you to build new applications and analytics focused on solving problems in your energy system. Often the most important step is asking the right questions. Today, with so much data being generated from electrical networks, there are very few questions that can’t be answered through data. Notebooks offer a simple answer to some of these complex questions, many of which we feared to even ask prior to their inception. If you’re having trouble solving a particular problem, or if you need to visualize data in a new way – the Awesense Digital Energy Platform’s integrations with data science notebook tools could be what you’re looking for.
Next Time, on the Awesense Build Better Series…
Extract, Transform and Load (ETL) tools are up next for episode six of the Build Better series. What are ETL tools? You’ll have to wait to find out! We hope you’ve enjoyed following this Building Better with the Awesense Platform series, and we hope you continue to follow along. Stop living in the past with your old ways of tackling data. It’s time to use your data to propel you into the future, and we’re grateful that ETL Tools and Notebooks allow us to do just that.
Free For a Chat?
We love to connect with our wide audience and would love for you to share our content! Follow along with this series and let us know what ideas YOU would like to see us write about. Whether it’s more content about the topics we’ve already written on or even a specific use case or tool you would like to know more about, let us know.
If you or your team are interested in building a custom application using the Awesense Platform, or you have an analytical tool you would like us to demonstrate with our platform, please feel free to reach out to at firstname.lastname@example.org.