Hacking Smart Light Switches and Other IoT Devices

Gosund SW1 circuitboard with test hook clips on serial connections

If you’ve ever had a free weekend, a desire to create a more secure smart home, and questionable judgment, you’ve come to the right place. In this post I’ll talk about how to take common IoT (Internet of Things) devices and put your own software on them.

Disclaimer: depending on the device, this exercise can range from pretty easy to drink bourbon and slam your head against the desk difficult. Oh, and there is some risk of electrocuting yourself or setting your house on fire. So everything after this point is for entertainment purposes only…

Why Hack Your IoT Devices?

Most people creating a smart home take the easy path… pick out some cheap and popular devices on Amazon, install the smartphone app to configure it, and are good to do. Why would anyone want to got through the extra effort to hack the device? There are a few good reasons:

  1. Security: With few exceptions, most smart devices require installing an app on your phone, often times from an unknown vendor and with questionable device permissions needed. The devices themselves are tiny, wifi-connected computers, and also have software that is updated by connecting to a server in some country, and installing new software on the device connected to your home network. Having a cheap device connected to your home network that requires full access the Internet to work is bad, but it is worse when that software can be changed at any time, to do whatever the person changing it wants it to do. This could turn your light switch into part of a botnet, or worse, be exploited to attack other devices on your home network. By hacking replacing the software, you create a device that works properly without ever needing access to the Internet, lowing the security risk. You can also see (and change) exactly what software the device is using.
  2. Sustainability: Since the devices require communicating with an external company for configuration and updates, when that company stops supporting the device or worse, goes out of business and turns off their servers, your device becomes useless or stuck in its current configuration forever. By hacking replacing the software, you are able to support the device even if the company ceases to exists. And by using open source software with a robust community, you will likely have very long term support.
  3. Because I Can (mu ha ha ha): Okay, this is more of a fun reason, but worth mentioning. I’ve generally been much happier with the hacked versions of my products, whether it be my Tivo, Wii, or car dashboard. Smart light switches are a relatively low-risk hack, as they are inexpensive, and I’m assuming the risk is turning it into a brick, not causing an electrical fire (I’ll update the blog if I have an update on that).

Getting Started

My adventure started with the spontaneous purchase of a Gosund Smart Light Switch. Like a gazillion IoT devices sold by name brand and random manufacturers, this switch is controlled by an ESP8266. Most of these ESP8266 devices use a turnkey software solution made by Tuya, a Chinese company powering thousands of brands from Philips to complete randos.

For security and sustainability reasons, I decided I didn’t want this switch connected to my home network, and even if I wrote complex network firewall rules to limit its access, it would need to connect to the open Internet and other devices in my house to work properly.

I did some research and found Tasmota, an open source project that replaces the software on ESP8266 or ESP8285 devices, eliminating the need for Internet access and enabling functionality that make them easier to connect to controllers like Amazon’s Alexa. The older examples required disassembling the device and soldering to hack it, which is exactly not what I wanted to do. However, more recently there was an OTA (over the air) solution that didn’t require opening a device at all, and did all of the hacking over wifi… that sounded great.

Tasmota Wifi Installation

When I tinker I like to use a computer that I can reset easily so that I don’t have to worry about an odd configuration causing problems later. I have an extra Raspberry Pi that is handy for this, and installed a clean version of the Raspberry Pi Desktop to install on an extra Micro SD card.

I installed TUYA-CONVERT, which basically creates a new wifi network that and forges the DNS (how computers translate a name like tuya.com to numbers that identify a server) to resolve to itself rather than the Tuya servers, so that when the device goes to get a software update from the mothership, it gets the Tasmota software installed instead – hacking complete.

Gosund Light Switch In Dangerous Setting
An example of poor judgment, however the red load wire is capped, as that is a not good wire to touch when the switch is on.

I started running the tuya-convert script on my Raspberry Pi and, rather than go through the full process of installing the switch in the wall, I found a standard PC power cable (C13) was the perfect size to hold the wires in place or allow testing on my desk. DO NOT DO THIS – I am showing you only as an example of what a person of questionable judgment might do. The switch powered up and on the tuya-convert console I could see it connecting and trying to get the new software! I love it when things just work.

But then, it didn’t work. While there was a lot of exciting communication happening between Raspberry Pi and the switch, ultimately the install failed. Looking at the logs, I was getting a message “could not establish sslpsk socket“, and found this open issue, New PSK format #483. Apparently, newer versions of the Tuya software require a secret key from the server to do a software update, and without the key (only known by Tuya), no new software will be accepted. So, damn… these newer devices can’t use the simple OTA update. Also, if you have older devices, do not configure them with the app it comes with if you plan on hacking, as that will update them from the OTA-friendly version to requiring the secret key.

Tasmota Serial Cable Installation

I realized I was too far down the rabbit hole to give up, so it was onto the disassembly and soldering option. The Tasmota site has a pretty good overview of how to do this, although I thought a no-solder solution would be possible, and tried to find the path that requires the least effort (yay laziness).

Gosund Light Switch Circuitboard
Gosund light switch SW5-V1.2 circuitboard, pen for scale. The connection points are the six dots towards the top, running down the right side (zoom in for labels).

Opening the switch required a Torx T5 screwdriver (tiny, star-shaped tool), and I happened to have one laying around from when I replaced my MacBook Pro battery. Looking at the circuit board, I realized that very tiny labels and contact points, combined with my declining eyesight, made this a challenge. I took a quick photo with my Pixel 4a and zoomed in to see what I needed… the serial connections on the side of the board (look for the tiny RX, TX, GND, and 3.3 labels… no, really, look). While soldering would be the most reliable connection, I was hoping test hook clips would do the job.

Since I was already using a Raspberry Pi, I didn’t need a USB serial adapter, as I could connect the Pi’s GPIO directly to the switch. Again, the Tasmota project has a page giving an example of connecting directly to the Pi. Whatever method you use, it is critical you connect with 3.3V, not 5V, and the higher voltage will likely fry the ESP8266. If you have a meter handy, check and double check the voltage. And, if you’re using the Raspberian OS, you may find /dev/ttyS0 is disabled… you will need to add enable_uart=0 to your /boot/config.txt file and reboot.

I connected the switch directly to the Raspberry Pi. There ware several things annoying about this, starting with each time the switch is connected to the 3.3V, it reboots the Pi. And since almost every command to the switch requires resetting its programming mode through a power cycle, that means rebooting the Pi frequently (fortunately it is a fast boot process).

Test hook clips connecting the Raspberry Pi to the Gosund switch worked surprisingly well.

The good news is, the test hook clips worked, which was a bit of a surprise. I added a connection from Pi ground to switch 00 (green wire in the photo), as that forces the switch to enter into programming mode at boot (it is okay to leave that connected during the hacking process, or you can detach it once it is in programming mode). I made sure everything was precariously balanced to add excitement and more opportunities for failure into the process. I was able to confirm that I entered programming mode and had access to the switch by esptool, a command line utility for accessing ESP82xx devices. Success! ūüéČ

The bad news is, other than being able to read the very basics from the switch, like the chip type, frequency, and MAC address, pretty much everything else failed. And, each successful access only worked once and then required a reboot. I was unable to upload new software to the switch. After researching a bit, the best clue I had was problems with voltage drops on homemade serial devices, and wiring directly to the Pi circuitboard seemed like it might apply. At this point I needed a drink, and went with a nice IPA.

But hey, once you’re this far down the rabbit hole, why stop? I decided to try a more traditional serial connection, using a CH340G USB to serial board.

Serial Killer Part Two

Apparently there was an issue using the Raspberry Pi directly for the serial communication as the USB to serial adapter worked perfectly. I validated the connection using esptool and then used the tasmotizer GUI, which makes it easy to backup, flash, and install new software on the switch. Many steps require rebooting the switch to proceed to the next step, but that is as simple as unplugging the USB cable and plugging it back in (even better that it isn’t triggering a reboot of the Raspberry Pi each time).

Tasmotizer and the default web interface to configure your newly-hacked switch

Once the new software is installed, there is one final reboot of the switch (don’t forget to disconnect the ground to 00 or else it boots back into programming mode). At this point the switch sets up a wifi network names tasmota[mac] where [mac] is part of the mac address. Connect to this network and point your browser to http://192.168.4.1 and you are able to configure your device. Set AP1 SSId and AP1 Password to your home wifi, click “save”, and a few seconds later your switch will be accessible from your home network.

I’ll provide the details of configuration in a follow-up post, but I used the Gosund SW1 Switch template following these instructions to import it, and turned on “Belkin WeMo” emulation to make the switch automatically discoverable by Alexa, without the need to install special apps on my phone or skills on Alexa. The configuration process and connecting to Alexa was incredibly easy and took less than 5 minutes.

Update January 2, 2020: I added a post on hacking the Gosund Smart Dimmer Switch (SW2) and the Youngzuth 2-in-1 Switch, each of which required a different technique.

If you’re curious about attempting this yourself, have questions about my sanity, or have other experiences hacking your smart devices, I’d love to hear from you – please leave a reply below!

Google I/O 2019, Some Exciting Bits that Were not Obviously Exciting

Over the last couple of days I’ve been looking at the various product announcements that came out of Google I/O 2019 and there were a couple of themes that got me pretty excited about where Google can go and how that can make pretty a positive impact on millions of people.

Creating Opportunities for People… All People

I loved the Google Lens announcements from Aparna Chennapragada because the application of the technology can make such a huge difference in people’s lives, and not just the people I typically see in wearing fleece vests and sipping cold brew coffee Silicon Valley. What was most compelling to me was the transcribing / Google Translate integration that was demonstrated, especially when combined with the processing being done on device (not cloud), and being accessible to extremely low-end ($35) devices. Visual translation was always a very cool feature and, when I was trying to figure out menus in Paris, I was happy to have the privilege of a high-end phone and data plan. Making this technology widely accessible enables breaking down barriers created by illiteracy, assisting the visually impaired, and helping human interactions in regions with language borders.

Google also announced Live Caption, where pretty much every form of video (including third party apps and live chat) can have real-time subtitles. This is also done on-device, and works offline, so it can be applied to live events, like watching a speaker at a conference. A shoutout to my friend and former colleague KR Liu for her work with Google on this project, that makes the world far more accessible to people with hearing challenges.

Also notable, Google’s Project Euphonia is making speech recognition more accessible to people with impaired speech.

Movement Towards Device vs. Cloud

The “on device” and “offline” features I mentioned (and were part of other announcements like Google Assistant improvements) are important because of the implications they have in making the technology available to everyone, and also because of the personal privacy that capability will enable.

Of course, my data, Google’s access to it, and personal privacy is a much larger, complicated conversation… for now I am going to focus on possibilities, not challenges.

For years there has been a move for all aspects of people’s lives to be captured and collected in the cloud. There are many reasons this may have been necessary, from correlating data to make it useful, raw computer processing power requirements, over-reaching policies, and business models requiring all the things to win. Once in the cloud, personal information can be used for purposes never imagined by the consumer, including detailed profiling, sharing with third parties, accidentally leaking to malicious parties, revealing personal content, and various other exploitations that can negatively impact the consumer.

As the processing stays on your device and does not require transferring data off of your device, it enables products that can still provide incredible benefits while also being respectful of customer privacy. This is exciting as there are product opportunities in areas like personal health (physical and mental) that will likely require deep trust and protection of consumer information to gain wide acceptance and benefit the most people.

Personal Assistant of My Dreams

And something I am more selfishly excited about…

For several years I wished that all of the products in Google would integrate with each other and eliminate almost every manual step I have to organizing my day. I am going to side-step the discussion about how much data a company has about an individual and say that I intentionally choose to trust my information with two companies (Google being one), because of the value I get from them. I use Google to organize most aspects of my life, from email communication to coordinating my kid’s schedules, video conferencing, travel planning, finding my way around anywhere, and almost every form of document. As a result, all the parts of Google know a lot about me. But still, when I send an email to setup a meeting, I usually need to manually add that to my calendar and then I also need to add in the travel details (I frequently take trains instead of driving)… it’s a couple of extra minutes that I could be spending on better things, or just looking at pictures of cats on the Internet.

With the progress of Google Assistant and Google Duplex, I am seeing a path where administrivia is eliminated, where email, text messages, phone calls and video conferencing can also provide inputs that guide this assistant into organizing my life behind the scenes… Action items discussed in a Hangout can automatically result in a summary document, a coordinated follow-up lunch, optimal travel details, and a task list.

There is an obvious contradiction between my excitement for the announcements that emphasize better human outcomes and my “let Google know all the things” excitement over a personal assistant, but again, this is about my personal, intentional choice to share data vs. products that mandate supplying personal data, often far in excess of what is necessary to deliver the product or service.

There were some other “that’s cool” announcements, and I’ll probably be buying a Pixel 3a, which seems like a great deal for the feature set, but overall I’m more excited about the direction than the specific products showcased.

Hinder, Don’t Halt: Griefing Content Thieves for Fun and Profit

The art of deterring content theft is an ongoing game of cat and mouse – generally any barrier you create to prevent theft is temporary, as thieves continue to find new ways to steal the content, so long as the value of the content exceeds the effort necessary to steal it. For this reason, it can often be more effective to hinder thieves instead of trying to stop them.

I encounter this “hinder don’t halt” pattern with others that run large services, and you can see this reflected in solutions like shadow banning. One of the most common themes I hear is the satisfaction that comes from solutions that cause frustration for bad actors, so I’m sharing one from my personal experiences…

At IMVU, customers called Creators make content that they sell to other IMVU customers. The content they create is 3D items like avatar clothing, items to decorate an environment, and ways to customize an avatar. This content creates real value for other IMVU customers, who spend real money to purchase it from the catalog of over 10 million items. While many Creators create content just for the enjoyment of creating, some do it as a business, with a few making over $100K US annually. Whether creating for pleasure or business, all Creators hated having their work stolen. And, since there is real money from the sales of content, there is real incentive for thieves to try to steal it.

At one point we discovered a site that was selling a service that would allow people to steal Creator content without paying for it. It was pretty easy to detect the service and the initial response was blocking them, which immediately broke their service completely and, not surprisingly, made the thieves quickly respond by finding a new way around the block. The block lasted less than a day and the thieves were back in business.

The next response was more fun… rather than blocking the thieves, we made their service not work… sometimes… and inconsistently. Code was added to detect thieves accessing content and randomly some content being accessed would be mildly corrupted. The corruption could be configured to occur at certain rates, on certain items, at certain times of day, and be disabled based on what appeared to be testing for the corruption. As a result, customers of the thieves started getting inconsistent results, that would sometimes lead to content failing to load and even crashes. If you are an engineer reading this, you understand why this is a nightmare scenario to debug and fix… customers are reporting different failure cases with no consistent way of reproducing the problem to understand the cause. And, since your code is working fine, the bug isn’t going to be found… you eventually have to discover that you are being served different content than is being served to legitimate customers.

The result of hindering was much more effective than blocking… it took many weeks for the thieves to understand what was happening and, during this time, we could see them getting bashed by the people that paid them because the stolen content was ruining their experience. By the time the thieves had found another solution, they had such a bad reputation that people were less willing to give them money.

If you have dealt with content thieves I would be interested in hearing your stories, successful or not. Please leave a reply, below!

Credits
Cat and mouse chase image by Jeroen Moes
Dungeons & Dragons dice by Lydia

Scaling Continuous Delivery: Happiness as a Metric

A few days ago Jeff Atwood¬†(Coding Horror) suggested a good measure of a tech company’s health is the time it takes to have a simple change become available to customers:

And while there are numerous metrics that determine the health of a tech company (see Jez Humble’s book,¬†Accelerate, for an amazingly comprehensive overview), Continuous Delivery strongly correlates to successful outcomes.

I support Jeff’s assertion – I witnessed the value created by¬†Continuous Delivery at IMVU, where we pioneered some of the crazy processes that would be followed by more sane practitioners. From day one, IMVU placed value on the speed of product iteration and “designed” build systems accordingly. In 2006, development was done in Windows using Reactor Server to provide the LAMP-ish stack, and the deploy process looked something like this:

 svn-server$ rcp website/* production:/var/www/

If you’re wondering why I omitted the test framework, I didn’t. Code went from a local Windows sandbox environment, to source control, to live on a Linux environment running a version of PHP different than the local sandbox. Fun! While there were numerous problems with this system, iteration velocity was amazing (those 1 word copy changes could ship in less than 5 minutes, and so could full features). This development velocity was a key component enabling IMVU to build a large, successful business in a space where the failed companies outnumber survivors 50 to 1.

And to be clear, time to get a change live to customers doesn’t in itself indicate healthy tech, but a lot of tech health comes from the corresponding systems necessary to make rapid deployment work.

Fast forward a few years to 2008 and I transition from leading the operations team to leading the engineering organization, where the build and deploy systems had matured, with reasonable test coverage, and automated deployment, with automated rollbacks when something unfortunate made it into production. It was pretty cool, even though publicly the process was mostly received with the sentiment, “that will never work , and certainly won’t scale”.

Scaling Problems

One of my first challenges as the new engineering leader was a team unhappy about their ability to get work done because it was taking hours for a commit to become live to customers. It’s astonishing when you think about it – every engineer in the company had come from companies where the commit to live process was typically measured in months, but once they experienced the value of¬†Continuous Delivery, anything more than minutes seemed unbearable.

Digging into the problem, I came to understand that the problem was not slow builds (although that was part of it), the most significant issues were caused from the shared responsibility for build systems, combined with the desire to deliver features to customers, created a tragedy of the commons. When an engineer had a failure in the build system, the optimal solution for that engineer was to fix the problem in place, blocking the build system for anybody else in the queue, which meant the number of commits in the next build increased, which meant the chance of a failure in the that build increased, ad infinitum. The result was pushing to production could be blocked for hours, sometimes most of the work day.

Solving for Happiness

I thought the best solution was to formalize a project, have a clear success outcome, and have a single person with the responsibility for (and therefore authority over) the build / deploy systems. The first problem was determining a clear success criteria… anything time metrics I chose would be somewhat arbitrary, so instead I chose engineering happiness as the success criteria, or more specifically, pushing to production was no longer causing unhappiness. While I generally hate subjective¬†success criteria, there were ways to assess progress through 1:1 conversations and¬†Likert scale surveys. We also had great (highly objective) data around¬†commit to deploy times, so we could see the correlation to the more subjective happiness index.

There was some pretty straightforward work to improve the actual test and deploy speeds, including simple things like adding more hardware and the slightly less simple sorting tests to run by speed (a surprisingly large performance gain), and fixing the slowest of the tests. But some of the most important gains came from the human parts of the deployment system… engineers were required to immediately revert code and fix the issue in their sandbox rather than blocking the build system. This was not a popular policy change as immediately engineers experienced the direct impact from a failed commit, but didn’t immediately see any gains to the overall system.¬† But after a few weeks the improvements were clear in the average commit to deploy time. And giving credit where it is due, Eric Prestemon¬†was the “Buildbot Sheriff” that identified so many of the opportunities for improvement and delivered the results… many people helped, but Eric had the burden of hearing a lot of critical feedback about unpopular policy changes (eventually outweighed by the praise for the results he produced).

Eventually the build system frustration ceased being a common topic in 1:1 meetings, and it faded away as a meaningful problem in engineering surveys. 12 minutes. When the commit to live time is 12 minutes, this system is operating well. That became the new value for alerting Рunder 12 minutes, all is good, after that we need to actively drive improvements. In practice, deploy time was usually around 11 minutes, 8 for parallel test builds/runs and 3 minutes for rollout checks (thanks for the reminder, @jwatte).

Diminishing Returns

I have been asked why we didn’t try to make the build and deploy systems as fast as possible… why not 2 minutes? We constantly worked on optimizing these systems, adding separate hypothesis builds, automatically isolating build servers to allow diagnosing and fixing without blocking, etc. And sometimes deployment would take less than 9 minutes.

However, much like the difference between 99.99% and 99.999% uptime for a service, the difference to the customer can be negligible while the resources necessary to deliver that improvement can be extraordinary. When business requirements are being met and engineering is happy with deploy times, the resources necessary to dramatically improve were better spent delivering value to customers.

Key Takeaways

  1. Working in a (well functioning) Continuous Delivery environment is empowering, naturally encourages other strong technical practices, and is hard to retreat from once experienced.
  2. Certain problems fall into what I call the “roommates and dishes” category, where “it’s everybody’s responsibility” sounds good, but in practice actually means “it’s nobody’s responsibility”. In these cases it is better to find a results-driven person and ensure they have responsibility and corresponding authority.
  3. Hire Eric Prestemon or somebody like him.

 

Have you worked in a¬†Continuous Delivery environment and experienced non-obvious scaling challenges? I’d like to hear about your experience – please leave a comment!