Tag Archives: python

Taiwan WWII Map Overlays

A while ago I came across the Formosa (Taiwan) City Plans, U.S. Army Map Service, 1944-1945 collection, in the Perry-Castañeda Library Map Collection of the University of Texas in Austin. I’m a sucker for maps, enjoy learning about history a lot, and I have a lot of interest in my current home, Taiwan – so you can call this a magic mix of cool stuff.

There are 26 maps in the collection, made by the US Army by flying over different parts of the island, and mostly I guess stitching together aerial photographs. The maps themselves were not that easy check in an image viewer, since there’s no context, zoom is clumsy, and have no idea where about half the places should be located. Instead, I thought it would be great to have them as an overlay on top of current maps and satellite imagery on Google Maps.

The result is Taiwan City Maps overlays, which does exactly that. Feel free to click the link and explore right now! In the rest of this post, I try to first show how that page was made, and also some history lessons I gained by making it.

Continue reading Taiwan WWII Map Overlays

Automating the hell out of it

Even before the 4-Hour Work Week made me more serious about this, I really enjoyed automating tasks, that benefit from not needing to remember to do, or would be troublesome to do otherwise. This frees up a lot of time, keeps a bunch of problems away, and it is actually quite fun when the information comes to me instead me going to it.

Now I have automated checking my bank account and credit card balance, updating dynamic IP of server, ebook sales numbers, and network clock synchronizing. There are some general ideas that I summarize, then give an intro to all of those scripts.

Banking script


Most of my scripts are written in bash, because it’s relatively straightforward to hammer out simple stuff, and it is surprisingly simple to do a lot of things once I have thought enough about a problem. The Advanced Bash-Scripting Guide is always on my reading list, but I usually get to check only the parts that are relevant to the given problem. You can get quite far with a few simple constructs.

The most common parts I seem to come across:

  • if-then-else constructs: if [ -f ‘directory ‘]; then echo “Found!”; fi
  • for loops: for f in *.png; do optipng $f; done
  • loading the results of a command into a variable: VAR=$(command)

For most other problems with a little keyword-fu there’s always an answer on StackOverflow or on the web.

Another group of scripts uses Python, when a bit more data-manipulation is needed, like web scraping or JSON parsing. Actually, all of the scripts could be rewritten in Python for consistency, and it would probably be be simpler too, which is something for the future.

As a general tip, most of these scripts need tweaking, and all of them are sort of alpha-beta quality code. To facilitate hacking and reduce heartache of mangled clever code, I keep everything in git repos. I share those repos online, so have to make sure there are no secrets checked in, ever. It helps to strategically use .gitignore, separate files for the secrets, and having an example how that secrets file should look in the inside.

Most of these scripts are run periodically by cron, so it is worth having some basic knowledge about how to schedule it.

Some scripts send me emails under specific circumstances (some after every run, some when new information appears), and for good delivery I have set up postfix to use Gmail as an SMTP relay. This way I’m sure to receive the emails and receive them quickly.


These are the scripts I use most often and the longest. Still, many of them are under development and adjust them whenever I learn how to do things better. I list the links to all their repos, where it can be improved.

Banking account balances

My two main bank accounts are queried once a day for available balance and I’m notified by email. Both accounts needed quite a bit of web scraping (and got them done at two different OpenHack Taipei events). The banks’ websites are pretty awfully organized (iframes within iframes within iframes; not using CSS classes and id), though it doesn’t have to be good for me, it has to be good for the bank.

Cathay United Bank

The cathaycheck (click for repo) script queries the available balance at Cathay United Bank by logging in with curl, and parsing the final page with Beautiful Soup. The script can be a skeleton for any other website where on has to log in and then navigate over a series of pages to get the information. The required HTML variable names can be extracted with the help of the Inspect Element tools in Chrome.

At the moment the credentials is stored in the crontab command, which is not really ideal, should rewrite to use a secrets file, though given that it runs on a server where I’m the only user (and root), for me there’s no practical difference at the moment. I have set it up to receive an email at the end of the day with the current balance.

ANZ Taiwan credit card

The anzcheck (click for repo) script queries my spending with the ANZ Taiwan credit card. Again bash for logging in and Beautiful Soup for parsing the final page. It needs a bit more logic extracting information from a table, because the websites developers added no classes or ids to the items to make it easier to understand – or for them to style, but that’s not my problem.

Just recently updated that it extracts the spending items added to my balance on a given day, so I can will never be caught by surprise again (hopefully). Since many of my charges go to companies that have Chinese names, I quickly run into the problem of having to tell my Heirloom Mailx (that I use to send emails on my ArchLinux box)  that the text I want to mail is plain text, not an attachment. With some hacking the solution was to add a few more commands to “mail” so it knows that the text is UTF-8. From “sendthatmail.sh” in the repo, the parameters needed are:

I could still extract some more information from the bank’s website, though nothing really urgent.

No-IP address updater

At the Taipei Hackerspace we have a handful of servers running, but the residential internet connection is provided by Chunghwa Telecom only gives us a dynamic IP address. Applying for a static IP seems to be pretty troublesome, so in the meantime I’m using a script on one of the servers to update the IP address associated with our dynamic tpehack.no-ip.biz address.

The no-ip-bash-updater (click for repo) script is forked originally from elsewhere, but I have rewritten it quite a bit so that it

  • needs no extra file to store the current IP address, but compares external IP with a DNS query
  • stores no secrets in the file

It uses a pretty straightforward API call with HTTP authentication, the only real logic in there is to check when that call actually needs to be made.

E-book sales

Recently I have helped a friend to publish an ebook version of How to Start a Business in Taiwan on Leanpub, and of course I want to know when there are any sales are made (disclaimer: I don’t get a cut of the sales, all goes to the author). The leanpubsales (click for repo) script is written in Python, because using JSON there is easier than it would be with bash. The call otherwise is quite simple, just keep an external file around to check if the sales number have increased or not, if yes then send an email. To send an email conditional on the output the the script the “ifne” command from moreutils is very useful (meaning: “if input is not empty”).

The query is run periodically, and lovely to receive the results. I will surely set up a script when I get my own book ideas published on Leanpub.

RTC correction

As a physicist in atomic physics, which is the area of science very much concerned about keeping precise time, keep all my servers’ times synchronized with network time protocol (NTP) using chrony. One difficulty is that the real-time clock (RTC) of those computers is pretty crappy and drifts away. Wouldn’t be a problem if I never restart them, but a pain if I do: after restart it can be tens of seconds away until the time is synchronized again.

Chrony can sync NTP and the RTC, but it doesn’t do that automatically, I have to trigger it manually. Instead I have written up an rtccorrect (click for repo) script that is run every 2 hours or so (could be done just once a day, actually), and eliminates the drift of the RTC.

Server backup

For backing up data between servers rsync has proven invaluable. I have a couple of scripts that do just that, though those are among my oldest ones and at that time I haven’t separated out personal information (way too easy to inline every credential, email, login, and all that), so I need to sanitize that. A couple of  ideas about these backup scripts:

  • sometimes higher transfer speed can be achieved by messing with the ssh algorithms, eg. passing “-e ‘ssh -c arcfour'” to rsync
  • more often there’s even better performance when there’s an rsync daemon running on the remote computer (though with Raspberry Pi, both cases are still frustratingly slow)
  • can exclude some files if no need to transfer them, eg: “–filter=’- *.part'”
  • using rsync not just to transfer but to mirror, the “–delete” (delete at target if doesn’t exist at origin) and “–archive” are pretty useful

For these backups I also use the Dead Man’s Snitch to know when things didn’t work out, e.g having a similar command in the cron list, where backup.sh is my script’s name, xxxxxxxx is the snitch ID from my account:

This way I got to know when my backup server was dying all the time because of bad heatsink, or my host server by flaky hosting company….


I guess there will be just more automation in the future, and maybe many of these scripts can be ported onto a common base so new ones are made much easier. What else do you guys automate?

Barometric recording of Typhoon Soulik

It all started a few weeks ago with Sparkfun having “20%-off” day, when I got myself (among other things) a BMP085 barometric pressure sensor. When it arrived, I have soldered some pins on it, and set it up with an Arduino Nano, to have the readings off it easily.

View of the circuit
BMP085 barometric pressure sensor breakout board from Sparkfun

Originally all I wanted is just some laid back pressure recording, so maybe I can use that to predict the weather a bit. “Pressure falls: bad weather comes, pressure rises: things will clear up”. I was recording for about a week, and nothing really noteworthy came out of that.

Then it was the news, that the year’s first typhoon is on the way to Taiwan, and it was supposed to be a big one. Obvious that I will try to record the barometric pressure pattern of its passing, but wanted to make it more interesting and informative. More visual than just the timeseries plot of pressures.

The Japanese Meteorological Agency (JMA) is a good place to watch for information about typhoons. They list path prediction, typhoon properties like strength, wind speeds, and central pressure, have satellite imagery. Putting these together, two days before the typhoon arrived, I set up a script to download the satellite imagery as it became available.

Satellite picture of Typhoon Soulik and location of Taiwan on 2013-07-12 morning
The morning before the typhoon arrived

The JMA publishes usually 2 satellite images in an hour for our North Western Quadrant (at :00 and :30), one of them covers the whole area, the other covers just the top 80% or so, leaving a dark band on the bottom. Nevertheless, matching up the pressure reading with the satellite pictures would be a good little project for this time.

Friday came, the government gave the afternoon off, though it turned out no landfall happened till everyone supposed to be off anyways, just a bit of on-and-off rain. People stocked up on convenience store food (I now have a good supply of instant noodles:) and water, taped over their glass windows, take in their plants and BBQ equipment from outside – well, those who have planned.

Around 10pm the big rain has arrived, here’s a video of how it looked from my window. Went to sleep later, and got woken up around 3:30am by the rain having changed into pretty darn big wind. Here’s another video of the violent part of the typhoon that time in the morning, that doesn’t even really do it justice. The houses around here are pretty tall, and I wonder if they have protected from the wind, or been artificial canyons channeling it. Some things got broken, though not as much as I expected – which is a very good thing.

In the meantime by the power of the Internet I have checked out the pressure reading, how is it going a few miles away in the Taipei Hackerspace, where I have left the barometric pressure sensor (the geolocation is 25.052993,121.516981)

Here’s the entire recording of the approximately 2 days of typhoon. It was pretty okay weather in the start and end of the plot.

Plot of pressure readings
Pressure reading during the passing of Typhoon Soulik, recorded at the Taipei Hackerspace

The readings have been corrected to sea level (from about 20m height, where the Taipei Hackerspace is), should be good within 1hPa or less.

The the pressure was indeed dropping like a rock, and the dip on the graph coincided with the most violent wind that woke me up. According the JMA, the central area of the typhoon had pressures down to 950hPa, which means that core must have passed pretty close to here, having readings below 958hPa, though probably not directly, as it didn’t stay down there for long.

I made a video syncing up the pressure reading and the satellite picture. The red dot on the video marks the recording location. (Watching it in full screen and HD makes it clearer.)

I would wonder what was the flat part in the readings while the typhoon was leaving. Maybe sign of changing direction, by the look of it.

Either way, this was fun to do, and I am glad that only a few people got hurt here, much fewer then even during the less powerful typhoons. Maybe getting people scared a little (like with this “super typhoon” stuff that went on) helps them keep safe? Just don’t use it too often.

Extra material

I put almost all material used here into a gist: the satellite imagery download script, the plotting, the movie frame generation, the movie generation script, and the complete barometric recording. Because this last part is pretty big (5Mb), Github truncated the rest of the scripts. I guess it’s okay to check check it out. Will add the Arduino sketch to read the sensor and the logging script later.

The satellite imagery weighs about 60Mb, so don’t put it online, but if anyone wants them, let me know.

Keep safe!

Laboratory 2.0 – a monitoring system

Looks like that one of my specialty as a physicist, and contribution to the labs where I have worked so far, is bringing different kinds of programming techniques, and technologies to the table. I’m not saying I’m any better than many of the professors, post-docs, and students I’ve met so far (there are plenty of ingenious ones), it’s more like I experiment with different tools, have tried more of the cutting edge or recent technologies, did some web programming and could whip up something quick – that might not work very well at first, but does broaden the horizon for the rest of the people.

Also, I’m a lazy person, so want to automate as much as possible. That was on my mind recently when we have been preparing to do a vacuum-system bake-out. It’s essentially a procedure to have a delicate experimental system, mostly made up of steel, glass, and stuff like that, closed up from the atmosphere, all the air pumped out, then heated up to high temperature (~150-300°C). One has to be careful, because things can break, there are temperature limitations for some materials, also on how quickly that temperature can change, requiring careful monitoring of the status of the system. And the whole thing takes something like two weeks or more. Perfect setting for automation.

Set up the electronics

The pressure measurements are done by some expensive other equipment so didn’t have to bother with that one yet, so set to work first on the temperature monitoring. Before it was a bunch of thermocouples and multimeters, requiring manual intervention and lots of labour. Instead, got some inspiration from Adafruit’s Thermocouple Breakout Board, using the MAX31855 chip, and also from the Thermocouple Multiplexer Shield. It can handle only one channel, but can use some other chip together with it to switch between the different thermocouples, and so we can read it out one-by-one. The Adafruit board could only handle 1 channel, and the multiplexer shield was using an older chip for the measurement that I could not buy anymore. In the end, found a good analog multiplexer that one that is sold in the computer market here in Taipei, the CD4067B, and it works pretty well.

Breadboard setup for temperature monitoring Arduino
Breadboard setup for temperature monitoring with Arduino

Of course, setting it all up was quite a bit of fun times, as there were way too many gotchas along the way.

  • MAX31855 is a surface-mount component, and haven’t worked with it before. Not too bad, and can be much neater, just takes some plactice
  • MAX31855 is a 3.3V circuit, so the CMOS voltage levels used by my Arduino Mega ADK had to be level shifted
  • Unlike the older chip, MAX31855 really needs differential input, and it’s much more sensitive to the environment. This required different kind of analog multiplexer than that board had
  • The Arduino Mega is a new model for me, and had some strange behaviour in terms of the serial communication
  • Surprisingly there are not too many options for 3.3V voltage regulators over here, just the LM1117, which is different from what others are using elsewhere
  • Lots of noise and stability issues until figured out what should be how. For example under no circumstance should touch the thermocouple to conducting surfaces, and avoid ground loops
  • While MAX31855 says it’s “cold-point compensated”, meaning that it accounts for the chip-s local temperature when measuring the thermocouple, it doesn’t appear completely compensated, meaning that we can have unexpected measurement change because the chip is heating up for example by being in a closed box.
  • Figuring out the right amount of time to wait between switching channels (375ms seems to be good enough, 500ms is totally fine)
In the end, though, we did have a nice 16 channel thermocouple multiplexer, sending off the measurements onto an LCD screen and to the computer over an USB cable.
Temperature monitoring board soldered
Temperature monitoring board in it’s lab setting with 16 thermocouple channels

This is then saved in a database, and can be accessed from elsewhere.


The thing that my co-workers were most amazed by wasn’t the electronics. Sure, they haven’t worked with Arduinos, but did do similar stuff. Instead they liked the monitoring interface much more, this is the one on the picture right here (can click to enlarge)

Bakeout Monitor  interface showing the vacuum system, temperatures, pressures and long term graphs
Bakeout Monitor interface (click image for full view)

It’s the schematic layout of our equipment, with the temperatures positioned where the actual sensors are. Also, the change of the measured values in time are also displayed with live scrolling.

I’m not saying it’s great. Thinking about it, the major insight that made it good for the rest of the people is that I realized how much more people understand visual data: the placement of the values to the corresponding locations on the schematics. That’s the only thing.

So inside it’s a MongoDB database (learned from previous mistakes, using a replica-set at least), with Python scripts talking to the sensors and saving the data, NodeJS / Smoothie Charts for visualization (and plain old CSS positioning of <input> tags for the reading display), nginx‘s upstream module for running two monitoring servers just in case. It’s mostly in the Github repo of the monitoring code, as well as the Arduino sketch for talking to the electronics.

It was actually quite fun to write it all, and the gradual improvements, trying the new tech, trying not to lose to much data, amazed how well it works. Especially had a good time learning about the database, scaling, fault tolerance, performance…

Of course there could be room for a lot more improvements.

  • My failover-restart bash scripts are awful, though they do seem to work more or less and counteract the USB unreliablilities
  • There were some changes to Smoothie Charts that I could improve on: logarithmic plotting, some display enhancements, wonder if it can be more optimized for performance
  • More efficient data loading. 12h data is about 30Mb in JSON format, that I send compressed, apparently it gets down to ~5% in size, but it still takes quite a bit of time to process on the frontend
  • The layout now can be changed from config files if the sensors change, so co-workers can do that without programming knowledge. I wonder if that can be simplified even more

Of course, I’m a person who generally overengineers stuff, so maybe it’s good to stop somewhere. And the somewhere might be when I got to the point to use my Kindle for monitoring (craps out on 1h data already, but some real time things are good enough).

Bakeout Monitor interface running on Kindle
Bakeout Monitor on running on Kindle 3, not perfect but does work

Get on with it

I did learn a lot along the way, and I’m sure that with this experience I will be let to do a little bit more in the lab in terms of programming ideas. I don’t like that the rest of the system is currently forced to be LabView, but that’s for another post, and there are so many things that can be improved in general as well. Let’s just go and do that.