4 Hugues Ross - Blog: Dusting the Cobwebs out of Singularity: Mistakes and Lessons
Hugues Ross

10/22/17

Dusting the Cobwebs out of Singularity: Mistakes and Lessons

I'm pretty tired today, so I'm going to talk a little about my strategy for finishing up this Singularity update and some of the issues I see in my old code.

Homecoming

Getting back to an old codebase is usually a wince-inducing experience, but this instance was worse than usual. Singularity was always a learning process for me, as I taught myself software development over the past few years. Since putting the project down, I've had close to 9 months of software development experience. Naturally, my skills have changed drastically and my old code looks terrible now.

Since I left off mid-refactor, the code is in a bit of a half-finished state. I don't want to go all the way back to the last update, so I'm trying to fix and re-purpose what I've already got to avoid some poor decisions from the past.

Static Shock

For a long time, I regarded globally-accessible data as a very bad thing. As a result, I avoided static functions and variables like the plague in most of my software. You don't want to overuse these types of things, but it's important to remember that there's pretty much nothing in programming that should always be avoided.

In this case, the most obvious use case for statics is Singularity's settings classes. That's right, there were two of them. Depending on how loose your definition of "settings" is, you could even say that there were three (One to act as an interface to the application preferences, one to provide command-line settings, and one to provide the resolved path to the database). None of them were static, so if I needed them in a class I had to make that class hold a reference to the one(s) I needed. Some of those references even had different names!

I have since made the contents of both "main" settings classes static, and nested the command-line settings into the global settings. While I was at it, I also folded the path resolver into the command-line settings (since that's what decides the db path most of the time anyway). The result absolutely violates SOLID, but that honestly doesn't matter because it also greatly simplifies the affected code and makes it much more understandable.

Lesson learned: Trying to follow a particular coding paradigm perfectly at the expense of your actual design is, naturally, a terrible idea. At the end of the day, there's a time and place for everything.

Shameful SQL

I don't think I've ever really enjoyed working with SQL, nor do I expect this to be a particularly unpopular opinion. Within Singularity's SQLite database, there's a table containing all of the feed entries that it has saved. As it turns out, these entries have no unique keys and are subject to change. That makes swapping them out in a clean fashion damn near impossible.

So where did this terrible idea come from, anyway? The answer is GUIDs. RSS/Atom feed entries require a unique ID. However, we can't use that as our key in the database because keys could overlap between two feeds. My current plan is to combine something from the owning feed with the entry's GUID, and then hash the result. Simple and reasonable, right? Well, my OLD solution for handling updates was this:
  1. Create a new temporary table and fill it with entries that match our feed.
  2. Create a brand new unique index on the GUID (since there's only one feed, it should be unique now).
  3. Save the items to this temporary table
  4. Copy the entire thing back to its original location
  5. Drop the temporary table 
  6. Repeat for every individual feed (about 700 times)
  7. Put the kettle on the stove, because this is going to take a while
Sometimes, (exactly 50% of the time) it crashes the entire program because a GUID managed to show up twice and database errors are treated as critical. Looking back, all I can do is cry.

Lesson Learned: Just because you just learned a couple fancy SQL tricks doesn't mean you should actually use them. Instead, consider if there's a better solution that doesn't overwork your database for no reason. Or at least use a better database, like PostgreSQL or Lucene.

Think Big! No, Smaller!

Often, it's a good idea to consider how well a solution might scale. After all, projects grow in scope over time and you don't want to leave yourself with a subpar solution down the line. On the other hand, it's also important to consider the current scale of the project you're working on.

Case in point, I made the rather poor decision to load as little as possible into memory. This seems smart, since it saves on RAM, but I don't need to do it! Seriously. Let's look at the numbers:
  • I'm subscribed to ~700 active feeds
  • My current Singularity database holds ~1 year of feed entries
  • My current Singularity database is just over 100Mb in size
At this rate, singularity might end up using 1Gb of RAM...in a decade. Most modern web browsers regularly eat up that much right now! By the time the database gets big enough to fill my current machine's 16 gigs of RAM, I'll be several decades underground already. Unless I decide tomorrow to make Singularity into a big public online service, there's no point to optimizing for space.

More than just saving on RAM, I've repeatedly butted heads with a messy "clean up" system to auto-delete entries after a certain length of time passes. This is even more misguided, considering how cheap and plentiful hard drive space is. On top of adding extra bugs and code, this "feature" is much less useful than, say, a simple archival system that hides old entries while still letting you search for them. Unless a user was tracking a ludicrous number of feeds, I really don't see the point in trying any fancy space-saving tricks that slow things down and introduce bugs.

Lesson Learned: Base your projections on real use-cases, not crazy what-ifs and guesswork.

Conclusion

There are plenty of other problems left, but these are the big ones that I'm working on right now. Besides this I've still got a few other projects that I'm toying around with. Still, this one has my interest right now so you can probably count on more updates as development continues.

No comments: