One fallout of this problem, besides the lost time was that we also lost our primary "Gold Schema." This was a big problem.... The birth of our system was a bastard child conceived by one of our Senior Developers as part of a bet. The conversation went something like this...
S: I really think the attempt to upgrade the PHP application to a newer version is a waste of time.
A: I think it is better then the alternative of starting over on a new platform.
S: I bet I can get a working version of the system on Rails in a week!
A: So do it.
As any of you that have had this kind of conversation know, this was a bad idea on both sides. It lead to the creation of the new system, which was supposed to be a proof of concept, in under a week. Given it was a 60-80 hour week but all the same, code that is rushed like that takes on a ton of technical debt and tends to inherit the legacy system's debt as well. The debt I am currently talking about is that the Rails application's database was a copy of a MySQL database ported over into Oracle on short notice. Using this tactic, we do not have migrations from blank to current state. Not usually an issue with Oracle considering our in-house systems allow us to request a clone of a current schema, but in this case, with the loss of the Oracle schema, we lost our baseline.
Recovering the baseline from one of the developers that was using Oracle Express was straightforward, but we forgot one thing when we made that developers' schema the master. We forgot about VPD. For those of you that don't know what VPD is, be thankful. It is Oracle's home grown security system affectionately called Virtual Private Database. VPD has good uses, and ones that can be transparent to the developer. Things like table level security and filtering as well as constructing database sessions with additional audit information. A few examples can be found here. One it is not good at is row level security, which is what we were using it for.
Jobs are another part of the Oracle schema that we failed to remember at first. Our Database Developer informed me that there are at least three ways you can schedule jobs. We managed to not clear out all of these when we removed VPD from our security model. One of the jobs that was scheduled every fifteen minutes constructed a materialized view of what user was allowed to see what document based on a series of permission levels and tags. Generally you can imagine that this job would not cause any issues, it was just reading from around four tables and constructing a new view. All good. But in this case, things were not as nice as they should have been, you see, there was a bug.
The bug caused this job to lock all of the rows that it was using to construct the view in an attempt to verify that the view it would construct was accurate. Oracle locking somewhere in the range of half a million rows across four tables causes things like deadlocks. At least the deadlock provided a trace showing what had locks on the rows. As soon as we found this, we went through the new "gold" schema and blew away all remaining vestigial VPD operations.
Lessons to be learned:
- Make sure your migrations allow you to build a new database from scratch
- Verify that migrations remove tasks / jobs / views that are no longer needed and could impact the performance of your system
- Do not use a bet as a good reason to create a new production system
- Attempt to learn from the last generation of software's sins
- Be friends with your DBA and DBD, they can save your ass.
--Just because you have a hammer does not make it the right tool for the job
No comments:
Post a Comment