December 12, 2018

38. When (and When Not) to Use Open Source Libraries

Show Notes
Transcription
Discussion

Chris Hickman and Jon Christensen of Kelsus and Rich Staats from Secret Stache discuss when and when not to use open source libraries for your projects. We remember when few people were using them. Now, there’s not much of your own code in a fully functional application. Most everything is open source.

Some of the highlights of the show include:

In the early 2000s, open source emerged; people were either strongly for or against this controversial and new software
Software used to be something you paid for and a fiercely protected asset
Open source became a community and future of software development; people leveraged and shared with each other – get more done by working together
Open source libraries were developed and most of the world moved to JavaScript, Node, and React
Is open source too much of a good thing? Where is that line? Balance?
Open source has always been around for newer developers, so they take it for granted
Not impossible to build software without the open source community, but would be difficult to write without leveraging some core pieces of open source software
Consider specific criteria because you’re entrusting code written by someone else that you are responsible for maintaining and making sure has high fidelity and integrity
Express, Winston, Sequelize, AWS SDK Module are fundamental things being leveraged and depended on by millions of other pieces of code
Should you write the code or find existing open source code? Rule of Thumb: How much time will it take or save you?
Consider various factors before using open source code, including is the code clean, can you understand what it’s doing, does it have a unit test or appropriate comments?
Leverage security through tools that scan open source code for vulnerabilities

Links and Resources

Beanstalk

GitHub Private

The Cathedral and the Bazaar

Rich: In episode 38 of Mobycast, we discuss when and when not to use open source libraries in your projects. Welcome to Mobycast, a weekly conversation about containerization, Docker and modern software deployment. Let’s jump right in.

Jon: Hey, another episode of Mobycast. Rich, what have you been up to last week?

Rich: We are about to upgrade our servers to PHP seven which should be a big deal but we have almost 100 different websites that we manage on our servers.

Jon: That’s a few.

Rich: Yeah, and all of them are using A records to point to the IP address and so now, all of these will have to be updated. The problem of course is that we don’t have the credentials for GoDaddy or whatever is their domain manager. I’m preparing this for January 15th because I know that if I sent out the email, it would take a month and a half for everyone to actually—you can do this once a year if you like. Unfortunately, this upgrade requires an IP address change and so it’s just a nightmare. In addition to that we’re also like, “Well, if we are going to do this too, then maybe we should…” Our version control is now on Beanstalk and we’re moving everything to GitHub Private and so now I have 200 repos probably because we have the theme that’s versioned in the core functionality plugin. A lot of like really meticulous, boring, just slug work.

Jon: Migration hell, yeah sounds good. Good luck.

Rich: Yeah.

Jon: How about you Chris? What are you up to?

Chris: Yeah. I’m just kind of busy with the normal routine. This is always kind of a busy time of the year as we head into the Thanksgiving and Christmas holidays. So I’m trying to get some projects done and a couple of trips, travel trips coming up. I’m arranging for that and lamenting the change of weather now too as well. So not being able to get outside as often. I’m just trying to come to mental terms with that.

Jon: Yeah, I’m having the same difficulty. Yes, this week it’s rare that I have to do much in the way of business work or negotiation of it other than with new projects. But this week, we ran across a client that that was maybe wanting us to push harder than we normally do and give up some weekends and family time kind of thing. So in order to do that, I was looking at ways of even in the risk-reward ratio that happens on our side for that, I would call it a fire drill basically and then it ended up not needing to happen.

Lots of stress and hard work for it than just going back to normal. I think if that had actually happened, we wouldn’t be talking right now. We would have to skip this episode of Mobycast so I’m glad it didn’t happen. This week I think we’re going to talk about using libraries, using open source libraries, SCK Open Source, just pieces of code that you can put into your own stuff that you build. It’s a very broad topic and we only have 20 minutes.

We just have some opinions about it and we’ve been around from the time when there wasn’t much that people were using to now where there can almost be not much of your own code in a fully functioning application and most everything is Open Source. Yeah, let’s jump in. And I think your career history, I just want to hear from your perspective how you see that, just kind of give a couple highlights of when you started seeing open source getting used a lot and how your attitude has changed about it over the past several years, Chris.

Chris: Yeah, I mean being an old timer in this game, I definitely did witness the transformation in the emergence of Open Source. That was early 2000, 2001, 2002 where it really started to kind of take root and the whole idea of just open source as a community. There’s a couple of books written about the subject I think The Cathedral and the Bazaar I believe is one of the ones that was pretty popular at that time. Basically, at that time, it was kind of like the Bitcoin of its era in a way. It was the technology buzz term at that time because it was so new and it was kind of like this really…

Jon: And controversial maybe too huh? Like people have a strong opinion for and against.

Chris: Up until that point, software was something you paid for. This is the big lunch bin of just Microsoft. The beginnings of that company and its core philosophy, software was an asset, it was to be fiercely protected by the company and people paid for it. If you didn’t pay for that software, if you got it by copying from someone else and didn’t pay for it, they would go after you. They very much are going to protect themselves against that.

That was true with most of the other companies as well. At that point I think things like UNIX—basically the software coming out of the universities was like the equivalent of open source software but it was really just like big pieces of software. It was like operating systems or compilers, it really wasn’t libraries at that point. This was a very big radical change, philosophical change for the industry but it definitely—obviously it caught fire and people realized you know what, there’s something nice about this, about having a community where people can leverage each other, they can share and we can get a lot more done by working together as opposed to make everyone pay for everything that we do.

Jon: Right. When I started my career I was like, here we are. Open source starts today, like you graduate and go ahead and become a professional software developer. That very day is like, “Hey, there’s this Linux thing, you should try installing it on this machine.” It’s like two days later, I did it but none of the drivers work. exactly when and I was kind of starting to decide what software was all about in terms of a professional career is exactly when this was happening. And I’m sure you and I probably touched the same open source projects to begin with, Linux obviously at first and then probably Apache and then maybe, I don’t know if you ever touched JBoss or Tomcat type of stuff.

Chris: Unfortunately, I like to forget that part of my career.

Jon: But then yeah, a few years later it’s like along coms, I think that one of the next big things that happened was—they’re obviously libraries, JAR files that were open source that you could include your Java projects but the next really big leap forward was probably Ruby on Rails, Mac community, Gems. And I don’t know that you jive much into that world. At that time in your career, were you playing much with open source and getting involved in any of those communities?

Chris: No, going back, I was very much a Microsoft fan. I worked there for a while and kind of definitely cut my teeth on that tech stack. The time that Ruby was kind of gaining hold and starting to get some adaption, I was still pretty much into the Microsoft Stacks, the .net. I’m still in that space. There was some open source stuff in that but not nearly as much. There was like no package manager for .net. There is now, I mean they eventually did become one.

Jon: It’s that sort of mentality that started to take hold in the Ruby world that has only increased since then that we really want to get that today because in the Ruby world was the first time you just saw every little thing, anything. Throw it on GitHub, I guess it wasn’t even GitHub at that time but throw it somewhere. I’m trying to remember the name RubyGems.org or something. throw it there and let other people use it and it could be something as big as an ORM or for mapping objects to relational tables in your database to a little widget, just a tiny little text box with the special feature and people were just grabbing Gems.

You would see Ruby projects with 20, 30, 50, 70, 150 gems in them. Now, the whole world has kind of moved over to JavaScript and Node and React and I think it’s only increased. What would you say, Chris? Has that been your first real like, I know you did some work using Python but you’re also deeply involved in JavaScript. You use JavaScript to write production code. Has that been kind of a place where you’ve seen the most action in libraries and available things?

Chris: Yeah, I think that MPM probably is like the biggest repository for packages. Rails I think definitely was before Node and it started, it kind of had this concept, I don’t think it had its own repository. I think it was just other things sprang up around it to do so. I forget what the numbers are but it was kind of like after a few years, it was like 25,000 packages in the Rails community type of thing. Node came on the scene I think in 2000, 2009, 2010. Pretty soon after Node was built that’s when I said, “Hey, when you package an engine for this, so let’s go ahead and build NPM to do so.”

For whatever reasons, Node took off and the whole concept of having this reusable code as packages took off. NPM itself grew very quickly. I’ve seen the charts before which is the adaption of like modules that are available for Node versus Rails, like how long it took NPM to get to the same size as Rails. It’s compressed by them like a 3X time. It’s just an ungodly number, it’s billions of packages, downloads a month type of thing from them, it’s really exploding.

Jon: And this conversation is about to take a hard left turn but before it does, I just want to say that I myself even remember somewhere around 2010 or 2011 just having an argument with somebody who is a bit of a programming language purist and trying to argue with me about some technical benefit of whatever it was that they were using maybe with Scala, I can’t remember. About why I should be thinking about that and using that instead of, at the time I was using Ruby. I argued back pretty fiercely at that time.

I specifically remember this conversation. Just saying, “You have no idea what you’re saying. it doesn’t matter that these esoteric technology improvements that in the language you’re talking about exist because I can accomplish the same business work in the language that I’m using and the community around it is so much more powerful.” and that community is really the future of software development. I just remember having that argument and feeling so right. I was like, “Yes, this is what it is all about.”

Now is where we take the hard left turn. I still do think that that’s important but I think that there can maybe be too much of a good thing and I think that that’s what we’re going to talk about kind of for the rest of this conversation. Where is that line? Where is the balance between over using open source and using just the right amount of Open Source? Chris, let’s talk about what we see among people that are sort of still more early in their careers. They’ve graduated with a degree or they’ve come to a bootcamp and they’ve got maybe two or three years experience being a developer. What do we typically see in terms of their relationship with Open Source?

Chris: Yeah, I think for the most part it’s kind of always been there for them. I can’t remember a time where it wasn’t there. It’s kind of like just taking it for granted. This is just part of the software code base. I definitely see I guess just implicit trust like, “Hey, if I can go find something that does what it is that it needs to do, just go ahead and use it and I’m done.” this is great. Now I can basically build software by just gluing things together kind of like a Lego approach. Just attaching these boxes together type of thing.

Jon: Exactly. It’s sort of the default behavior of Futurex.

Chris: Open source community is great and there is so much good work that’s being done there, that has been done there.

Jon: Be careful, what we’re saying is a little controversial so we want to make sure we are very obvious that we’re pro Open Source.

Chris: I mean it’s not impossible to build software today without the open source Community to build modern software for like Cloud. It would be really hard to write without leveraging some core pieces of this software that has been open sourced.

Jon: Right, and there will be no problem from a business perspective right? You would be spending value that shouldn’t be.

Chris: Yup, absolutely. so it’s definitely an important part of this foundational piece but it’s a double-edged sword too because you need to be thinking about just a whole bunch of particular facets, criteria as you decide to go use this software. Like, what does that actually mean? You’re now entrusting code that was written by someone else, you probably are not even looking at the code. You’ve now sucked it up into your project and for all intensive purposes, it’s now pure code based and you are now really responsible for maintaining that. Making sure that that has high fidelity and integrity.

Jon: Chris, before we started this conversation when we were kind of planning it, you said, “Is this as good as the code that you would write? Would you consider it your own?” and when you said that kind of the warning bells that went off in me is, I think a lot of people, especially in their first, let’s say five years of work, they may assume that any other code is better than their code. There may be just a confidence issue. If it’s on the web and it’s in GitHub, it’s probably better than anything I would write so let me just use that.

I know I can trust it because mine is not trustworthy. I think we need to break that, that’s not okay to assume that all other code is better than anything you would write yourself. Right? You know your own context, you know your own problems, you know what your customers want, and you know what the rest of your application is doing. Give yourself some credit. Your code is not default bad code. Don’t assume that open source code is better than yours always.

Chris: Absolutely, if you have any experience whatsoever with writing code, if you’ve been doing it for any amount of time and studying the craft, setting the material, you should have some pretty good thoughts, you should have some base level confidence and even if you are new, just by looking at the code, you can be learning from it. There’s nothing to say that you won’t see a way to improve it.

I think being inexperienced is no excuse for not being able to take at least a critical look at something and decide still whether or not like, does this feel good. Being more inexperienced doesn’t necessarily have to be like, I’m going to go evaluate the efficiency of this function in the way that it was developed. is it the really the best algorithm, you can pop up a few levels and say like, this piece of open source software that I’m bringing in like does it have unit test? Is it commented accordingly? Are the files laid out in a way that makes it easy to understand like what it’s doing and how it’s organized? Those are all things that don’t require a lot of experience.

Jon: If it’s in GitHub, does it have lots of issues, do they look like they’re getting attention? How many other people are using it.

Chris: Exactly. I don’t think it matters. Whether you’re inexperienced or you’ve been a seasoned veteran. When you’re looking at adopting in some piece of open source code. You need to make that a considered decision. You essentially are saying like, “Whatever code you’ve written, I’m now bringing it to my own project. I should be treated with the same kind of respect and concerns that I would have for a code written by a fellow colleague on my team or myself.”

Jon: I guess I’m just trying to think of the best way to ask this question, what would you typically want to use as open source. We’ve said what we typically see newer developers do is, is there an upsurge project for this feature than I’m about to use. Okay, there is, I’m going to pull it in. instead of having that be the default mode, what do you think is a better default mode. And it can’t be like, I’m going to make a very considered well researched, it’s got to be a little bit more easily digestible, sort of MO or sort of like a way of going about making this decision than something that could lead to analysis paralysis. I don’t want to suggest that everybody gets into analysis paralysis. How do we tell people when they should go look for open source?

Chris: I think for most projects you have a core set of this fundamental open source projects that makes sense for you to leverage and you can do that without really thinking twice about it. if you’re writing a Node code things like Express or Winston or Sequelize if you’re working with racial databases over the official R.E.D.S library.

Obviously the AWS SDK module. These are all fundamental things that are being depended on by millions of other pieces of code, they’re getting millions of downloads, there’s lots of eyes look being looked on this stuff. You don’t have to think twice about it. You should not be writing that episode in the outright. This is the kind of code you can depend upon and you should leverage that.

I think regardless of what platform you’re on, whether you’re Node or Rails or Go or Python, they’re all going to have their equivalent like just foundational frameworks, libraries that they depend upon. But that should be relatively small maybe like five to 10 libraries or frameworks that would fall into that category. Then after that, you get into something that becomes much more application specific. These are where you start making considered choices and you definitely want to—I mean I’m very much a believer that the fewer dependencies that you have, the better off that you’re going to be.

It’s just the a trade off. If you do have a library out there an open source library that does exist, mostly or exactly what it is you need done and it’s an appreciable amount of work to go duplicate it, then it makes sense for you to go and look at that and say, “Hey, should I bring this in?” embed it. If you’re looking at something like, I need to do something like an iteration over an array that can deal with no values, that’s not a package. Go write the three lines of code and make it your own package type of thing.

Jon: I want to say, there’s got to be a rule of thumb. Is it going to save me at least three days or at least a week or something? If you feel like, if I can do this, I could not write a code for a week. It maybe feels like it’s worth at least exploring.

Chris: I think so, yeah. I think it depends too on where you’re at in your career and how much experience you have. I would actually kind of default is the less experienced you are, the bigger that timeframe should be.

Jon: Right, because if you’ve got more experience then you’re going to be able to better take a look at the package and say, is this well written or not and be like, “Oh, it just saved me $5. This is great.” or whatever.

Chris: Yeah. It’s just like as an inexperienced person, the only way to get more experience and better at coding is when through writing code. Instead of writing a code, you could just bring in someone else’s code, then you miss that opportunity.

Jon: Yeah and I want to add that, this is one of the best reasons to not use open source is, to hide the complexity of something that is generally expected that developers will use. I just have to tell a personal story. I made this mistake myself on a project that I was working on with you. So you are working for the client and I was working for Kelsus and this is at least eight years ago probably or maybe six. I was writing some iOS code and I knew I needed a persistent stuff on the client and have that get synced up to some data on the server and you’re writing to make the ads for that.

I was coming off the heels of doing a lot of work in Rails and there was a framework that somebody had written that was getting a lot of attention that kind of made writing iOS code feel sort of railsy and I said, “I got to try that, that sounds great.” And it just ended up not being great. It was just the worst POS I have ever dealt with. I remember asking you, “Can you reformat to change it this way and that way.” Because I was just dealing with limitations in the library. What I was trying to avoid is that iOS still to this day has this fairly complicated persistence API core data.

Especially at that time, you really had to understand the threading model around core data or you can just ruin applications. The threading model was sophisticated, it was multiple weeks of study in order to become proficient with it. I just was like, “I can use this library and I don’t have to learn that because it’ll take care of it for me.” Guess who ended up learning the library eventually and getting rid of that. That’s a stupid freaking framework. I was like, “Sorry client, I’m going to start over.” I literally started that application over again. I learned my lesson the hard way. So don’t make that mistake. Learn the libraries.

Chris: Absolutely. Maybe just quickly like some other heuristics occurred here that I look at when deciding whether or not to use an open source module. Look at things like, how many other folks are using it, like if it’s on a GitHub preview, how many stars does it have, how many other dependent applications that it has. Consider who wrote it as well. There are well-known folks in the community so obviously if it’s someone that has an established name for himself and a great track record that you know about, that’s going to land more credence. You should give that more weight.

It doesn’t mean that folks that don’t have that history or that background doesn’t exclude them but it’s obviously bonus points if it’s someone that has a track record that you can look at. Look at the commit history to as well. When was the last time this thing was updated, is it three years old, was it four years old since the last commit that’s going to be something that’s a bit of a code smell. You may want to think twice, three times, four times whether you pull that in. look at the code itself. is the code clean, can you understand what it’s doing, is it laid out well, does it have a unit test or does it have the appropriate comments, look at things like that to give you an idea of the overall quality of it and then you can obviously look at things like issues, are there a lot of issues.

Paradoxically, I think the more issue something has, it kind of usually indicates it’s actually something you can rely upon for the most part, as long as they’re not just sitting there and they’re just stacking up. Usually, things that have lots of issues means that there’s lots of people using it. If it has like 70 open issues, it might mean that there’s a million people using it. So anyhow, those are some of the things I think you should look at when deciding whether to use an open source software.

Jon: I think that that basically paints a pretty clear picture. Hopefully, some folks are listening to this and adjust accordingly just save yourself two hours, do the learning and use the solid packages, the ones that really are useful.

Chris: One other thing to definitely make sure you leverage is, security. This is a software that anyone writes. They can do whatever they want. You can have Node modules that they can access through the file system. They can write whatever they want, read and write. Security is a big issue but there’s lots of tools out there in the communities to kind of help with this. recently in the Node community, there was actually an independent company that was doing security evaluations and kind of like just certifying modules as being clean and then NPM bought that company.

Now it’s actually a part of NPM so you can just—NPM Audit, run that command and that will go and scan for any known vulnerabilities in the packages that you’re using. There’s other scanning solutions out there as well that do this. They either go against like a CVE database or actually just at the application level as well. That is another big huge consideration when using open source software, it’s just like one of the security ramifications.

Jon: Right, you don’t want the fancy text box that you got off of React library putting a coin miner into a browser.

Chris: Indeed.

Jon: All right, well thanks so much for this conversation about open source community libraries.

Chris: Thanks guys.

Jon: Thanks for putting this together Rich.

Rich: Yup.

Jon: Talk to you next time, bye.

Rich: Well dear listener, you made it to the end. We appreciate your time and invite you to continue the conversation with us online. This episode, along with the show notes and other valuable resources is available at mobycast.fm/38. If you have any questions or additional insights, we encourage you to leave us a comment there. Thank you and we’ll see you again next week.

The Docker Transition Checklist

19 steps to better prepare you & your engineering team for migration to containers

38. When (and When Not) to Use Open Source Libraries