The GSD Programmer

Saturday, December 30, 2017

URN vs. URL vs. URI

Uniform Resource Identifier (URI) : Identify
A compact sequence of characters that identifies an abstract or physical resource. Encompases URN's and URL's (and URC's).

Example: "www.google.com", "http://www.google.com"

Uniform Resource Name (URN) : Uniquely name
Identifies a resource by a unique and persistent name, but doesn't necessarily tell you how to locate it on the internet.

Example use: 6e8bc430-9c3a-11d9-9669-0800200c9a66

Unified Resource Location (URL) : Locate
Contains information about how to fetch a resource from its location. URL's ALWAYS start with protocol.
Example use: "http://www.google.com"

So if I want to be 'that guy', when should I use which terms?
Use URI's for most everything, unless it includes protocol ('http://', 'https://', etc) in which case be 'that guy' and say it's a URL.

Source
https://danielmiessler.com/study/url-uri/

Dependency Injection

What is Dependency Injection

A technique of 'injecting' the dependencies of one object to another.

Not this:
function a (id) {
return b(id)
.then(response => response.json())
}

This:
function a (b, id) {
return b(id)
.then(response => response.json())
}

Why use it?

It's more explicit, easier to read and provides more control over inputs...but practically speaking, it makes it easier to unit test your application. There are many ways to test your application, but dependency injection is a common and well known design pattern, well documented and can (for the most part) be implemented with most languages.

Source
https://www.youtube.com/watch?v=0X1Ns2NRfks
https://en.wikipedia.org/wiki/Dependency_injection

Sunday, November 26, 2017

Updating query URLs to destination URLs

Quick Note: I know it's been awhile but I'm back! Actual solution and setup is all the way down. Enjoy!

I was recently contracted to go through and update a client's database values. The issue was that the URLs stored in the database were query URLs that would redirect to the destination URL instead of the URL itself. Example:

https://www.somedomain.com/Query/SomeProduct/Details.asp?Code=blahblah

Instead of:

https://www.somedomain.com/SomeProduct

Pfft, easy.

So what I'm thinking:

Create a script that will iterate over each of the query URLs, perform an HTTP request with the query URL and, once the page has loaded, evaluate the current URL. I'll get the data in a CSV and use Python and it's awesomeness to read the data, then use a library to scrape the web and run a quick JS script to evaluate the query URL into a new URL.

Pfft, easy.

So what actually happened:

Python library and dependency hell. Despite using PIP to install and manage my dependencies, I found that using Python 3 for one dependency was not supported. So I had to reinstall Python 2.7, only to find out afterwards that another dependency requires Python 3. Eventually, I hacked together some patches and libraries to make this work.

TL;DR: Use a webdriver to open each query URL, wait for the page to load and run javascript "location.url" to evaluate the destination URL and return it's value.

However, the speed was horrible...at least 7-10 seconds per website and some of the websites wouldn't finish loading using the selenium driver, resulting in an undefined value. To further my frustration, I would have to download more sketchy dependencies in order to adjust the timeout of the webdriver to....holy shit this is terrible.

I decided to change my tech stack. I initially chose Python because I didn't want issues reading the data in the CSV, but thankfully there were Node.js libraries for that. Furthermore, I decided to analyse the actual HTTP requests that were being made and received. Surely, there had to be some sign of the desitnation URL in the HTTP response for the query URL since that's the nature of a redirect. There was!

So final solution:

Use a simple Node script to read query URLs from the CSV, use that data to make an HTTP request, read the HTTP response for the destination URL ($("link[rel='canonical']").attr('href');) and record it back next to the query URL. I did have some issues with async and writing the csv due to my order of operations, but I changed my algorithm to store all the URLs before writing them so I could maintain order of the data and ensure I wasn't accidentally mismatching query URLs with destination URLs because some HTTP requests were faster than others. The script was fast too, evaluating 8-12 query URLs per second changing my estimated 2.5 hour script with python script down to about 3 min with my node script.

PS: I'm happy to share my script with anyone who would like to use it or if you would like to see how I did what I did.

Sunday, October 9, 2016

C# Generics

C# Generics

Generics allow us to treat different types and classes similarly through the power of type parameters. For example, rather than defining the same method across multiple classes, we can define a single class that can take in multiple types in a single location. Hopefully the examples provided below can help illuminate exactly what I'm talking about.

Let's say we have a backpack that we put a bunch of stuff into. Books, pencils and notebooks, for example. We might make a class Backpack like this:

public class Books
    {
        private Collection<Book> BookCollection;
        public void Add(Book book)
        {
            this.BookCollection.Add(book);
        }
    }
    public class Pencils
    {
        private Collection<Pencil> PencilCollection;
        public void Add(Pencil pencil)
        {
            this.PencilCollection.Add(pencil);
        }
    }

Can you see something wrong here? We have 3 methods with the same exact name, and they are all adding something of some type to that type's collection. This is a serious violation of DRY. Why do we have to do this though? Because C# requires us to define what we are performing the action on - it has not idea what's going on otherwise. So how can we reduce our code footprint while simultaneously giving C# something to define? Generics.

Rather than having to create a separate, unique list for each one of our classes - we can simply make a GenericList class that creates a new list upon instantiation. We defer type specification until instantation without the cost or risk of runtime casts or boxing operations.

public class GenericList<T> : IEnumerable
    {
        private Collection<T> GenericCollection;
 
        public IEnumerator<T> GetEnumerator()
        {
            return this.GenericCollection.GetEnumerator();
        }
 
        IEnumerator IEnumerable.GetEnumerator()
        {
            throw new NotImplementedException();
        }
    }

public class TestClass
    {
        static void Main()
        {
            var bookCollection = new GenericList<Books>();
            var pencilCollection = new GenericList<Pencils>();
            bookCollection.GetEnumerator();
            pencilCollection.GetEnumerator();
        }
    }

Once we have that, we can use a GenericList where ever and whenever - this gives us more flexibility and reduces the amount of code. Generics can be seen as an instance of composition, whereby we extract some common behavior or property from classes. This is especially useful for collections of different data types, such as Dictionaries. Typically, we use generics more often than creating them - regardless, generics are a great way to reuse code and improve performance, specifically by avoiding boxing/unboxing and casting operations.

Sunday, October 2, 2016

Pillars of OOP

Pillars of OOP

Object-oriented programming is a programming paradigm of focusing code towards the construction and treatment of objects, or instances of classes. This mechanic is illustrated through features (pillars) which is important to understand and implement in order to minimize bugs, reduce complexity and create quality code!

Encapsulation:
Separate concerns. The waiter shouldn't care how the chef is making food, just that he makes it and the same holds that the chef shouldn't care how the waiter is delivering food, just that he's doing it. The implementation of a procedure should not be shared with other blocks of your code - they should remain separate and should interact minimally.

Prevents unintended consequences of changing code in one place.
Loosely coupled code for maintainability.
Examples:

Access Modifiers (Public, Private, Protected, Internal, Protected Internal)
Accessors (get; set;)

Inheritance:
Code re-usability. You shouldn't have to redefine an entire set of properties and methods for classes that are similar - instead, you should structure the code so that classes inherit from other classes in order to reduce the amount of code written. Classes that have a similar relationship, either through properties or behavior, should have an explicitly defined relationship that allows one to 'inherit' from the other. Based on a "Is a" relationship.

Reduces code count
Improves structure and organization, leading to improved readability.
Examples:

The ":" operator extends a derived class (left) from a base class (right)
"is" and "as" keywords for casting (also upcasting and downcasting)

Polymorphism
Extract and distribute commonalities. The term derives from the ability to use the same name for what may be different actions on objects of different types. Humans can walk and so can dogs. We can abstract this commonality out from both, make a class called Walkability and extend it to classes that can walk. This explicit relationship means that humans and dogs can walk because they extend Walkability. This is typically done through interfaces or virtual methods - they act as a contract guaranteeing a specific function. The way that humans walk is different from dogs, but they still walk - same thing is applied here: while the implementation is different, we still guarantee that humans can "walk" and dogs can "walk". Based on a "has a" relationship.

Loosely coupled code for maintainability.
Flexibility to change commonalities among separate types
Reduce code count

For some, there is a fourth pillar (data abstraction), but I've decided to make a separate blog post on that so that I can explain why some include it and some don't, while also covering what it is.

Summary: Understand your pillars so that you can take full advantage of OOP to write better code.

Saturday, October 1, 2016

Static vs Instance C#/JS

Welcome back! I've finally found some time to myself to continue this blog. Also, I have become hopelessly overwhelmed with the subtle differences between C# and Javascript, so I will attempt to overcome that mountain, and maybe you'll learn something new (or correct me!).

Static Method vs. Class/Instance/Non-static methods

2 cent summary
So we need to first establish a key difference: classes outline the characteristics (properties) and behaviors (methods) of a construct. Functions that are defined within classes (methods) are specific to the instances of that class - they cannot be called without an instantiation of that class. Static methods are essentially type functions that can, within limits, be called at any point without the need of an object or instance, so long as the type is available.

C#
What are static methods? Basically, they are functions that are available without the instantiation of an object or instance from a class - which makes a lot more sense having been exposed to a language like C# or Java. An example of a static method is System.Diagnostics.Debug.WriteLine(). Ever notice how you can call this function without instantiating System.Diagnostics.Debug()? Probably not, because that wouldn't make sense right? Well, this is an example of a static method - it's a function attached to the type itself rather than an instantiation/object.

Javascript
However, coming from my JS roots, what does this mean? Well, Classes are not really a syntactic thing (at least not in ECMA5), but depending on your design and instantiation pattern, you can mimic this mechanic. Through a constructor function, you can call methods specific to that instance (either internally or externally) while also attaching methods to the constructor itself (static methods).

Example:
var Person = function(name, age) {
this.name = name;
this.age = age;
this.introduce = function() {
console.log("Hello, my name is " + this.name);
}
}
Person.AnswerToUniverse = function() {
console.log("42");
}
var Cameron = new Person("Cameron", 27);
//Static Method:
Person.Answer();

//Instantiation Method:
Cameron.yell();

Summary
Static methods are functions attached to type, instance/class methods are attached to instances/objects.

EDIT: OH GAWD, I WAS SO WRONG!!! Fixed.

Reference
https://www.youtube.com/watch?v=53LWUQVyZb8

Saturday, May 14, 2016

Why Node?

Friday, May 13 2016

Why chose to develop in Node? What are the pros and the cons? What was it designed for?

Node was designed for web application development in mind, specifically aiming to process non-intensive CPU process asynchronously and fast. Knowing when to use Node and when not to use Node depends on context; to make this easier, below is a list of pros and cons to using Node.

Pro's:

One language:

Node is written in Javascript which means that, assuming your already using javascript for your frontend, you would only ever have one language to work with. That makes it easier for developers to write more code and focus on logic rather than language syntax. This is especially true for companies that maybe using JS, PHP, C or a mix of multiple languages.

Its fast.

V8: The V8 engine developed and maintained by Google compiles javascript, using C++, into machine code, which makes it incredibly fast. Also,
The Event Loop: As I've written in a previous post, the event loop is what provides javascript is awesome power in conjunction with the V8 engine. The event loop is a single thread that performs I/O operations asynchronously by handing async operations to the event loop with a callback so it can run the rest of the code. Once the async operation completes, the callback is called.
Network connections, filesystems, and database queries run very quickly in javascript due to the event loop and v8 compilation.

NPM is the largest package manager on the web and is entirely open-sourced. With a large, thriving community supporting it, the reliability of node's package manager is a sure thing for many years to come.
Additionally, many of these modules are made in a plug-and-play sort of manner and are incredibly performant - one of the many benefits of having the support of a large online community for an open sourced platform. (EG: bluebird, socket.io, etc)

Con's:

Its based on JavaScript

This one isn't so much to do with Node as to what language its built for. Javascript itself is not as opinionated as other languages, which can be a good thing and can be a bad thing. If your looking to create a safety-critical system that would be disastrous if there were any bugs, then yeah, maybe look for a more opinionated language, like TypeScript, with more built-in testing interfaces.

CPU heavy requests

Because Node is single-threaded, CPU intensive requests are not as ideal (unless a significant portion of the request can be run asynchronously, but this is not always the case).

The number of modules

NPM is flooded with many different modules, which can be a good thing and, again, a bad thing. The bad thing about having many different modules is that you can have 2 separate apps that do the same exact thing but built very differently with different modules. This can make it difficult to find one answer to one problem with one type of module. However, this is minor since there are market-recommended modules that are more used than others, thereby consolidating a good portion of FAQ that are easy to lookup.

References:
http://blog.modulus.io/top-10-reasons-to-use-node
http://stackoverflow.com/questions/5062614/how-to-decide-when-to-use-node-js?rq=1