Data & Objects
In most programming languages, you can create your own types. This is a major productivity boost, I would not recommend using a language that does not allow this. Here, I’m going to try to categorize the types that you might create, and assign some properties to the categories. Then, I’ll make some recommendations for language designers to consider when creating their languages.
Data & Objects
If you read the title, you probably already know what categories I’m going to use: Data and Objects. Before we get into the nitty gritty, lets think about some things people might typically associate with Data and Objects.
Obejcts — Encapsulation, behaviours, polymorphism
Data — Query, store, copy, compress
Now lets also think about some types you might see in a typical codebase, and what category they fit into:
Mutex — A mutex looks a lot like an object. It has behaviours, you can lock it an unlock it, and it has encapsulation, since it hide’s it’s details from the user, and prevents the user from accessing it’s fields directly.
User—A user is something that looks a lot like data. You usually store your users in a database, you frequently query your users, and you frequently access their fields directly (or almost-directly, via getters that just return the field)
That was easy. Now lets think about some more borderline case
File — Many codebases will have a “File” type, often in the standard library. A file pretty clearly has behaviours, like write(), and close(), but it also has storage, can be queried or compressed, which were things we had said were more data-like. Hmmmm.
String — The basic string type is built in to almost every modern language. Like files, they can be compressed, they support queries like hasSubstring(), and they usually support mutating methods like append().
OK, so how do we categorize these? What is the fundamental difference between a String and a File? My claim is that the File type is an Object, and the String type is Data, and the fundamental difference is that File type is handle through which some resource is accessed, in this case the filesystem. The String type is just a string, plain, raw, data.
More on Data
So what can we do with Data? Or more realistically, what might we want to do, or expect to do, with Data? What actually programming languages let you do with Data varies.
Store and transmit the data — This implies some form of serialization format by which we transform the data into format usable for storage and transmission, and then store or transmit it. This could be JSON, it could be a database, or it could any number of other formats.
Look at the data — This is closely related to storage and transmission, but implies some form of textual format, with labelled subfields that a human can look at and understand. JSON could be usable for this purpose.
Compare the data — One obvious thing to do with data is check if it’s equal to some other piece of data with the same type Some languages may also allow you to check if a piece of data is equal to a piece of data with a different type — with varying results.
Hash the data — Hashing data is useful in many contexts. By storing a hash of some data you can do some very interesting things. You can make a storage system where using the hash of some data allows you to cheaply de-duplicate identical pieces of data. You can ask that someone prove that they have a piece of data by asking them to send you a hash of that data over an insecure channel, thus proving that they have the data without having to transmit it.
Data & Object in Languages
What does this mean for language designers? What I ask for, is that language designers allow users to create Data types, and Object types, and keep them separate. Then, if a programmer declares that a type is a Data type, make sure that it actually behaves like data!
- Don’t make them write their own implementation of equality. Programmers are lazy, they probably won’t do it until they need to, and when they need to, they will probably get it wrong, e.g. by forgetting some fields
- Don’t make them write their own hashing functions. Hashing functions are tricky to do properly, as a language designer you can make a better default hash than most people can.
- Help them look at the data. Looking at data is critical during debugging, make it as easy as you can.
Next, don’t let them treat Objects types like Data types! Copying a File object isn’t something you can Just Do. You need to have a new file path to copy the file to, and it can fail, if for example the use runs out of space on their filesystem in the middle of the copy. Copying a file is thus something that shouldn’t done with the humble “equals” symbol. It needs some error reporting mechanism, and some way to provide information like the destination path to the copying mechanism.
Succinctly, my final recommendation: Let programmers distinguish between Data types and Object types, and help them make the Data types behave like Data, and their Object types behave like Objects. There is lots of room for debate in how exactly to go about this, but I hope this baseline recommendation can form a good starting point.