What is artificial intelligence?

The research domain of artificial intelligence is a very fertile and diverse area of scientific research currently, tapping expertise across numerous disciplines to address specific research challenges, including computer science, mathematics, statistics and data science, to name just a few. These disciplines operate in a rich terminological space with terms often used loosely and interchangeably by commentators. For this reason, this article offers a number of definitions of commonly used terms.

 Artificial intelligence (AI) may be defined as:

 “…the theory and development of computer systems able to perform tasks normally requiring human intelligence…”

 Machine learning (ML), a branch of AI, may be defined as:

 “…the science of getting computers to act without being explicitly programmed…”

 At its most simplistic, a computer may act through the application of simple rules based “if then” logic, written into a programme. The central challenge in machine learning is to train a computer algorithm to act autonomously without the need for rules to be explicitly predefined. Examples of the sorts of acts undertaken might include objective tasks such as learning or predicting but might also conceivably include more subjective tasks such as creating or relating. A common challenge area a ML algorithm is tasked to perform is to predict an outcome based on past occurrences of the outcome or co-occurrence of related factors, the central aim typically being to predict the outcome as accurately as possible. This can naturally be undertaken relatively easily, simply by comparing the computer algorithm’s predictions to what actually happened in reality. Statistics can also be used to undertake such predictive tasks, as an alternative to using machine learning based techniques.

 Statistics may be defined as:

 “…the science of collecting and analysing numeric data in large quantities, especially for the purpose of inferring proportions in a whole from those observed in a representative sample…”  

 Statistical exercises typically involve bringing together a sample of data, making general assumptions about how the data varies in the form of an assumed statistical model, and then making inferences about relationships within the data sample using the model as a guide. The central aim in any statistical exercise is less predictive accuracy, although this is often desired, and more making valid interpretations and appropriate generalisations from the data sample providing the focus of the statistical exercise. Key here is the suitability of the statistical model used to make the inferences and the representivity of the data sample relative to the population from which it was sampled from. Making accurate predictions using machine learning techniques in contrast is contingent on having a sufficiently large data sample available to train the algorithm to predict, and sufficient computer power to undertake the algorithmic training exercise. Both these prerequisites have experienced step changes for the better over recent years. For this reason, machine learning techniques are being increasingly used across many disciplines to support the carrying out of predictive analytic exercises.  

 The terms machine learning (ML) and deep learning (DL) are also often used interchangeably. However, the two techniques are subtly different in a number of ways. Both involve algorithms that 1) parse data, that is, analyse data in order to understand its content and structure, 2) learn from the parsing process, and then 3) apply what they’ve learned. However, deep learning refers to the use of a particular category of algorithm in the process known as an artificial neural network (ANN), and specifically an ANN with a certain degree of complexity (characterised by the number of layers it has). A key difference between ML and DL is that the former requires a certain level of manual intervention from an analyst to inform the learning process, specifically the definition of the variables or features to be used in the learning process. In the latter however, the definition of the variables or features is self-directed by the algorithm. The typical unit structure of an ANN, how it operates and its likeness to a human neurone. 

 Different types of ML approaches are on a continuum of increasing functional complexity. The continuum starts on the far left with simple rules-based algorithms, moves on to the use of inferential statistics for prediction, then onto ML based and then DL based prediction. The latter two techniques are examples of what is often termed as artificial narrow intelligence (ANI), that is, AI systems developed and trained for a specific task within a limited context.

 The general process by which a machine learning algorithm is trained to predict an outcome of interest and then is subsequently operationalised. The process starts by defining the objective function for the AI system, that is, the specific outcome that the system is being trained to predict. A dataset is then compiled made up of variables characterising the outcome to be predicted, along with potential predictors. In this dataset, termed the training set, the relationship between the outcome and its predictors is known. The training set is split into a set used to train the algorithm to predict, and a second set used to test how well the final trained algorithm performs. The training process is typically a sequential, cyclical process of repeated training, validating, tuning and revalidating of the algorithm using different portions of the training set. In this way, the way the algorithm predicts is repeatedly refined and the way it predicts is repeatedly validated, which continues until a desired level of predictive performance is achieved. As the relationship between the outcome and predictors is known in the training set, the predictive accuracy of the algorithm can be measured at each stage of the validation process and the effects of algorithm tuning on predictive accuracy judged. The final performance of the algorithm is then measured on a final test set of data. If the predictive performance of the algorithm is deemed sufficiently accurate then the final step in the process is to deploy it by feeding in new unlabelled data on a routine basis.

 As technological advances continue, both in AI and ML and the computing power needed to fuel it, many futurists regard it as inevitable that artificial narrow intelligence systems will be replaced by wider or more general intelligence systems. This is obviously contingent on progress continuing unchecked. Artificial general intelligence (AGI) systems can be defined as AI systems with broad, adaptable cognitive abilities, similar to humans. On the ML continuum, ANI can be considered to be replaced by AGI at the point where algorithms transition from merely performing to actually mastering a particular task. Such algorithms can be characterised by possessing the capability of constructing abstract concepts and transferring and applying them from one domain to another, as humans are able to do.

 The final point on the far right of the ML continuum is the emergence of artificial super intelligence (ASI), which may be described as AI systems with general cognitive abilities surpassing humans. Whereas humans are essentially fixed entities in time, super intelligence might be conceivably achieved through the emergence of entities that are able to change their own architecture and design instantaneously to adapt to changing needs.

 The emergence of AGI is no longer confined to the realms of science fiction, a recent survey of prominent AI experts found that consensus opinion was that AGI would be more likely than not realised by 2100. The capacity of the fastest supercomputers to store information is already around ten times greater than the human brain, whilst the speed with which information is processed is around four times greater. However, the number of calculations each second that the fastest supercomputer can currently manage is dwarfed by the human brain (capacity of the human brain is twenty times greater in this regard). Placing the latter in context, one second of human brain activity would take the fastest supercomputers around forty minutes to process. Computers therefore still have quite a way to go to match human brain power on this metric.