What things do I need to develop an artificial intelligence? Originally appeared on Quora: the place to gain and share knowledge, empowering people to learn from others and better understand the world.
Like a lot of things, the answer is “it depends”. If we take deep learning as an example of an increasingly popular A.I. workload, building an AI system for deep learning training on your datasets is largely a function of the resources, expertise and amount of infrastructure you have readily accessible. For example, the system you might employ as an independent developer, or as a researcher in a smaller setting, would look considerably different from what you would need to support a larger organization’s efforts to “A.I.-enable” their business interactions with customers, or improve the quality of clinical care, or detect fraud in a voluminous flow of financial transaction data. Ultimately this becomes of question of whether you design and build your own system, or employ a purpose-built solution for your problem.
For the former, today’s GPU technology commonly found in the consumer market often makes its way into “do-it-yourself” A.I. systems that are quite capable, and deliver substantially better performance than CPU-based systems. The readily available toolkits that make general purpose parallel computing possible on today’s GPU’s, in combination with the required programming skills supporting popular deep learning frameworks, can yield great results, as first demonstrated by Alex Krizhevsky’s “AlexNet” which made history by winning the ImageNet Large Scale Visual Recognition Challenge in 2012, using a convolutional neural network running on GPU’s and NVIDIA’s CUDA.
For the latter (deep learning at scale, for larger organizations) – there are two important considerations that should influence your approach for implementing an A.I. system:
1. How Soon Do You Need to Start Seeing Results?
If you’re on an exploratory journey, and you enjoy the intellectual challenge and sometimes “sleuth work” of playing with various hardware and software configurations, it’s likely that you’re not operating against a timeline set by your boss, or some kind of business or research imperative. In this setting, it may be perfectly fine follow a meandering path as you piece together a system including GPU’s, drivers, libraries, and deep learning frameworks that interest you, sifting through potentially hundreds of pages of documentation, as you take on the role of “system integrator”.
However, if your efforts are steered by an overriding need to derive insights from data, as fast as possible, you likely need a simpler, accelerated path to get there. In such cases, you’ll want to take advantage of “solution-ized” platforms or appliances like the NVIDIA DGX-1 or DGX Station that integrate all the componentry you need, and are validated via published benchmark testing, delivering quantifiable performance on the frameworks you care about. Such appliances should be evaluated on the basis of how “plug-and-play” they are, and whether the turn-up experience is intuitive, guiding you through a simple administrative interface that’s easy to navigate, as you manage datasets, allocate resources, and schedule jobs. This mode of deployment prevents you from having to wear an “IT admin” hat, and lets you jump into training neural nets as fast as possible, getting up and running in as little as one day.
Another aspect of getting faster insights is the performance of the hardware and software working together in concert. The benefit of GPU-optimized deep learning software stacks, is the ability to reach much higher speed-up factors on deep learning training than possible with GPU hardware alone. NVIDIA DGX Systems see a 30% increase in deep learning performance compared with other systems built using the same Tesla V100 GPU’s, but lacking integrated, optimized deep learning software. The important takeaway here is that, even if you build an A.I. system on your own, using the absolute latest GPU technology, that system would still be at a performance disadvantage relative to an integrated hardware and software system that’s fully-optimized and software-engineered for maximum performance of each deep learning framework.
2. How Much Time Am I Ready to Spend On Managing and Optimizing Infrastructure?
Similar to the “exploratory journey” route, the on-going operational experience with your A.I. system can look very different, depending on which route you follow. If you thrive in spending time fine-tuning your software stack, playing with different combinations of frameworks and supporting libraries, and don’t mind seeking troubleshooting support in community forums, building your own system may be the way to go. The reality is that a considerable expenditure of software engineering skill is often required to tune the “perfect” deep learning stack, from the framework, right down to the GPU driver, and every layer in between. Consider also that the frameworks themselves are often open source, and are continually evolving. Ensuring your stack is running at optimal performance therefore means a commitment in man-hours to tune and re-tune, and maybe re-build your stack on an on-going basis. If that’s your charter, then it might not be an issue, but if you’re funding someone else to do this work, it could mean hundreds of thousands of dollars in software engineering OpEx to ensure you’re maximizing the ROI of your A.I. investment.
Alternatively, A.I. appliances like NVIDIA’s DGX, that include access to popular deep learning frameworks like TensorFlow, Caffe2, MXNet and more, as well as supporting libraries, all integrated with the hardware, can save considerable time and money. Such offerings feature a pre-optimized stack that’s updated by the solution provider on a regular (ideally monthly) basis. This insulates you from the churn and uncertainty of open source software, while also giving you enterprise-grade support should you have issues with any elements within the hardware and software.
Additionally, with the experimental nature of data science and A.I., developers often find themselves (or their teams) needing to simultaneously experiment with different combinations of system resources and software configurations, in order to determine which model can derive insights fastest. Docker-based containers offer the ability to support co-residency of multiple versions of the deep learning stack, each isolated from each other, with their own instances of supporting drivers and libraries. This keeps the base image of the system OS “clean” and avoids the potential for having to re-image the appliance should the experimenter want to try different configuration permutations. Containers, combined with the right management and scheduling apparatus can also enable teams of researchers to use the platform simultaneously, and collaborate on models they’re developing, thereby increasing the utilization of the system, driving a higher return on your department’s A.I. investment.
All of this enables greater productivity for you, the data scientist who’s trying to harness the power of A.I. in your enterprise. So, you’ve really got two paths to building an A.I. system, each suited to a different set of timelines, business goals, and operational setting you may be dealing with. The choice will ultimately boil down to whether you have the latitude to expand your charter to include the role of system integrator and IT administrator, or whether your responsibilities keep you squarely focused on data science and deriving insights for your organization.