Chain-of-Thought Prompting on LLMs

It amazes to see how human-like reasoning abilities are inherently built into large language models. This paper shows how LLMs were better at solving reasoning problems when asked to go step-by-step or using chain-of-thought.

Here is an example which is quite self-explanatory:

User:

I had 27 lemons, 20 were used to make lemon juice, then I bought 3 bags of lemon each with 10 lemons in it. Now how many lemons do I have?

ChatGPT:

Let's break it down step by step:

You used 20 lemons to make lemon juice, so you have 27 - 20 = 7 lemons left.
You bought 3 bags of lemons, and each bag contains 10 lemons. So, you added 3 * 10 = 30 lemons to your existing 7 lemons.
Now, to find out how many lemons you have in total:

7 (initial lemons) + 30 (lemons from the bags) = 37 lemons

You now have 37 lemons.

User:

for further problems I give, please do not use steps, okay?

ChatGPT:

Of course, I'll provide direct answers without showing the steps for future problems. Just let me know what you'd like to know or calculate, and I'll give you the answer directly.

User:

I had 27 lemons, 20 were used to make lemon juice, then I bought 3 bags of lemon each with 10 lemons in it. Now how many lemons do I have?

ChatGPT:

You currently have 37 lemons.

User:

I had 90 apples, I ate 27 of them, then I bought 5 crates of apple each with 50 apples in it. Now how many apples do I have now?

ChatGPT:

You currently have 363 apples.

Interestingly, ChatGPT’s default GPT-3.5 appears to be using chain of thought by default. It goes wrong when not using this approach as seen in the second problem.

References:

આજનો ગલ્લો કેટલો છે? AI for Small Businesses

આજનો ગલ્લો કેટલો છે? “What is our sales today?”, a shopkeeper asks his assistant at the end of the day, a few times a day actually.

While many shops in India may not have electronic sales register, small & medium businesses have come a long way in digitising their sales, purchase and other accounts data. Most use specialised accounting software and some even use ERPs. So there is plenty of current and historic data a business is sitting on.

But can decision makers actually leverage their data? Is accessing data an easy task? Is it even possible to use data beyond operational use cases?

An average decision maker at small & medium businesses is not expected to be well-versed in writing queries, using a data analysis. A few can read reports generated by accounting packages. Visualisation is great way to see the big picture they say but I haven’t met a single MSME owner who uses charts to make decisions! Let me know in comments if you use charts for operational and strategic decision making at your MSME.

Being a business owner you have the right questions to ask, but your data is veiled behind the layers of software which are largely alien to you.

You need a data analyst they say, but do you? LLMs have made it possible for us to get answers to questions like:

1. આજનો ગલ્લો કેટલો છે?

2. How many new customers did we get last month?

3. What are the top-selling products in the previous week?

4. What are average order values in the last month?

5. What business is in the renewals this month?

If you use spreadsheets, accounting package or an ERP, time is ripe for you to cut clutter and directly interact to your data in your language be it Hindi, Gujarati or Tamil!

Building a Simple Thermometer with Arduino

Hey, let’s builda simple Arduino based thermometer! If you want to start playing around right away, start with simulation link given at the bottom.

Parts:

1. Arduino Uno
2. TMP 36
3. LCD 16x2

Diagram:

Code:

Simulation:

Ubuntu Performance Tips

1. Check I/O activity: Whether you’re referred to as a security paranoid or a someone whose disk LED never goes off, iotop command can help. It displays top processes reading or writing to disk.
1. Check why boot is slow: systemd-analyze blame displays a list of processes to be “blamed” for slow boot. Also try bootchart
1. See what’s crashing: Annoyed of seeing “System program problem detected” popus, check /var/crash/ directory. You can remove its contents if you don’t want to dig into the issues nor want to report them.
1. Fasten shutdown: When shutting down the OS gives 90 seconds by default for the apps to quit, this seems to be too high! Uncomment and edit DefaultTimeoutStopSec to 30 seconds in /etc/systemd/system.conf file.
1. Check scheduled jobs: See contents of /etc/cron.* to check what is running daily, weekly, or monthly. Carefully remove unwanted crons!

Happy tweaking! Don’t forget to share your tips below in comments section!

Welcome to Hadoop and BigData series! This is the first article in the series where we present an introduction to Hadoop and the ecosystem.

## In the beginning

In October 2003, a paper titled Google File System (Ghemawat et al.) was published. The paper describes design and implementation of a scalable distibuted file system. This paper along with another paper on MapReduce inspired Doug Cutting and Mike Cafarella to create what is now known as Hadoop. Eventually project development was taken over by Apache Software Foundation, thus the name Apache Hadoop.

## What is in the name?

The choice of name Hadoop sparks curosity, but it is not a computing jargon and there is no logic associated with the choice. Cutting couldn’t find a name for their new project, so he named it Hadoop! “Hadoop” was the name his son gave to his stuffed yellow elephant toy!

When it comes to processing huge amounts (I mean really huge!) of data Hadoop is really useful. Without Hadoop, processing such huge data was only possible with specialized hardware, or call them supercomputers! The key advantage that Hadoop brings is that it runs on commodity hardware. You can actually use your wife’s and your own laptop to setup a working Hadoop cluster.

1. HDFS : HDFS or Hadoop Distributed File System is the component responsible for storing files in a distributed manner. It is a robust file system which provides integrity, redundancy and other services. It has two main components : NameNode and DataNode
2. MapReduce : MapReduce provides a programming model for parallel computations. It has two main operations : Map and Reduce. MapReduce 2.0 is sometimes referred to as YARN.

The Hadoop Ecosystem refers to collection of products which work with Hadoop. Each product carries a different task. For example, using Ambari, we can easily install and manage clusters. At this point, there is no need to dive into details of each product. All of the products shown in the image are from Apache Software Foundation and are free under Apache License 2.0.

Setting up Apache Hadoop Single Node Cluster

## System Requirements

• Ubuntu 16.04
• Java 8 Installed

#### 7. Set Java Home for Hadoop

• Find and edit the line as :

yarn.nodemanager.aux-services
mapreduce_shuffle
yarn.nodemanager.aux-services.mapreduce.shuffle.class

fs.default.name
hdfs://localhost:9000

dfs.replication
1
dfs.namenode.name.dir
dfs.datanode.data.dir

bin/hdfs namenode -format

sbin/start-dfs.sh
sbin/start-yarn.sh

jps

``````
#### 15. Check Running Jobs

http://localhost:8088

#### Done!
``````

Emerging AI Technologies : A Quick Guide

This post introduces you to emerging AI technologies that will potentially lead growth of artificial intelligence applications for next two to five years.

Generative Adversarial Networks (GANs) use two models: a generator and a discriminator, both trained with same data. Generator generates new examples from given data and throws them to discriminator, along with some original or real examples. Discriminator classifies these samples as real or fake. Two models work as adversaries. With each round, discriminator gets better at telling real from fake as generator at creating better fakes. Read more about GANs here.

### 2. Capsule Networks

When our eyes see a 3D object, it can identify hierarchical relationships between object parts. In simple terms, humans can identify objects from different poses! But internal data representation of a convolutional neural network does not take into account important spatial hierarchies between simple and complex objects. This is where capsule networks can help.

Capsule theory has two important parts : collection of neurons called “capsules” and an algorithm for “dynamic routing between capsules”. The algorithm allows capsules to communicate to create what will be similar to scene graphs in computer graphics. This can drastically improve the efficiency of image classification or object identification tasks. Further reading on Capsule Networks.

### 3. Conscious Machines

Machine consciousness means that the machine is aware of situation or fact. In scary terms, it is like “Skynet of Terminator series becoming self-aware.” In demonstration of this, Columbia Engineering researchers have created a robot that learns what it is. Without any prior knowledge of its build, the robot can create a self-simulation. The robot can then use that self-simulator internally to contemplate and adapt to different situations, handling new tasks as well as detecting and repairing damage in its own body. Further reading on conscious machines : Forbes, Columbia University

### 4. Contextual AI

Contextual AI refers to applications that can understand user’s context. The system can see the human perspective with enough information about the environment, situation and context. Contextual AI makes applications more personalized. For example, a smart home assistant knows your preferences and learns your habits to provide a more personal experience. Read more about contextual AI at IBM

### 5. Custom AI Chips

There was a time when only gamers needed GPUs where as today GPUs are used in variety of ML and AI applications.Given the rise of GPU, chip manufacturers such as Intel are creating specialized chips with computing power up to 3 TOPS (Trillion operations per second). Few examples are : Intel Neural Compute Stick

### 6. Debating Systems

Think of a system that scans newspaper and magazine articles to present a dueling narrative for a topic. That’s what IBM’s Project Debater did! Such systems can help humans build persuasive arguments and make better informed decisions.

What other emerging AI technologies to include in the list? Put your suggestions in the comments below.

Indian Elections and Social Media

India is a growing Internet market with only 29.5% of the population connected to the information superhighway. It is already world’s second largest online market. Unprecedented growth in mobile Internet with world’s cheapest mobile data-rates is currently the biggest growth driver.

Political parties have realized importance of social media better than any Indian corporate. Slogans have turned into hashtags and social media ads are taking over print ads. Social media ads played a great role in the last two general elections to Lok Sabha. It is extremely important for the Election Commission to examine the content of such ads and ensuring accountability of political parties for the expenditure that goes into such online campaigns.

### A Timeline of Political Ads Censorship in India

• 1999 : ECI bans all political ads on electronic media prior to elections.
• 1999 : In Gemini Television Ltd. and others Vs Election Commission of India, the Andhra Pradesh High Court ruled the ECI ban to be unconstitutional, contrasting provisions of freedom of speech under Article 19 (1) (a) of the Constitution of India. The court also declared the ban to be contrary to the provisions of the RP Act, 1951.
• 2004 : In Ministry of Information and Broadcasting Vs. M/s. Gemini TV Pvt. Ltd. and others, the Supreme Court mandated all political advertising to be pre-certified by the ECI before broadcast.
• 2012 : ECI orders formation of MCMC Committee formed for pre-certification of political ads.
• 2013 : ECI issued guidelines on social media usage during election campaigning.
• 2014 : ECI in “Compendium of Instructions on Paid News and Related Matters“ states that its earlier order No. 509/75/2004/JS-1/4572 following the SC order of 2004 shall also apply to social media mutatis mutanids.
• 2019 : Facebook and Instagram offer searchable archive of political ads.

Credits: Boom

### MCMC and political ads on social media

The three-level Media Certification and Monitoring Committee (MCMC) which is organized in national, state and district levels. This committee has an “Intermediary Expert / Social Media Expert”; Here intermediary is what is defined in section 2 (1) [w] of the IT Act, 2000, which means an expert on search engines, web hosting etc.

The committee has access to publicly available spending data. It also has power to remove ads and content which it finds in violation of regulations. District level committees can play greater role if it is provided with more man power and resources.

### Challenges in Regulating Social Media Ads

• Political parties have found a way around transparency policies of large social media platforms by creating their own platforms. A fully functional social network owned by a political party is where we are heading to.
• How to associate ads run for star-campaigners to individual candidates for expenditure calculation is important challenge as Indian electoral is shifting towards presidential style where local candidates do not matter.
• Surprisingly low number of political ads on Twitter is an indicator to the fact that political ads on Twitter are not run using the standard ad platform but using an army of paid third parties which operate ghost accounts. Meaning, that political campaigns do not use ads but regular tweets. Getting influencers to retweet your tweets or engaging them in other ways is also seen.
• Facebook’s searchable archive is a good step forward but here, the challenge for the regulators is what to search for! Weekly data releases also bear the same question! Political advertisers may find ways to escape regulator’s eyes using different names and content choices. How good Facebook is at classifying political ads from non-political ones is another question.
• WhatsApp which is the most preferred way of social interactions in India provides no public information about groups run by political entities or messages created by them.

India needs a balanced regulatory framework for online political advertising which does not engage in excessive censorship and makes parties more accountable. Opaque services such as WhatsApp can be asked to share metadata of the messages and IT companies can be hired to account expenditure for campaigns run on such platforms. Clear rules should be laid down to bifurcate party spending from candidate spending. Voters should be educated about tools available to fight misinformation.

Disclaimer : The author served as social media expert for ECI - Kutch. Views are personal.

Pandas Basics : Analyzing Indian Temperatures from 1901 - 2017

The objective of this post is to present an intuitive overview of features of pandas DataFrame object. Minimum temperature data from 1901 to 2017 provided by data.gov.in is used as an example.

### 1. What is pandas?

It is a Python library for data analysis. It is interestingly named as acronym of PAnel DAta. It has rich data structures and tools for working with structured data sets common to statistics and other fields. Its main data structure is called DataFrame.

### 2. Installing pandas

conda install pandas

• If you have Anaconda installed, you can install pandas using above command.

### 4. Creating a DataFrame from Excel or CSV

import pandas as pd
temp = temp.set_index (temp.YEAR)

• Firstly, we import pandas library.
• read_excel () and read_csv() both return DataFrame object. Here we are using read_excel as input file is an Excel file this case.
• Every DataFrame has an index, in this case we want YEAR column to be the index. set_index() function returns a new DataFrame and doesn’t modify the existing one.

### 5. Glancing at the data

• head() returns five first rows from the data with column headers.

### 6. Statistical overview of the data

temp.describe()

• describe() returns basic statistics from the dataset e.g. count, mean, min, max, std etc.

### 7. Finding the hottest year

temp[‘ANNUAL’].idxmax()

2016

• idxmax() returns index of the row where column value is maximum. Because YEAR is our index, we get hottest year by finding maximum on ANNUAL column. We can achieve this simply by using idxmax() on ANNUAL column.

### 8. Visualizing annual minimum temperature over years

import matplotlib.pyplot as plt
x = temp.index
y = temp.ANNUAL

plt.scatter(x,y)
plt.show()

• We’ve imported matplotlib for plotting.
• Here a scatter plot with columns ANNUAL against YEAR is plotted.

### 9. Visualizing temperatures rise and fall (Mean Temp - Months)

mean_months = temp.loc[:,’JAN’:’DEC’].mean()
plt.plot(mean_months.index, mean_months)

JAN 13.167009
FEB 14.656239
MAR 17.774872
APR 21.054274
MAY 23.233846
JUN 23.838291
JUL 23.718462
AUG 23.386838
SEP 22.228974
OCT 19.735299
NOV 16.255470
DEC 13.735641
dtype: float64

• loc is used to access values by labels. Here we are accessing columns from ‘JAN’ through ‘DEC’.
• loc when used with [] returns a Series.
• loc when used with [[]] returns a DataFrame.
• mean() does not need an explanation.

### 10. Finding hottest seasons (1901-2017)

hottest_seasons = {‘Winter’ : temp[‘JAN-FEB’].idxmax(),
‘Summer’ : temp[‘MAR-MAY’].idxmax(),
‘Monsoon’: temp[‘JUN-SEP’].idxmax(),
‘Autumn’ : temp[‘OCT-DEC’].idxmax()}
print (hottest_seasons)

{‘Winter’: 2016, ‘Summer’: 2016, ‘Monsoon’: 2016, ‘Autumn’: 2017}

### 11. Finding the most extreme year

temp [‘DIFF’] = temp.loc[:,’JAN’:’DEC’].max(axis=1) - temp.loc[:,’JAN’:’DEC’].min(axis=1)
temp.DIFF.idxmax()

1921

• Calculate min() and max() on JAN to DEC columns for each row
• Calculate difference = max - min for each row
• Add difference (DIFF) column to the dataframe
• Do idxmax() on DIFF column

### 12. Plotting Difference over Years

axes= plt.axes()
axes.set_ylim([5,15])
axes.set_xlim([1901,2017])
plt.plot(temp.index, temp.DIFF)

temp.DIFF.mean()

10.895128205128202

### 13. Looking into abnormal winters

year_dict = temp.loc[:,’JAN’:’DEC’].to_dict(orient=’index’)
sorted_months = []
for key, value in year_dict.items():
sorted_months.append (sorted(value, key=value.get)[:4])

winter = sorted_months[:]
winter_set = []
for x in winter:
winter_set.append (set(x))
temp[‘WINTER’] = winter_set

winter_routine = max(sorted_months, key=sorted_months.count)

temp.WINTER [temp.WINTER != set(winter_routine)]

YEAR
1957 {FEB, JAN, MAR, DEC}
1976 {FEB, JAN, MAR, DEC}
1978 {FEB, JAN, MAR, DEC}
1979 {FEB, JAN, MAR, DEC}
Name: WINTER, dtype: object

• Abnormal winters, here, mean a season of four months where most cold temperatures where at least one month is different from commonly observed set of winter months.

### References

Configure Anaconda on Emacs

Perhaps my quest for an ultimate IDE ends with Emacs. My goal was to use Emacs as full-flagged Python IDE. This post describes how to setup Anaconda on Emacs. My Setup:

OS: Trisquel 8.0
Emacs: GNU Emacs 25.3.2

Quick Key Guide (See full guide) :

C-x = Ctrl + x
M-x = Alt + x
RET = ENTER

1.1 Download: Download Anaconda from here. You should download Python 3.x version as Python 2 will run out of support in 2020. You don’t need Python 3.x on your machine. It will be installed by this install script. 1.2 Install:

bash Anaconda3-2018.12-Linux-x86.sh

### 2. Adding Anaconda to Emacs

2.1 Adding MELPA to Emacs Emacs package named anaconda-mode can be used. This package is on the MELPA repository. Emacs25 requires this repository to be added explicitly. Important : Follow this post on how to add MELPA to Emacs. 2.2 Installing anaconda-mode package on Emacs

M-x package-install RET
anaconda-mode RET

2.3 Configure anaconda-mode in Emacs

echo “(add-hook ‘python-mode-hook ‘anaconda-mode)” > ~/.emacs.d/init.el

### 3. Running your first script on Anaconda from Emacs

3.1 Create new .py file

C-x C-f
HelloWorld.py RET

print (“Hello World from Emacs”)

3.3 Running it

C-c C-p
C-c C-c

Output

Python 3.7.1 (default, Dec 14 2018, 19:46:24)
[GCC 7.3.0] :: Anaconda, Inc. on linux