The post Identifying Misspelled Words in your Dataset with Hunspell appeared first on Brandon Rozek.

]]>I assume in this article that you have hunspell and it's integration with python installed. If not, please refer to the article mention above and follow the prerequisite steps.

This article is inspired from the need to correct misspelled words in the Dress Attributes Dataset. I'll share with you my initial pitfall, and what I ended up doing instead.

Misspelled words are common when dealing with survey data or data where humans type in the responses manually. In the Dress Attributes Dataset this is apparent when looking at the sleeve lengths of the different dresses.

```
dresses_data['SleeveLength'].value_counts()
```

Word | Frequency |
---|---|

sleevless | 223 |

full | 97 |

short | 96 |

halfsleeve | 35 |

threequarter | 17 |

thressqatar | 10 |

sleeveless | 5 |

sleeevless | 3 |

capsleeves | 3 |

cap-sleeves | 2 |

half | 1 |

Petal | 1 |

urndowncollor | 1 |

turndowncollor | 1 |

sleveless | 1 |

butterfly | 1 |

threequater | 1 |

Ouch, so many misspelled words. This is when my brain is racking up all the ways I can automate this problem away. Hence my stumbling upon Markus' post.

First, I decided to completely ignore what Markus warns in his post and automatically correct all the words in that column.

To begin the code, let's import and create an instance of the spellchecker:

```
from hunspell import HunSpell
spellchecker = HunSpell('/usr/share/hunspell/en_US.dic', '/usr/share/hunspell/en_US.aff')
```

I modified his `correct_words`

function so that it only corrects one word and so I can `apply`

it along the `SleeveLength`

column.

```
def correct_word(checker, word, add_to_dict=[]):
"Takes in a hunspell object and a word and corrects the word if needed"
# Add custom words to the dictionary
for w in add_to_dict:
checker.add(w)
corrected = ""
# Check to see if it's a string
if isinstance(word, str):
# Check the spelling
ok = checker.spell(word)
if not ok:
# Grab suggestions for misspelled word
suggestions = checker.suggest(word)
if suggestions:
# Grab the best suggestion
best = suggestions[0]
corrected = best
else:
# There are no suggestions for misspelled word, return the original
corrected = word
else:
# Word is spelled correctly
corrected = word
else:
## Not a string. Return original
corrected = word
return corrected
```

Now let's apply the function over the `SleeveLength`

column of the dataset:

```
dresses_data['SleeveLength'] = dresses_data['SleeveLength'].apply(
lambda x: correct_word(spellchecker, x))
```

Doing so creates the following series:

Word | Frequency |
---|---|

sleeveless | 232 |

full | 97 |

short | 96 |

half sleeve | 35 |

three quarter | 17 |

throatiness | 10 |

cap sleeves | 3 |

cap-sleeves | 2 |

Petal | 1 |

butterfly | 1 |

turndowncollor | 1 |

half | 1 |

landownership | 1 |

forequarter | 1 |

As you might be able to tell, this process didn't go as intended. `landownership`

isn't even a length of a sleeve!

This is when I have to remember, technology isn't perfect. Instead we should rely on ourselves to identify what the word should be correctly spelled as.

Keeping that in mind, I modified the function again to take in a list of the data, and return a dictionary that has the misspelled words as the keys and suggestions as the values represented as a list.

```
def list_word_suggestions(checker, words, echo = True, add_to_dict=[]):
"Takes in a list of words and returns a dictionary with mispellt words as keys and suggestions as a list. Also prints it out"
# add custom words to the dictionary
for w in add_to_dict:
checker.add(w)
suggestions = {}
for word in words:
if isinstance(word, str):
ok = checker.spell(word)
if not ok and word not in suggestions:
suggestions[word] = checker.suggest(word)
if not suggestions[word] and echo:
print(word + ": No suggestions")
elif echo:
print(word + ": " + "[", ", ".join(repr(i) for i in suggestions[word]), "]")
return suggestions
```

With that, I can use the function on my data. To do so, I convert the pandas values to a list and pass it to the function:

```
s = list_word_suggestions(spellchecker, dresses_data['SleeveLength'].values.tolist())
```

These are the suggestions it produces:

```
sleevless: [ 'sleeveless', 'sleepless', 'sleeves', 'sleekness', 'sleeve', 'lossless' ]
threequarter: [ 'three quarter', 'three-quarter', 'forequarter' ]
halfsleeve: ['half sleeve', 'half-sleeve', 'sleeveless' ]
turndowncollor: No suggestions
threequater: [ 'forequarter' ]
capsleeves: [ 'cap sleeves', 'cap-sleeves', 'capsules' ]
sleeevless: [ 'sleeveless', 'sleepless', 'sleeves', 'sleekness', 'sleeve' ]
urndowncollor: [ 'landownership' ]
thressqatar: [ 'throatiness' ]
sleveless: [ 'sleeveless', 'levelness', 'valveless', 'loveless', 'sleepless' ]
```

From here, you can analyze the output and do the replacements yourself:

```
dresses_data['SleeveLength'].replace('sleevless', 'sleeveless', inplace = True)
```

This is where you ask "What's the difference if it doesn't automatically fix my data?"

When you have large datasets, it can be hard to individually identify which items are misspelled. Using this method will allow you to have a list of all the items that are misspelled which can let you deal with it in a systematic way.

The post Identifying Misspelled Words in your Dataset with Hunspell appeared first on Brandon Rozek.

]]>The post Obtaining Command Line Input in Java appeared first on Brandon Rozek.

]]>`Scanner`

class
First import the relevant library

```
import java.util.Scanner;
```

Then create a variable to hold the `Scanner`

object

```
Scanner input;
input = new Scanner(System.in);
```

Inside the parenthesis, the `Scanner`

binds to the System input which is by default the console

The new varible `input`

now has the ability to obtain input from the console. To do so, use any of the following methods

Method | What it Returns |
---|---|

next() | The next space seperated string from the console |

nextInt() | An integer if it exists from the console |

nextDouble() | A double if it exists from the console |

nextFloat() | A float if it exists from the console |

nextLine() | A string up to the next newline character from the console |

hasNext() | Returns true if there is another token |

close() | Unbinds the Scanner from the console |

Here is an example program where we get the user’s first name

```
import java.util.Scanner;
public class GetName {
public static void main(String[] args) {
Scanner input = new Scanner(System.in);
System.out.print("Please enter your name: ");
String firstName = input.next();
System.out.println("Your first name is " + firstName);
}
}
```

The post Obtaining Command Line Input in Java appeared first on Brandon Rozek.

]]>The post Escape Sequences in Java appeared first on Brandon Rozek.

]]>

Character | Escape Sequence |
---|---|

Newline | \n |

Tab | \t |

Backspace | \b |

Double Quote | \” |

Single Quote | \’ |

Backslash | \\ |

The post Escape Sequences in Java appeared first on Brandon Rozek.

]]>The post Java Swing Components appeared first on Brandon Rozek.

]]>

Buttons are created using the JButton component. The constructor takes the text placed inside the button.

```
JButton stopBtn = new JButton("Stop");
```

You can also add images inside a button. To do that you need to get the image and make it into an icon. The following code grabs the image file “smallpanda.jpg” from the current working directory.

```
Image img = this.getImage(this.getCodeBase(), "smallpanda.jpg");
ImageIcon imgIcon = new ImageIcon(img);
JButton feedBtn = new JButton("Feed", imgIcon);
```

Sometimes, you want to change the location of the text in the button. Like say, we want to place the text in the center horizontally and bottom vertically.

```
feedBtn.setHorizontalTextPosition(JButton.CENTER);
feedBtn.setVerticalTextPosition(JButton.BOTTOM);
```

Don’t forget to add your buttons to the screen!

```
this.add(stopBtn);
this.add(feedBtn);
```

One of the most common forms of input is a text field, usually distinguished with a label. Those components are called JTextField and JLabel respectively. The constructor for JTextArea can take just the width of the text field, or another common use is to have already inputed text and its width.

```
JLabel nameLabel = new JLabel("Enter in your name: ");
// Create an input and set the width to be 10px wide
JTextField nameInput = new JTextField(10);
//Override nameInput with a field already contains the text "Brandon"
//And is 10px wide
nameInput = new JTextField("Brandon", 10);
this.add(nameLabel);
this.add(nameInput);
```

Checkboxes are commonly used when giving the possibility for multiple answers. Such as, check all of the foods that you like.

```
JCheckBox pizza = new JCheckBox("Pizza");
JCheckBox noodles = new JCheckBox("Noodles");
JCheckBox rice = new JCheckBox("Rice");
this.add(pizza);
this.add(noodles);
this.add(rice);
```

You can even replace the default look of the checkbox with an image. To do this, you need to make image icons for both when it’s checked and when it’s unchecked.

```
Image checkedImage = this.getImage(this.getCodeBase(), "checked.png");
Image uncheckedImage = this.getImage(this.getCodeBase(), "unchecked.png");
ImageIcon checkedIcon = new ImageIcon(checkedImage);
ImageIcon uncheckedIcon = new ImageIcon(uncheckedImage);
JCheckBox checkbox = new JCheckBox("Check Me", uncheckedIcon);
checkbox.setSelectedIcon(checkedIcon);
this.add(checkbox);
```

Text Areas are different from text fields in which it is made to be able to hold multiple lines of text. It’s called JTextArea and its construction takes a width and height as it’s arguments.

```
JTextArea textarea = new JTextArea(10, 10);
```

By default, when the someone inputs more text than the size can hold, it will automatically grow with the text inputted. To override this behaviour and instead introuduce scroll bars. One must define a ScrollPane and put the TextArea inside of it by using it as the scroll pane’s argument for its constructor.

```
JScrollPane scrollPane = new JScrollPane(textarea);
```

Radio buttons are used for when you only want one out of many different options to be selected. For this, one needs to define a button group that houses the radio buttons for the user to choose from. This can be achieved with ButtonGroup and JRadioButton respectively.

```
// Make the radio buttons
JRadioButton radio1 = new JRadioButton("Pies");
JRadioButton radio2 = new JRadioButton("Cakes");
JRadioButton radio3 = new JRadioButton("Cookies");
// Put the radio buttons in a group
Button Group desserts = new ButtonGroup();
desserts.add(radio1);
desserts.add(radio2);
desserts.add(radio3);
// Add the radio buttons to the screen
this.add(radio1);
this.add(radio2);
this.add(radio3);
```

To display a list of items that are clickable by the user, you can use a `JList`

. JLists require a model that stores the list implementation, we’ll use `DefaultListModel`

to achieve this purpose.

```
DefaultListModel model = new DefaultListModel();
JList list = new JList(model);
```

To add scrolling capabilities, remember to add it to a scroll pane

```
JScollPane sp = new JScrollPane(list);
```

You can set the number of items you wish to see in the list. The example below, allows us to see three items in the list.

```
list.setVisibleRowCount(3);
```

There are a variety of ways to add items to the list. If a number is specified that tells it to place it at the index specified. Starting from the top at zero, to the button.

```
model.addElement("Apples")
model.addElement("Cherries");
model.addElement("Bananas");
// Adds 'Oranges' to the top
model.add(0, "Oranges");
```

Sometimes, you want to only let the user select one item. At the end, don’t forget to add the component to the screen!

```
list.setSelectionMode(ListSelectionModel.SINGLE_SELECTION);
this.add(sp);
```

To create a dropdown list of different options, consider using a JComboBox.

```
JComboBox cb = new JComboBox();
cb.addItem("Select Food Option");
cb.addItem("Pizza");
cb.addItem("Burger");
cb.addItem("Hot Dog");
cb.addItem("Steak");
// Add it to the screen
this.add(cb);
```

The post Java Swing Components appeared first on Brandon Rozek.

]]>The post Using System Themes In Java Swing appeared first on Brandon Rozek.

]]>

In the init method of your java application, place the following code.

```
try {
UIManager.setLookAndFeel(UIManager
.getSystemLookAndFeelClassName());
} catch(Exception e) {}
```

Here the application will attempt to look up the system theme and set that as the default styles for the Swing components. If the lookup fails, then it will default back to the metal theme.

For more information, check out this page from Oracle.

If it is so easy to set up applications that look native to each desktop environment, why not have that by default? With the cross platform metal theme, you can ensure that the style of your application is the same across all the operating systems. In this fashion, you don’t need to worry about spacing between components and have full control of the “look and feel” of your application.

Since I am used to development for the web, I don’t have strong motivation to have an application look the same on all platforms. I prefer the application to match the system theme and look like it was built for the platform that I am on. One loses partial control on the presentation of your application across different desktop environmnets, but with a strong layout, it is possible to make it look organized and integrated.

The post Using System Themes In Java Swing appeared first on Brandon Rozek.

]]>The post Viewing Java Applets appeared first on Brandon Rozek.

]]>Following around using a normal text editor, I found that I couldn’t just compile and run the code like I have with my java programs in the past. To be able to help around and assist in the course, I need to be able to build and run these applications. The rest of this article describes the process I underwent to be able to use my existing setup to write and build java applets. Of course you can always install JGrasp and have that all built in, but it’s always nice to not have to change your workflow.

When I tried following along, I would receive the following error

`Main method not found in class HelloWorld, please define main method as...`

Which makes sense since I have never defined a main method inside my source code. So how do I go about doing this?

Java Applets are meant to run on web pages, because of this one needs an html file to host the applet. The following code gives you the bare minimum for setting up the html file. I called it `HelloWorld.html`

.

```
<html>
<head><title>Applet Container<title>
<body>
<applet code='HelloWorld.class' width=400 height=400></applet>
</body>
</html>
```

To get it up and running, I will show a “Hello World” like application for applets.

```
import javax.swing.JApplet;
import java.awt.Graphics;
public class HelloWorld extends JApplet {
public void paint(Graphics g) {
g.drawString("Hello World", 30, 30);
}
}
```

Now we need to compile the code

`javac HelloWorld.java`

Then run the appletviewer

`appletviewer HelloWorld.html`

This tutorial concludes the setup of running a simple Java applet. From here you can look at the different methods in the Graphics library and play around

The post Viewing Java Applets appeared first on Brandon Rozek.

]]>The post Monte Carlo Pi appeared first on Brandon Rozek.

]]>

Pi is a mathematical constant consisting of the ratio between the circumfrence of a circle and it’s diameter.

The circumfrence of the circle is defined to be $$ C = 2\pi r$$ while the diameter of the circle is $$d = 2r$$

Take the ratio between the two and you get $$\frac{2\pi r}{2r} = \pi$$

Now let us consider the area of a circle. One can derive the area of a circle by taking the integral of the circumfrence with respect to it’s radius $$ A_{circle} = \int{(2\pi r) dr} = \pi r^2 $$

Let us simplify the formula more by setting the radius equal to one. $$A_{circle} = \pi$$

Now consider only the first quadrant of the circle. Since our circle is centered at the origin and all the points on the circumfrence is equidistant from the center, our area is now $$A_{circle} = \frac{1}{4} \pi$$

And bound the quarter-circle in a 1×1 box with an area of $$A_{square} = 1^2 = 1$$

Notice that the ratio between the circle and the square is a quarter of pi $$\frac{A_{circle}}{A_{square}} = \frac{\frac{1}{4} \pi}{1} = \frac{1}{4} \pi$$

The formula for a circle centered at the origin with radius one is $$x^2 + y^2 = 1$$

Let us focus again on the first quadrent, and do a Monte Carlo simulation to find the area of the quarter-circle

We can do this by what is called the dart board method. We generate a random x and y between 0 and 1. If it satisfies the inequality $$x^2 + y^2 \leq 1$$ then it counts as being inside the circle, if not then it lies outside the circle.

That point will count as an really small area. The point will always be inside the square but may sometimes be inside the circle. Running the simulations a large number of times allows us to add up all the tiny little areas that make up the circle and the square.

To add up these small areas we need to make an assumption. The assumption is that the variance of all the little Monte Carlo trials are the same. Since we are using a psuedo-random number generator, it is safe to assume it is met.

This will allow us to perform a pooled empiricle probability on the simulations to sum up the areas.

Meaning the area of the circle will be the number of times that the inequality was satisfied $$A_{circle} = \# Successes$$

And the area of the square will be the number of times the simulation was run, since the random numbers generated will always be between 0 and 1 $$A_{square} = \# Trials$$

Recall that taking the ratio of the area of the circle and the area of the square is a fourth of pi. $$\frac{\frac{1}{4} \pi}{1} = \frac{1}{4} \pi$$

Multiply this number by 4 and you get the value for pi.

This tells us that four times the probability that the randomly generated point is in the circle is equal to pi.

$$\pi = 4 * (Probability\ of\ being\ inside\ circle) = 4 * \frac{\# Success}{\# Trials} = 4 * \frac{A_{circle}}{A_{square}}$$

For the Monte Carlo simulation I used Java. The BigDecimal implementation was used so that there wouldn’t be any issue with integer size limits

```
/** Calculates Pi
* @author Brandon Rozek
*/
// Big Integers are used so we don't run into the integer size limit
import java.math.BigInteger;
import java.math.BigDecimal;
class MonteCarloPi {
public static void main(String[] args) {
BigInteger successes = BigInteger.ZERO;
BigInteger trials = BigInteger.ZERO;
```

For this simulation, we will run 1,000,000,000 trials

```
BigInteger numTrials = new BigInteger("1000000000");
/*
Monte Carlo Simulation
Generate a random point 0 <= x < 1 and 0 <= y < 1
If the generated point satisfies x^2 + x^2 < 1
Count as a success
Keep track of the number of trials and successes
*/
for (; trials.compareTo(numTrials) < 0; trials = trials.add(BigInteger.ONE)) {
double randomX = Math.random();
double randomY = Math.random();
if (Math.pow(randomX, 2) + Math.pow(randomY, 2) < 1) {
successes = successes.add(BigInteger.ONE);
}
}
```

And then we finalize it with a quick calculation of pi

```
// (Number of successes) / (Number of trials) * 4 gives the approximation for pi
BigDecimal pi = new BigDecimal(successes)
.divide(new BigDecimal(trials))
.multiply(new BigDecimal("4"));
System.out.println("The calculated value of pi is: " + pi);
}}
```

We found an approximation of pi using the Monte Carlo methods! I find that really awesome, however, there are some concerns I have with this approach.

1) We don’t keep track of double counting. One possible solution for this is increasing the radius and bounding box appropriately so that the probability of double counting is low.

2) Speed. The more trials you ask it to run, the longer it takes to perform all of the simulations. One possible way around this is to write a parrallel version of this code. That’s possible because of the equal variance that we spoke of earlier. Pooling the successses and trials will still result in a good approximation.

The post Monte Carlo Pi appeared first on Brandon Rozek.

]]>The post Simplifying Expressions with Octave appeared first on Brandon Rozek.

]]>

First install Octave and the symbolic package using the website or your package manager of choice.

Then in octave type in the following code

```
pkg load symbolic
```

For every variable not defined earlier in your expression, make sure to declare it as a symbolic data type

`syms x y`

Then make an expression

```
expr = y + sin(x)^2 + cos(x)^2
```

You can then ask Octave to simplify the expression for you

```
simp_expr = simplify(expr)
```

Displaying it shows it as

`(sym) y + 1`

Which is indeed a simplification using a trig identity

The post Simplifying Expressions with Octave appeared first on Brandon Rozek.

]]>The post Uniformity of Math.random() appeared first on Brandon Rozek.

]]>

Today, I will compare Internet Explorer 11, Chrome, and Firefox on a Windows 7 machine and report my results.

H0: The random numbers outputted follow the uniform distribution

HA: The random numbers outputted do not follow the uniform distribution

I wrote a small website and obtained my data by getting the CSV outputted when I use IE11, Firefox, and Chrome.

The website works by producing a random number using `Math.random()`

between 1 and 1000 inclusive and calls the function 1,000,000 times. Storing it’s results in a file

This website produces a file with all the numbers separated by a comma. We want these commas to be replaced by newlines. To do so, we can run a simple command in the terminal

```
grep -oE '[0-9]+' Random.csv > Random_corrected.csv
```

Do this with all three files and make sure to keep track of which is which.

Here are a copy of my files for Firefox, Chrome, and IE11

Since we’re interested in if the random values occur uniformly, we need to perform a Chi-Square test for Goodness of Fit. With every test comes some assumptions

__Counted Data Condition:__ The data can be converted from quantatative to count data.

__Independence Assumption:__ One random value does not affect another.

__Expected Cell Frequency Condition:__ The expected counts are going to be 10000

Since all of the conditions are met, we can use the Chi-square test of Goodness of Fit

For the rest of the article, we will use R for analysis. Looking at the histograms for the three browsers below. The random numbers all appear to occur uniformly

```
rm(list=ls())
chrome = read.csv("~/Chrome_corrected.csv", header = F)
firefox = read.csv("~/Firefox_corrected.csv", header = F)
ie11 = read.csv("~/IE11_corrected.csv", header = F)
```

```
hist(ie11$V1, main = "Distribution of Random Values for IE11", xlab = "Random Value")
```

`hist(firefox$V1, main = "Distribution of Random Values for Firefox", xlab = "Random Value")`

`hist(chrome$V1, main = "Distribution of Random Values for Chrome", xlab = "Random Value")`

Before we run our test, we need to convert the quantatative data to count data by using the plyr package

```
#Transform to count data
library(plyr)
chrome_count = count(chrome)
firefox_count = count(firefox)
ie11_count = count(ie11)
```

Run the tests

```
# Chi-Square Test for Goodness-of-Fit
chrome_test = chisq.test(chrome_count$freq)
firefox_test = chisq.test(firefox_count$freq)
ie11_test = chisq.test(ie11_count$freq)
# Test results
chrome_test
```

As you can see in the test results below, we fail to reject the null hypothesis at a 5% significance level because all of the p-values are above 0.05.

```
##
## Chi-squared test for given probabilities
##
## data: chrome_count$freq
## X-squared = 101.67, df = 99, p-value = 0.4069
```

`firefox_test`

```
##
## Chi-squared test for given probabilities
##
## data: firefox_count$freq
## X-squared = 105.15, df = 99, p-value = 0.3172
```

`ie11_test`

```
##
## Chi-squared test for given probabilities
##
## data: ie11_count$freq
## X-squared = 78.285, df = 99, p-value = 0.9384
```

At a 5% significance level, we fail to obtain enough evidence to suggest that the distribution of random number is not uniform. This is a good thing since it shows us that our random number generators give all numbers an equal chance of being represented. We can use `Math.random()`

with ease of mind.

The post Uniformity of Math.random() appeared first on Brandon Rozek.

]]>The post Knit a Document in RStudio appeared first on Brandon Rozek.

]]>

First go to File->Knit Document. If this is your first time, then it will install RMarkdown, a dependency this tool needs to compile the report.

Once that is downloaded, it will let you choose between three different file formats (HTML, PDF, MS Word). For the purposes of blog posts, I like to output it in HTML so I can copy and paste the code. But for personal use, I like using PDFs

After you select the file format, hit compile, and voila! A nice neat compiled report is created for you. Here is a pdf example of the report I made.

The post Knit a Document in RStudio appeared first on Brandon Rozek.

]]>