Skeleton Coder

Saturday, November 04, 2006

Beware of SQL Injection

SQL injection is one of the simple but very powerful security threat, that is also very common in various sites.

Let us say we have a website, and we want to let only the registered users to login by asking for username and password. Let us say we store the username and password in the database.

To check whether the user is valid or not, we use the following query.


select * from users where username='abcd' and password='xyz'


If the result contains atleast 1 row, we say the user name password matches.

A typical Java code will be,


String username = "abcd"; // or get from the user.
String password = "xyz"; // or get from the user.

Statement stmt = con.createStatement();
ResultSet rs = stmt.executeQuery(
"select * from users
where username='"+username+"'"
and password='"+password+"'"
);
if(rs.next()){
// user is logged in.
}else{
// login failed.
}


As we are simply concatenating the string, we don't have a clue what the user will type for the username and password fields.

What happens if the user types ' or ''=' including the quotes?
Then, the SQL query will be

select * from users where username='abcd' and password='' or ''=''

This will succeed irrespective of the username or password the user types. This will clearly be a security threat.

How to fix this?
Should we check for the input string whether it contains any single quote and ignore if any? That will be very difficult to check for each and every field.
A simple technique is, use PreparedStatement.

That is the code will then be,

String username = "abcd"; // or get from the user.
String password = "xyz"; // or get from the user.

PreparedStatement stmt = con.prepareStatement("select * from users
where username='?'
and password='?'"
);
stmt.setString(1,username);
stmt.setString(2,password);

ResultSet rs = stmt.executeQuery();

if(rs.next()){
// user is logged in.
}else{
// login failed.
}

When the user types the password as before, it wont allow the user to login as the prepared statement takes care of escaping the characters, there by guaranteeing the expected behavior.

Conclusion:
Always try to use PreparedStatement instead of String concatenations in all places.

Saturday, October 07, 2006

JDBC Tutorials: Commit or Rollback transaction in finally block

In most of JDBC books, the transaction management idiom that is followed is, after executing the update statements commit, and if an SQLException is thrown, rollback.
That is,


Connection con = null;
try{
con = //...
con.setAutoCommit(false);

Statement stmt1 = ...
stmt1.executeUpdate();

// Some operations

Statement stmt2 = ...
stmt2.executeUpdate();

con.commit();
con.setAutoCommit(true);
}catch(SQLException e){
if(con!=null){
try{
con.rollback();
}catch(SQLException e){
// Log the error...
}
}
}



The similar structure is followed in the JDBC(TM) API
Tutorial and Reference
from the Sun Microsystems. Have a look at the Transactions Tutorial and the Sample code provided.

There is a severe problem with this way of commiting and rollback. The problem is we are handling only the SQLException. What will happen if a RuntimeException occured after executing the first update statement but beforethe second update statement?

The transaction is opened, but neither commited nor rolled back. This will leave the data integrity into trouble. If we are reusing the same connection (as in most cases), and we commit the transaction in the next statements, we are into serious trouble. We have inconsitent data.

What is the solution?
Catch Exception instead of SQLException
A simpler and not recommended solution is, catch all the execeptions, including RuntimeException. Even now, what if an Error is thrown, say OutOfMemoryError or some VirtualMachineError or something else? What ever happens in the code, we should either the database should be committed or rolledback. So, the worst thing is we should catch the Throwable class, instead of Exception.

Doesn't this look awkward,Whenever we use transactions we should catch a Throwable class or atleast Exception class?

Use finally block
A clean solution and yet simple solution is, use finally block. Since it is always guaranteed that the finally block will be executed even when any Exception is thrown or even when the method is returned.



Connection con = null;
boolean success = false;
try{
con = //...
con.setAutoCommit(false);

Statement stmt1 = ...
stmt1.executeUpdate();

// Some operations

Statement stmt2 = ...
stmt2.executeUpdate();

success = true;

}catch(SQLException e){
success = false;
}finally{
if(con!=null){
try{
if(success){
con.commit();
con.setAutoCommit(true);
}else{
con.rollback();
}
}catch(SQLException e){
// Log the error...
}
}
}



This way of implementing transactions guarantees that the
transaction is either committed or rolledback before we exit the method.

Saturday, September 30, 2006

Java Tutorials: Overloading is compile-time binding

Most beginners in Java get confused between Overloading and Overriding. One should understand that overloading is compile-time binding whereas overriding is runtime binding.

Have a look at the following example. There are three classes - Base, Derived and Test. As the name indicates class Derived extends class Base. The class Test has two overloaded methods with name methodA, with parameters Base and Derived respectively.





class Base{

}

class Derived extends Base{

}

class Test{
public void methodA(Base b){
System.out.println("Test.methodA(Base)");
}
public void methodA(Derived b){
System.out.println("Test.methodA(Derived)");
}

public static void main(String []args){
Test t = new Test();
Base b = new Base();
Base d = new Derived();

t.methodA(b);
t.methodA(d);

}
}


What is the output?

If your answer is


Test.methodA(Base)
Test.methodA(Derived)

This is wrong

For your surprise the answer is wrong. The actual output is



Test.methodA(Base)
Test.methodA(Base)


Surprised?

This is because overloading is compile-time binding. When the compiler sees the line t.methodA(d);
It checks the data type of 'd', which is declared as 'Base'. So it looks for the method, methodA(Base) and binds the call to this method and hence the result.

Let us look at another common problem. When the programmers think to override the 'equals', but endup really overloading the method, there by creating some unforeseen problems.

Have a look at the following code, and say whether the equals method and the hashCode are implemented in the correct way?





public class EqualsOverloadTest {

String id;

public EqualsOverloadTest(String id){
this.id = id;
}

public boolean equals(EqualsOverloadTest other){
return (other!=null) && this.id.equals(other.id);
}

public int hashCode() {
return id.hashCode();
}

}



In the first go, anyone will say the equals method is implemented correctly.
It follows all the constraints for the 'equals' method, and also implements the 'hashCode()' method following the same contract. But, if you look closer, you fill notice that, the 'equals' method really overloads the Object.equals(Object) method, instead of over loading it.

To prove that this won't work, let me give a simple program. In the main method, we are creating two EqualsOverloadTest objects with the same id. The two objects are added into the Set. Then we are printing the size of the set.





public static void main(String[] args) {
EqualsOverloadTest first = new EqualsOverloadTest("123");
EqualsOverloadTest second =
new EqualsOverloadTest(new String("123"));

System.out.println(first.equals(second));

Set set = new HashSet();
set.add(first);
set.add(second);
System.out.println(set.size());
}






We will expect the size of the Set to be '1' since the two Objects are equal. But it will print as '2'.
This is because we didn't override the 'equals' method. Whereas the first check with the equals method returned true, because we called the method as equals(EqualsOverloadTest), hence the proper method was called. But withing the set, it called the method equals(Object), which is not implemented, so uses the Object.equals(Object), which really checks whether they both are same instance or not. Hence we get an unexpected behaviour.



Summary:
Overloading is a static or compile-time binding and Overriding is dynamic or run-time binding.

Sunday, September 17, 2006

Java Tutorials: i = i++

This one of the frequently asked questions in interview. This question is frequently asked in 'C', but sometimes in 'Java' also.

What is the output of the program?


int i = 0;
i = i++;
System.out.println(i);


In C language, Dennis Ritchie has mentioned in the book "The C Programming Language", the behaviour is undefined and left to implementations. But most of the implementations produce the result as '0'.

Let us see, what will be the output in 'Java'?
When the questions is asked to many people, they immediately said the answer as '1'. They gave the following explanation.

This is wrong.
The line

i=i++;

is equivalent to

i=i;
i++;


That is, the value of i(0) is stored in the LHS, i.e. 'i'. Then the value of 'i' is incremented to '1' and then stored in 'i' and hence the result '1'.

Unfortunately, the answer is wrong. Let us see the reason.

i++ means post increment. It means, the value will be incremented after the operation is performed on it. It doesnt mean, the statement will be completed before execution. That is the value of the variable 'i' will be stored in a temporary location, then the value is incremented and then, the actual operation is performed, in this case assignment, on the value in the temporary location.

Hence,

i = i++;

is equivalent to,

int temp = i; // temp = 0
i++; // i=1
i = temp; // i = 0


Hence we get the result as '0'.

To get a clear understanding, let us look at the byte code for this operation.


public class Increment {

public static void main(String[] args) {
int i=0;
i=i++;
// System.out.println(i); // commented out to avoid unnecessary lines
}

}


Compile the file 'Increment.java'.

Java Byte Code:

In the command line, type,
> javap -c Increment

This will display the methods and the byte code for the Increment class. The byte code will be displayed in the form of mnemonics - human readable form.

The following lines will be printed:

C:\Projects\test\src>javap -c Increment
Compiled from "Increment.java"
public class Increment extends java.lang.Object{
public Increment();
Code:
0: aload_0
1: invokespecial #1; //Method java/lang/Object."":()V
4: return

public static void main(java.lang.String[]);
Code:
0: iconst_0
1: istore_1
2: iload_1
3: iinc 1, 1
6: istore_1
7: return
}


Let us concentrate on the byte code for the main method. Before getting in to this problem, you should have a good understanding about the Java Virtual Machine architecture.
Let us just have a brief introduction about Java Virtual Machine.

Java Virtual Machine
The JVM is Stack Based. That is, for each operation, the data will be pushed into the stack and from the stack the data will popped out to perform the operation. There is another data structure, typically an array to store the local variables.
The local variables are given ids which are just the index to the array. For a non-static method, 'this' reference will be at the array index '0', followed by method parameters and then the other local variables. For static methods, the parameters will start with '0' followed by local variables.

Program Illustration

Let us look at the mnemonics in main() method line by line.

Stack and Local Variables Array before the start of main() method.


| | args i
|---| --------------
| | | | |
|---| --------------
|___|
Stack Local Variable


0: iconst_0
The constant value '0' is pushed to the stack.


| | args i
|-----| --------------
| | | | |
|-----| --------------
| 0 |
-----
Stack Local Variable


1: istore_1
The top element of the stack is popped out and stored in the local variable with index '1'. That is 'i'.


| | args i
|-----| --------------
| | | | 0 |
|-----| --------------
| |
-----
Stack Local Variable


2: iload_1
The value at the location 1, is pushed into the stack.



| | args i
|-----| --------------
| | | | 0 |
|-----| --------------
| 0 |
-----
Stack Local Variable

3: iinc 1, 1
The value at the memory location '1' is incremented by '1'.


| | args i
|-----| --------------
| | | | 1 |
|-----| --------------
| 0 |
-----
Stack Local Variable

6: istore_1
The value at the top of the stack is stored to the memory location '1'. That is '0' is assigned to 'i'.


| | args i
|-----| --------------
| | | | 0 |
|-----| --------------
| |
-----
Stack Local Variable


Hence, we get the result as '0', and not '1'.

Unlike C/C++, this behaviour is guaranteed in Java.

I think this article will give a good insight about the Java Language and the Java VM.

Friday, September 15, 2006

Java Tutorials: ArrayList or Vector?

This is one of the famous questions that a Java beginner has in his mind. This is also a famous question asked in interviews. Following are the differences between ArrayList and Vector.

1. Vectors and Hashtable classes are available from the initial JDK 1.0. But, ArrayList and HashMap are added as a part of new Collections API since JDK 1.2.

2. Vectors and Hashtable are synchronized where as ArrayList and HashMap are unsynchronized.

When to use Vector? When to use ArrayList?

1. ArrayList is faster when compared to Vector since ArrayList is unsynchronized. So, if the List will be modified by only one thread, use ArrayList. If the list is a local variable, you can always use ArrayList.
2. If the List will be accessed by multiple threads, always use Vector, otherwise you should take care of synchronization manually.

To visualize the problem with synchronization, try the following code.

There is a Producer class that adds 5000 elements to the List (ArrayList/Vector). Another class, Consumer class removes 5000 elements from the same list. There are around 10 producer threads and 10 consumer threads.



class Producer implements Runnable {

private List list;

public Producer(List pList) {
list = pList;
}

public void run() {
System.out.println("Producer started");
for (int i = 0; i < 5000; i++) {
list.add(Integer.toString(i));
}
System.out.println("Producer completed");
}

}




class Consumer implements Runnable {
private List list;

public Consumer(List pList) {
list = pList;
}

public void run() {
System.out.println("Consumer started");
for (int i = 0; i < 5000; i++) {
while (!list.remove(Integer.toString(i))) {
// Just iterating till an element is removed
}

}
System.out.println("Consumer completed");
}
}



public class ListTest {

public static void main(String[] args) throws InterruptedException {
// List list = new Vector();
List list = new ArrayList();

for (int i = 0; i < 10; i++) {
Thread p1 = new Thread(new Producer(list));
p1.start();
}

for (int i = 0; i < 10; i++) {
Thread c1 = new Thread(new Consumer(list));
c1.start();
}
Thread.yield();

while (Thread.activeCount() > 1) {
Thread.sleep(100);
}

System.out.println(list.size());

}

}



Try running the program with ArrayList. You can see a number of ArrayIndexOutOfBoundException, Consumer threads will still keep waiting for more elements which wont be added because the Producer has terminated after throwing the Exception.

Now, change the line,
List list = new ArrayList();

to
List list = new Vector();

and run the program.

Now you can see a proper result.

This clearly explains why you should use Vector class when there are multiple threads in the system.

In this program, even if you remove the Consumer class and Consumer thread, you can see that the Producer will themselves throw Exception.

This is because, while adding an element to the ArrayList, it checks for the size of the Array. If the array size is not sufficient, a new array will be created, the elements will be copied to the new array. If the context switching if Threads happen at this place also, we will get ArrayIndexOutOfBoundException, or sometimes, you maynot get any Exception, but some elements will be missing, and many unexpected behaviours.

So always use Vector if there are multiple threads. The same rule applies to HashMap vs Hashtable, StringBuilder vs StringBuffer

Summary:
1. Use Vector if there are multiple threads and ArrayList if there is only a single thread.
2. Use Hashtable if there are multiple threads and HashMap if there is only a single thread.
3. Use StringBuffer if there are multiple threads and StringBuilder if there is only a single thread.

Wednesday, September 13, 2006

Java Tutorials: Why no 'unsigned'?

In the java programming language, we dont have the concept of 'unsigned' integers. Once question that arises in the minds of most beginners is this - Why no 'unsigned' integers in Java.

In languages like C, C++, might have used the unsigned integers extensively for declaring variables like 'age' which can never be negative. The programmers prefer ths same style of declaring the non-negative values as unsigned. But Java doesnt allow that.

Before answering why unsigned is not supported in Java, let us see the problems that can happen because of 'unsigned' type.

Let us say we store the age of a person.

unsigned int age;

Now, we have a problem, what if the age of the person is not known? we are comming up with a great plan of using a value '-1', which means 'not known'.

The requirement is, only if the users age is above 18 they should be allowed to perform some action. If they didnt specify the age, he should be blocked. Let us see, how to write the code...


if(age<18){>
printf("You should be above 18 years to access this feature");
return;
}


On seeing the code, it is common to assume that '-1' is less than '18' if the user didnt specify the age, so he will automatically blocked.

But if you check the C specifications, you can find out that any 'signed' variables will be converted to 'unsigned' automatically if both appear in the same expression.

It means, the value '-1' will be converted into its unsigned counter part i.e. 0xFFFF. This is definitly more greater that '18' thus allowing the user to continue the blocked activity.

This is just a trivial example, but this will make it very difficult to trace the problem in the real life projects. To avoid such potential problems, java removed the 'unsigned' keyword.

This reason may look weird, but there is not much to achieve introducing 'unsigned' when compared to the drawbacks. The omission of unsigned data types is controversial. There are reasons where unsigned could help. Example, having unsigned will increase the maximum limit for the data type. All other reasons revolve around this reason. But this could be solved by using a bigger numeral. Example, instead of unsigned int, we can use long. similarly for unsigned short, we can use 'int'. Still, 'long' doesnt have any alternative other than 'BigInteger' wrapper classes.

There are lots of 'Request For Enhancements (RFE)' to add unsigned. Here is one http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4504839. They have listed a lots of reasons for having an unsigned integers, but almost all the reasons revolve around the 'size'.

Let us look at, whether it is possible to add the new feature without the drawback that we mentioned.

It looks possible for adding the new keyword without losing the type safety of Java - without losing the title 'stongly typed language'. Use the same level of type checking. Make it stict to use explicit casts to convert between unsigned and signed.

Let us say for 'unsigned int', java adds the keyword 'uint'. Then,


uint a = 10;
int b = 20;

uint c = a + b // should throw compilation error.


We should instead use,


uint a = 10;
int b = 20;

uint c = a + (uint)b // should work


This will guarantee the strong type checking nature of Java and satisfying the needs of the programmers - A win-win situation for both the Java architects and for the Java Programmers.

Saturday, July 29, 2006

Implementing 'equals()' method is no longer easy

The Object class provides two important methods - equals() and hashCode().

There are number of tutorials and books that suggests the correct way of implementing these methods. Please have a look at the Chapter 3 of the Effective Java book. This book explains with clear examples on implementing equals() method and hashCode() methods, and the common mistakes. In this post, we can look at one possible problem that was not covered in that book or in most other books. But before proceding, please read the chapter 3 of Effective Java Book.


An equals() method implementation said to be correct, if the following conditions are satisfied.
[source: javadocs]


  1. It is reflexive: for any non-null reference value x, x.equals(x) should return true.
  2. It is symmetric: for any non-null reference values x and y, x.equals(y) should return true if and only if y.equals(x) returns true.
  3. It is transitive: for any non-null reference values x, y, and z, if x.equals(y) returns true and y.equals(z) returns true, then x.equals(z) should return true.
  4. It is consistent: for any non-null reference values x and y, multiple invocations of x.equals(y) consistently return true or consistently return false, provided no information used in equals comparisons on the objects is modified.
  5. For any non-null reference value x, x.equals(null) should return false.
A hashCode() method implementation said to be correct, if the following conditions are satisfied.
[source: javadocs]


  1. Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
  2. If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
  3. It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hashtables.
Read the condition for consistancy one again for the equals method
It is consistent: for any non-null reference values x and y, multiple invocations of x.equals(y) consistently return true or consistently return false, provided no information used in equals comparisons on the objects is modified.

Logically, if the two objects are equal at a particular instant of time, but one object changes its value later, then the two objects should not be equal. This is true as per the specifications. But from an implementation perspective, this will become a source of potential bugs in the system.

Let me explain with an example.
Consider we have a graph or a geographic map. We should mark a point with a text in the graph. Now we have to develope an application that stores the point and the text, like a Server side Java implementation for Google Maps.

Consider the following class.

public final class Location{

private int latitude;
private int longitude;

public Location(){
this(0,0);
}

public Location(int latitude, int longitude){
this.latitude = latitude;
this.longitude = longitude;
}

public int getLatitude() { return latitude; }
public int getLongitude(){ return longitude; }

public void setLatitude(int latitude){this.latitude=latitude; }
public void setLongitude(int longitude){this.longitude=longitude; }

public boolean equals(Object obj){
if( !(obj instanceof Location))
return false;

Location location = (Location) obj;

return latitude== location.latitude
&& longitude== location.longitude;
}

public int hashCode(){
// Following the same instuctions as mentioned
// in Effective Java

int result = 17;
result = 37*result + latitude;
result = 37*result + longitude;
return result;
}
}



This is a perfect implementation of equals() and hashCode() methods as specified in Java API Docs and by Effective Java.

The equals method satisfy all the 5 conditions. The hashCode() method satisfies all the 3 conditions.

Let us write a simple Java program that stores the map of Location --> Place Name pairs in a HashMap and simulates the following situation.

  1. Add the mapping for the "Madras" city. The Latitude : 13, Longitude : 81. (Approximating to integer values).
  2. Make a small correction in the latitude-longitude pair (13,80).
  3. Change the city name from "Madras" to "Chennai". That is add the new value to the Map.

See the sample program.


import java.util.HashMap;

public class LocationTest{

public static void main(String []args) {

HashMap map = new HashMap();

Location loc = new Location(13,81);
map.add(loc, "Madras");

loc.setLongitude(80);

/* Since the key is already present in the map
* The add method should replace the
* existing entry.
* if you want to explicitly remove the prev mapping,
* uncoment the following line.
*/

// map.remove(loc);

map.add(loc, "Chennai");

System.out.println("Number of elements : "+map.size());

}

}

Guess the output!!!

The expected result is 'Number of elements : 1'. But the actual result is 'Number of elements : 2'

Removing the first location explicity will not change the result. Can you guess where we went wrong?

The problem is the element that is already present in the map is modified.

Let us see this with a small illustration.

When adding the location for the first time [13,81] the hashCode is 23835. Assume that the HashMap implementation simply finds the remainder by 3 and chooses the appropriate hash bucket. So the entry is added to the Hash Bucket 0.

The Hash Bucket looks as follows.




----- ---------
0 --> [13,81]
----- ---------
1
-----
2
-----




Now we are changing the longitude from 81 to 80.





----- ---------
0 --> [13,80]
----- ---------
1
-----
2
-----



The new hashCode is 23834. Hence the this entry should go to bucket 2.
In bucket 2, there is no entry that is equal to [13,80]. Hence a new value is added.





----- ---------
0 --> [13,80]
----- ---------
1
----- ---------
2 --> [13,80]
----- ---------




How to solve this issue?


  • Notify the HashMap to update the hash buckets whenever an entry is changed.

  • Change the Map implementation to check all the elements in the HashMap whenever a new entry is added.

  • Make the keys immutable.


The first two solutions try to patch the problem. It will have too much overhead making it very difficult to implement or very inefficient to use. The third solution identifies the root cause of the problem, so we can avoid the problem from the root.

Never use a mutable object as a key for Maps and Sets.
From this example, you would have understood the need for making the Classes immutable. This is also one of the reason for the classes like String, Integer, Long, Float, Double, etc to be immutable.
Hence the correct implementation is,

public final class Location{
private int latitude;
private int longitude;

public Location(){
this(0,0);
}
public Location(int latitude, int longitude){
this.latitude = latitude;
this.longitude = longitude;
}
public int getLatitude() { return latitude; }
public int getLongitude(){ return longitude; }

public boolean equals(Object obj){
if( !(obj instanceof Location))
return false;

Location location = (Location) obj;
return latitude == location.latitude
&& longitude== location.longitude;
}

public int hashCode(){
// Following the same instuctions as mentioned
// in Effective Java

int result = 17;
result = 37*result + latitude;
result = 37*result + longitude;
return result;
}
}

Summary:
  1. Use only immutable objects as Keys for Maps and Sets.
  2. Don't use mutable fields for equals comparison and hashCode comutation. (See how equals and hashCode work for StringBuffer class).
  3. The equals and hashCode methods should be consistent over time, although the Java Specs doesn't mandate them.