Introduction to IP: Internet Protocol

As far as the TCP/IP protocol suite there’s little doubt which does the most work and that’s IP itself. Indeed all the data generated by TCP, ICMP, UDP and even IGMP gets transmitted in the form of IP datagrams. Many people are often surprised that IP basically provides a connectionless and unreliable way of delivering datagrams.

Now you’ll probably hear this expression often when discussing network protocols – so what exactly does ‘unreliable’ mean. Well it basically means that there will be no guarantee that an IP datagram will make it to it’s destination. The delivery IP provides is a best effort to deliver the data to it’s specified destination. So if there is a problem, such as a router failure then IP has a very simple way of handling the error. The algorithm in charge will simply throw away the data and try to send an ICMP message back to the source device. If you need a reliable delivery then it must be supplied by one of the upper layer protocols – most commonly TCP.

The other important term specified is – connectionless, which basically means that IP doesn’t keep any information about the state of successive pieces of data. In essence every datagram is handled completely independently from all the other datagrams. There are other implications to this most importantly that IP datagrams will often arrive out of sequence. Every datagram can potentially take different routes and there’s no guarantee that the first datagram will arrive before the second one.

The above diagram illustrates the format of an IP datagram, the normal size of this header is 20 bytes although this can change if any options are present as described above.  You can see that the most  significant bit on the left hand side is numbered 0 whereas the least significant but of the 32 bit value is numbered 31 on the right hand side.

The four bytes in the 32-bit value are transmitted in a specific order – bits 0-7 are first then bits 8-15, then 16-23 and finally bits 24 to 31 are transmitted last.   This very specific form of ordering is called big endian, this is the byte ordering system which must be used for all binary integers stored in the TCP/IP headers.  The big endian format basically means that the data is stored with the biggest first that is the most significant value in the sequence is stored at the lowest storage address (i.e the first)

There are other formats which can be used to store these binary integers either for digital transmission or in a computer’s memory. The opposite format is called unsurprisingly, little endian and many computer operating systems use this format e.g. OS2, Vax/VMS and even Windows. These must be converted into the correct network byte order before being transmitted.

Remember the current protocol version of IP used across the internet is version 4 so you’ll normally see the IP and IP addresses referred to as IPv4.  This corresponds to the 4 bit version field at the beginning of the header.  There are many proposal for updating the current version of IP.

Next comes the header length, which specifies the number of 32 bit words in the header including any of the options fields.  There is an important point to remember that this is only a 4 bit field which means that the header is limited in size to 60 bytes.   With no options present this field is normally set to 5, the limitation has other implications including making some options fairly useless.

The type of service field is normally abbreviated to TOS and is made up of several parts – a 3 bit precedence field which is largely ignored today, 4 terms of service bits and an additional bit which is always set to 0.   There are 4 possible settings to this ToS field which are as follows:

  • Minimize Delay
  • Maximize Throughput
  • Maximize Reliability
  • Minimize Monetary Cost

Only one of these four bits can be turned on at any time. However more commonly all the bits are set to 0 which is the standard for normal service.  The guidelines for these bits and also a detailed explanation of this feature can be found in RFC 1349.  However it is important to remember that the values of this field are normally dependent upon what application is being used.  For example to use an interactive login application like Telnet you would normally have the Minimize delay value because it transfers relatively small amounts of data in response to human keystrokes and commands.   Whereas an application which was transferring large amounts of data like the File Transfer Protocol (FTP) would set the value to maximize throughput.

Most modern TCP/IP implementations will not generally support this feature although this could change.  New routing protocols are implementing features to enable routing decisions to be based on this field if it is set.

The total length field refers simply to the total size of the datagram in bytes.  The most common use of this and the header length field is to identify the beginning of the data portion of the IP datagram.  The size of this field is 16 bits and therefore the maximum size of a single IP datagram is therefore 65535.  Although it should also be noted that although 65535 bytes is possible, mostly this would be fragmented by the majority of link layers.

Each datagram that is sent by an individual host will be uniquely identified using the identification field.  Any network analyst or engineer will know how useful this field is for the simple fact that it increments by one every time another datagram is sent. There are lots of other uses to this field with regards re-assembly and fragmentation, however these are a little out of scope of this particular introduction article.  For further reading it’s worth checking out RFC 791, which discusses the field and it’s potential in more detail.

The next field is the TTL or Time to Live Field which despite it’s name essentially sets an upper limit on the number of routers which each datagram can pass through.  It’s an important field as it essentially limits the datagram’s lifespan.  The initial value is initialized by the sender (usually 32 or 64) and then every time it passes through a router the value is reduced by one.   When the value reaches 0 the datagram is discarded by the next router or proxy server and the sender notified by an ICMP message.   This is important as without this field, a datagram could effectively get stuck in a routing loop potentially forever.

Lots of protocols can send data to IP like ICMP, UDP and of course TCP.  In order to identify which protocol was responsible there need to be an identifier in the IP header. IP does allow for this by storing an 8 bit value in it’s header which is known as the protocol field.

The next field is called the header checksum which is calculated for the IP header only.  It’s important to realise that the checksum doesn’t cover any data which follows the header.  Most protocols such as UDP and TCP have their own checksum in their own headers which normally cover both the header and data as well.  The method to compute a checksum requires that the field is initially set to 0.  After this the entire header is considered a sequence of 16 bit words and the complement sum calculated.

The next two fields concern IP routing which deserves it’s own post and we will cover separately.

 

Leave a Reply

Your email address will not be published. Required fields are marked *