XPath CheatSheet

It uses a non-XML syntax to provide a flexible way of addressing (pointing to) different parts of an XML document. It can also be used to test addressed nodes within a document to determine whether t


Table of Content




# Getting started XPath


What is XPath ?

It uses a non-XML syntax to provide a flexible way of addressing (pointing to) different parts of an XML document. It can also be used to test addressed nodes within a document to determine whether they match a pattern or not.

  • XPath is a major element in the XSLT standard.
  • XPath can be used to navigate through elements and attributes in an XML document.
  • XPath stands for XML Path Language
  • XPath uses "path like" syntax to identify and navigate nodes in an XML document
  • XPath contains over 200 built-in functions
  • XPath is a major element in the XSLT standard
  • XPath is a W3C recommendation

Why Xpath ?

XPath is a specification for a query language that locates and extracts data from XML documents, and a comprehensive set of functions for the manipulation of that data. XPath is used to identify, filter, and test nodes and content; and to apply functions or operations on the resulting data sets.

XML provides tree-structured data objects. Each tag in an XML document is a node in the tree. A tag can contain attributes, other tags, and raw data content: these are also nodes in the tree. A valid XML document has exactly one node at the root of its tree. Contained by that node are its children and, contained in those children, descendent nodes branching out until reaching terminal nodes that contain only data content.

XPath provides a language with which to locate nodes, by identifying the nodes address or by finding nodes using tests or filters, and to perform operations on the identified nodes.

In identifying the location of a node, XPath uses the concept of an axis, which describes the relationship between the node that is currently identified (the context) and the node that one wants to locate. These axes include familial relations (parent, child, descendent, sibling, etc.), linear relations (preceding, following, etc.), and two XML markup identifies (attribute and namespace). Combined with tests and filters, XPath provides an immensely powerful but succinct way to exactly identify a specific node or collection of nodes.

An example XPath statement can illustrate this power. The only further knowledge you require is that the slash character is used to separate steps in the traversal; double colons separate axes from tag names, and square brackets contain filtering statements.

Overview

Testing these command in Firefox or Chrome console:
$x('/html/body')
$x('//h1')
$x('//h1')[0].innerText
$x('//a[text()="XPath"]')[0].click()

# XPath Selectors


Descendant selectors

Xpath CSS
//h1 h1
//div//p div p
//ul/li ul > li
//ul/li/a ul > li > a
//div/* div > *
/ :root
/html/body :root > body

Order selectors

Xpath CSS
//ul/li[1] ul > li:first-child
//ul/li[2] ul > li:nth-child(2)
//ul/li[last()] ul > li:last-child
//li[@id="id"][1] li#id:first-child
//a[1] a:first-child
//a[last()] a:last-child

Siblings

Xpath CSS
//h1/following-sibling::ul h1 ~ ul
//h1/following-sibling::ul[1] h1 + ul
//h1/following-sibling::[@id="id"] h1 ~ #id

jQuery

Xpath CSS
//ul/li/.. $('ul > li').parent()
//li/ancestor-or-self::section $('li').closest('section')
//a/@href $('a').attr('href')
//span/text() $('span').text()

Attribute selectors

Xpath CSS
//*[@id="id"] #id
//*[@class="class"] .class
//input[@type="submit"] input[type="submit"]
//a[@id="abc"][@for="xyz"] a#abc[for="xyz"]
//a[@rel] a[rel]
//a[starts-with(@href, '/')] a[href^='/']
//a[ends-with(@href, '.pdf')] a[href$='pdf']
//a[contains(@href, '://')] a[href*='://']
//a[contains(@rel, 'help')] a[rel~='help']

Misc selectors

Xpath CSS
//h1[not(@id)] h1:not([id])
//button[text()="Submit"] Text match
//button[contains(text(),"Go")] Text contains (substring)
//product[@price > 2.50] Arithmetic
//ul[*] Has children
//ul[li] Has children (specific)
//a[@name or @href] Or logic
//a | //div Union (joins results)

# XPath Expressions


Steps and axes

- - - -
// ul / a[@id='link']
Axis Step Axis Step

Prefixes

Prefix Example Means
// //hr[@class='edge'] Anywhere
/ /html/body/div Root
./ ./div/p Relative

Axes

Axis Example Means
/ //ul/li/a Child
// //[@id="list"]//a Descendant

# XPath Predicates


Predicates

//div[true()]
//div[@class="head"]
//div[@class="head"][@id="top"]

Operators

# Comparison
//a[@id = "abcdef"]
//a[@id != "abcdef"]
//a[@yourPrice > 5000]

# Logic (and/or)
//div[@id="head" and position()=10]
//div[(xa and ya) or not(za)]

Using nodes

# Use it inside functions
//ul[count(li) > 5]
//ul[count(li[@class='hide']) > 5]

# Returns '<ul>' that has a '<li>' child
//ul[li]

Indexing

# first for tag <a>
//a[1]

# last for tag <a>                
//a[last()]

# second <li>
//ol/li[2]

# same as above        
//ol/li[position()=2] 

//ol/li[position() > 1] #:not(:first-child)

Chaining order

a[1][@href='/']
a[@href='/'][1]

Nesting predicates

//section[.//h1[@id='hi']]

# Function in Xpath


Node functions

Index Function Description
1) node() It is used to select all kinds of nodes.
2) processing-instruction() It is used to select nodes which are processing instruction.
3) text() It is used to select a text node.
4) name() It is used to provide the name of the node.
5) position() It is used to provide the position of the node.
6) last() It is used to select the last node relative to current node;
7) comment() It is used to select nodes which are comments.

Example

name()            # //[starts-with(name(), 'cool')]
text()            # //button[text()="Submit"]
                  # //button/text()
lang(str)
namespace-uri()

count()           # //table[count(tr)=1]
position()        # //ol/li[position()=2]

String functions

Index Function Description
1) starts-with(string1, string2) It returns true when first string starts with the second string.
2) contains(string1, string2) It returns true when the first string contains the second string.
3) substring(string, offset, length?) It returns a section of the string. The section starts at offset up to the length provided.
4) substring-before(string1, string2) It returns the part of string1 up before the first occurrence of string2.
5) substring-after(string1, string2) It returns the part of string1 after the first occurrence of string2.
6) string-length(string) It returns the length of string in terms of characters.
7) normalize-space(string) It trims the leading and trailing space from string.
8) translate(string1, string2, string3) It returns string1 after any matching characters in string2 have been replaced by the characters in string3.
9) concat(string1, string2, ...) It is used to concatenate all strings.
10) format-number(number1, string1, string2) It returns a formatted version of number1 after applying string1 as a format string. String2 is an optional locale string.

Example

contains()        # font[contains(@class,"head")]
starts-with()     # font[starts-with(@class,"head")]
ends-with()       # font[ends-with(@class,"head")]

concat(x,y)
substring(str, start, len)
substring-before("01/02", "/")  # Result: 01
substring-after("01/02", "/")   # Result: 02
translate()
normalize-space()
string-length()

Boolean functions

Functions Description
true() returns true
false() returns false
not() takes a boolean argument and returns the inverse value.
boolean() Takes an argument and converts it into a boolean value.

  • 0 and NaN return false; any other number returns true
  • Nodes return false if they are empty; otherwise the return is true
  • Empty strings return false; otherwise they return true. Note: By this rule the string false will return true.
not(expr)         # button[not(starts-with(text(),"Submit"))]

Type conversion

string()
number()
boolean()

# XPath Axes


Using axes

- - - -
// ul / a[@id='link']
Axis Step Axis Step
//ul/li                       # ul > li
//ul/child::li                # ul > li (same)
//ul/following-sibling::li    # ul ~ li
//ul/descendant-or-self::li   # ul li
//ul/ancestor-or-self::li     # $('ul').closest('li')

Child axis

# both the same
//ul/li/a
//child::ul/child::li/child::a

# both the same
# this works because `child::li` is truthy 
//ul[li]
//ul[child::li]
# both the same
//ul[count(li) > 2]
//ul[count(child::li) > 2]

Descendant-or-self axis

# both the same
//div//h4
//div/descendant-or-self::h4

# both the same
//ul//[last()]
//ul/descendant-or-self::[last()]

Unions

//a | //span

Other axes

Axis Abbrev Notes
ancestor
ancestor-or-self
attribute @ @href is short for attribute::href
child div is short for child::div
descendant
descendant-or-self // // is short for /descendant-or-self::node()/
namespace
self . . is short for self::node()
parent .. .. is short for parent::node()
following
following-sibling
preceding
preceding-sibling

Axes Reference




Best Suggest