How to use Advanced SearchUnlike the Quick Search, the Advanced Search lists only those terms that satisfy the search criteria exactly. For example, if you search for smoked, you will find the term smoked, but you will not find a link to the term smoke – although it is in the database. This search does not disregard punctuation. If you search for rain-forest (with a hyphen), you will not find it as the database only contains rain forest (without a hyphen). Neither of the searches is case-sensitive, however – so whether you enter téarma or TÉARMA (or even TéArMa) you will get the same results. Show
The following criteria can be used individually or in combination to narrow your search results: Length (single-word/multi-word)This criteria allows you to limit your search to either single-word or to multi-word terms. The default option includes both. ExtentThis criteria allows you to limit your search to terms beginning with, ending with or incorporating a particular word or string of characters. LanguageThis criteria allows you to limit your search results to a particular language. The default option includes terms from all languages. Part of speechThis criteria allows you to limit your search results to a particular part of speech. For example, you can request a list containing nouns only. DomainThis criteria allows you to limit your search results to a particular domain. For example you could search the word bat and restrict your results to the ‘Sports’ domain, thus filtering out results relating to nocturnal flying mammals. Combining CriteriaYou can select any combination of the above criteria for your search. The search engine will return terms which satisfy all of those criteria selected. If your search yields no result, you may have selected criteria that are too narrow. In that case, it is worth looking back on your criteria and broadening them in some way. Wildcards in Advanced SearchA clever way to make your Advanced Search more flexible is to use wildcards in the 'Foclaíocht/Wording' box. A wildcard is a symbol which stands for one or more unspecified characters. These are the available wildcards: Underscore: _This represents any single character. For example, if you search for l_w, you will find law and low, as well as the abbreviations LBW and LLW. Percentage: %This represents any number (including zero) of any characters. For example, if you search for met%rology, you will find metamorphic petrology and meteorology, as well as metrology. This wildcard, when used on its own, will produce a list of every entry that satisfies the selected search criteria. Regular expressions is a convention of using some characters instead of unspecified letters or numbers. They are used to set criteria for strings of characters, e.g. words or tags, which have a common pattern, e.g. start the same way, finish the same way or contain certain characters. Regular expressions are used mainly inside CQL, in word lists and n-grams. This page only gives a few basic examples, please refer to Wikipedia, try our regular expressions exercises or this interactive course. Wild cardsWild cards are not regular expressions but users know them from other software. They are only supported in the simple concordance search. Using wild cards in simple concordance search Only in simple concordance search, the asterisk (*), question mark (?) and double dashes ( asterisk (*) stands for zero or more characters c*t will find question mark
(?) stands for exactly 1 character c?t will find these lemmas To search for the asterisk and question mark, use backslash (\) such as double dashes ( vertical bar ( cat cat, dog, horse Regular expressionsRegular expressions (not wildcards!) are used in all the other concordance searches, in CQL to specify patterns for values and with wordlists to only include/exclude certain types of items. Regular expressions and CQL Regular expressions are used in CQL to specify patterns for values. [word = “dis.*“] [tag = “V.*“] finds words beginning dis- followed by a verb [tag=”J.*“] [word=”[[:upper:]]*“] finds adjectives followed by an acronym (=word in capitals) To copy & paste, use these: [word = "dis.*"] [tag = "V.*"] [tag="J.*"] [word="[[:upper:]]*"]Spaces in CQL and regular expressions Spaces are used in CQL to make the code easier to read for the human eye. The use of spaces in CQL does not have any effect on the result. In regular expressions, a space refers to a real space, e.g. space between two words. Since CQL criteria are set for individual tokens separately, the use of a space is generally a mistake and will not produce the required result. CQL tutorial – introduction to corpus query languagedot ‘ . ‘A dot stands for a single unspecified character.
question mark ‘ ? ‘A question mark stands for zero or 1 occurrence of the preceding character
asterisk ‘ * ‘An asterisk stands for zero or more occurrences of the preceding character.
range ‘ [ ] ‘use square brackets to specify a list or range
not ‘ ^ ‘use ^ to indicate that the character(s) should not be included, the characters have to be enclosed in square brackets
letters and digitsletters can be specified by a range or by character class
\d stands for a digit, i.e. characters 0-9, \D stands for any non-digit character
character classesCharacter classes are special codes used to refer to a group of characters.
Example: [[:alpha:]]* finds all words composed of letters or ‘ | ‘the pipe | is used to indicate OR
plus ‘+’the plus stands for ‘one or more repetitions of the preceding character’
case sensitivity switch (?i)regular expressions are always case sensitive, i.e. Bill is different from bill. To make the whole regular expression case insensitive, put these four characters at the beginning (?i)
repetition { }use curly brackets to indicate repetition of the preceding character
grouping ( )any part of a regular expression can be surrounded by parentheses to make it a single unit onto which other regular expressions can be applied
escapingto search for characters . ? * which already have a special function in regular expressions, you have to put a backslash in front of them, this is called escaping (e.g. you have to escape a question mark) Characters $ and # in part of speech tags also have to be escaped. regular expression . \. ok? ok\? \ matching result a b c d e f g h etc. (all alphanumeric characters) . o ok (question mark makes the preceeding character optional) ok? produces error, backslash escapes the following character but no such character exists not starting with ‘ ?! ‘Use ?! to say “not starting with”, also called negative lookahead. The brackets are required. The brackets have to be followed by a regular expression defining what the token should consist of. Use .* for any token. Use … for 3-letter tokens. Use [[:upper:]]* for tokens consisting of uppercase characters, etc.
backreferencessince manatee 2.65 It is possible to place brackets around one or several parts of a regular expression and refer to those parts later. The first part in brackets is referred to with number 1, the second with number 2, etc. (This only works within one token, e.g. [word=”(ba)..\1..*”] to find baseball, basketball, etc. N-grams tool supports also backreferences in different tokens, e.g. (.*) or \1 to find occurrences such as may or may, do or do, etc.
Is a symbol that stands for one or more unspecified characters in the search criterion?A wildcard character is used as a symbol that stands for one or more unspecified characters.
Which wildcard character replaces a single character?Alternatively referred to as a wild character or wildcard character, a wildcard is a symbol used to replace or represent one or more characters. The most common wildcards are the asterisk (*), which represents one or more characters, and question mark (?), which represents a single character.
What is a display of files that is limited based on specified criteria?Chapter 1. When you want to locate files meeting two criteria you use which Boolean operator?The AND operator is used when you want to include two or more search criteria.
|