Soundex
From My Research Wiki
The Soundex system is a method of grouping related words as pronounced in English, or in this case surnames, to help eliminate spelling variations that occur. Soundex was used by the U.S. Works Administration Project during the Great Depression to encode census records. While this was a make-work project, that effort has helped genealogists find ancestors.
Contents |
How long is the code?
The code has four (4) positions for census work. Other records may use additional positions, but the census always uses 4. The first position of the code is always the first letter in the name followed by three (3) numbers.
How does it work?
The name is converted into the code using the following rules and all records associated with that code are grouped together in the index. The index is usually sorted by the actual surname and given name, if known or readable.
Rules
- The first letter of the name is moved to the first position in the code
- Notice that the first letter is not changed to a number
- If the name includes O, Mc, Mac, Le, La, De or Von the name may have been coded with or without the prefix; check both groups to be safe
- Remove the following letters from the name: a, e, i, o, u, y, w, h
- Change all remaining letters to numbers using the table below
- If two letters code to the same number, only code the first letter
- When done coding the name add zeros (0) to create a 3-digit number
Code table
| Code | Letters |
|---|---|
| 1 | B, P, F, V |
| 2 | C, S, K, G, J, Q, X, Z |
| 3 | D, T |
| 4 | L |
| 5 | M, N |
| 6 | R |
Examples
- Adams = A352
- Daily = D400 - only "D" and "L" are coded
- Daly = D400 - small spelling variations are grouped together using the same code
- Deely = D400
- Hendershot = H536
- Hennescheidt = H523
- O'Shea = O200 - O, Mc, Mac, Le, De, Van, and other prefixes were sometimes coded and sometimes only the root name was coded
- Shay = S000 - only "S" and "H" are coded
- Shea = S000
- Williams = W452 - the second "L" is ignored
- Williamsons = W452 - same as "Williams" because the census only codes 4 positions
Limitations
Soundex works well for names when pronounced in English. Remember that this is a phonetic system for finding records; that means the index is all about how the work sounds, not how it is really spelled. Asian names, North American Indian names, and other cultures where the spelling does not always match how English uses the letters could find it hard to locate records.
Luckily, Soundex was usually only used for the census. Other records used different ways to index and find records.
