Sorry about the terminology, it's hard to break from it when you've used it for awhile. The motif is a subsequence, and it may be found several times in a localized region. More often than not, you'll find one or two instances near one another, with other instances very far away (including on other chromosomes). So, you are typically comparing a set of strings from various parts of the genome together. It would bear more similarity to multiple sequence alignment in that sense.
The source material is {ATCG} only; the format you are mentioning is what the OP was requesting as output, also known as IUPAC format. Some people encode the motif in terms of bits of information to produce a motif logo letting you know the conservation at each position. Motifs typically range from 4 to 20 or so bases (characters) in length, with some positions in the motif substring being conserved more often than others (ie, if the base at the third position of the motif isn't an A, the protein doesn't bind). The repetitions will be the same or similar length, yes. As for where to start looking, regions of high evolutionary conservation and protein binding sites (via ChIP-Sequencing data) would be common ways to narrow down the regions to look.
As an aside, not all repetitive sequence is informative in the same way. There is plenty of repetitive sequence in the genome that has functions outside of protein binding (and the term repetitive sequence has a different meaning than what you might ascribe to it). Tools like repeat-masker are used to identify these regions, and databases of these sequences exist that you could use to determine whether or not an enriched sequence is informative or not. Simple repeats of ATATATATTATTATATATATATAT aren't as likely to be a protein binding site for instance.