Processing of Data: Editing, Coding, Classification

Processing of Data refers to the systematic procedure of converting raw data into meaningful and useful information. After data collection, the information is often unorganized and difficult to understand. Through data processing, it is edited, classified, coded, and tabulated to make it clear and structured. This process helps in removing errors, improving accuracy, and preparing data for analysis. Proper data processing is important for drawing valid conclusions and making effective decisions. It ensures that the research results are reliable and easy to interpret. In business research, data processing plays a key role in transforming facts into valuable insights for managers.

Editing

Editing is the first step in data processing after collection. It involves reviewing raw data to detect errors, omissions, inconsistencies, and illegible responses. The purpose is to ensure that data are accurate, complete, consistent, and ready for coding and analysis. In Indian business research, editing can be field editing (done by the interviewer immediately after interview while memory is fresh) or central editing (done at a central office by a dedicated editor). Common errors include missing answers, out of range responses (e.g., age 200), contradictory answers (e.g., age 25 but years of experience 30), and illegible handwriting. Editing improves data quality before coding. Always document editing decisions and keep original raw data for audit.

Types of Editing:

1. Field Editing

Field editing is performed by the interviewer immediately after completing each interview, while the respondent is still available or the situation is fresh in memory. The interviewer reviews the questionnaire for obvious omissions, illegible entries, or inconsistent answers. If a question is left blank, the interviewer may politely ask the respondent to provide the missing information. In Indian business research, field editing reduces follow up costs and improves accuracy. However, interviewers must avoid changing answers based on their own assumptions. Field editing is the first line of defense against data errors and should be done systematically, not casually.

2. Central Editing

Central editing is performed at a central office or data processing center by a dedicated editor who was not involved in data collection. All questionnaires from all interviewers are reviewed collectively for completeness, consistency, and accuracy. The central editor checks for out of range responses, logical contradictions, illegible handwriting, and pattern responses (e.g., all answers the same). Problematic questionnaires may be returned to the field for clarification or assigned missing value codes. In Indian business research, central editing ensures uniform standards across all interviewers. It is more objective than field editing because the editor has no personal involvement with respondents.

3. In-House Editing

In house editing is conducted within the researcher’s own organization using permanent staff. The editor follows a standardized edit manual that specifies rules for handling common problems. In house editing is efficient because editors are familiar with the research objectives and coding schemes. They can quickly identify unusual responses and apply consistent correction rules. In Indian business research, in house editing is common for large scale surveys conducted by market research firms or academic institutions. Advantages include quality control and faster turnaround. Disadvantages include potential bias if editors become too familiar with expected results. Regular audits prevent drift from edit specifications.

4. Field Return Editing

Field return editing occurs when a questionnaire is returned from the field to the central office, and the editor identifies problems that cannot be resolved without going back to the respondent. The questionnaire is marked with specific queries and returned to the same interviewer, who then revisits or recontacts the respondent for clarification. In Indian business research, field return editing is time consuming and expensive but necessary when data quality is critical. It is most feasible when respondents are easily reachable (e.g., employees, regular customers). For geographically dispersed or hard to reach populations, field return editing may be impractical. Use this method sparingly for high priority missing or inconsistent data.

5. Online Editing

Online editing occurs in real time during digital data collection. Web based surveys and electronic forms can be programmed with validation rules that prevent or flag errors as the respondent types. For example, if a respondent enters age 200, the system shows an error message and refuses to proceed. Skip patterns are automated. In Indian business research, online editing dramatically reduces post collection editing effort. It also improves data quality by catching errors at the source. However, online editing cannot correct for logical contradictions across questions that the respondent answers sincerely but inconsistently. Despite this limitation, online editing is highly recommended for all digital data collection.

Process of Editing:

1. Receiving and Logging Data

The editing process begins when completed questionnaires or data files are received from the field. Each questionnaire is logged with a unique identifier, date received, interviewer name, and batch number. This creates an audit trail. In Indian business research, logging prevents loss of questionnaires and allows tracking of non response. For online surveys, logging is automatic with timestamps. For paper surveys, maintain a receipt log. Any missing batches are flagged for follow up. The log also records the number of questionnaires received versus expected. Discrepancies trigger investigation. Proper logging ensures accountability and provides a count of usable cases before detailed editing begins. Never skip logging; lost data cannot be edited.

2. Preliminary Screening

The editor conducts a quick preliminary screening of each questionnaire to identify major problems that make the questionnaire unusable. These include: entire sections blank, obvious respondent misunderstanding (e.g., circling two answers for every question), or evidence of cheating (e.g., identical handwriting across multiple questionnaires). In Indian business research, screening removes fraudulent or completely unusable cases before detailed editing. This saves time on cases that will be discarded anyway. Establish clear criteria for rejection before screening. For example, “More than 50 percent of questions unanswered = reject.” Document all rejections with reasons. Preliminary screening is a gatekeeping step, not a substitute for detailed editing.

3. Checking Completeness

The editor verifies that every question intended for the respondent has been answered. For paper questionnaires, this involves visually scanning each page. For online surveys, completeness checks are often automated. In Indian business research, pay special attention to skip patterns: if a respondent was instructed to skip to question 15, question 14 should be blank; if they answered question 14 but should have skipped, flag as error. Missing answers are recorded using missing value codes (e.g., 99). For critical questions, attempt to retrieve missing data through follow up contact if feasible. For non critical items, assign missing codes. Document the proportion of missing data per variable. High missing rates suggest poor question design or interviewer error.

4. Checking Consistency

The editor examines responses for logical contradictions. Consistency checking ensures that answers to different questions do not conflict with each other. For example, a respondent who reports age 22 and years of education 20 is inconsistent (one would finish school around age 18). Another example: “Never use ecommerce” but later “Rate satisfaction with last ecommerce purchase.” In Indian business research, consistency checks reveal respondent carelessness, misunderstanding, or interviewer error. Create a consistency matrix showing which variable pairs should align. Contradictions may be resolved by returning to the respondent, using the most reliable answer based on context, or assigning missing codes. Document all consistency resolutions. Inconsistent data cannot be analyzed meaningfully without correction or exclusion.

5. Checking Accuracy

The editor verifies that responses fall within permissible ranges and are plausible. Accuracy checks include: numerical values within specified limits (e.g., age 18 to 100, not 200), codes matching the codebook (e.g., gender only 1 or 2, not 3), and dates logical (e.g., purchase date not in the future). In Indian business research, out of range responses may indicate data entry errors (e.g., typing 8 instead of 8 for age) or respondent misunderstanding. For paper surveys, check for illegible handwriting that could be misread during data entry. Flag any questionable entries. If the correct value cannot be determined, assign a missing code. Accuracy checks are especially critical for variables that will be used in statistical testing. One outlier can distort results.

6. Handling Non Response

The editor systematically addresses missing data. First, distinguish between legitimate skips (respondent correctly followed skip instructions) and item non response (respondent failed to answer when they should have). Legitimate skips are not errors; they receive special codes (e.g., 98 for Not Applicable). True item non response receives missing codes (e.g., 99). In Indian business research, if non response exceeds 10 percent for any variable, investigate causes. For critical variables, attempt follow up contact. For analysis, decide on treatment: listwise deletion (remove cases with any missing), pairwise deletion (use available data per analysis), or imputation (estimate missing values). Document the proportion of missing data and the chosen treatment. High non response rates threaten validity regardless of editing.

7. Recording Editing Decisions

Every editing decision must be documented in an edit log. The log records: questionnaire ID, question number or variable name, original response (if any), problem identified, action taken (e.g., corrected, assigned missing code, discarded), editor initials, and date. In Indian business research, the edit log serves as an audit trail for quality control and replication. It allows another researcher to verify editing decisions and assess their impact. Without a log, editing is invisible and cannot be defended during thesis viva voce or journal review. Maintain the log separately from the raw data. Keep original questionnaires unchanged; record edits in a separate column or file. Transparency is the hallmark of professional data processing.

8. Final Review and Approval

After all editing is complete, a senior editor or research supervisor conducts a final review of a random sample of edited questionnaires (typically 10 to 20 percent). The final review verifies that editing rules were applied consistently and correctly. Any systematic errors discovered require re editing of the entire batch. In Indian business research, final approval is documented before data entry begins. The approval signifies that data are ready for coding and entry. For large surveys, conduct interim final reviews at regular intervals, not only at the end. Final review catches mistakes that individual editors may miss. It also provides feedback for improving future editing processes. Approved data sets are locked; no further changes without justification.

Coding

Coding is the process of assigning numerical or symbolic codes to raw responses so they can be entered into a database and analyzed statistically. Raw data from questionnaires are often in text form (e.g., “Male,” “Female”) or open ended answers. Coding transforms these into numbers (e.g., 1 for Male, 2 for Female) that statistical software like SPSS or Excel can process. The purpose is to reduce large volumes of text into compact, analyzable categories without losing meaning. In Indian business research, coding also applies to Likert scale responses (Strongly Agree = 5, Strongly Disagree = 1) and open ended questions (e.g., coding “Delivery was late” under category “Delivery Issues”). Good coding preserves information while enabling analysis.

Purpose of Coding:

1. Data Reduction

Coding reduces large volumes of raw text responses into manageable categories. Open ended answers, interview narratives, or observation notes may contain thousands of words. Coding condenses these into a limited set of numerical categories (e.g., 10 complaint types). In Indian business research, a survey asking “Why do you prefer this ecommerce platform?” might generate 500 different sentences. Coding reduces these to categories like “Price” (code 1), “Delivery Speed” (code 2), “Product Quality” (code 3). This reduction makes analysis feasible. Without coding, each response is unique and cannot be summarized statistically. Data reduction preserves essential information while discarding irrelevant variation. It transforms qualitative richness into quantitative analyzability.

2. Enabling Statistical Analysis

Raw text responses cannot be entered directly into statistical software like SPSS, R, or Excel. Statistical analysis requires numbers. Coding converts text responses into numerical values that can be counted, averaged, correlated, and tested for significance. For example, “Strongly Agree” becomes 5, “Agree” becomes 4, and so on. In Indian business research, coding enables frequency distributions (e.g., 40 percent chose code 1), chi square tests, t tests, regression, and factor analysis. Without coding, only qualitative analysis (thematic, narrative) is possible. Coding is the bridge between raw data and statistical inference. It allows researchers to move from individual responses to population level patterns. Every quantitative analysis depends on prior coding.

3. Standardizing Data Across Respondents

Different respondents may express similar meanings using different words. Coding standardizes these variations into common categories. For example, for occupation, respondents may write “businessman,” “self employed,” “entrepreneur,” or “owns shop.” Coding assigns all these to a single code (e.g., code 3 = Self Employed). In Indian business research, standardization is essential because language diversity (Hindi, Tamil, Bengali, English) produces many wordings for the same concept. Without standardization, each unique wording appears as a separate category, making analysis impossible. Standardization also enables comparison across subgroups (e.g., urban vs rural) because the same codes apply to all respondents. A good coding scheme captures meaning while ignoring superficial wording differences.

4. Facilitating Data Entry

Coded data are easier and faster to enter into databases than raw text. Data entry operators can quickly type numerical codes (e.g., 1, 2, 3) instead of full text responses (e.g., “Very satisfied with delivery”). This reduces entry time, keystrokes, and transcription errors. In Indian business research, large surveys with thousands of respondents benefit enormously from precoding. Numerical codes also occupy less storage space. For paper questionnaires, precoded response options (numbers printed next to answers) allow direct entry without interpretation. For online surveys, coding is automatic. Coding transforms the data entry task from skilled (requiring interpretation) to mechanical (simple typing). Faster, cheaper, more accurate data entry is a major practical purpose of coding.

5. Enabling Data Retrieval and Replication

Coded data stored in numerical format can be easily retrieved, sorted, filtered, and reanalyzed by different researchers. A codebook documents exactly what each code means. Another researcher can later retrieve your data, understand the codes, and replicate your analysis or perform new analyses. In Indian business research, replication is essential for scientific credibility. If you share raw text responses, the next researcher must recode everything from scratch, introducing new subjectivity. Coded data with a codebook preserve your coding decisions permanently. Data archives (e.g., ICSSR data archive) require coded data. Coding ensures that your research investment continues to benefit future researchers. It transforms ephemeral responses into permanent, shareable knowledge.

6. Reducing Subjectivity in Interpretation

Coding forces the researcher to make explicit, consistent rules for categorizing responses. Without coding, interpretation is impressionistic and varies with each reading. A coding scheme with clear category definitions and examples reduces individual judgment. Multiple coders applying the same scheme should achieve high agreement (inter coder reliability above 0.80). In Indian business research, coding reduces the risk that the researcher unconsciously favors responses that support their hypotheses. It imposes discipline. While some subjectivity remains in creating categories, once the scheme is fixed, coding is rule based. This objectivity is the foundation of scientific analysis. Coding transforms vague impressions into verifiable measurements. Without coding, research is opinion; with coding, it becomes evidence.

7. Enabling Computerized Analysis

Modern data analysis relies on computer software. Computers cannot understand natural language text directly without complex natural language processing. Coding converts human language into numbers that computers process easily. In Indian business research, coded data enable the use of powerful tools: pivot tables in Excel, descriptive statistics in SPSS, regression in R, and structural equation modeling in AMOS. These tools produce results in seconds that would take weeks manually. Coding also enables automated cross tabulation, chart generation, and significance testing. Without coding, researchers would manually count frequencies and calculate percentages, introducing errors and limiting complexity. Coding unlocks the full power of computational analysis. It is the essential translation layer between human responses and machine processing.

8. Preserving Anonymity

Coding can separate identifying information from research data. Respondents’ names, addresses, or contact numbers receive one code, while their answers receive another code. The linking code is kept separately and securely. In Indian business research, this protects participant privacy. For example, a codebook might show that Respondent A (code 4321) gave certain answers, but without the master list, the researcher cannot identify who Respondent A is. For published datasets, all identifying information is removed, and only anonymous codes remain. Coding thus serves an ethical purpose: enabling data sharing and secondary analysis without violating confidentiality. Proper coding with anonymization is required by Indian data protection laws (DPDP Act 2023). Ethics and coding are connected.

9. Facilitating Data Cleaning

Coded data make it easier to detect errors, outliers, and inconsistent responses. Numerical codes have expected ranges (e.g., gender codes 1 and 2 only). Any value outside this range (e.g., 5) is immediately visible as an error. Raw text responses hide errors because any text is plausible. In Indian business research, after coding, run frequency tables to check for illegal codes. For example, a 5 point Likert scale should show only codes 1,2,3,4,5. Code 9 (missing) may appear but should be documented. Outliers in continuous variables (e.g., age code 999) are obvious. Coding transforms error detection from subjective judgment to mechanical rule checking. Clean data require clean codes. Without coding, many errors remain undetected until analysis fails.

10. Enabling Longitudinal and Cross Study Comparison

When the same coding scheme is used across multiple time points or different studies, data can be compared directly. For example, if customer complaint categories are coded identically in 2024 and 2025, the researcher can test whether complaint patterns changed. In Indian business research, standardized coding schemes (e.g., National Industrial Classification, Standard Occupational Classification) enable comparison across different researchers and institutions. Without consistent coding, each study is an island. Coding creates a common language. For longitudinal studies (same respondents over time), coding ensures that responses at Time 1 and Time 2 are categorized identically. Cross study comparison builds cumulative knowledge. Coding transforms isolated findings into an accumulating evidence base for business theory and practice.

Types of Coding:

1. Precoding

Precoding assigns numerical codes to response options before data collection begins. During questionnaire design, each answer choice is printed with a code number next to it. For example, “Gender: Male (1) Female (2).” Respondents or interviewers may not even notice the codes. In Indian business research, precoding is used for all closed ended questions (multiple choice, Likert scales, yes/no). Advantages include faster data entry, reduced errors, and immediate readiness for analysis. No post collection coding decisions are needed. Precoding requires that the researcher anticipates all possible responses accurately. If an unexpected response occurs, it cannot be precoded. Therefore, always include an “Other (please specify)” option with a code for unanticipated answers.

2. Post Coding

Post coding assigns codes to responses after data collection, typically for open ended questions where responses cannot be anticipated. The researcher reads a sample of responses, identifies common themes, creates categories, assigns codes, and then codes all responses. For example, “Why did you choose this ecommerce platform?” may generate answers like “price,” “cheap,” “affordable,” “good deal” all coded as “Price Consciousness (code 1).” In Indian business research, post coding is time consuming and requires trained coders. Inter coder reliability must be checked (above 0.80). Post coding preserves richness but introduces subjectivity. Always keep raw text responses alongside codes for verification. Post coding is necessary when exploring new topics or when response options cannot be predetermined.

3. Hierarchical Coding

Hierarchical coding organizes codes into a tree structure with broader categories at higher levels and narrower subcategories at lower levels. For example, level 1: “Product Issue (code 100)”; level 2: “Quality Issue (110)” and “Size Issue (120)”; level 3 under Quality: “Damaged (111),” “Defective (112),” “Wrong Color (113).” In Indian business research, hierarchical coding is used for complex open ended questions, content analysis, and qualitative data. Advantages include flexibility to analyze at different levels of detail. A researcher can report broad patterns (level 1) or specific complaints (level 3). Hierarchical coding requires careful planning and a detailed codebook. It is more time consuming than flat coding but provides richer analytical options. Use when response diversity is high.

4. Flat Coding

Flat coding assigns codes to categories without any hierarchical structure. All categories are at the same level. For example, “Complaint Type: Price (1), Delivery (2), Quality (3), Packaging (4), Customer Service (5).” No category contains subcategories. In Indian business research, flat coding is simpler and faster than hierarchical coding. It is appropriate when categories are mutually exclusive and no natural hierarchy exists. Flat coding works well for closed ended questions with limited response options (5 to 15 categories). Disadvantages include inability to analyze at different levels of detail. If a category becomes too broad (e.g., “Delivery” includes both late delivery and rude courier), important distinctions are lost. Use flat coding for simple classification tasks where depth is not required.

5. Numerical Coding

Numerical coding assigns numbers to response categories where the numbers have quantitative meaning. The codes preserve order and distance. For example, Likert scale: Strongly Disagree (1), Disagree (2), Neutral (3), Agree (4), Strongly Agree (5). The number 5 represents a higher level of agreement than 3. In Indian business research, numerical coding is used for interval and ratio scale data. Advantages include allowing full parametric statistical analysis (means, standard deviations, t tests, regression). The numeric values can be directly entered into SPSS or Excel. However, numerical coding assumes that the distance between codes is meaningful (e.g., the gap between 1 and 2 equals the gap between 4 and 5). This assumption is controversial for Likert scales but widely accepted in practice.

6. Categorical Coding

Categorical coding assigns numbers or symbols to categories where the numbers have no quantitative meaning. The codes are simply labels or identifiers. For example, “Region: North (1), South (2), East (3), West (4).” The number 4 does not mean “more” than 1. In Indian business research, categorical coding is used for nominal scale variables like gender, religion, occupation, city, or brand preference. Any numerical codes are arbitrary; 1 for Male and 2 for Female could be reversed without changing analysis. Only non parametric statistical tests (chi square, mode, frequency) are appropriate. Means and standard deviations are meaningless for categorical codes. Categorical coding is simple but limited. Never perform arithmetic operations (e.g., average of gender) on categorical codes.

7. Alphabetic Coding

Alphabetic coding uses letters instead of numbers as codes. For example, “Gender: M (Male), F (Female)” or “Region: N (North), S (South), E (East), W (West).” In Indian business research, alphabetic coding is less common than numerical coding because statistical software prefers numbers. However, alphabetic codes are more memorable and intuitive for human readers. They are useful during manual data entry or for small datasets analyzed manually. Alphabetic codes can be converted to numerical codes later for statistical analysis. Disadvantages include limited sorting options (M comes before F alphabetically, not logically) and incompatibility with many statistical packages. Use alphabetic coding for temporary or small scale projects. For large scale research, convert to numerical codes for software compatibility.

8. Alphanumeric Coding

Alphanumeric coding combines letters and numbers in the same code. For example, “A1” for “Price Complaint,” “A2” for “Quality Complaint,” “B1” for “Delivery Complaint.” In Indian business research, alphanumeric coding is useful for hierarchical or complex coding schemes where letters indicate broad categories and numbers indicate subcategories. Advantages include human readability and logical grouping. For example, all A codes are product related; all B codes are service related. Disadvantages include incompatibility with many statistical packages that expect purely numeric variables. Alphanumeric codes require conversion or special handling in SPSS, R, or Excel. Use alphanumeric coding for qualitative data management (e.g., NVivo) rather than quantitative analysis. For statistical analysis, convert to numeric codes with a separate codebook.

9. Dummy Coding

Dummy coding converts categorical variables with k categories into k-1 binary (0/1) variables for use in regression analysis. For example, “Payment Method” has three categories: UPI, Credit Card, Cash on Delivery. Dummy coding creates two variables: Payment_UPI (1 if UPI, else 0) and Payment_Credit (1 if Credit Card, else 0). Cash on Delivery becomes the reference category (coded 0 on both). In Indian business research, dummy coding is essential for including categorical independent variables in regression, ANOVA, and other parametric models. One category must be omitted as reference to avoid perfect multicollinearity. Interpretation compares each category to the reference category. Dummy coding preserves all information from categorical variables while enabling powerful statistical analysis. It is standard practice in business research.

10. Effect Coding

Effect coding (also called deviation coding) is an alternative to dummy coding for categorical variables in regression. Instead of comparing each category to a reference category, effect coding compares each category to the overall mean of all categories. Codes are typically -1, 0, and 1. For a variable with three categories, two effect coded variables are created. The mean of the effect codes across categories is zero. In Indian business research, effect coding is less common than dummy coding but useful when no natural reference category exists. Interpretation: a positive coefficient means that category scores above the grand mean; negative means below. Effect coding is preferred in some experimental designs and when all categories are equally meaningful. However, dummy coding remains the industry standard due to simpler interpretation.

One thought on “Processing of Data: Editing, Coding, Classification

Leave a Reply

error: Content is protected !!